Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ERR: raise on invalid coulmns using a fixed HDFStore #13492
Comments
|
not really sure what you are doing. pls show an exact reproduction.
|
jreback
added Can't Repro IO HDF5
labels
Jun 20, 2016
amanhanda
commented
Jun 20, 2016
|
I am using the HDFStore interface. With your code snippet, please try and reset_index() on the returned frame, when the format="fixed"
|
|
I c. Well that's not really supported; you must have strings for column names. We did a fix for tables IIRC. want to do a pull-request? |
jreback
added Difficulty Novice Error Reporting Effort Low and removed Can't Repro
labels
Jun 20, 2016
jreback
added this to the
Next Major Release
milestone
Jun 20, 2016
jreback
changed the title from
DataFrame reset_index() fails when data frame read from HDF5. to ERR: raise on invalid coulmns using a fixed HDFStore
Jun 20, 2016
amanhanda
commented
Jun 21, 2016
|
The index name is string in the source data frame. Storing it to hdf5 and retrieving it back is when the type changes to numpy.string_. I have not done a pull request before. This would be my first. Will give it a shot. |
|
fixed is not very respectful of attributes like this |
|
Hi! I'm at the sprints at pycon and am looking to pick this up! Managed to reproduce the issue even though for the type I get: In [27]: type(s1.index.name)
Out[27]: numpy.str_instead of Same issue arises when reading the table with In terms of expected behavior, I'm not entirely certain what we want here - should we be casting the |
|
Also can confirm that this doesn't happen with |
|
@makmanalp yeah, I think the best thing to do would be to cast |
|
On my python3 installation, I'm finding that |
|
Ugh that's unfortunate. I guess we should know the encoding inside the HDF reader. |
|
Single-file example for easy reproduction: import pandas as pd
import numpy as np
import datetime
idx = pd.Index(pd.to_datetime([datetime.date(2000, 1, 1), datetime.date(2000, 1, 2)]), name='cols')
idx1 = pd.Index(pd.to_datetime([datetime.date(2010, 1, 1), datetime.date(2010, 1, 2)]), name='rows')
s = pd.DataFrame(np.arange(4).reshape(2,2), columns=idx, index=idx1)
with pd.HDFStore("test.h5", "w") as store:
store.put("test", s, "fixed")
with pd.HDFStore("test.h5", "r") as store:
s1 = store["test"]
# s1.reset_index() |
makmanalp
added a commit
to makmanalp/pandas
that referenced
this issue
May 23, 2017
|
|
makmanalp |
48b91c5
|
makmanalp
referenced
this issue
May 23, 2017
Merged
BUG: Handle numpy strings in index names in HDF #13492 #16444
|
So, I just made a PR, it's just a first stab at the issue but hopefully it's in the right direction! Please let me know how happy you are with this fix and what I can do to get it release-ready! |
makmanalp
added a commit
to makmanalp/pandas
that referenced
this issue
Jun 1, 2017
|
|
makmanalp |
dbd8b4c
|
makmanalp
added a commit
to makmanalp/pandas
that referenced
this issue
Jun 1, 2017
|
|
makmanalp |
a90b215
|
makmanalp
added a commit
to makmanalp/pandas
that referenced
this issue
Jun 1, 2017
|
|
makmanalp |
ab75d27
|
makmanalp
added a commit
to makmanalp/pandas
that referenced
this issue
Jun 1, 2017
|
|
makmanalp |
90f63b0
|
jreback
modified the milestone: 0.20.2, Next Major Release
Jun 2, 2017
TomAugspurger
closed this
in #16444
Jun 4, 2017
TomAugspurger
added a commit
that referenced
this issue
Jun 4, 2017
|
|
makmanalp + TomAugspurger |
18c316b
|
TomAugspurger
added a commit
to TomAugspurger/pandas
that referenced
this issue
Jun 4, 2017
|
|
makmanalp + TomAugspurger |
e01241a
|
TomAugspurger
added a commit
that referenced
this issue
Jun 4, 2017
|
|
makmanalp + TomAugspurger |
7286bc7
|
Kiv
added a commit
to Kiv/pandas
that referenced
this issue
Jun 11, 2017
|
|
makmanalp + Kiv |
a0174eb
|
stangirala
added a commit
to stangirala/pandas
that referenced
this issue
Jun 11, 2017
|
|
makmanalp + stangirala |
05e41e4
|
guillemborrell
added a commit
to guillemborrell/pandas
that referenced
this issue
Jul 7, 2017
|
|
makmanalp + guillemborrell |
26888fd
|
yarikoptic
added a commit
to neurodebian/pandas
that referenced
this issue
Jul 12, 2017
|
|
yarikoptic |
483706d
|
amanhanda commentedJun 20, 2016
Code Sample
Expected Output
output of
pd.show_versions()# Problem occurs in 0.16.2 and 0.18.1