Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hierarchical index + frame_table + data_columns=True -> TypeError #4710

Closed
kghose opened this issue Aug 30, 2013 · 6 comments · Fixed by #4716
Closed

hierarchical index + frame_table + data_columns=True -> TypeError #4710

kghose opened this issue Aug 30, 2013 · 6 comments · Fixed by #4716
Labels
Bug IO HDF5 read_hdf, HDFStore
Milestone

Comments

@kghose
Copy link

kghose commented Aug 30, 2013

import pandas as pd, numpy

r = numpy.empty((3,4))
index = pd.MultiIndex.from_tuples([('A','a'), ('A','b'), ('B','a'), ('B','b')])
df = pd.DataFrame(r, columns=index)

store = pd.HDFStore('df.h5')
store.put('data',df) #->OK
store.put('data1',df,table=True) #-> Ok
store.put('data2',df,table=True,data_columns=['A']) #-> Ok
store.put('data3',df,table=True,data_columns=True) #-> raises hell 
#TypeError: not all arguments converted during string formatting

store['data']['A'] #->OK
store['data1']['A'] #-> KeyError KeyError: u'no item named A'

In [43]: df = store['data']

In [44]: df
Out[44]: 
   A                            B               
   a             b              a              b
0  2 -1.727234e-77  2.964394e-323   0.000000e+00
1  0  0.000000e+00   0.000000e+00   0.000000e+00
2  0  0.000000e+00   0.000000e+00  8.344027e-309

In [45]: df = store['data1']

In [46]: df
Out[46]: 
   (A, a)        (A, b)         (B, a)         (B, b)
0       2 -1.727234e-77  2.964394e-323   0.000000e+00
1       0  0.000000e+00   0.000000e+00   0.000000e+00
2       0  0.000000e+00   0.000000e+00  8.344027e-309


pd.__version__ -> '0.12.0'
@jreback
Copy link
Contributor

jreback commented Aug 30, 2013

hmm might be a bug; just specify the data columns that you need in any event

@jreback
Copy link
Contributor

jreback commented Aug 30, 2013

@kghose
Copy link
Author

kghose commented Aug 30, 2013

Once again, man: an instant reply. Thanks so much! Sorry I had to edit the code several times. I was copying and pasting from by REPL and my Editor and made a hash of it. I verified that the code as it stands now replicates the issue.

Also, exploiting you kindness, could you send me a link to how I can use MultiIndex for the select and select_as_multiple usages?

I see your link. Thanks!

Thanks again!!

@jreback
Copy link
Contributor

jreback commented Aug 30, 2013

yep...this a bug....actually wasn't really tested with a column mi

@jreback
Copy link
Contributor

jreback commented Aug 31, 2013

@kghose

There was a somewhat related bug that the multi-index columns were not being recreated properly. The PR #4716 fixes this.

But, I have disallowed a multi-index when you specify data_columns, because this is quite complicated. If you are interested in solving it, go for it.

@kghose
Copy link
Author

kghose commented Aug 31, 2013

Thanks @jreback . I'll pass for now on the mi ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants