Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/ER: HDFStore write with empty frame reports an error (rather than suceeding) #4273

Closed
jreback opened this issue Jul 17, 2013 · 3 comments · Fixed by #4660

Comments

@jreback
Copy link
Contributor

commented Jul 17, 2013

writing to an HDFStore with an empty-frame with invalid dtypes raises, maybe should just proceed (or is it the dtypes call that is actually wrong here: see #4272)

came up in this question: http://stackoverflow.com/questions/17691912/problems-with-merging-on-disk-tables-with-millions-of-rows/17698740#17698740

In [26]: df = DataFrame(randn(10,2),columns=list('AB'))

In [28]: df['C'] = 'foo'

In [33]: df.to_hdf('test.h5','df',mode='w',table=True)

In [35]: pd.read_hdf('test.h5','df')
Out[35]: 
          A         B    C
0 -1.123712 -1.146515  foo
1  0.921705  1.800419  foo
2 -0.769236 -0.553307  foo
3 -0.747601 -1.783439  foo
4 -1.110340  1.601026  foo
5  0.743869 -2.135140  foo
6  1.033699  2.028479  foo
7 -0.755478 -1.060223  foo
8  0.079326 -2.671624  foo
9 -2.262756  0.406850  foo

In [36]: pd.read_hdf('test.h5','df').dtypes
Out[36]: 
A    float64
B    float64
C     object
dtype: object

In [37]: df[df.C=='bar']
Out[37]: 
Empty DataFrame
Columns: [A, B, C]
Index: []

In [38]: df[df.C=='bar'].dtypes
Out[38]: 
A   NaN
B   NaN
C   NaN
dtype: float64

In [39]: df[df.C=='bar'].to_hdf('test.h5','df',append=True)
TypeError: Cannot serialize the column [C] because
its data contents are [empty] object dtype
@abrakababra

This comment has been minimized.

Copy link

commented Aug 21, 2013

The DataFrame must not necessarily be completely empty. ALL NaNs are simply omitted when using 'append=True' in to_hdf.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2013

@abrakababra that is a separate issue, I thought I had an issue for the all-nan dropping (which is done for efficiency really), but agreed should have way not to do it

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2013

see #4625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.