Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

HDFStore file size mysteriously increases #2132

Closed
murbard opened this Issue Oct 26, 2012 · 1 comment

Comments

Projects
None yet
2 participants

murbard commented Oct 26, 2012

To reproduce the bug, do the following:

df = pandas.DataFrame([['a','b'] for i in range(1,1000)])
store = pandas.HDFStore('test.h5')
store['x'] = df
store.close()

The file test.h5 is now about 1.1M (seems like a lot, but OK)

now let's reopen the file

store = pandas.HDFStore('test.h5') #open it again
store['x'] = df #do the same thing as before!
store.close()

if you look at store['x'] they represent the exact same dataframe.

And yet, the file test.h5 is now 2.1M!

I've had h5 files which were initially a couple Mb grow to over 5.5Gb through repeated loads and save.
I'm using version 9.0

EDIT:
stackoverflow reference

EDIT: this seems to be a limitation of HDF5. The assignment is a delete and a write, but the delete part doesn't free any space in the file... http://www.hdfgroup.org/hdf5-quest.html#del

@wesm wesm closed this Nov 3, 2012

@hadim hadim referenced this issue in bnoi/scikit-tracker Apr 29, 2014

Closed

HDF5 limitation #36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment