<h2>Freeing Space in HDF5</h2>
<h4>Using Pandas PyTables</h4>
    
A SO [question](http://stackoverflow.com/questions/1124994/removing-data-from-a-hdf5-file) in 2009 was followed by <a href="http://stackoverflow.com/questions/11194927/deleting-information-from-an-hdf5-file">another</a> in 2012 on this issue. 
    
The HDF5 docs referenced in the latter question still state that space is not freed upon object/node removal. This code snippet just confirms that.

A later follow-up was posted on SO [here](http://stackoverflow.com/questions/21090243/release-hdf5-disk-memory-after-table-or-node-removal-with-pytables-or-pandas) where the mechanism to reclaim this space is used directly here.
 

In [152]:
import pandas as pd
import os
import numpy as np


small_df = pd.DataFrame([[1,2,3], [3,2,1]])
large_df = pd.DataFrame(np.random.randint(9,size=(1000,10000)))
filename = 'test.h5'

with pd.HDFStore(filename, mode='w') as store:
    store.put('test/small_df', small_df)
    print('keys: %s' % ', '.join(store.keys()))
    
print('Size of %s is %.2fMiB' % (filename, float(os.stat(filename).st_size)/1024**2))

with pd.HDFStore(filename, mode='a') as store:
    store.put('test/n1/large_df', large_df)
    store.put('test/n2/large_df', large_df)
    store.put('test/n3/large_df', large_df)
    print('keys: %s' % ','.join(store.keys()))
    
print('Size of %s is %.2fMiB' % (filename, float(os.stat(filename).st_size)/1024**2))

with pd.HDFStore(filename, mode='a') as store:
    for i in range(1,4):
        store.remove('test/n%s/large_df' % i)
    print('keys: %s' % ','.join(store.keys()))
    
print('Size of %s is %.2fMiB' % (filename, float(os.stat(filename).st_size)/1024**2))



keys: /test/small_df
Size of test.h5 is 0.01MiB
keys: /test/small_df,/test/n1/large_df,/test/n2/large_df,/test/n3/large_df
Size of test.h5 is 229.38MiB
keys: /test/small_df
Size of test.h5 is 153.01MiB


<h4>Reclaiming space</h4>
We are using the `ptrepack` commandline instruction to 'repackage' the hdf5 file

In [153]:
from subprocess import call
outfilename = 'out.h5'
command = ["ptrepack", "-o", "--chunkshape=auto", "--propindexes", "--complevel=9", 
           "--complib=blosc",filename, outfilename]
print('Size of %s is %.2fMiB' % (filename, float(os.stat(filename).st_size)/1024**2))
if call(command) != 0:
    print('Error')
else:
    print('Size of %s is %.2fMiB' % (filename, float(os.stat(outfilename).st_size)/1024**2))


Size of test.h5 is 153.01MiB
Size of test.h5 is 0.01MiB
