https://dzone.com/articles/quick-hdf5-pandas

In [2]:
import numpy as np 
import pandas as pd

In [3]:
# create (or open) an hdf5 file and opens in append mode
hdf = pd.HDFStore('storage.h5')

In [5]:
df = pd.DataFrame(np.random.rand(5,3), columns=('A','B','C'))# put the dataset in the storage

In [6]:
df

Unnamed: 0,A,B,C
0,0.920246,0.982024,0.677537
1,0.484375,0.795836,0.584861
2,0.058648,0.010145,0.612402
3,0.332054,0.295081,0.937325
4,0.503101,0.599171,0.89955


In [7]:
hdf.put('d1', df, format='table', data_columns=True)

In [8]:
hdf['d1'].shape

(5, 3)

The data in the storage can be manipulated. For example, we can append new data to the dataset we just created:

In [10]:
hdf.append('d1', pd.DataFrame(np.random.rand(5,3), 
           columns=('A','B','C')), 
           format='table', data_columns=True)
hdf.close()# closes the file

There are many ways to open a hdf5 storage, we could use again the constructor of the class HDFStorage, but the function read_hdf makes us also able to query the data:

In [13]:
# this query selects the columns A and B# where the values of A is greather than 0.5
hdf = pd.read_hdf('storage.h5','d1',where=['A>.5'], columns=['A','B'])

At this point, we have a storage which contains a single dataset. The structure of the storage can be organized using groups. In the following example we add three different datasets to the hdf5 file, two in the same group and another one in a different one:

In [14]:
hdf = pd.HDFStore('storage.h5')
hdf.put('tables/t1', pd.DataFrame(np.random.rand(20,5)))
hdf.put('tables/t2', pd.DataFrame(np.random.rand(10,3)))
hdf.put('new_tables/t1', pd.DataFrame(np.random.rand(15,2)))

In [15]:
hdf

<class 'pandas.io.pytables.HDFStore'>
File path: storage.h5
/d1                       frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index],dc->[A,B,C])
/new_tables/t1            frame        (shape->[15,2])                                                   
/tables/t1                frame        (shape->[20,5])                                                   
/tables/t2                frame        (shape->[10,3])                                                   

On the left we can see the hierarchy of the groups added to the storage, in the middle we have the type of dataset and on the right there is the list of attributes attached to the dataset. Attributes are pieces of metadata you can stick on objects in the file and the attributes we see here are automatically created by Pandas in order to describe the information required to recover the data from the hdf5 storage system.