### HDF5 Format

For  numerical data, in particular for large data volumes, HDF5 is a popular format.

* HDF5 data is portable between different applications, all common programming languages provide interfaces for it.
* You can process large data in chunks, i.e. you do not need to load the full data volume in RAM and one can process arbitrarily large data volumes
* HDF5 implements some sort of directory structure -– the datasets are structured in directories -- similar to files in a file-system.

In Python there are several modules for HDF5: h5py, PyTables, Pandas IO.
In the following example we use h5py. 

HDF5 has many more features – deep directory structures, meta-data attributes, sophisticated compression, ...   

See e.g. https://www.pythonforthelab.com/blog/how-to-use-hdf5-files-in-python/ 

### HDF5 examples with h5py


In [None]:
# create hdf5 data set
import h5py
import numpy as np

arr = np.random.randn(10000) # 10k normally distributed numbers

f = h5py.File('rand1.hdf5', 'w')
f.create_dataset("default", data=arr) # single data set as 'default'
f.close()

In [None]:
# read back in
# read back from hdf5 data
f = h5py.File('rand1.hdf5', 'r')
data = f['default']
print(min(data))
print(max(data))
print(data[:15])

#### hdf5 dataset vs numpy array
An hdf5/h5py dataset is similar to a numpy array and one can copy the content into a numpy array

In [None]:
import matplotlib.pyplot as plt
plt.hist(data,50);

In [None]:
nd = data[:]
print(type(data),type(nd))

**However:** H5PY dataset is only active as long as the file is open:

In [None]:
f.close()
#
print(data[:15]) # error after file closed



#### Store several datasets

In [None]:
nv = 10000
af = np.random.randn(nv)
ai = np.random.randint(0,10,(nv,3))


with h5py.File('rand2.hdf5', 'w') as f:
    f.create_dataset("floats", data=af)
    f.create_dataset("ints", data=ai)

In [None]:
# open for reading
f=h5py.File('rand2.hdf5')
for k in f.keys():
    print(k)

In [None]:
df = f['floats']
print(df[:10])

In [None]:
di=f['ints']
print(di[:10])

In [None]:
# don't forget closing
f.close()