# Purpose of package *h5py*

- Python interface which is as close as possible to interface of the HDF5 C library,
  providing almost all features and
- map feature set of HDF5 as close as possible to *NumPy* features, e.g.:

  + high level type system uses NumPy *dtype* objects
  + attribute and naming conventions like NumPy
  + mapping between HDF5 errors and exceptions
  
Makes HDF5 "pythonic", but adds no new features (e.g. no indexing, see *PyTables*).
  

# Installation

Conda (binary package incl.depependencies):

    conda install h5py
   
pip:

    pip install h5py
   
h5py Tutorial:
 
> Installing on Windows from source practically impossible because       
> of the C library dependencies involved*.


# Creating an HDF5 file

In [182]:
try:
    h5.close()
except:
    pass

In [183]:
!rm example.h5

In [184]:
import h5py
h5py.enable_ipython_completer()

In [185]:
h5 = h5py.File('example.h5', 'w')

## Creating some groups and a link

`File` object is *root* group:

In [186]:
h5.name

'/'

In [187]:
g_sim = h5.create_group('sim')

In [188]:
g_sim

<HDF5 group "/sim" (0 members)>

In [189]:
g1 = h5.create_group('/sim/001')

In [190]:
g2 = g_sim.create_group('002')

In [191]:
g2

<HDF5 group "/sim/002" (0 members)>

Path-like addressing

In [192]:
g2.name

'/sim/002'

In [193]:
list(g_sim.keys())

['001', '002']

In [194]:
# g.create_dataset?

Creating a hard link (not a copy!):

In [195]:
h5['link-to-002'] = g2

In [196]:
list(h5.keys()) 

['link-to-002', 'sim']

<div class="alert alert-block alert-info">
When using h5py from Python 3, the keys(), values() and items() methods will return view-like objects instead of lists. These objects support containership testing and iteration, but can’t be sliced like lists.
</div>

In [257]:
h5['link-to-002'] == h5['/sim/002']

True

# Create datasets

Datasets:

- like NumPy arrays: homogenous collections of data elements, with an immutable datatype and (hyper)rectangular shape
- additionally: compression, error-detection, and chunked I/O


In [198]:
dsA = g1.create_dataset("A", shape=(10000,10,11), dtype='float32')
# dsA = g1.create_dataset("A", shape=(10000,10,11), dtype='float32', compression=7)

In [199]:
dsA

<HDF5 dataset "A": shape (10000, 10, 11), type "<f4">

Or from existing data:

In [200]:
import numpy as np

In [201]:
arr = np.random.randn(200,3)

In [202]:
dsB = g1.create_dataset("B", data=arr)

In [203]:
list(dsB.dims.keys())

[<"" dimension 0 of HDF5 dataset at 140144441544440>,
 <"" dimension 1 of HDF5 dataset at 140144441544440>]

In [204]:
dsA[:,::2,:]=-np.arange(-1, -12, -1)

In [205]:
dsA.value

array([[[  1.,   2.,   3., ...,   9.,  10.,  11.],
        [  0.,   0.,   0., ...,   0.,   0.,   0.],
        [  1.,   2.,   3., ...,   9.,  10.,  11.],
        ..., 
        [  0.,   0.,   0., ...,   0.,   0.,   0.],
        [  1.,   2.,   3., ...,   9.,  10.,  11.],
        [  0.,   0.,   0., ...,   0.,   0.,   0.]],

       [[  1.,   2.,   3., ...,   9.,  10.,  11.],
        [  0.,   0.,   0., ...,   0.,   0.,   0.],
        [  1.,   2.,   3., ...,   9.,  10.,  11.],
        ..., 
        [  0.,   0.,   0., ...,   0.,   0.,   0.],
        [  1.,   2.,   3., ...,   9.,  10.,  11.],
        [  0.,   0.,   0., ...,   0.,   0.,   0.]],

       [[  1.,   2.,   3., ...,   9.,  10.,  11.],
        [  0.,   0.,   0., ...,   0.,   0.,   0.],
        [  1.,   2.,   3., ...,   9.,  10.,  11.],
        ..., 
        [  0.,   0.,   0., ...,   0.,   0.,   0.],
        [  1.,   2.,   3., ...,   9.,  10.,  11.],
        [  0.,   0.,   0., ...,   0.,   0.,   0.]],

       ..., 
       [[  1.,   2., 

In [206]:
!ls -al *.h5

-rw-rw-r-- 1 mcrot mcrot 4409240 Apr 10 16:06 example.h5


In [207]:
10000*10*10*8

8000000

In [208]:
h5['/sim/002/C'] = np.random.randint(0,10,1000)

In [209]:
# del h5['/sim/002/C']

In [210]:
h5.flush()

In [211]:
# h5.close()

# Resizing a dataset

In [212]:
dsD = h5.create_dataset("unlimited", (100, 10), maxshape=(None, 10))
dsD[:] = np.random.randn(100,10)
dsD.size

In [213]:
h5.flush()

In [214]:
dsD

<HDF5 dataset "unlimited": shape (100, 10), type "<f4">

In [227]:
dsD.resize(200, axis=0)

In [228]:
dsD.size

2000

In [225]:
dsD[100:,:] = 1

In [226]:
h5.flush()

See in *hdfview* or *hdf_compass*!

# Using dimensions????

In [215]:
list(dsB.dims.keys())

[<"" dimension 0 of HDF5 dataset at 140144441544440>,
 <"" dimension 1 of HDF5 dataset at 140144441544440>]

In [216]:
h5['position'] = np.arange(200)
h5['flag'] = [0,1,2]

In [217]:
dsB.dims.create_scale(h5['position'])
dsB.dims.create_scale(h5['flag'])

In [218]:
d = dsB.dims[0]

In [219]:
d.attach_scale(h5['position'])
d.attach_scale(h5['flag'])

In [220]:
dsB.dims[0].label = 'position'
dsB.dims[1].label = 'flag'

In [221]:
[ d.label for d in dsB.dims ]

['position', 'flag']

In [222]:
h5.flush()

# Attributes (Metadata)

In [236]:
dsB.attrs['experiment'] = 123
dsB.attrs['description'] = "random data and integers mixed"

In [237]:
list(dsB.attrs.keys())

['DIMENSION_LIST', 'DIMENSION_LABELS', 'experiment', 'description']

In [238]:
h5.flush()

# References

HDF5 does not only support hard and soft links, but also *references*.

<div class="alert alert-block alert-info">
 References are low-level pointers to other objects with can be used like data.
</div>

So e.g. you can generate datasets with references or use references in attributes, e.g. as pointer to a time vector:

In [254]:
dsB.attrs['D'] = dsD.ref

In [255]:
h5.flush()

In [256]:
h5[dsB.attrs['D']]

<HDF5 dataset "unlimited": shape (200, 10), type "<f4">