# Purpose of package *h5py*

- Python interface which is as close as possible to interface of the HDF5 C library,
  providing almost all features and
- map feature set of HDF5 as close as possible to *NumPy* features, e.g.:

  + high level type system uses NumPy *dtype* objects
  + attribute and naming conventions like NumPy
  + mapping between HDF5 errors and exceptions
  
Makes HDF5 "pythonic", but adds no new features (e.g. no indexing, see *PyTables*).
  

# Installation

Conda (binary package incl.depependencies):

    conda install h5py
   
pip:

    pip install h5py
   
h5py Tutorial:
 
> Installing on Windows from source practically impossible because       
> of the C library dependencies involved*.


# Creating an HDF5 file

In [37]:
try:
    h5.close()
except:
    pass

In [38]:
!rm example.h5

rm: das Entfernen von »example.h5“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden


In [39]:
import h5py
h5py.enable_ipython_completer()

In [40]:
h5 = h5py.File('example.h5', 'w')

## Creating some groups and a link

`File` object is *root* group:

In [41]:
h5.name

'/'

In [42]:
g_sim = h5.create_group('sim')

In [43]:
g_sim

<HDF5 group "/sim" (0 members)>

In [44]:
g1 = h5.create_group('/sim/001')

In [45]:
g2 = g_sim.create_group('002')

In [46]:
g2

<HDF5 group "/sim/002" (0 members)>

Path-like addressing

In [47]:
g2.name

'/sim/002'

In [48]:
list(g_sim.keys())

['001', '002']

In [49]:
# g.create_dataset?

Creating a hard link (not a copy!):

In [50]:
h5['link-to-002'] = g2

In [51]:
list(h5.keys()) 

['link-to-002', 'sim']

<div class="alert alert-block alert-info">
When using h5py from Python 3, the keys(), values() and items() methods will return view-like objects instead of lists. These objects support containership testing and iteration, but can’t be sliced like lists.
</div>

In [52]:
h5['link-to-002'] == h5['/sim/002']

True

# Create datasets

Datasets:

- like NumPy arrays: homogenous collections of data elements, with an immutable datatype and (hyper)rectangular shape
- additionally: compression, error-detection, and chunked I/O


In [53]:
dsA = g1.create_dataset("A", shape=(10000,10,11), dtype='float32')
# dsA = g1.create_dataset("A", shape=(10000,10,11), dtype='float32', compression=7)

In [54]:
dsA

<HDF5 dataset "A": shape (10000, 10, 11), type "<f4">

Or from existing data:

In [55]:
import numpy as np

In [56]:
arr = np.random.randn(200,3)

In [57]:
dsB = g1.create_dataset("B", data=arr)

In [58]:
list(dsB.dims.keys())

[<"" dimension 0 of HDF5 dataset at 140144435541800>,
 <"" dimension 1 of HDF5 dataset at 140144435541800>]

In [59]:
dsA[:,::2,:]=-1

In [60]:
dsA.value

array([[[-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., -1.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

       [[-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., -1.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

       [[-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., -1.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

       ..., 
       [[-1., -1., -1., ..., -1., -1., -1.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., -1., -1., ..., -1., -1., 

In [61]:
!ls -al *.h5

-rw-rw-r-- 1 mcrot mcrot 4409240 Apr 10 15:42 example.h5


In [62]:
10000*10*10*8

8000000

In [63]:
h5['/sim/002/C'] = np.random.randint(0,10,1000)

In [71]:
# del h5['/sim/002/C']

In [65]:
h5.flush()

In [66]:
# h5.close()

# Using dimensions????

In [81]:
list(dsB.dims.keys())

[<"x" dimension 0 of HDF5 dataset at 140144435541800>,
 <"y" dimension 1 of HDF5 dataset at 140144435541800>]

In [73]:
h5['position'] = np.arange(200)
h5['flag'] = [0,1,2]

In [74]:
dsB.dims.create_scale(h5['position'])
dsB.dims.create_scale(h5['flag'])

In [83]:
d = dsB.dims[0]

In [85]:
d.attach_scale(h5['position'])
d.attach_scale(h5['flag'])

In [87]:
dsB.dims[0].label = 'position'
dsB.dims[1].label = 'flag'

In [91]:
[ d.label for d in dsB.dims ]

['position', 'flag']

In [70]:
h5.flush()

# References

HDF5 does not only support hard and soft links, but also *references*.

<div class="alert alert-block alert-info">
 References are low-level pointers to other objects with can be used like data.
</div>

So e.g. you can generate datasets with references or use references in attributes, e.g. as pointer to a time vector.