# hdf5

- hide: false
- toc: true
- comments: true
- categories: [python]

In [1]:
import h5py
import numpy as np

Notes on basic hdf5 use.

## Create a file

In [2]:
f = h5py.File('demo.hdf5', 'w')

In [3]:
data = np.arange(10)
data

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
f['array'] = data

In [5]:
dset = f['array']

In [6]:
dset

<HDF5 dataset "array": shape (10,), type "<i8">

In [7]:
dset[:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]:
dset[[1, 2, 5]]

array([1, 2, 5])

Add additional data

In [9]:
f['dataset'] = data

In [10]:
f['full/dataset'] = data

In [11]:
list(f.keys())

['array', 'dataset', 'full']

In [12]:
grp = f['full']

In [13]:
'dataset' in grp

True

In [14]:
list(grp.keys())

['dataset']

Create dataset

In [15]:
dset = f.create_dataset('/full/bigger', (10000, 1000, 1000, 1000), compression='gzip')

## Set attributes

In [16]:
dset.attrs

<Attributes of HDF5 object at 140618810188336>

Atributes again have dictionarry structure, so can add attribute like so:

In [17]:
dset.attrs['sampling frequency'] = 'Every other week between 1 Jan 2001 and 7 Feb 2010'
dset.attrs['PI'] = 'Fabian'

In [18]:
list(dset.attrs.items())
for i in dset.attrs.items():
    print(i)

('PI', 'Fabian')
('sampling frequency', 'Every other week between 1 Jan 2001 and 7 Feb 2010')


## Open file

In [19]:
f.close()

In [20]:
f = h5py.File('demo.hdf5', 'r')

In [21]:
list(f.keys())

['array', 'dataset', 'full']

In [22]:
dset = f['array']

hdf5 files are organised in a hierarchy - that's what the "h" stands for.

In [23]:
dset.name

'/array'

In [24]:
root = f['/']

In [25]:
list(root.keys())

['array', 'dataset', 'full']

In [26]:
list(f['full'].keys())

['bigger', 'dataset']

## Sources

- [Managing Large Datasets with Python and HDF5 - O'Reilly Webcast](https://www.youtube.com/watch?v=wZEFoVUu8h0)