# h5glance

* `h5ls` shows too little, `h5ls -rv` too much
* `hdfview` needs X forwarding, lots of clicking

## Terminal view

In [23]:
!h5glance sample.h5

[94msample.h5[0m
├[94mgroup[0m
│ └[94msubgroup[0m
│   ├[1m0[0m	[float64: 10 × 5 × 0] (1 attributes)
│   ├[1m1[0m	[float64: 10 × 5 × 1] (1 attributes)
│   ├[1m2[0m	[float64: 10 × 5 × 2] (1 attributes)
│   ├[1m3[0m	[float64: 10 × 5 × 3] (1 attributes)
│   ├[1m4[0m	[float64: 10 × 5 × 4] (1 attributes)
│   ├[1m5[0m	[float64: 10 × 5 × 5] (1 attributes)
│   ├[1m6[0m	[float64: 10 × 5 × 6] (1 attributes)
│   ├[1m7[0m	[float64: 10 × 5 × 7] (1 attributes)
│   ├[1m8[0m	[float64: 10 × 5 × 8] (1 attributes)
│   └[1m9[0m	[float64: 10 × 5 × 9] (1 attributes)
└[95mlatest[0m	-> group/subgroup/9



In [24]:
!h5glance sample.h5 group/subgroup/6

sample.h5/group/subgroup/6
      dtype: float64
      shape: 10 × 5 × 6
   maxshape: 10 × 5 × 6
     layout: Contiguous

sample data:
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]

1 attributes:
* square: 36



**Tab completion** for bash & zsh

## Notebook

In [25]:
import h5py
from h5glance import H5Glance

In [26]:
f = h5py.File('sample.h5', 'r')
f

<HDF5 file "sample.h5" (mode r)>

In [27]:
H5Glance(f)

📋: Copy path to clipboard

In [28]:
f['/group/subgroup/1']

<HDF5 dataset "1": shape (10, 5, 1), type "<f8">

# h5py

## File & Group

Like nested dictionaries:

In [29]:
f['group']

<HDF5 group "/group" (1 members)>

In [30]:
for key, value in f.items():
    print(key, '--',  value)

group -- <HDF5 group "/group" (1 members)>
latest -- <HDF5 dataset "latest": shape (10, 5, 9), type "<f8">


In [31]:
f['group']['subgroup']

<HDF5 group "/group/subgroup" (10 members)>

In [32]:
f['group/subgroup']

<HDF5 group "/group/subgroup" (10 members)>

In [33]:
f['group/subgroup/8']

<HDF5 dataset "8": shape (10, 5, 8), type "<f8">

## Datasets

Like numpy arrays, slice to read data:

In [34]:
ds = f['group/subgroup/8']
ds[0, 0:3]

array([[0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.]])

Create small datasets simply from numpy arrays:

In [35]:
import numpy as np

arr = np.arange(30).reshape(5, 6)
arr

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

In [36]:
f2 = h5py.File('demo.h5', 'w')
f2['data'] = arr
f2['data']

<HDF5 dataset "data": shape (5, 6), type "<i8">

Create big datasets without data, then fill them piecewise:

In [37]:
big_dataset = f2.create_dataset('big_data', shape=(1_000_000, 5, 6), dtype=np.int64)

for a in range(10):
    big_dataset[a] = arr

big_dataset[8:12, 0]

array([[0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

Datasets can grow:

In [38]:
growing_dataset = f2.create_dataset('growing_data', shape=(0, 5, 6), maxshape=(None, 5, 6), dtype=np.int64)

for a in range(103):
    dim0 = growing_dataset.shape[0]
    if a >= dim0:
        growing_dataset.resize((dim0 + 10, 5, 6))
    
    growing_dataset[a] = arr

growing_dataset.shape

(110, 5, 6)

In [39]:
f2.close()

## Low-level API

In [40]:
ds.id

<h5py.h5d.DatasetID at 0x7f5c74eeb780>

In [41]:
ds.id.get_offset()

17832

In [42]:
dcpl = ds.id.get_create_plist()
dcpl.get_nfilters()

0

# Virtual datasets

![](vds_concept.svg)

In [43]:
# Create source files (1.h5 to 4.h5)
for n in range(1, 5):
    with h5py.File('{}.h5'.format(n), 'w') as f:
        f['data'] = np.arange(100) + n

In [44]:
# Assemble virtual dataset
layout = h5py.VirtualLayout(shape=(4, 100), dtype='i4')

for n in range(1, 5):
    filename = "{}.h5".format(n)
    vsource = h5py.VirtualSource(filename, 'data', shape=(100,))

    layout[n - 1] = vsource

# Add virtual dataset to output file
with h5py.File("VDS.h5", 'w', libver='latest') as f:
    f.create_virtual_dataset('data', layout, fillvalue=-5)
    print("Virtual dataset:")
    print(f['data'][:, :10])

Virtual dataset:
[[ 1  2  3  4  5  6  7  8  9 10]
 [ 2  3  4  5  6  7  8  9 10 11]
 [ 3  4  5  6  7  8  9 10 11 12]
 [ 4  5  6  7  8  9 10 11 12 13]]


In [45]:
!h5glance VDS.h5

[94mVDS.h5[0m
└[1mdata[0m	[int32: 4 × 100] virtual

