# Overview of HD5

The HDF5 file format is a high performance data software library and file format optimized for heterogeneous, high-dimensional data with support for fast, flexible I/O processing and storage. The format is broadly supported across most popular programming languages, including Python. More information can be found here: https://www.hdfgroup.org/solutions/hdf5/

In Python, the standard library for HDF5 is `h5py`. All `.hdf5` files that you recieve conform to the existing HDF5 standard, and are able to be accessed directly using `h5py`. In addition, a high-level HDF5 API optimized for medical imaging is available as part of the `dl_core` library.

# Usage

## Import dl_core

To use the `dl_core` library, you need to ensure that the repository path has been set. If you are using the python interpreter directlying (e.g. command line) you will need to add the repository path to the `$PYTHONPATH` environment variable. If you are using an iPython interface (e.g. including Jupyter) you will need to set the path using the `sys` module: 

In [None]:
# --- Set PATH to dl_core library path
PATH = '../../' 

# --- Use sys module to set $PYTHONPATH
import sys
if PATH not in sys.path:
    sys.path.append(PATH)

## Import hdf5

In [None]:
# --- Import hdf5 module
from dl_core.io import hdf5

# --- Import other modules
import numpy as np

## Loading Data

In [None]:
# --- Choose a random HDF5 file here
HDF5_FILE = '../../data/hdfs/ID_2e28736ab7/dat.hdf5'

### Loading an entire volume

In [None]:
data, meta = hdf5.load(HDF5_FILE)

In [None]:
# --- Data contains Numpy array with 4D data (Z x Y x X x C)
print(type(data))
print(data.shape)

In [None]:
# --- Meta contains dict with any additional metadata (at minimum, 4 x 4 affine matrix)
print(type(meta))
print(meta['affine'])

### Loading portions of an entire volume

HDF5 files are memory-mapped structures, so that partial opening of an array subvolume (e.g. several slices) is very efficient. To do so using the `dl_core.io.hdf5` library, you need to define an `infos` dictionary with two 3-element lists:

```
infos = {
    'point': [z0, y0, x0],
    'shape': [z1, y1, x1]
}
```

`infos['point']` represents a normalized coordinate from `[0, 1]` that indicates the position of the **center** of the subvolume to be opened.

`infos['shape']` represents the matrix shape of the subvolume to be opened.

If the provided matrix shape extends *beyond* the bounds of the underlying matrix (e.g. a N x 256 x 256 matrix is requested when the underlying matrix is N x 128 x 128, then the image will be **zero-padded** as needed).

**IMPORTANT NOTE**: By default, HDF5 files serialized by the `dl_core.io.hdf5` library are *chunked* in a way that is optimized for slice-by-slice loading. As a result, it is recommended that data is loaded as a series of N-slices. This means that the `y0` and `x0` per above are set to `0.5` (e.g. middle of the slice) and `y1` and `x1` are set to the default slice shape. Setting the `z0` value to random numbers between `[0, 1]` indicate selecting arbitrary slice(s) at random locations throughout the volume. This is the **most common** method to load data for radiology applications

If you want to load data in any other method (e.g. cropped 3D subvolumes) it is recommended that you choose a different *chunking* method for serialization.

In [None]:
# --- Load random single slices from the volume
infos = {
    'point': [None, 0.5, 0.5],
    'shape': [1, 512, 512]
}

for i in range(10):
    infos['point'][0] = np.random.rand()
    data, meta = hdf5.load(HDF5_FILE, infos=infos)
    print(data.shape)

In [None]:
# --- Load random 5 consecutive slices from the volume
infos = {
    'point': [None, 0.5, 0.5],
    'shape': [5, 512, 512]
}

for i in range(10):
    infos['point'][0] = np.random.rand()
    data, meta = hdf5.load(HDF5_FILE, infos=infos)
    print(data.shape)