The Hierarchical Neuron Format (HNF) is a schema for storing neuron morphologies and meta data in Hdf5 files.
We provide read/write implementations for R and Python:
There are a few file formats that can store neuron morphology. To name but a few:
- SWC for simple skeletons
- NeuroML is an XML-based format primarily used for modelling but can store compartment models (i.e. skeletons) of neurons and meta data
- NWB (neurodata without borders) is an HDF5-based format focused on physiological data
- NRRD files can be used to store dotprops
Why then start a new format?
Because none of the existing formats tick all the boxes! We need a file format that can hold:
- thousands of neurons
- multiple representations (mesh, skeleton, dotprops) of a given neuron
- annotations (e.g. synapses) associated with each neuron
- meta data such as names, soma positions, etc.
Enter HDF5: basically a filesystem-in-a-file. The important thing is that we
don't have to worry about how data is en-/decoded because other libraries
(like h5py
for Python or hdf5r
for R) take care of that. All we have to
do is come up with a schema.
HDF5 knows "groups" (=folders), "datasets" and "attributes". The basic idea for our schema is this:
- the
root
contains info about the format as attributes - each group in
root
represents a neuron and the group's name is the neuron's ID - a neuron group holds and meta data, and the neuron's representations (mesh, skeleton and/or dotprops) and annotations in separate sub-groups
To illustrate the basic principle:
.
├── attrs: format-related meta data
├── group: neuron1
│ ├── attrs: neuron-related meta data
│ ├── group: skeleton
│ | ├── attrs: skeleton-related meta data
| | └── datasets: node table, etc
│ ├── group: dotprops
│ | ├── attrs: dotprops-related meta data
| | └── datasets: points, tangents, alpha, etc
│ ├── group: mesh
│ | ├── attrs: mesh-related meta data
| | └── datasets: vertices, faces, etc
| └── group: annotations
| └── group: e.g. connectors
| ├── attrs: connector-related meta data
| └── datasets: connector data
├── group: neuron2
| ├── ...
...
The root meta data must contain two attributes:
format_spec
specifies format and versionformat_url
points to a library or format specifications
.
├── attr['format_spec']: str = 'hnf_v1'
├── attr['format_url']: str = 'https://github.com/schlegelp/navis'
...
Each neuron group contains properties that apply to all the neuron's potential
representations - for example a neuron_name
. Note that if an attribute is
defined at the neuron level and again at a deeper level (i.e. the skeleton,
mesh or dotprops), the more proximal attribute takes precedence for a given
representation.
.
└── group['123456'] # note that numeric IDs will be "stringified"
├── attr["neuron_name"]: str = "some name"
...
Attributes:
units_nm
(float | int | tuple, optional): specifies the units in nanometer space - can be a tuple of(x, y, z)
if units are non-isotropicsoma
(int, optional): the node ID of the soma
Datasets:
node_id
(int): IDs for the nodesparent_id
(int): for each node, the ID of it's parent; nodes with out parents (i.e. roots) haveparent_id
of-1
x
,y
,z
(float | int): node coordinatesradius
(float | int, optional): radius for each node
└── group['123456']
├── attr['neuron_name'] = "example neuron with a skeleton"
├── attr['units_nm'] = (4, 4, 40)
└── grp['skeleton']
├── attr['soma']: 1
├── ds['node_id']: (N, ) array
├── ds['parent_id']: (N, ) array
├── ds['x']: (N, ) array
├── ds['y']: (N, ) array
├── ds['z']: (N, ) array
└── ds['radius']: (N, ) array, optional
Meshes are principally represented as vertices + triangular faces (navis
is using trimesh under the hood).
Attributes:
units_nm
(float | int | tuple, optional): specifies the units in nanometer space - can be a tuple of(x, y, z)
if units are non-isotropicsoma
(tuple, optional): tuple of(x, y, z)
coordinates of the soma
Datasets:
vertices
(int | float): (N, 3) array of vertex positionsfaces
(int): (M, 3) array of vertex indices forming the faces (indices start at 0)skeleton_map
(int, optional): (N, ) array mapping each vertex to a node ID in the skeleton
└── group['4353421']
├── attr['neuron_name'] = "example neuron with a mesh"
├── attr['units_nm'] = (4, 4, 40)
└── grp['mesh']
├── attr['soma']: (1242, 6533, 400)
├── ds['vertices']: (N, 3) array
├── ds['faces']: (M, 3) array
└── ds['skeleton_map']: (N, ) array, optional
Attributes:
k
(int): number of k-nearest neighbours used to calculate the tangent vectors from the point cloudunits_nm
(float | int | tuple, optional): specifies the units in nanometer space - can be a tuple of(x, y, z)
if units are non-isotropicsoma
(tuple, optional): tuple of(x, y, z)
coordinates of the soma
Datasets:
points
(int | float): (N, 3) array of x/y/z positionsvect
(int | float, optional): (N, 3) array of tangent vectors -
generated if not providedalpha
(int | float, optional): (N, ) array of alpha values for each point inpoints
generated if not provided
└── group['65432']
├── attr['neuron_name'] = "example neuron with dotprops"
└── grp['dotprops']
├── attr['k'] = 5
├── attr['units_nm'] = (4, 4, 40)
├── attr['soma']: (1242, 6533, 400)
├── ds['points']: (N, 3) array
├── ds['vect']: (N, 3) array
└── ds['alpha']: (N, ) array
Annotations are meant to be flexible and are principally parsed into pandas DataFrames. Because they won't follow a common format, it is good practice to leave some (optional) meta data pointing to columns containing data relevant for e.g. plotting:
Attributes:
point_col
(str | list thereof): pointer to the column(s) containing x/y/z positionstype_col
(str): pointer to a column specifying typesskeleton_map
(str): pointer to a column associating the row with a node ID in the skeleton
Let's illustrate this with a mock synapse table:
└── group['32434566']
├── attr['neuron_name'] = "example neuron with synapse annotations"
├── attr['units_nm'] = 1
└── grp['annotations']
└── grp['synapses']
├── attr['points']: ['x', 'y', 'z']
├── attr['types']: 'prepost'
├── attr['skeleton_map']: 'node_id'
├── ds['x']: (N, ) array
├── ds['x']: (N, ) array
├── ds['z']: (N, ) array
├── ds['prepost']: (N, ) array of [0, 1, 2, 3, 4]
└── ds['node_id']: (N, )
"Hidden" attributes & datasets
It can be useful to have attributes and datasets that contain information that's only pertinent for the reader/writer but does not directly relate to the neuron.
For this, we prefix the attribute/dataset with a .
:
└── group['4353421']
├── attr['neuron_name'] = "example neuron with a mesh"
├── attr['units_nm'] = (4, 4, 40)
├── attr['.hidden_attribute'] = "typically ignored when reading"
└── grp['mesh']
├── attr['soma']: (1242, 6533, 400)
We use hidden attributes to e.g. store a serialized version of a neuron instead/ in addition to the raw data to speed up reading the data.
The above schema describes a "minimal" layout - i.e. we expect no less
data than that. However, e.g. the navis
implementations for reading/writing
the schema are flexible: you can add more attributes or datasets
and navis
will by default try to read and attach them to the neuron.
Ish? The format is versioned and I will maintain readers/writers for
past versions in navis
. In other good news: the HDF5 backend is
stable - so even if navis
acts up when parsing your file, you can
always read it manually using h5py
.
The current version of the format is 1.0.
Changes:
- 2021/02/01: Version 1.0