
### INCF Workshop 

# Integrated Storage and Management of Data & Metadata with NIX

                                    The Neuroscience Information eXchange format

                                    Jan Grewe1, Michael Sonntag2

                                    1 Institute for Neurobiology
                                      Eberhard-Karls-Universität Tübingen
                                    
                                    2 Department Biologie II
                                      Ludwig-Maximilians-Universität München

                                    30.08. - 01.09.2021


![G-Node-logo.png](./resources/G-Node-logo.png)

## Data and Metadata (data annotation) - Tutorial 3

### What are metadata and why are they needed?

Metadata are data about data. As a non-scientific example: title and director of a movie are metadata.

In science metadata describe the conditions under which the raw data of an experimental study were acquired.

Metadata can be anything that is related to an experiment or an analysis step
- stimulus / protocols
- environmental factors e.g. temperature, gas or liquid concentrations, ...
- operational information e.g. experimenter, date, time, organism, strain, ...
- subject information e.g. animal strain, history, ...
- hardware/software used
- settings

Traditionally, actively collected metadata will be found in spreadsheets or lab books. Further metadata is found in raw data files, hardware information, code comments, etc.

The organization of such metadata and their accessibility is not a trivial task, most laboratories developed their home-made solutions over time to keep track of their metadata. The collection and organization of these metadata in its own right is already a tough job since experiments are diverse and may even change over time [lyuba comic?].

Metadata is especially important when trying to make sense of data 
- that you are not familiar with
- that you have not worked with for a while

A hard issue in this respect is that most of the metadata information is usually disconnected from the data it belongs to; searching data and retrieving the corresponding metadata or vice versa is usually not trivial, especially after a period of time has passed.

With NIX, metadata can be stored alongside the data it belongs to, the process of collecting the metadata can be automatized and the results are machine readable and can be searched programatically.

## Data and data annotation in the same file

The entities of the data model that were discussed so far carry just enough information to get a basic understanding of the stored data. Often much more information than that is required.

NIX does not only allow to save initial data and analysed data within the same file. It also allows to create structured annotations of the experiments that were conducted and connects this information directly to the data.

Metadata in NIX files is stored in the [odML format](https://g-node.github.io/python-odml); odML is a hierarchically structured data format that provides grouping in nestable `Sections` and stores information in `Property`-`Value` pairs. `Sections` are the main structural elements, while `Properties` hold the actual metadata information.

## The odml data model in NIX
![](./resources/nix_odML_model_simplified.png)

 On a conceptual level, data and metadata in a NIX file live side by side in parallel trees. The different layers can be connected from the data tree to the metadata tree. Corresponding data can be retrieved when exploring the metadata tree.

    ---------------- NIX File --------
    ├─ Section                  <--- ├─ Block
    |  ├─ Section                    |  ├─ DataArray
    |  |  └─ Property                |  ├─ DataArray
    |  └─ Section                    |  ├─ Tag
    |     └─ Property                |  └─ Multitag
    └─ Section                  <--- └─ Block
       └─ Section               <---    ├─ DataArray
          ├─ Property                   ├─ DataArray
          ├─ Property                   └─ Group
          └─ Property                    


# Storing metadata in NIX

## Metadata basics: creating section-property trees

To introduce the usage of metadata functions in NIX, we'll keep it simple and abstract for now.

In [27]:
# Lets explore the metadata functions of NIX before going more into detail

import nixio as nix

f = nix.File.open("metadata.nix", nix.FileMode.Overwrite)


In [28]:
# As expected there are no metadata in our current file yet.
print(f.sections)


[]


In [40]:
# Lets check how we can create a new Section. Sections can be created from File and Section objects.
help(f.create_section)


Help on method create_section in module nixio.file:

create_section(name, type_='undefined', oid=None) method of nixio.file.File instance
    Create a new metadata section inside the file.
    
    :param name: The name of the section to create.
    :type name: str
    :param type_: The type of the section.
    :type type_: str
    :param oid: object id, UUID string as specified in RFC 4122. If no id
                is provided, an id will be generated and assigned.
    :type oid: str
    
    :returns: The newly created section.
    :rtype: nixio.Section



In [30]:
# First we need to create a Section that can hold our annotations. We'll use abstract names and types for now.
sec = f.create_section(name="experiment_42", type_="project_AB")

f.sections


[Section: {name = experiment_42, type = project_AB}]

In [31]:
# Like other NIX objects Section names on the same level have to be unique
section = f.create_section(name="experiment_42", type_="project_AB")

DuplicateName: Duplicate name - names have to be unique for a given entity type & parent. (create_section)

In [32]:
# Sections can hold further multiple Sections as well as multiple Properties.
sec.sections


[]

In [33]:
# The section currently does not contain any Properties.
sec.props

[]

In [34]:
# We want to add information about a subject that was used in the experiment.
sub_sec = sec.create_section(name="subject", type_="experiment_42")


In [42]:
# Properties can be created from Section objects.
help(sub_sec.create_property)


Help on method create_property in module nixio.section:

create_property(name='', values_or_dtype=0, oid=None, copy_from=None, keep_copy_id=True) method of nixio.section.Section instance
    Add a new property to the section.
    
    :param name: The name of the property to create/copy.
    :type name: str
    :param values_or_dtype: The values of the property or a valid DataType.
    :type values_or_dtype: list of values or a nixio.DataType
    :param oid: object id, UUID string as specified in RFC 4122. If no id
                is provided, an id will be generated and assigned.
    :type oid: str
    :param copy_from: The Property to be copied, None in normal mode
    :type copy_from: nixio.Property
    :param keep_copy_id: Specify if the id should be copied in copy mode
    :type keep_copy_id: bool
    
    :returns: The newly created property.
    :rtype: nixio.Property



In [56]:
# We'll add metadata about subjectID, subject species and subject age as Properties to the "Subject" section.
prop = sub_sec.create_property(name="subjectID", values_or_dtype="78376446-f096-47b9-8bfe-ce1eb43a48dc")
prop = sub_sec.create_property(name="species", values_or_dtype="Mus Musculus")

# To fully describe metadata, properties support saving "unit" and "uncertainty" together with values.
prop = sub_sec.create_property(name="age", values_or_dtype="4")
prop.unit = "weeks"


DuplicateName: Duplicate name - names have to be unique for a given entity type & parent. (create_property)

In [50]:
# Lets check what we have so far at the root of the file.
f.sections


[Section: {name = experiment_42, type = project_AB}]

In [53]:
# File and Sections also support the "pprint" function to make it easier to get an overview 
# over the contents of the metadata tree.
f.pprint()

File: name = metadata.nix
  experiment_42 [project_AB]
    subject [experiment_42]
        |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
        |- species: ('Mus Musculus',)
        |- age: ('4',)weeks


In [54]:
# We access all Properties of the subsection containing subject related information.
# Sections can be accessed via index or via name
f.sections[0].sections['subject'].props


[Property: {name = subjectID}, Property: {name = species}, Property: {name = age}]

In [55]:
# We can also again use the pprint function
f.sections[0].sections['subject'].pprint()

subject [experiment_42]
    |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
    |- species: ('Mus Musculus',)
    |- age: ('4',)weeks


## Connecting data and metadata

Until now we have seen how to create and store metadata in NIX files. Now we can check how to connect them to actualy data.

## Automated handling of metadata

Metadata can become quite complex and it can become tedious to create large trees over and over again. To this end, "template" sections can be created and re-used.

As an example: when running an experiment, usually there are a couple of different stimulus protocols or one or two hardware setups, but the stimulus or the hardware itself does not change. When adding data to an existing NIX file, the hardware metadata can be pre-defined for these setups and attached to the specific experimental data once it is stored in the file.

In [None]:
f 

In [None]:
# We can now connect the Section describing our experiment directly to the MultiTag 
#  that references both the raw as well as the analysed data.

multi_tag = f.blocks['tag_examples'].multi_tags['tag_A']
multi_tag.metadata = f.sections['experiment_42']


In [None]:
# Now when we look at the data via a MultiTag we can directly access all metadata that has been attached to it.
# E.g. get information about the subject the experiment was conducted with.
multi_tag.metadata.sections['subject'].props


In [None]:
# We can also attach the same Section to the raw DataArray itself e.g. when no MultTags have been used.
init_data = f.blocks['tag_examples'].data_arrays['membrane_voltage_A']
init_data.metadata = f.sections['experiment_42']


In [None]:
# And we can also find it in reverse: we can select a Section and find all data, that are connected to it.
sec = f.sections['experiment_42']

# Either via connected DataArrays.
sec.referring_data_arrays


In [None]:
# Or via connected MultiTags.
sec.referring_multi_tags


In [26]:
# And finally we close our file.
f.close()


## Try it out

Now we move on to an actual exercise.

The public repository https://gin.g-node.org/RDMcourse2020/demo-lecture-07 contains a Jupyter notebook "2020_RDM_course_nix_exercise.ipynb".

Start it either
- locally if you can use Python and make sure all dependencies are installed.
- or use Binder if you cannot use Python locally. The repository is already set up for the use with Binder. Check the last lecture if you are unsure how to start the notebook using Binder.

This repository further contains a folder called "excercise". It contains calcium imaging data and rough metadata about the recordings.

The exercise is to
- read through the README.md and briefly familiarize yourself with the project and the data.
- load the raw data to the notebook. Ideally transfer the "obj_substracted" column from the data files (column 3) but it can be any other column as well.
- the "time_elapsed" column is roughly 100ms. If you want to you can use a SampledDimension with an interval of 100 which should be easier or try to include the real times as a RangeDimension.
- create a new NIX file and put the raw data traces into NIX DataArrays including labels and units - note that the signal is Flourescence with unit AU (arbitrary unit). 
- plot data from these DataArrays.
- read through the metadata, try to put useful metadata into a NIX Section/Property structure and connect it to the DataArrays. Examples would be
  - original file names of raw data files.
  - species.
  - recording equipment.

- identify and specify a region of interest via the used shift paradigm with start and extent and try to create a MultiTag connecting all three DataArrays via the same paradigm MultiTag.

Alternatively you can also take some of your own data and try to put it into a NIX file along with some of your metadata.