
### INCF Workshop 

# Integrated Storage and Management of Data & Metadata with NIX

                                    The Neuroscience Information eXchange format

                                    Jan Grewe1, Michael Sonntag2

                                    1 Institute for Neurobiology
                                      Eberhard-Karls-Universität Tübingen
                                    
                                    2 Department Biologie II
                                      Ludwig-Maximilians-Universität München

                                    30.08. - 01.09.2021


![G-Node-logo.png](./resources/G-Node-logo.png)

## Data and Metadata (data annotation) - Tutorial 3

### What are metadata and why are they needed?

Metadata are data about data. They describe the conditions under which the actual raw-data of an experimental study were acquired. The organization of such metadata and their accessibility may sound like a trivial task, and most laboratories developed their home-made solutions to keep track of their metadata. Most of these solutions, however, break down if data and metadata need to be shared within a collaboration, because implicit knowledge of what is important and how it is organized is often underestimated.

## Data and data annotation in the same file

The entities of the data model that were discussed so far carry just enough information to get a basic understanding of the stored data. Often much more information than that is required.

NIX does not only allow to save initial data and analysed data within the same file. It also allows to create structured annotations of the experiments that were conducted and connects this information directly to the data.

Metadata in NIX files is stored in the [odML format](https://g-node.github.io/python-odml) and is saved side by side with the actual "DataTree" in a "MetadataTree" but can easily be connected to Data in the DataTree.

odML is a hierarchically structured data format that provides grouping in nestable `Sections` and stores information in `Property`-`Value` pairs.

## The odml data model
![](./resources/nix_odML_model_simplified.png)

On a conceptual level, data and metadata in a NIX file live side by side in parallel trees. The different layers can be connected from the data side to the metadata side. Corresponding data can be retrieved when exploring the metadata tree.

    -----------Nix File---
    ├─ Section           ├─ Block
    |  ├─ Section        |  ├─ DataArray
    |  |  └─ Property    |  ├─ DataArray
    |  └─ Section        |  ├─ Tag
    |     └─ Property    |  └─ Multitag
    └─ Section           └─ Block
       └─ Section           ├─ DataArray
          ├─ Property       ├─ DataArray
          ├─ Property       └─ Group
          └─ Property                    


In [None]:
# Let us annotate a DataArray of our last example.

# As we can see, we have not stored any metadata in our current file yet.
f.sections


In [None]:
# Lets check how we can create a new Section.
help(f.create_section)


In [None]:
# First we need to create a Section that can hold our annotations.
section = f.create_section(name="experiment_42", type_="project_AB")

f.sections


In [None]:
# This Section can hold further Sections as well as Properties.
section.sections


In [None]:
section.props

In [None]:
# Lets store additional information about the raw data of our MultiTag example.

# We want to add information about the subject that was used in the experiment.
sub_sec = section.create_section(name="subject", type_="experiment_42")


In [None]:
# Lets add some Properties to this Section
help(sub_sec.create_property)


In [None]:
# We'll add information about subjectID, subject species and subject age. 
prop = sub_sec.create_property(name="subjectID", values_or_dtype="78376446-f096-47b9-8bfe-ce1eb43a48dc")
prop = sub_sec.create_property(name="species", values_or_dtype="Mus Musculus")
prop = sub_sec.create_property(name="age", values_or_dtype="4")
prop.unit = "weeks"


In [None]:
# Lets check what we have so far at the root of the file.
f.sections


In [None]:
# We list all Sections that our main Section documenting our "tag_example" holds.
f.sections['experiment_42'].sections


In [None]:
# We access all Properties of the subsection containing subject related information.
f.sections['experiment_42'].sections['subject'].props


In [None]:
# We can now connect the Section describing our experiment directly to the MultiTag 
#  that references both the raw as well as the analysed data.

multi_tag = f.blocks['tag_examples'].multi_tags['tag_A']
multi_tag.metadata = f.sections['experiment_42']


In [None]:
# Now when we look at the data via a MultiTag we can directly access all metadata that has been attached to it.
# E.g. get information about the subject the experiment was conducted with.
multi_tag.metadata.sections['subject'].props


In [None]:
# We can also attach the same Section to the raw DataArray itself e.g. when no MultTags have been used.
init_data = f.blocks['tag_examples'].data_arrays['membrane_voltage_A']
init_data.metadata = f.sections['experiment_42']


In [None]:
# And we can also find it in reverse: we can select a Section and find all data, that are connected to it.
sec = f.sections['experiment_42']

# Either via connected DataArrays.
sec.referring_data_arrays


In [None]:
# Or via connected MultiTags.
sec.referring_multi_tags


In [None]:
# And finally we close our file.
f.close()


## Try it out

Now we move on to an actual exercise.

The public repository https://gin.g-node.org/RDMcourse2020/demo-lecture-07 contains a Jupyter notebook "2020_RDM_course_nix_exercise.ipynb".

Start it either
- locally if you can use Python and make sure all dependencies are installed.
- or use Binder if you cannot use Python locally. The repository is already set up for the use with Binder. Check the last lecture if you are unsure how to start the notebook using Binder.

This repository further contains a folder called "excercise". It contains calcium imaging data and rough metadata about the recordings.

The exercise is to
- read through the README.md and briefly familiarize yourself with the project and the data.
- load the raw data to the notebook. Ideally transfer the "obj_substracted" column from the data files (column 3) but it can be any other column as well.
- the "time_elapsed" column is roughly 100ms. If you want to you can use a SampledDimension with an interval of 100 which should be easier or try to include the real times as a RangeDimension.
- create a new NIX file and put the raw data traces into NIX DataArrays including labels and units - note that the signal is Flourescence with unit AU (arbitrary unit). 
- plot data from these DataArrays.
- read through the metadata, try to put useful metadata into a NIX Section/Property structure and connect it to the DataArrays. Examples would be
  - original file names of raw data files.
  - species.
  - recording equipment.

- identify and specify a region of interest via the used shift paradigm with start and extent and try to create a MultiTag connecting all three DataArrays via the same paradigm MultiTag.

Alternatively you can also take some of your own data and try to put it into a NIX file along with some of your metadata.