## Data and Metadata (data annotation) - Tutorial 3

### What are metadata and why are they needed?

Metadata are data about data. As a non-scientific example: *title*, *date* or *director* of a *movie* are metadata.

In science metadata describe the conditions under which the raw data of an experimental study were acquired or analysed.

Metadata can be anything that is related to an experiment or an analysis step:
- stimulus; protocols
- environmental factors e.g. temperature, gas or liquid concentrations, ...
- operational information e.g. experimenter, date, time, ...
- subject information e.g. animal strain, history, ...
- hardware and software used, versions, updates and customizations
- settings

Traditionally, actively collected metadata will be found in spreadsheets or lab books. Further metadata is found in raw data files (header or manufacturer documentation), hardware information, code comments, etc.

All of these information might be required to fully understand how an experiment was conducted and the data analysed.

The organization of such metadata and their accessibility is not a trivial task, most laboratories developed their home-made solutions over time to keep track of their metadata. The collection and organization of these metadata in its own right is a tough job since experiments are diverse and may even change over time.

Metadata is especially important when trying to make sense of data 
- that you are not familiar with
- that you have not worked with for a while

A hard issue in this respect is that most of the metadata information is usually disconnected from the data it belongs to; searching data and retrieving the corresponding metadata or vice versa is usually not trivial, especially after a period of time has passed.

With NIX, metadata can be stored alongside the data it belongs to.

The process of collecting the metadata can be automatized and the results are machine readable and can be searched programatically.

## Data and data annotation in the same NIX file

The entities of the NIX data model that were discussed so far carry enough information to get sufficient knowledge to understand the stored data (dimensions, units, labels). Often much more information than that is required to fully interpret the underlying experiment.

NIX does not only allow to save initial data and analysed data within the same file. It also allows to create structured annotations of the experiments that were conducted and connects this information directly with the data.

Metadata in NIX files is stored in the [odML format](https://g-node.github.io/python-odml):
- odML is a hierarchically structured data format that provides grouping in nestable `Sections`.
- `Sections` can hold both `Sections` and `Properties`.
- metadata information is stored in `Property`-`Value` pairs.
- `Sections` are the main structural elements, while `Properties` hold the actual metadata information.

### The odml data model in NIX
![](./resources/nix_odML_model_simplified.png)

 On a conceptual level, data and metadata in a NIX file live side by side in parallel trees. The different layers can be connected from the data tree to the metadata tree. Corresponding data can be retrieved when exploring the metadata tree.

    --------------- NIX File --------
    ├─ Section              <---     ├─ Block
    |  ├─ Section                    |  ├─ DataArray
    |  |  └─ Property                |  ├─ DataArray
    |  └─ Section                    |  ├─ Tag
    |     └─ Property                |  └─ Multitag
    └─ Section              <---     └─ Block
       └─ Section           <---        ├─ DataArray
          ├─ Property                   ├─ DataArray
          ├─ Property                   └─ Group
          └─ Property                    


# Storing metadata in NIX

## Metadata basics: creating Section-Property trees and navigation

To introduce the usage of metadata functions in NIX, we'll keep it simple and abstract for now.

In [1]:
import nixio


In [2]:
# Lets explore the metadata functions of NIX before going more into detail
# We will re-use this file throughout the following examples
f = nixio.File.open("metadata.nix", nixio.FileMode.Overwrite)


In [3]:
# As expected there are no metadata in our current file yet.
print(f.sections)


[]


In [4]:
# Lets check how we can create a new Section. Sections can be created from File and Section objects.
help(f.create_section)


Help on method create_section in module nixio.file:

create_section(name, type_='undefined', oid=None) method of nixio.file.File instance
    Create a new metadata section inside the file.
    
    :param name: The name of the section to create.
    :type name: str
    :param type_: The type of the section.
    :type type_: str
    :param oid: object id, UUID string as specified in RFC 4122. If no id
                is provided, an id will be generated and assigned.
    :type oid: str
    
    :returns: The newly created section.
    :rtype: nixio.Section



You can find the class information including all available methods in the nixpy readthedocs API entry for [nix.sections](https://nixpy.readthedocs.io/en/latest/api/nixio.html#module-nixio.section).

In [5]:
# First we need to create a Section that can hold our annotations. 
sec = f.create_section(name="recording.20210405", 
                       type_="raw.data.recording")

f.sections


[Section: {name = recording.20210405, type = raw.data.recording}]

In [6]:
# Like other NIX objects Section (and Property) names on the same 
# level have to be unique. Otherwise a 'DuplicateName' exception 
# will be raised.
section = f.create_section(name="recording.20210405", 
                           type_="raw.data.recording")

DuplicateName: Duplicate name - names have to be unique for a given entity type & parent. (create_section)

In [8]:
# Sections can hold further multiple Sections as well as 
# multiple Properties.
sec.sections


[]

In [9]:
# The section currently does not contain any Properties.
sec.props

[]

In [11]:
# We want to add information about a subject that was used in the 
# experiment.
sub_sec = sec.create_section(name="subject", 
                             type_="raw.data.recording")


In [12]:
# Properties can be created from Section objects.
help(sub_sec.create_property)


Help on method create_property in module nixio.section:

create_property(name='', values_or_dtype=0, oid=None, copy_from=None, keep_copy_id=True) method of nixio.section.Section instance
    Add a new property to the section.
    
    :param name: The name of the property to create/copy.
    :type name: str
    :param values_or_dtype: The values of the property or a valid DataType.
    :type values_or_dtype: list of values or a nixio.DataType
    :param oid: object id, UUID string as specified in RFC 4122. If no id
                is provided, an id will be generated and assigned.
    :type oid: str
    :param copy_from: The Property to be copied, None in normal mode
    :type copy_from: nixio.Property
    :param keep_copy_id: Specify if the id should be copied in copy mode
    :type keep_copy_id: bool
    
    :returns: The newly created property.
    :rtype: nixio.Property



Again, you can find all class information in the nixpy readthedocs API entry for [nix.property](https://nixpy.readthedocs.io/en/latest/api/nixio.html#module-nixio.property).

In [13]:
# We'll add metadata about subjectID, subject species and 
# subject age as Properties to the "Subject" section.
_ = sub_sec.create_property(name="subjectID", 
                            values_or_dtype="78376446-f096-47b9-8bfe-ce1eb43a48dc")

_ = sub_sec.create_property(name="species", 
                            values_or_dtype="Mus Musculus")

# To fully describe metadata, properties support saving "unit" 
# and "uncertainty" together with values.
prop = sub_sec.create_property(name="age", 
                               values_or_dtype="4")

prop.unit = "weeks"
# prop.uncertainty

In [14]:
# Lets check what we have so far at the root of the file.
f.sections


[Section: {name = recording.20210405, type = raw.data.recording}]

In [15]:
# File and Sections also support the "pprint" function to make it easier 
# to get an overview of the contents of the metadata tree.
f.pprint()

File: name = metadata.nix
  recording.20210405 [raw.data.recording]
    subject [raw.data.recording]
        |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
        |- species: ('Mus Musculus',)
        |- age: ('4',)weeks


In [16]:
# We access all Properties of the subsection containing subject related 
# information.
# Sections can be accessed via index or via name
f.sections[0].sections['subject'].props


[Property: {name = subjectID}, Property: {name = species}, Property: {name = age}]

In [17]:
# We can also again use the pprint function
f.sections[0].sections['subject'].pprint()

subject [raw.data.recording]
    |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
    |- species: ('Mus Musculus',)
    |- age: ('4',)weeks


In [18]:
f.close()

## Connecting data and metadata

Until now we have seen how to create and store metadata in NIX files. Now we can check how to connect them to actual data.

In [19]:
f = nixio.File.open("metadata.nix", nixio.FileMode.ReadWrite)

In [20]:
# We'll add some minimal abstract data
rec_block = f.create_block(name="project.recordings", 
                           type_="example.raw.data")

In [21]:
example_data_01 = [2, 2, 2, 6, 6, 6, 6, 2, 2, 2]
da = rec_block.create_data_array(name="recording.20210405", 
                                 array_type="shift.data", 
                                 data=example_data_01,
                                 label="df/f")

da.append_sampled_dimension(0.001, label="time", unit="s")

SampledDimension: {index = 1}

In [22]:
example_data_02 = [2, 2, 2, 8, 8, 8, 8, 2, 2, 2]
da = rec_block.create_data_array(name="recording.20210505.01", 
                                 array_type="shift.data", 
                                 data=example_data_02,
                                 label="df/f")

da.append_sampled_dimension(0.001, label="time", unit="s")

SampledDimension: {index = 1}

In [23]:
# We'll also create a NIX Tag, that will reference a specific region 
# in the data.
stim_on = 4
stim_off = 8
# We create the Tag on the same Block as the DataArrays 
# it should reference.
stimulus_tag = rec_block.create_tag("stimulus.down.3", 
                                    "stimulus.shift", 
                                    position=[stim_on])

stimulus_tag.extent = [stim_off - stim_on]

# We append the DataArrays of both experiments to the tag
stimulus_tag.references.append(f.blocks["project.recordings"].data_arrays["recording.20210405"])
stimulus_tag.references.append(f.blocks["project.recordings"].data_arrays["recording.20210505.01"])

In [24]:
# Now we want to hook up the DataArrays and the Tag to more information;
# to the metadata we have defined before.

# We will only reference the appropriate metadata for recording 20210405, 
# since we have not defined metadata for the second recording yet.

# We'll set the metadata for both data array and tag
f.blocks["project.recordings"].data_arrays["recording.20210405"].metadata = f.sections["recording.20210405"]
f.blocks["project.recordings"].tags["stimulus.down.3"].metadata = f.sections["recording.20210405"]


In [25]:
# We can now access the metadata from DataArray and Tag:
f.blocks["project.recordings"].data_arrays["recording.20210405"].metadata.pprint()


recording.20210405 [raw.data.recording]
  subject [raw.data.recording]
      |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
      |- species: ('Mus Musculus',)
      |- age: ('4',)weeks


In [26]:
f.blocks["project.recordings"].tags["stimulus.down.3"].metadata.pprint()


recording.20210405 [raw.data.recording]
  subject [raw.data.recording]
      |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
      |- species: ('Mus Musculus',)
      |- age: ('4',)weeks


In [27]:
f.blocks["project.recordings"].data_arrays["recording.20210505.01"].metadata


In [28]:
# We can also access DataArrays via the metadata
f.sections["recording.20210405"].referring_data_arrays

[DataArray: {name = recording.20210405, type = shift.data}]

In [29]:
f.sections["recording.20210405"].referring_tags

[Tag: {name = stimulus.down.3, type = stimulus.shift}]

In [30]:
# The referring_objects method is a shortcut to all references
# of a section
f.sections["recording.20210405"].referring_objects

[DataArray: {name = recording.20210405, type = shift.data},
 Tag: {name = stimulus.down.3, type = stimulus.shift}]

In [31]:
f.sections["recording.20210405"].referring_blocks

[]

In [32]:
f.sections["recording.20210405"].referring_groups


[]

In [33]:
f.sections["recording.20210405"].referring_multi_tags

[]

In [34]:
f.close()

## Automated handling of metadata

Metadata can become quite complex and it can become tedious to create large trees over and over again. To this end, "template" sections can be created and re-used.

As an example: when running an experiment, there usually are a couple of different stimulus protocols or one or two hardware setups, but the stimulus or the hardware itself does not change. When adding data to an existing NIX file, the hardware metadata can be pre-defined for these setups and attached to the specific experimental data once it is stored in the file.

In [35]:
# The file that will contain templates for import
ft = nixio.File.open("metadata_templates.nix", nixio.FileMode.Overwrite)

# The current example file will contain the data and will import from the templates file
fi = nixio.File.open("metadata.nix", nixio.FileMode.ReadWrite)

In [36]:
# We will add basic templates representing two similar imaginative 
# microscope setups with slightly different metadata.
sec_micro_A = ft.create_section(name="microscope_station_A", 
                                type_="hardware.microscopes")
_ = sec_micro_A.create_property(name="Manufacturer", 
                                values_or_dtype="Company A")
_ = sec_micro_A.create_property(name="Objective", 
                                values_or_dtype="Pln Apo 40x/1.3 oil DIC II")
_ = sec_micro_A.create_property(name="pE LED intensity", 
                                values_or_dtype="20")

sec_micro_B = ft.create_section(name="microscope_station_B", 
                                type_="hardware.microscopes")
_ = sec_micro_B.create_property(name="Manufacturer", 
                                values_or_dtype="Company B")
_ = sec_micro_B.create_property(name="Objective", 
                                values_or_dtype="Pln Apo 40x/1.3 oil DIC II")
_ = sec_micro_B.create_property(name="pE LED intensity", 
                                values_or_dtype="30")


In [37]:
# The root "templates" section now contains two microscope setup templates
ft.pprint(max_depth=2)

File: name = metadata_templates.nix
  microscope_station_A [hardware.microscopes]
      |- Manufacturer: ('Company A',)
      |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)
      |- pE LED intensity: ('20',)
  microscope_station_B [hardware.microscopes]
      |- Manufacturer: ('Company B',)
      |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)
      |- pE LED intensity: ('30',)


When running an experiment and adding new data to the NIX file, the appropriate, full template can be copied and added.

In [38]:
# Create a base section in the recording session file.
sec_ses = fi.create_section(name="sessions")

# On three different days experiments are added and the used setup 
# is documented using the templates:
sec_session01 = fi.sections["sessions"].create_section(name="recording.20210505.01", 
                                                       type_="raw-data.ca-imaging")

sec_setup_A = ft.sections["microscope_station_A"]
sec_session01.copy_section(sec_setup_A)

sec_session02 = fi.sections["sessions"].create_section(name="recording.20210506.01", 
                                                       type_="raw-data.ca-imaging")

sec_setup_B = ft.sections["microscope_station_B"]
sec_session02.copy_section(sec_setup_B)

sec_session03 = fi.sections["sessions"].create_section(name="recording.20210507.01", 
                                                       type_="raw-data.ca-imaging")

sec_setup_A = ft.sections["microscope_station_A"]
sec_session03.copy_section(sec_setup_A)


Section: {name = microscope_station_A, type = hardware.microscopes}

In [39]:
fi.pprint()

File: name = metadata.nix
  Block: {name = project.recordings, type = example.raw.data}
    DataArray: {name = recording.20210405, type = shift.data}
      Shape: (10,) Unit:None
      SampledDimension: {index = 1}
    DataArray: {name = recording.20210505.01, type = shift.data}
      Shape: (10,) Unit:None
      SampledDimension: {index = 1}
    Tag: {name = stimulus.down.3, type = stimulus.shift}
      Position Length:1 Units: ()
  recording.20210405 [raw.data.recording]
    subject [raw.data.recording]
        |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)
        |- species: ('Mus Musculus',)
        |- age: ('4',)weeks
  sessions [undefined]
    recording.20210505.01 [raw-data.ca-imaging]
      microscope_station_A [hardware.microscopes]
          |- Manufacturer: ('Company A',)
          |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)
          |- pE LED intensity: ('20',)
    recording.20210506.01 [raw-data.ca-imaging]
      microscope_station_B [hardware.microscopes]
   

In [40]:
ft.close()
fi.close()

## Finding information and data by filtering

NIX files can get quite extensive in both data and metadata contained. Filter methods can help to find specific metadata or data when simply walking through the file becomes too tedious.

In [41]:
# We will re-use the metadata.nix file created in the previous examples
fi = nixio.File.open("metadata.nix", nixio.FileMode.ReadOnly)

Both `File` and `Section` objects provide a `find_sections` method.

In [42]:
# We can search all sections directly from file level
fi.find_sections(lambda sec: sec.name == "subject")

[Section: {name = subject, type = raw.data.recording}]

In [43]:
fi.find_sections(lambda sec: sec.type.startswith("raw"))

[Section: {name = recording.20210405, type = raw.data.recording},
 Section: {name = subject, type = raw.data.recording},
 Section: {name = recording.20210505.01, type = raw-data.ca-imaging},
 Section: {name = recording.20210506.01, type = raw-data.ca-imaging},
 Section: {name = recording.20210507.01, type = raw-data.ca-imaging}]

In [44]:
fi.find_sections(lambda sec: sec.name == "microscope_station_A")

[Section: {name = microscope_station_A, type = hardware.microscopes},
 Section: {name = microscope_station_A, type = hardware.microscopes}]

In [45]:
fi.find_sections(lambda sec: len(sec.referring_data_arrays) > 0)

[Section: {name = recording.20210405, type = raw.data.recording}]

## Hands on session 3

Now we move on to another hands on session.

In the folder "day_1" of the repository https://gin.g-node.org/INCF-workshop-2021/NIX-Neo-workshop you will find a  Jupyter notebook "hands_on_3.ipynb".

Again, start it either
- locally if you can use Python and make sure all dependencies are installed.
- or use Binder if you cannot use Python locally. The repository is already set up for the use with Binder.
