# Tutorial about metadata

In [None]:
import numpy as np
import pandas as pd
from google.protobuf import json_format, text_format

import locan as lc

In [None]:
lc.show_versions(system=False, dependencies=False, verbose=False)

## Metadata definition

We have define a canonical set of metadata to accompany localization data.

Metadata is described by protobuf messages. Googles protobuf format is advantageous to enforce metdata definitions that can be easily attached to various file formats, exchanged with other programmes and implemented in different programming languages.

Metadata is instantiated through messages defined in the `locan.data.metadata_pb2` module.

In [None]:
list(lc.data.metadata_pb2.DESCRIPTOR.message_types_by_name.keys())

Each class contains a logical set of information that is integrated in the main class `Metadata`.

`Metadata` contains the following keys:

In [None]:
metadata = lc.data.metadata_pb2.Metadata()
list(metadata.DESCRIPTOR.fields_by_name.keys())

Each field has a predefined type and can be set to appropriate values:

In [None]:
metadata.comment = "This is a comment"

In [None]:
try:
    metadata.comment = 1
except Exception as e:
    print(e)

In [None]:
metadata

Metadata values including the default values can be shown in JSON format or as dictionary:

In [None]:
json_format.MessageToDict(metadata)

In [None]:
# except empty fields with repeated message classes
json_format.MessageToDict(metadata, including_default_value_fields=True, preserving_proto_field_name=True)

In [None]:
json_format.MessageToJson(metadata, including_default_value_fields=True, preserving_proto_field_name=True)

To print metadata with timestamp and duration in a well formatted string use:

In [None]:
lc.metadata_to_formatted_string(metadata)

## Set metadata fields 

### Repeated fields 

To set selected fields instantiate the appropriate messages. For list fields use `message.add()`.

In [None]:
metadata = lc.data.metadata_pb2.Metadata()

ou = metadata.experiment.setups.add().optical_units.add()
ou.detection.camera.electrons_per_count = 13.26

metadata

### Timestamp fields 

Timestamp fields contain information on date and time zone and are of type `google.protobuf.Timestamp`.

In [None]:
import time
metadata = lc.data.metadata_pb2.Metadata()
metadata.creation_time.GetCurrentTime()
metadata.creation_time

In [None]:
metadata.creation_time.FromJsonString('2022-05-14T06:58:00.514893Z')
metadata.creation_time.ToJsonString()

Time duration fields contain information on time intervals and are of type `google.protobuf.Duration`.

In [None]:
metadata.experiment.setups.add().optical_units.add().detection.camera.integration_time.FromMilliseconds(20)
metadata.experiment.setups[0].optical_units[0].detection.camera.integration_time.ToMilliseconds()
# metadata.experiment.setups[0].optical_units[0].detection.camera.integration_time.ToJsonString()

To print metadata with timestamp and duration in a well formatted string use:

In [None]:
lc.metadata_to_formatted_string(metadata)

## Metadata scheme

The overall scheme can be instantiated and visualized:

In [None]:
metadata = lc.data.metadata_pb2.Metadata()
scheme = lc.message_scheme(metadata)
scheme

### Metadata from toml file

You can provide metadata in a [toml](https://toml.io) file.

In [None]:
metadata_toml = \
"""
# Define the class (message) instances.

[[messages]]
name = "metadata"
module = "locan.data.metadata_pb2"
class_name = "Metadata"


# Fill metadata attributes
# Headings must be a message name or valid attribute.
# Use [[]] to add repeated elements.
# Use string '2022-05-14T06:58:00Z' for Timestamp elements.
# Use int in nanoseconds for Duration elements.

[metadata]
identifier = "123"
comment = "my comment"
ancestor_identifiers = ["1", "2"]
production_time = '2022-05-14T06:58:00Z'

[[metadata.experiment.experimenters]]
first_name = "First name"
last_name = "Last name"

[[metadata.experiment.experimenters.affiliations]]
institute = "Institute"
department = "Department"

[[metadata.experiment.setups]]
identifier = "1"

[[metadata.experiment.setups.optical_units]]
identifier = "1"

[metadata.experiment.setups.optical_units.detection.camera]
identifier = "1"
name = "camera name"
model = "camera model"
electrons_per_count = 3.1
integration_time = 10_000_000

[metadata.localizer]
software = "rapidSTORM"

[[metadata.relations]]
identifier = "1"
"""

In [None]:
toml_out = lc.metadata_from_toml_string(metadata_toml)
for k, v in toml_out.items():
    print(k, ":\n\n", v)

To load from file:

## Metadata for LocData

Metadata is instantiated for each LocData object and accessible through the `LocData.meta` attribute.

### Sample data

In [None]:
df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })
locdata = lc.LocData.from_dataframe(dataframe=df)

locdata.meta

Fields can also be printed as well formatted string (using `lc.metadata_to_formatted_string`):

In [None]:
locdata.print_meta()

A summary of the most important metadata is printed as:

In [None]:
locdata.print_summary()

Metadata fields can be printed and changed individually:

In [None]:
print(locdata.meta.comment)
locdata.meta.comment = 'user comment'
print(locdata.meta.comment)

Metadata can also be added at instantiation:

In [None]:
locdata_2 = lc.LocData.from_dataframe(dataframe=df, meta={'identifier': 'myID_1', 
                                                   'comment': 'my own user comment'})
locdata_2.print_summary()