# From eLabFTW to NOMAD

In this notebook I will show how to transfer metadata from eLabFTW to NOMAD, as described in the metadata pipeline presented in the thesis. I will use both the eLabFTW and NOMAD API.

<u>Prerequisite</u>: eLabFTW experiment with the metadata

<u>Steps of the procedure</u>:
1. Write a YAML Schema Package (The one we used can be find below).
2. GET the experiment from eLabFTW.
3. Create the Python dictionaries (that will be turned into JSON files) and edit them with the relevant metadata.
4. Prepare the JSON files for the upload in one zip folder.
5. Upload to NOMAD via API.

<u>Disclaimer</u>

In this notebook, for simplicity, I will only consider some of the metadata values coming from the pipeline. Yet, I deem them to be representative of the kind of work that should be done for transfering metadata from eLabFTW to NOMAD.

In particular, we will consider the metadata coming from the Python program. That is, the value of the following eLabFTW keys:
- `Experiment_DateAndTime`
- `Experiment_RunNumber`
- `Experiment_MeasurementParameters_field_direction`
- `UserCase_Sample_sample_name`
- `Experiment_MeasurementParameters_current_pulse` (mA)
- `Experiment_MeasurementParameters_magnetic_field_sequence` (V)

Moreover, since we are intersted in describing the sample, we will consider some of the metadata entered by the user in eLabFTW, namely those values that describe the layers of the thin film (i.e., the sample) and the device. That is, the value of the following eLabFTW keys:
- `UserCase_Sample_layer_1_thickness` (nm)
- `UserCase_Sample_layer_2_thickness` (nm)
- `UserCase_Sample_layer_3_thickness` (nm)
- `UserCase_Sample_layer_4_thickness` (nm)
- `UserCase_Sample_layer_5_thickness` (nm)
- `UserCase_SampleMaterialProperties_layer_1_material`
- `UserCase_SampleMaterialProperties_layer_2_material`
- `UserCase_SampleMaterialProperties_layer_3_material`
- `UserCase_SampleMaterialProperties_layer_4_material`
- `UserCase_SampleMaterialProperties_layer_5_material`
- `UserCase_Sample_device_geometry`
- `UserCase_Sample_device_width`(mm)
- `UserCase_Sample_device_length` (mm)








## 1. Writing a YAML Schema Package for NOMAD

In this part we should write the YAML Schema package for NOMAD. (Here the documentation on how to build this kind of schema: https://nomad-lab.eu/prod/v1/docs/howto/customization/basics.html). Here is a YT tutorial by FAIRmat: https://www.youtube.com/watch?v=5VXGZNlz9rc).

The YAML schema package used in this code is available in this repository:https://github.com/filippo-vasone/FAIR-metadata-pipeline-for-experimental-matsci/blob/main/SOT.schema.archive.yaml.

(See the thesis for comment on the YAML Schema Package).

There are three section definitions:
- `SampleStructure`
- `SOTMeasurement`
- `Substance`

```
definitions:
    name: SOT
    sections:
        SampleStructure:  
          base_sections:
            - nomad.datamodel.metainfo.basesections.CompositeSystem
            - nomad.datamodel.data.EntryData
          quantities:
            UserCase_Sample_device_width:
              type: np.float64
              unit: mm
              m_annotations:
                eln:
                  component: NumberEditQuantity
                  display:
                    unit: mm
            UserCase_Sample_device_length:
              type: np.float64
              unit: mm
              m_annotations:
                eln:
                  component: NumberEditQuantity
                  display:
                    unit: mm
            UserCase_Sample_device_geometry:
              type: str
              m_annotations:
                eln:
                  component: StringEditQuantity
          sub_sections:
            layers:
              section:
                quantities:
                  layer:
                    type: Substance
                    m_annotations:
                      eln:
                        component: ReferenceEditQuantity
              repeats: true

        SOTMeasurement:
            base_sections:
               - nomad.datamodel.metainfo.eln.ElnBaseSection
               - nomad.datamodel.data.EntryData
               - nomad.datamodel.metainfo.basesections.CompositeSystemReference
            quantities:
                UserCase_Sample_sample_name:
                  type: str
                  m_annotations:
                      eln:
                        component: StringEditQuantity
                Experiment_RunNumber:
                  type: int
                  m_annotations:
                      eln:
                        component: NumberEditQuantity
                Experiment_MeasurementParameters_field_direction:
                  type: str
                  m_annotations:
                      eln:
                        component: StringEditQuantity
                Experiment_MeasurementParameters_current_pulse:
                  type: np.float64
                  unit: mA
                  m_annotations:
                    eln:
                      component: NumberEditQuantity
                      display:
                        unit: mA                
                Experiment_MeasurementParameters_magnetic_field_sequence:
                  type: str # this is a string for simplicity
                  unit: V
                  m_annotations:
                      eln:
                        component: StringEditQuantity
                        display:
                          unit: V                

        Substance:
          base_sections:
          - nomad.datamodel.metainfo.eln.Substance
          - nomad.datamodel.data.EntryData
          quantities:
            UserCase_Sample_layer_thickness:
              type: np.float64
              unit: nm
              m_annotations:
                eln:
                  component: NumberEditQuantity
                  display:
                    unit: nm
```

This schema has to be saved in a YAML file to be uploaded into the same folder of this Notebook with the name `SOT.schema.archive.yaml`

## 2. GET the experiment metadata from eLabFTW

Now we use the eLabFTW API to retrieve the metadata from the experiment with id = 463.

In [None]:
# install the elab library
!pip install elabapi-python

# import modules
import time
import json
import elabapi_python

# replace with your api key (obtained from User Panel > API Keys)
my_api_key = '***' # censored

# START CONFIGURATION

configuration = elabapi_python.Configuration()
configuration.api_key['api_key'] = my_api_key
configuration.api_key_prefix['api_key'] = 'Authorization'
configuration.host = 'https://***//api/v2' # censored
configuration.debug = False
configuration.verify_ssl = False

# create an instance of the API class
api_client = elabapi_python.ApiClient(configuration)
# fix issue with Authorization header not being properly set by the generated lib
api_client.set_default_header(header_name='Authorization', header_value=my_api_key)

# END CONFIGURATION

In [None]:
# get the EXPERIMENT instance
from pprint import pprint
api_exp_instance = elabapi_python.ExperimentsApi(api_client)
id = 463

# Read the experiment
api_response = api_exp_instance.get_experiment(id)
pprint(api_response)

In [None]:
# get the metadata of the experiment as a Python dictionary
md_exp_dic = json.loads(api_response.metadata)

print(md_exp_dic)

## 3. Create the Python dictionaries

In this step we will create several Python dictionaries, which will then be turned into JSON files to be uploaded as entries in NOMAD.

We will hard code the dictionaries with the structure of a NOMAD entry and edit them with the relevant metadata from eLabFTW.



### 3.1 We create the dictionaries for the layers of the thin film
These dictionaries will be transformed into JSON files to be uploaded as raw files on NOMAD. For the files to be processed correctly, they need to have the right structure.

This structure is hard coded and it is inferred from other downloaded NOMAD raw files.

The structure specifies that the raw JSON file should follow the 'Substance' definition from the NOMAD Schema package.

Then, we will update the dictionary with the metadata coming from the experiment.

In [None]:
# empty dictionary in which we will store each layer dictionary
layers ={}

for n in range(1,6):
  # we hard code a base dictionary (taken from the structure of a NOMAD raw entry)
  base = {"data":{"m_def":"../upload/raw/SOT.schema.archive.yaml#/definitions/section_definitions/2","name":""}}

  material = md_exp_dic['extra_fields'][f'UserCase_SampleMaterialProperties_layer_{n}_material']['value'] #the name of the material of the layer from the metadata
  thickness = md_exp_dic['extra_fields'][f'UserCase_Sample_layer_{n}_thickness']['value'] #the thickness of the material from the metadata
  thickness = float(thickness) # turn the thickness into a float for metadata quality reasons

  # we update the base dictionary with the values from the metadata
  base['data']['name'] = material
  base['data']['UserCase_Sample_layer_thickness'] = thickness
  # we add the updated layer dictionary to the dictionary of layers
  layers[f'{material}']= base

print(layers)

# data quality check to check if the key of each nested dictionary corresponds to the value of 'name' inside it
for key, value in layers.items():
  if key == value['data']['name']:
    print("All good")
  else:
    print("Something is wrong")

### 3.2 We create the dictionary for the thin film (SampleStructure)


In [None]:
# As above, we first hard code the structured dictionary for SampleStructure

thin_film = {"data":{"m_def":"../upload/raw/SOT.schema.archive.yaml#/definitions/section_definitions/0","name":"thin_film"}}

# then we update this dictionary with the value from the eLabFTW metadata

thin_film['data']['UserCase_Sample_device_width'] = float(md_exp_dic['extra_fields']['UserCase_Sample_device_width']['value']) # for metadata quality, I turn this value into a float
thin_film['data']['UserCase_Sample_device_length'] = float(md_exp_dic['extra_fields']['UserCase_Sample_device_length']['value']) # for metadata quality, I turn this value into a float
thin_film['data']['UserCase_Sample_device_geometry'] = md_exp_dic['extra_fields']['UserCase_Sample_device_geometry']['value']

print(thin_film)

### 3.3 We create the dictionary for the experiment (SOTMeasurement)

In [None]:
# As above, we first hard code the structured dictionary for SOTMeasurement

exp_entry = {"data":{"m_def":"../upload/raw/SOT.schema.archive.yaml#/definitions/section_definitions/1"}}


# then we update this dictionary with the values from the eLabFTW metadata

# the name of the experiment is the 'title' in eLabFTW and thus it comes directly from the API response and not from the metadata dictionary
exp_entry['data']['name']= api_response.title

# I establish a list of the keys whose value I want to transfer between dictionaries
key_list_exp = ['UserCase_Sample_sample_name',
                'Experiment_RunNumber',
                'Experiment_MeasurementParameters_field_direction',
                'Experiment_MeasurementParameters_current_pulse',
                'Experiment_MeasurementParameters_magnetic_field_sequence']

# I update the exp_entry dict with the values coming from the metadata dictionary
for key in key_list_exp:
  exp_entry["data"][key] = md_exp_dic["extra_fields"][key]["value"]

# Since NOMAD (unlike eLabFTW) does not support units for strings, add units as part of string for Experiment_MeasurementParameters_magnetic_field_sequence (V)
exp_entry["data"]['Experiment_MeasurementParameters_magnetic_field_sequence'] = f'{md_exp_dic["extra_fields"]["Experiment_MeasurementParameters_magnetic_field_sequence"]["value"]}{md_exp_dic["extra_fields"]["Experiment_MeasurementParameters_magnetic_field_sequence"]["unit"]}'

# for metadata quality, turn the value of Experiment_RunNumber into integer
exp_entry["data"]["Experiment_RunNumber"] = int(exp_entry["data"]["Experiment_RunNumber"])

# for metadata quality, turn the value of Experiment_MeasurementParameters_current_pulse into float
exp_entry["data"]["Experiment_MeasurementParameters_current_pulse"] = float(exp_entry["data"]["Experiment_MeasurementParameters_current_pulse"])

print(exp_entry)

#### Metadata quality: add datetime to `exp_entry` with the right format for NOMAD (ISO 8601)
Example of NOMAD datetime (inferred): '2025-02-11T22:31:54.856891+00:00'

We want the datetime we insert in `exp_entry` from eLabFTW to have the same structure as the above (ISO 8601 structure at UTC)

In [None]:
# import necessary modules
from datetime import datetime

# we have a look at the datetime from eLabFTW
time_elab = md_exp_dic["extra_fields"]["Experiment_DateAndTime"]["value"]
print(f'The original datetime from eLabFTW is: {time_elab} - with type {type(time_elab)}')

# we transform the string into a Python datetime object
elab_date_obj = datetime.strptime(time_elab, '%Y-%m-%d %H:%M:%S.%f')
print(f'Now it is Python datetime object: {elab_date_obj} - with type {type(elab_date_obj)}')

In [None]:
# import necessary modules
from datetime import tzinfo, timezone, date
from zoneinfo import ZoneInfo

# we check that the Python datetime object is a naive datetime object (i.e., without info on the time zone)
print(f'The timezone of our datetime is: {elab_date_obj.tzinfo}')

# we transform the datetime object from naive to aware, by adding the current time zone (but without changing the datetime)
elab_date_obj_tz = elab_date_obj.replace(tzinfo=ZoneInfo("Europe/Rome"))
print(f'Now we added the timezone information. Thus the date is now: {elab_date_obj_tz} with this timezone: {elab_date_obj_tz.tzinfo}')

# we convert the datetime from the current time zone to UTC
elab_date_obj_utc = elab_date_obj_tz.astimezone(timezone.utc)
print(f'In UTC, our datetime is the following: {elab_date_obj_utc}')

# convert it to isoformat
elab_iso_date_utc = elab_date_obj_utc.isoformat()
print(f'In ISO 8601 format, the datetime is: {elab_iso_date_utc}')

# we update the exp_entry dictionary
exp_entry["data"]["datetime"] = elab_iso_date_utc
print(f'Here is our updated dictionary: {exp_entry}')

## 4. Prepare the JSON files for the upload in one zip folder.


Now we have to do the following steps:
- Move the YAML Schema into the same folder of this notebook. (In Google Colab: load it into the environnment).
- Create an empty folder in the same folder of this notebook.
- Move the YAML Schema in the folder
- Transform each of the Python dictionaries that I created in the last section into JSON files. Disclaimer: we have to 'unpack' the dictionary `layers` because we would like to create a JSON file for each nested dictionary inside `layers`.
- Move all these JSON files into the folder.
- ZIP the folder.

In [None]:
# import necessary modules
import os
import shutil

# create an empty folder
folder_name = "SOT_exp_API"
os.makedirs(folder_name, exist_ok=True)

# upload of the YAML schema into Google Colab/Move the YAML Schema into the same folder of this notebook

# move the YAML Schema into the folder
schema = "SOT.schema.archive.yaml"
shutil.move(schema, os.path.join(folder_name, schema))

In [None]:
# transform the dictionaries created in sec. 3 into JSON files and move them to the folder

# I tranform the dict 'exp_entry' into a file and move it in the folder

# filename for the JSON file ('archive.json' as needed for NOMAD)
file_name_exp_entry = exp_entry["data"]["name"]+".archive.json"

# tranform it into a JSON file
with open(file_name_exp_entry, "w") as filesotexp:
    json.dump(exp_entry, filesotexp)

# move the file to the folder
shutil.move(file_name_exp_entry, os.path.join(folder_name, file_name_exp_entry))

In [None]:
# I tranform the dict 'thin_film' into a file and move it in the folder

# filename for the JSON file ('archive.json' as needed for NOMAD)
file_name_thin_film = thin_film["data"]["name"]+".archive.json"

# tranform it into a JSON file
with open(file_name_thin_film, "w") as filesotexp:
    json.dump(thin_film, filesotexp)

# move the file to the folder
shutil.move(file_name_thin_film, os.path.join(folder_name, file_name_thin_film))

In [None]:
# create a JSON file for each nested dictionary inside `layers`.

for layer, content in layers.items():
    filename = f'{layer}'+'.archive.json' # filename for the JSON files
    with open(filename, "w") as filesotexp:
        json.dump(content, filesotexp)       # create a JSON file for each dictionary
    shutil.move(filename, os.path.join(folder_name, filename)) # move the JSON file to the folder

In [None]:
# zip the folder
output_zip = "SOT_exp_API.zip"
shutil.make_archive(output_zip.replace(".zip", ""), 'zip', folder_name)

## 5. Upload the ZIP folder to NOMAD via API

The functions `get_authentication_token` and `upload_to_NOMAD` used in this Notebook come from here: https://github.com/Master-Data-Management-and-Curation/Tools-Data-Management-Curation/blob/main/nomad/NOMAD_API.py

In [None]:
# import necessary modules
import requests
import pprint

nomad_url = 'https://nomad-lab.eu/prod/v1/oasis/api/v1/' #public NOMAD oasis

# nomad authentication credentials
username='***' # censored
password='***' # censored

# function to get the authentication token
def get_authentication_token(nomad_url, username, password):
    '''Get the token for accessing your NOMAD unpublished uploads remotely'''
    try:
        response = requests.get(
            nomad_url + 'auth/token', params=dict(username=username, password=password), timeout=10)
        token = response.json().get('access_token')
        if token:
            return token

        print('response is missing token: ')
        print(response.json())
        return
    except Exception:
        print('something went wrong trying to get authentication token')
        return

# get the authentication token
token = get_authentication_token(nomad_url, username, password)

In [None]:
# function to upload to NOMAD using API
def upload_to_NOMAD(nomad_url, token, upload_file, upload_name, embargo_length=0):
    '''Upload a single file for NOMAD upload, e.g., zip format'''
    with open(upload_file, 'rb') as f:
        try:
            response = requests.post(
                nomad_url + 'uploads?upload_name=' + upload_name + '&embargo_length=' + str(embargo_length),
                headers={'Authorization': f'Bearer {token}', 'Accept': 'application/json'},
                data=f, timeout=30)
            upload_id = response.json().get('upload_id')
            if upload_id:
                return upload_id

            print('response is missing upload_id: ')
            print(response.json())
            return
        except Exception:
            print('something went wrong uploading to NOMAD')
            return

In [None]:
# upload the created folder to NOMAD
upload_to_NOMAD(nomad_url, token, "SOT_exp_API.zip", "upload_zip_sot_def", embargo_length=0)