# MMB Data Pipeline

This notebook sets up a data pipeline for Mini Module Baseline (MMB) data from NOMAD. It includes:

1. Authentication with NOMAD API
2. Fetching all MMB-related data
3. Downloading the data
4. Processing and transforming into a tidy dataset
5. Preparing the data for visualization, modeling, and analysis

Date: June 27, 2025

## 1. Setup and Imports

First, let's import the necessary libraries and set up our environment.

In [1]:
# Ensure we can load the .env file
from pathlib import Path
from dotenv import load_dotenv

# Find the .env file in the project root (two levels up from this notebook)
env_path = Path().absolute().parent / '.env'
if env_path.exists():
    load_dotenv(dotenv_path=env_path)
    print(f"Loaded environment from: {env_path}")
else:
    print(f"Warning: No .env file found at {env_path}")

# Now we can import the rest of our dependencies
import os
import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from tqdm.notebook import tqdm

# Import NOMAD API modules
sys.path.append('../')
from nomad_api.auth import authenticate, OASIS_OPTIONS
from nomad_api.client import NomadClient
from nomad_api.data import query_sample_entries,get_all_samples_with_authors

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook", font_scale=1.2)

# Display settings for pandas
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 50)
pd.set_option('display.width', 1000)

Loaded environment from: /home/qkg/Documents/1_PROJECTS/NOMAD-Tools/NOMAD-Admin-Tools/.env


## 2. Authentication with NOMAD API

To access the NOMAD API, we need to authenticate. We'll use the Auth class from nomad_api.

In [12]:
# The authenticate function will automatically try to:
# 1. Use NOMAD_CLIENT_ACCESS_TOKEN if available
# 2. Fall back to NOMAD_USERNAME and NOMAD_PASSWORD from .env file
# 3. Prompt for credentials if neither are available
token, user_info = authenticate(base_url=OASIS_URL)

print(f"Successfully authenticated as: {user_info.get('name', user_info.get('username'))}")

# Create the client with the obtained token
client = NomadClient(base_url=OASIS_OPTIONS['SE Oasis'], token=token)

Successfully authenticated as: Paolo Graniero


## 3. Fetching MMB Data

Now, let's query the NOMAD API to find all MMB-related data. We'll search for entries related to Mini Module Baseline.

In [13]:
# Use query_sample_entries to fetch all samples
# This function handles pagination and admin/visible access automatically

all_samples = get_all_samples_with_authors(
    client=client,
    section_type="HySprint_Sample",
    page_size=1000  # Adjust based on your needs
)


Attempting to retrieve samples with admin access...
Admin access failed, falling back to visible access...
Attempting to retrieve samples with visible access...
API Response: 200 - {"owner":"visible","query":{"and":[{"name":"results.eln.sections","value":{"any":["HySprint_Sample"]...
Found 1513 samples (approximately 2 pages)
API Response: 200 - {"upload_id":"-5RhcS3iSa6ZaqvRdDYMhw","data":{"process_running":false,"process_status":"READY","last...
API Response: 200 - {"upload_id":"5XLufSAnTjiqCeIdlicQow","data":{"process_running":false,"current_process":"edit_upload...
API Response: 200 - {"upload_id":"Yk5i_GB3RQqMEO3mIURIsg","data":{"process_running":false,"process_status":"READY","last...
API Response: 200 - {"owner":"visible","query":{"and":[{"name":"results.eln.sections","value":{"any":["HySprint_Sample"]...
Found 1513 samples (approximately 2 pages)
API Response: 200 - {"upload_id":"-5RhcS3iSa6ZaqvRdDYMhw","data":{"process_running":false,"process_status":"READY","last...
API Respo

In [14]:
print(f"Total HySprint_Sample entries found: {len(all_samples)}")
print(f'Example entry: {json.dumps(all_samples[0], indent=2)}')



Total HySprint_Sample entries found: 1513
Example entry: {
  "entry_id": "--d4nmrAC5AYG9dLtd9pB3RPtmAq",
  "upload_id": "-5RhcS3iSa6ZaqvRdDYMhw",
  "lab_id": "HZB_IJP-BL_1_3_C-23",
  "main_author": "f45973c6-e55c-47d8-a439-e174b0963d2b",
  "coauthors": [],
  "coauthor_groups": [
    "z3hvrdX0QVmD3jdbMZGqDw"
  ],
  "upload_create_time": "2025-04-22T14:31:28.775000",
  "published": false,
  "license": "CC BY 4.0",
  "upload_name": "IJP-BL_01"
}


In [15]:
unique_upload_names = list(set(sample['upload_name'] for sample in all_samples if 'upload_name' in sample))
unique_upload_names = [name for name in unique_upload_names if name]  # Filter out empty names

print(f"Unique upload names found: {len(unique_upload_names)}")
print('Upload names:')
for name in sorted(list(unique_upload_names)):
    print(f"- {name}")


Unique upload names found: 82
Upload names:
- 1st_Batch_HySPRINT_Yuxin
- 1st_batch_IRIS_Yuxin
- 2nd_Batch_IRIS_Yuxin
- AF_SDC_MAPI_ink_B4
- AF_SDC_MAPIink_Batch7
- Batch VII - SAM Wettability
- Batch_5_CSMB_Yuxin
- Calender Week 15 2024 Module Baseline
- HZB_MMB_8_2001
- IJP-BL_01
- Introduction to Nomad - Workshop Material KJ
- KW10 Module Baseline
- KW15 FACs and PEtOx60
- KW16 FACs and PEtOx
- KW22 PEtOx60 in FACs with MACl
- KW25 POx in FACs - Polymer Variation I
- MAFA Batch 5
- MAFA Batch 8
- MAFA Batch 9
- MAFA Batch2
- MAPI ink_ref_spin_coated
- MMB Batch 12.0
- MMB Batch 12.1
- MMB Batch 12.10
- MMB Batch 12.11
- MMB Batch 12.12
- MMB Batch 12.5
- MMB Batch 12.7
- MMB Batch 12.8
- MMB Batch 12.9
- MMB Batch 13.0
- MMB Batch 14.0
- MMB Batch 15.0
- MMB Batch 16.0
- MMB Batch 17.0
- MMB Batch 19.0
- MMB Batch 20.0
- MMB Batch 2000
- MMB Batch 2002
- MMB Batch 21.0
- MMB Batch 23.0
- MMB Batch 24.0
- MMB Batch 25.0
- MMX B7
- SDC-PSC-7_8_Dec2023_b2
- SOP-02_20241010_TM
- SOP_CSMB

In [6]:
mmb_uploads_names = [name for name in unique_upload_names if "MMB" in name]
print(f'MMB uploads found: {len(mmb_uploads_names)}')
print(f'Uploads:')
for name in sorted(mmb_uploads_names):
    print(f"- {name}")

MMB uploads found: 23
Uploads:
- HZB_MMB_8_2001
- MMB Batch 12.0
- MMB Batch 12.1
- MMB Batch 12.10
- MMB Batch 12.11
- MMB Batch 12.12
- MMB Batch 12.5
- MMB Batch 12.7
- MMB Batch 12.8
- MMB Batch 12.9
- MMB Batch 13.0
- MMB Batch 14.0
- MMB Batch 15.0
- MMB Batch 16.0
- MMB Batch 17.0
- MMB Batch 19.0
- MMB Batch 20.0
- MMB Batch 2000
- MMB Batch 2002
- MMB Batch 21.0
- MMB Batch 23.0
- MMB Batch 24.0
- MMB Batch 25.0


In [16]:
mmb_samples = [sample for sample in all_samples if sample.get('upload_name') in mmb_uploads_names]
print(f"Total MMB samples found: {len(mmb_samples)}")
print(f'Example MMB sample: {json.dumps(mmb_samples[0], indent=2)}')

Total MMB samples found: 483
Example MMB sample: {
  "entry_id": "-635Z62MBAoFkXDxqYfamU3qcgCX",
  "upload_id": "Uq7aoxCCRKe0g2sprkDbMg",
  "lab_id": "HZB_MMB_8-2002-3-0",
  "main_author": "df8bc696-58aa-4571-95fb-d71a800e1c07",
  "coauthors": [],
  "coauthor_groups": [
    "MjM4ze-URpu0NrHBulkRYg"
  ],
  "upload_create_time": "2025-03-19T11:06:57.233000",
  "published": false,
  "license": "CC BY 4.0",
  "upload_name": "MMB Batch 2002"
}


## 4. Retrieving Archive Data

Let's retrieve the complete archive data for a specific MMB sample using its entry ID. This will give us access to all the detailed information stored in the archive.

In [29]:
# Function to get archive data for a specific entry
def get_sample_archive(client, entry_id):
    """
    Retrieve the complete archive data for a specific entry using the NOMAD API.
    
    Args:
        client (NomadClient): Authenticated NOMAD client
        entry_id (str): The entry ID of the sample
        
    Returns:
        dict: The complete archive data for the entry
    """
    try:
        # Prepare the request body
        request_body = {
            "required": "*"
        }
        
        # Use the make_request method with the correct endpoint pattern and request body
        response = client.make_request(
            'post',
            f'entries/{entry_id}/archive/query',
            json_data=request_body
        )
        return response
    except Exception as e:
        print(f"Error retrieving archive data: {e}")
        return None


In [22]:
# Let's test the function with the first MMB sample
import re
import json


if mmb_samples:
    # Get the entry_id of the first MMB sample
    test_entry_id = mmb_samples[0].get('entry_id')
    print(f"Retrieving archive data for entry_id: {test_entry_id}")
    
    # Get the archive data
    archive_data = get_sample_archive(client, test_entry_id)
    
    # Print the structure of the archive data
    if archive_data:
        print("\nArchive data structure:")
        print(json.dumps(list(archive_data.keys()), indent=2))
        
        # Print a sample of the data (first few keys)
        print("\nSample of archive data content:")
        sample_data = {k: archive_data[k] for k in list(archive_data.keys())[:3]}
        print(json.dumps(sample_data, indent=2))
else:
    print("No MMB samples available to test with.")

Retrieving archive data for entry_id: -635Z62MBAoFkXDxqYfamU3qcgCX
API Response: 200 - {"entry_id":"-635Z62MBAoFkXDxqYfamU3qcgCX","required":"*","data":{"entry_id":"-635Z62MBAoFkXDxqYfamU...

Archive data structure:
[
  "entry_id",
  "required",
  "data"
]

Sample of archive data content:
{
  "entry_id": "-635Z62MBAoFkXDxqYfamU3qcgCX",
  "required": "*",
  "data": {
    "entry_id": "-635Z62MBAoFkXDxqYfamU3qcgCX",
    "upload_id": "Uq7aoxCCRKe0g2sprkDbMg",
    "parser_name": "parsers/archive",
    "archive": {
      "processing_logs": [
        {
          "event": "Executing celery task",
          "proc": "Entry",
          "process": "process_entry",
          "process_worker_id": "IYE4O35pRZO6y7UitUY6gA",
          "parser": "parsers/archive",
          "logger": "nomad.processing",
          "timestamp": "2025-05-21 20:39.45",
          "level": "DEBUG"
        },
        {
          "proc": "Entry",
          "process": "process_entry",
          "process_worker_id": "IYE4O35pRZO6

In [28]:
archive_data['data']['archive']['m_ref_archives']

{}

In [11]:
archive_data

{'owner': 'public',
 'query': {'op': []},
 'pagination': {'page_size': 10,
  'order_by': 'entry_id',
  'order': 'asc',
  'total': 492,
  'next_page_after_value': '03pQ_u5eQFuH2eLc0-dUq6X7MGhA'},
 'required': '*',
 'data': [{'entry_id': '-3NKn5RhuG0XUokZP54AZIcuKJem',
   'upload_id': 'PobxL8nSRVy3pZoc65ryTA',
   'parser_name': 'nomad_hysprint.parsers:hysprint_parser',
   'archive': {'processing_logs': [{'event': 'Executing celery task',
      'proc': 'Entry',
      'process': 'process_entry',
      'process_worker_id': 'IYE4O35pRZO6y7UitUY6gA',
      'parser': 'nomad_hysprint.parsers:hysprint_parser',
      'logger': 'nomad.processing',
      'timestamp': '2025-05-21 20:32.09',
      'level': 'DEBUG'},
     {'exec_time': '0.0018453598022460938',
      'event': 'parser matching executed',
      'proc': 'Entry',
      'process': 'process_entry',
      'process_worker_id': 'IYE4O35pRZO6y7UitUY6gA',
      'parser': 'nomad_hysprint.parsers:hysprint_parser',
      'logger': 'nomad.processing'

## 5. Finding Referenced Entries

Let's create a function to find all entries that reference a specific target entry ID. This will help us track the relationships between different entries in the database.

In [33]:
# Function to get entries that reference a specific target entry
def get_referencing_entries(client, target_entry_id):
    """
    Find all entries that reference a specific target entry.
    
    Args:
        client (NomadClient): Authenticated NOMAD client
        target_entry_id (str): The entry ID to search for in references
        
    Returns:
        list: List of entries that reference the target entry
    """
    # Construct the query to search for entries with matching target_entry_id in references
    query = {
        "owner": "visible",
        "query": {
            "entry_references.target_entry_id": target_entry_id
        }
    }

    try:
        # Use the make_request method to query the entries
        response = client.make_request('post', 'entries/query', json_data=query)
        if response and 'data' in response:
            return response['data']
        return []
    except Exception as e:
        print(f"Error searching for referencing entries: {e}")
        return []

In [34]:
# Let's test the function with a sample entry ID from our MMB data
if mmb_samples:
    # Use the first MMB sample's entry_id as an example
    test_entry_id = mmb_samples[0].get('entry_id')
    print(f"Searching for entries that reference: {test_entry_id}")
    
    # Find referencing entries
    referencing_entries = get_referencing_entries(client, test_entry_id)
    
    # Display results
    print(f"\nFound {len(referencing_entries)} entries that reference this entry:")
    for entry in referencing_entries:
        print(f"\nEntry ID: {entry.get('entry_id')}")
        print(f"Entry Name: {entry.get('entry_name')}")
        print("References:")
        for ref in entry.get('entry_references', []):
            if ref.get('target_entry_id') == test_entry_id:
                print(f"  - {ref.get('source_name')} → {ref.get('target_name')}")
                print(f"    Path: {ref.get('source_path')} → {ref.get('target_path')}")
else:
    print("No MMB samples available to test with.")

Searching for entries that reference: -635Z62MBAoFkXDxqYfamU3qcgCX
API Response: 200 - {"owner":"visible","query":{"prefix":"entry_references","query":{"name":"target_entry_id","value":"-...

Found 10 entries that reference this entry:

Entry ID: 40ZwwTcNJjNXzMOHeDErzhfu1Yqf
Entry Name: evaporation C60
References:
  - reference → data
    Path: data.samples.reference → /data
  - section → data
    Path: workflow2.outputs.section → /data

Entry ID: 8f-SAa48H6JYCBg66XGGS_mQh4P2
Entry Name: slot die coating Me4PACz
References:
  - reference → data
    Path: data.samples.reference → /data
  - section → data
    Path: workflow2.outputs.section → /data

Entry ID: 95r74lX_to6WNLvmDd0wnMVrSu6M
Entry Name: evaporation BCP
References:
  - reference → data
    Path: data.samples.reference → /data
  - section → data
    Path: workflow2.outputs.section → /data

Entry ID: B-sug_IAoP5geTKBHu-eYS01KmDM
Entry Name: slot die coating MAFA
References:
  - reference → data
    Path: data.samples.reference 

In [50]:
referencing_entries[1]['data'].keys()

dict_keys(['name', 'description', 'location', 'positon_in_experimental_plan', 'samples', 'layer', 'solution', 'annealing', 'quenching', 'properties', 'method', 'datetime'])

In [54]:
referencing_entries[1]['data']['layer']

[{'layer_type': 'Hole transport layer',
  'layer_material_name': 'Me4PACz',
  'layer_material': ''}]

In [62]:
def convert_json_to_tidy_dataframe(data):
    """
    Converts a specific JSON structure into a tidy pandas DataFrame,
    focusing on sample and process information.

    Args:
        data (dict): The dictionary parsed from the JSON file.

    Returns:
        pandas.DataFrame: A tidy DataFrame with one row per sample,
                          containing associated process details.
    """
    processed_rows = []
    
    # Extract common process details
    process_data = data.get('data', {})
    
    process_name = process_data.get('name')
    process_description = process_data.get('description')
    process_method = process_data.get('method')
    experimental_plan_position = process_data.get('positon_in_experimental_plan')
    process_datetime = process_data.get('datetime')

    # Layer details (assuming first layer entry is relevant)
    layer_info = process_data.get('layer', [{}])[0]
    layer_type = layer_info.get('layer_type')
    layer_material_name = layer_info.get('layer_material_name')

    # Solution details (assuming first solution entry is relevant)
    solution_info = process_data.get('solution', [{}])[0].get('solution_details', {})
    
    solute_info = solution_info.get('solute', [{}])[0]
    solution_solute_concentration_mol = solute_info.get('concentration_mol')

    solvent_info = solution_info.get('solvent', [{}])[0]
    solution_solvent_name = solvent_info.get('chemical_2', {}).get('name')
    solution_solvent_volume_milliliter = solvent_info.get('chemical_volume')
    solution_datetime = solution_info.get('datetime')

    # Annealing details
    annealing_info = process_data.get('annealing', {})
    annealing_temperature = annealing_info.get('temperature')
    annealing_time = annealing_info.get('time')

    # Quenching details
    quenching_info = process_data.get('quenching', {})
    quenching_air_knife_angle = quenching_info.get('air_knife_angle')
    quenching_bead_volume = quenching_info.get('bead_volume')

    # Slot Die Coating properties
    properties_info = process_data.get('properties', {})
    slot_die_flow_rate = properties_info.get('flow_rate')
    slot_die_head_distance_to_thinfilm = properties_info.get('slot_die_head_distance_to_thinfilm')
    slot_die_head_speed = properties_info.get('slot_die_head_speed')

    # Iterate through each sample and combine with process details
    samples = process_data.get('samples', [])
    for sample in samples:
        row = {
            'sample_lab_id': sample.get('lab_id'),
            'sample_name': sample.get('name'),
            'process_name': process_name,
            'process_description': process_description,
            'process_method': process_method,
            'experimental_plan_position': experimental_plan_position,
            'process_datetime': process_datetime,
            'layer_type': layer_type,
            'layer_material_name': layer_material_name,
            'solution_solute_concentration_mol': solution_solute_concentration_mol,
            'solution_solvent_name': solution_solvent_name,
            'solution_solvent_volume_milliliter': solution_solvent_volume_milliliter,
            'solution_datetime': solution_datetime,
            'annealing_temperature': annealing_temperature,
            'annealing_time': annealing_time,
            'quenching_air_knife_angle': quenching_air_knife_angle,
            'quenching_bead_volume': quenching_bead_volume,
            'slot_die_flow_rate': slot_die_flow_rate,
            'slot_die_head_distance_to_thinfilm': slot_die_head_distance_to_thinfilm,
            'slot_die_head_speed': slot_die_head_speed,
        }
        processed_rows.append(row)

    return pd.DataFrame(processed_rows)


In [None]:

# Convert the JSON data to a pandas DataFrame
df = convert_json_to_tidy_dataframe(referencing_entries[1])

In [66]:
referencing_entries[2]

{'upload_id': 'Uq7aoxCCRKe0g2sprkDbMg',
 'references': [],
 'origin': 'Katleen Kraft',
 'text_search_contents': ['HZB_MMB_8-2002-7-0',
  'HZB_MMB_8-2002-19-0',
  'evaporation BCP',
  'HZB_MMB_8-2002-14-0',
  'HZB_MMB_8-2002-23-0',
  'HZB_MMB_8-2002-20-0',
  'HZB_MMB_8-2002-8-0',
  'HZB_MMB_8-2002-12-0',
  'HZB_MMB_8-2002-24-0',
  'HZB_MMB_8-2002-5-0',
  'Evaporation',
  'BCP',
  'HZB_MMB_8-2002-1-0',
  'HZB_MMB_8-2002-18-0',
  'Buffer layer',
  'HZB_MMB_8-2002-17-0',
  'H10C13N1',
  'HZB_MMB_8-2002-2-0',
  'HZB_MMB_8-2002-16-0',
  'HZB_MMB_8-2002-4-0',
  'HZB_MMB_8-2002-21-0',
  'BCP 8.0 nanometer',
  'HZB_MMB_8-2002-3-0',
  'HZB_MMB_8-2002-11-0',
  'HZB_MMB_8-2002-13-0',
  'HZB_MMB_8-2002-10-0',
  'HZB_MMB_8-2002-15-0',
  'HZB_MMB_8-2002-6-0',
  'HZB_MMB_8-2002-9-0',
  'HZB_MMB_8-2002-22-0'],
 'quantities': ['',
  'data',
  'data.co_evaporation',
  'data.datetime',
  'data.description',
  'data.inorganic_evaporation',
  'data.inorganic_evaporation.chemical_2',
  'data.inorganic_evapor

In [None]:
def process_json_to_narrow_dataframe(data, process_type):
    """
    Processes a single JSON entry into a list of dictionaries suitable for a narrow
    pandas DataFrame. Each row represents a specific parameter for a sample and process.

    Args:
        data (dict): The 'data' section from a parsed JSON entry.
        process_type (str): A string indicating the type of process (e.g., 'slot_die_coating', 'evaporation').

    Returns:
        list: A list of dictionaries, where each dictionary is a row in the narrow DataFrame.
    """
    processed_rows = []

    # Extract common process details that identify the process step
    common_process_details = {
        'process_name': data.get('name'),
        'process_description': data.get('description'),
        'process_method': data.get('method'),
        'experimental_plan_position': data.get('positon_in_experimental_plan'),
        'process_datetime': data.get('datetime'),
    }

    # Extract layer details
    layer_info = data.get('layer', [{}])[0]
    common_process_details['layer_type'] = layer_info.get('layer_type')
    common_process_details['layer_material_name'] = layer_info.get('layer_material_name')
    # layer_material is present in evaporation data, add if available
    common_process_details['layer_material'] = layer_info.get('layer_material')


    # Iterate through each sample associated with this process
    samples = data.get('samples', [])
    for sample in samples:
        # Base row includes sample identifiers and common process info
        base_row = {
            'sample_lab_id': sample.get('lab_id'),
            'sample_name': sample.get('name'),
            **common_process_details  # Unpack common details into the base row
        }

        # Extract process-specific parameters based on process_type
        parameters_to_flatten = {}

        if process_type == 'slot_die_coating':
            # Solution details
            solution_details = data.get('solution', [{}])[0].get('solution_details', {})
            solute_info = solution_details.get('solute', [{}])[0]
            solvent_info = solution_details.get('solvent', [{}])[0]

            parameters_to_flatten.update({
                'solution_solute_concentration_mol': solute_info.get('concentration_mol'),
                'solution_solvent_name': solvent_info.get('chemical_2', {}).get('name'),
                'solution_solvent_volume_milliliter': solvent_info.get('chemical_volume'),
                'solution_datetime': solution_details.get('datetime'),
            })
            
            # Annealing details
            annealing_info = data.get('annealing', {})
            parameters_to_flatten.update({
                'annealing_temperature': annealing_info.get('temperature'),
                'annealing_time': annealing_info.get('time'),
            })
            
            # Quenching details
            quenching_info = data.get('quenching', {})
            parameters_to_flatten.update({
                'quenching_air_knife_angle': quenching_info.get('air_knife_angle'),
                'quenching_bead_volume': quenching_info.get('bead_volume'),
            })
            
            # Slot Die Coating specific properties
            properties_info = data.get('properties', {})
            parameters_to_flatten.update({
                'slot_die_flow_rate': properties_info.get('flow_rate'),
                'slot_die_head_distance_to_thinfilm': properties_info.get('slot_die_head_distance_to_thinfilm'),
                'slot_die_head_speed': properties_info.get('slot_die_head_speed'),
            })
        
        elif process_type == 'evaporation':
            evap_properties = data.get('evaporation_properties', {})
            parameters_to_flatten.update({
                'evaporation_deposition_rate': evap_properties.get('deposition_rate'),
                'evaporation_pressure': evap_properties.get('pressure'),
                'evaporation_substrate_temperature': evap_properties.get('substrate_temperature'),
            })
        
        # Add the flattened parameters as key-value pairs to the rows
        for param_name, param_value in parameters_to_flatten.items():
            # Only add parameters that have a non-None value
            # This avoids creating rows for parameters not relevant to a specific process type
            if param_value is not None:
                row = {
                    **base_row,
                    'parameter_name': param_name,
                    'parameter_value': param_value
                }
                processed_rows.append(row)
            
    return processed_rows

# --- Main execution ---

all_processed_data = []

# Process the slot die coating data
all_processed_data.extend(process_json_to_narrow_dataframe(slot_die_coating_json['data'], 'slot_die_coating'))

# Process the evaporation data
all_processed_data.extend(process_json_to_narrow_dataframe(evaporation_json['data'], 'evaporation'))

# Create the final tidy DataFrame
combined_df = pd.DataFrame(all_processed_data)

# Sort for better readability (optional)
combined_df = combined_df.sort_values(by=['sample_lab_id', 'experimental_plan_position', 'parameter_name']).reset_index(drop=True)


Unnamed: 0,sample_lab_id,sample_name,process_name,process_description,process_method,experimental_plan_position,process_datetime,layer_type,layer_material_name,solution_solute_concentration_mol,solution_solvent_name,solution_solvent_volume_milliliter,solution_datetime,annealing_temperature,annealing_time,quenching_air_knife_angle,quenching_bead_volume,slot_die_flow_rate,slot_die_head_distance_to_thinfilm,slot_die_head_speed
0,HZB_MMB_8-2002-1-0,HZB_MMB_8-2002-1-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
1,HZB_MMB_8-2002-2-0,HZB_MMB_8-2002-2-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
2,HZB_MMB_8-2002-3-0,HZB_MMB_8-2002-3-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
3,HZB_MMB_8-2002-4-0,HZB_MMB_8-2002-4-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
4,HZB_MMB_8-2002-5-0,HZB_MMB_8-2002-5-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
5,HZB_MMB_8-2002-6-0,HZB_MMB_8-2002-6-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
6,HZB_MMB_8-2002-7-0,HZB_MMB_8-2002-7-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
7,HZB_MMB_8-2002-8-0,HZB_MMB_8-2002-8-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
8,HZB_MMB_8-2002-9-0,HZB_MMB_8-2002-9-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
9,HZB_MMB_8-2002-10-0,HZB_MMB_8-2002-10-0,slot die coating Me4PACz,SAM coating done on two substrate at a time,Slot Die Coating,4.0,2025-05-21T20:39:44.149402+00:00,Hole transport layer,Me4PACz,3e-06,Ethanol,5.0,2025-05-21T20:39:44.148844+00:00,100.0,600.0,60.0,0.5,0.1,60.0,10.0
