# HTMDEC DMS API Example

This example demonstrates how to use the new HTMDEC DMS `/form` and `/entry` REST API endpoints to search for BIRDSHOT data which can be converted to a Pandas dataframe for analysis and visualization. 

The `birdshot.py` module implements some example utility functions for querying, analyzing, and visualizing BIRDSHOT data.

### Note:
During the ingest process, form data collected by Contextualize in the Campaign 1 structure was converted to the [Campaign 2 structure](https://docs.google.com/document/d/1FpiXwLQi8QAOuLB0Xr80V14qGfqtyCpT/edit). This means that:
* Sample IDs were converted to the Campaign 2 structure 
* Campaign 1 "child" sample numbers (T*nn*) have been replaced with ordinal letters (a, b, c, ...)


### Using the REST API

The [Girder Client](https://girder.readthedocs.io/en/latest/python-client.html) can be used to query the REST endpoints directly. When running via the DMS, the API URL and user token required for access are available in the current environment.


In [1]:
from girder_client import GirderClient
import os
import json
import pandas as pd

# Connect to the Girder instance using the API key
client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
client.authenticate(apiKey=os.environ["GIRDER_API_KEY"])

{'_id': '6424ac394236ff9b0883f243'}

In [2]:
# Fetch a few entries to inspect their structure
entries = client.get('entry', parameters={'limit': 5})

import pprint
pprint.pprint(entries[0])  # Look at the first entry to explore keys


{'_id': '66310e7b1b56c53abc7e0fb1',
 'created': '2024-04-30T15:30:03.326000+00:00',
 'data': {'Arc Melting': {'3 Part Sections': {'1': 90.0, '2': 90.0, '3': 110.0},
                          '3-Parts Pre-Mn Melting': {'1': '', '2': '', '3': ''},
                          'Full Ingot': {'1': 110.0,
                                         '2': 145.0,
                                         '3': 145.0,
                                         '4': 145.0,
                                         '5': 145.0,
                                         '6': 145.0,
                                         '7': 175.0},
                          'Ingot Mass Information': {'Final Ingot Mass': 29.9995,
                                                     'Mass Loss': 0.0},
                          'Process Overview': {'Completed By': 'Daniel',
                                               'Finish Date': '2023-06-04',
                                               'Start Date': '2023-05-30',
    

In [None]:
entry = entries[0]  # First entry from previous step
metadata = entry.get('meta', {})
print(metadata.keys())  # This shows the metadata fields like 'Iteration', 'Form', etc.

In [None]:
from girder_client import GirderClient
import pprint

client = GirderClient(apiUrl="https://data.htmdec.org/api/v1")
client.authenticate(apiKey="s6u5gYeQIunhjbc6XOXzV5tuEYSfdQn2ZWbaqJyU")

# Step 1: List all items in your folder
folder_id = "65fac36a60662ef084f6bc06"
items = list(client.listItem(folder_id))

print(f"Found {len(items)} items in the folder.")

# Step 2: Filter by metadata field "Iteration" = "AAA"
for item in items:
    meta = item.get("meta", {})
    if meta.get("Iteration") == "AAA":
        print(f"\nItem Name: {item['name']}")
        pprint.pprint(meta)




In [None]:
# Look at the metadata keys in the first few items
for i, item in enumerate(items[:5]):
    print(f"\nItem {i} metadata keys: {item.get('meta', {}).keys()}")

In [None]:
# Look at the metadata keys in the first few items
for i, item in enumerate(items[:5]):
    print(f"\nItem {i} metadata keys: {item.get('meta', {}).keys()}")

In [None]:
# client.get('entry/search', parameters={'query': f'^{iteration}.._VAM-.', 'limit': 1000})

To query VAM data from iteration 1 (AAA) across all forms, use the `/entry/search` endpoint to query the Sample ID. Data is returned in JSON:

In [3]:
iteration = 'AAA'
raw_data = client.get(
        'entry/search', parameters={'query': f'^{iteration}.._VAM-.', 'limit': 1000}
)
raw_data[0]["data"]

{'Forging': {'Ingot Condition': {'Soak Time': 30, 'Temperature': 1100},
  'Ingot Dimensions After': {'Length': 44.7,
   'Thickness': 3.8,
   'Thickness Reduction': -63.2,
   'Width': 27},
  'Ingot Dimensions Before': {'Length': 37.5,
   'Thickness': 10.3,
   'Width': 14.3},
  'Maximum Load': [{'Maximum Load Step': 244.2}],
  'Press Temperature': 397,
  'Process Overview': {'Completed By': 'Robert & Michael',
   'Finish Date': '2022-10-03',
   'Start Date': '2022-09-30',
   'Time Spent': '7:00'}},
 'Homogenization': {'Process Overview': {'Completed By': 'Michael',
   'Finish Date': '2022-10-02',
   'Start Date': '2022-09-28',
   'Time Spent': '6:00'},
  'Purging Sequence Pressure': {'1': 4.8e-05,
   '2': 3.8e-05,
   '3': 3e-05,
   '4': 1.5e-05},
  'Thermal Conditions': {'Atmosphere': 'Ar',
   'Cooling Rate': 'FC',
   'Duration': 24,
   'Pressure': 5,
   'Temperature': 1150}},
 'Notes': '',
 'sampleId': 'AAA01_VAM-B',
 'suffix': 'Syn',
 'targetPath': 'AAA/VAM-B/AAA01/Syn'}

Use the `/form` and `entry` endpoints to query data from a specific form:

In [None]:
form_name = 'tensile-details.json'
form = client.get('form', parameters={'entryFileName': form_name, 'limit': 1000})
form[0]['_id']

tensile_data = client.get('entry', parameters={'formId': form[0]['_id'], 'limit': 1000})
tensile_data[0]["data"]

### Using the `birdshot` Module

The `birdshot.py` module implements a few helper functions to query the REST API and convert data into a single Pandas dataframe for analysis:

In [4]:
import birdshot

The `query()` method takes the iteration identifier as an argument and returns a dataframe of results from multiple characterization methods. The example dataframe is intended to reproduce the information available in the Summary Synthesis Results (for example see [HTMDEC AAB Summary Synthesis Results](https://docs.google.com/spreadsheets/d/15cdImpOComsvUpAIq20_nyff65WVzN_q/)).

In [5]:
#client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
#client.token = os.environ["GIRDER_TOKEN"]
#client.authenticate(apiKey=os.environ["GIRDER_TOKEN"])
df = birdshot.query("AAA")
df

Unnamed: 0,Target Composition (%).Al,Target Composition (%).Co,Target Composition (%).Cr,Target Composition (%).Cu,Target Composition (%).Fe,Target Composition (%).Mn,Target Composition (%).Ni,Target Composition (%).V,XRD.Phase,XRD.Lattice Parameters,...,Maximum ∂2σ/∂ε2.a,UTS/YS Ratio.a,Ultimate Tensile Strength.a,Yield Strength.a,Elastic Modulus.b,Elongation.b,Maximum ∂2σ/∂ε2.b,UTS/YS Ratio.b,Ultimate Tensile Strength.b,Yield Strength.b
AAA01_VAM-B,0,45,10,0,20,0,15,10,FCC,3.583,...,-1726,1.728953,842,487,196,21,-3671,1.605416,830,517
AAA02_VAM-B,0,30,10,0,5,0,45,10,FCC,3.571,...,-3363,1.54461,831,538,228,24,-3479,1.77907,918,516
AAA03_VAM-B,0,30,5,0,30,0,20,15,FCC,3.605,...,-779,1.494505,680,455,206,22,-242,1.690867,722,427
AAA04_VAM-B,0,25,10,0,20,0,40,5,FCC,3.578,...,-4567,1.603104,723,451,194,20,-1436,1.624021,622,383
AAA05_VAM-B,0,10,10,0,55,0,25,0,FCC,3.594,...,-2342,1.446602,447,309,168,17,-2094,1.324324,441,333
AAA06_VAM-B,0,35,25,0,5,0,30,5,FCC,3.58,...,-1945,1.814815,931,513,239,21,-4975,1.512384,977,646
AAA07_VAM-B,0,55,10,0,10,0,15,10,FCC,3.579,...,-3035,1.57947,954,604,220,27,-2321,1.778793,973,547
AAA08_VAM-B,0,10,10,0,25,0,50,5,FCC,3.574,...,-3360,1.64988,688,417,193,14,-6069,1.488739,661,444
AAA09_VAM-B,5,0,5,0,25,0,60,5,FCC,3.585,...,1660,1.887179,736,390,181,40,1349,2.371601,785,331
AAA10_VAM-B,0,5,15,0,5,0,70,5,FCC,3.57,...,-748,1.758025,712,405,208,28,-877,2.03183,766,377


In [None]:
raw_data = birdshot.query("AAA", raw=True)
print(raw_data)

In [None]:
from girder_client import GirderClient
import os

client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
#client.token = os.environ["GIRDER_TOKEN"]
client.authenticate(apiKey=os.environ["GIRDER_TOKEN"])
print(client.get('user/me'))
raw_data_direct = client.get("entry/search", parameters={"query": "AAA", "limit": 1000})
print(raw_data_direct)


In [None]:
from girder_client import GirderClient
import os

client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
client.token = os.environ["GIRDER_TOKEN"]

# 1. List all collections and find the one named "BIRDSHOT (Center)"
collections = client.listCollection()
for col in collections:
    print(col['name'], col['_id'])
    # If col['name'] == 'BIRDSHOT (Center)', store its _id
    if col['name'] == 'BIRDSHOT (Center)':
        birdshot_collection_id = col['_id']

# 2. List the subfolders of the BIRDSHOT (Center) collection
folders = client.listFolder(birdshot_collection_id, parentType='collection')
for folder in folders:
    print(folder['name'], folder['_id'])
    # If folder['name'] == 'sample_data', store its _id
    if folder['name'] == 'sample_data':
        sample_data_folder_id = folder['_id']

# 3. List subfolders inside "sample_data"
subfolders = client.listFolder(sample_data_folder_id)
for sf in subfolders:
    print(sf['name'], sf['_id'])



In [None]:
df = birdshot.query("AAA")

In [None]:
import sys
print(sys.executable)

In [None]:
import os
print(os.environ.get("GIRDER_API_URL"))
print(os.environ.get("GIRDER_API_KEY"))
print(os.environ.get("TMP_URL"))


In [None]:
import birdshot
import pandas as pd

data = birdshot.query("AAA", raw=True)  # Assuming you can modify or patch query to return early
df = pd.DataFrame.from_dict(data, orient='index')
print("Actual columns:", df.columns)


In [None]:
from girder_client import GirderClient
import os

client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
client.authenticate(apiKey=os.environ["GIRDER_TOKEN"])

# Send the same query that birdshot.query("AAA") uses
raw_data = client.get(
    'entry/search',
    parameters={'query': '^AAA.._VAM-.', 'limit': 1000}
)

# If you want to preview what keys/structure it has:
import json
print(json.dumps(raw_data[:2], indent=2))  # print a couple entries

# Create a DataFrame to see the actual columns
import pandas as pd
df = pd.DataFrame.from_records(raw_data)
print("Actual columns:", df.columns.tolist())


In [None]:
import birdshot
import importlib

importlib.reload(birdshot)


In [None]:
birdshot.show_plot()


In [None]:
import birdshot
import os
import socket

# Find a free port
def get_free_port():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(('', 0))  # Let OS pick an available port
        return s.getsockname()[1]

free_port = get_free_port()
os.environ['TMP_URL'] = 'localhost:8888'  # This should match your running Jupyter server

def patched_show_plot():
    from dash import Dash
    import dash_bootstrap_components as dbc

    app = Dash(
        __name__,
        external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME],
        requests_pathname_prefix=f"/proxy/{free_port}/",
    )
    app.layout = birdshot.serve_layout
    app.run(
        debug=False,
        jupyter_mode="jupyterlab",
        host="0.0.0.0",
        port=free_port,
        jupyter_server_url=f"http://{os.environ['TMP_URL']}/",
    )

birdshot.show_plot = patched_show_plot
birdshot.show_plot()


### Working on understanding the queries to get something interesting into the dataframe

In [None]:
# Define the iteration value
iteration = 'AAA'

# Build the query string using the iteration variable
query_string = f"^{iteration}.._VAM-."

# Construct the birdshot.query statement (as it would be used)
query_statement = f"birdshot.query('entry/search', parameters={{'query': '{query_string}', 'limit': 1000}})"

# Print the constructed query statement
print(query_statement)


In [None]:
iteration = 'AAA'
raw_data = client.get(
        'entry/search', parameters={'query': f'^{iteration}.._VAM-.', 'limit': 1000}
)
raw_data[0]["data"]

In [None]:
birdshot.query('entry/search', parameters={'query': '^AAA.._VAM-.', 'limit': 1000})