# HTMDEC DMS API Example

This example introduces use of the HTMDEC DMS `/form` and `/entry` REST API endpoints to search for BIRDSHOT data.  The notebooks queries the DMS for BIRDSHOT data which is loaded into a Pandas dataframe for analysis and visualization. 

The `birdshot.py` module implements some example utility functions for querying, analyzing, and visualizing BIRDSHOT data.

## TODO: write a description of the structure so people understand what forms and entries are

### Note:
During the ingest process, form data in the Campaign 1 structure (originally organized by Contextualized during their data seedling award)  was converted to the [Campaign 2 structure](https://docs.google.com/document/d/1FpiXwLQi8QAOuLB0Xr80V14qGfqtyCpT/edit). This means that:
* Sample IDs were converted to the Campaign 2 structure 
* Campaign 1 "child" sample numbers (T*nn*) have been replaced with ordinal letters (a, b, c, ...)


### Using the REST API

The [Girder Client](https://girder.readthedocs.io/en/latest/python-client.html) can be used to query the REST endpoints directly. When running through the DMS, the API URL and a user token are required for access are available in the current environment. In this notebook, we will run from outside the DMS, in a local Jupyter environment that is likely on someone's laptop, so authentication will happen with an API Key created by the user in their account profile on the DMS platform. This authentication by key versus token is the fundamental difference between this example and the one shown as a Tale on the data.htmdec.org platform.


---

### Set up
1. Import modules
2. Read authentication information from environmental variables

In [1]:
from girder_client import GirderClient
import os
import json
import pandas as pd

# Connect to the Girder instance using the API key
client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
client.authenticate(apiKey=os.environ["GIRDER_API_KEY"])

{'_id': '6424ac394236ff9b0883f243'}

In [2]:
# Fetch a few entries to inspect their structure
entries = client.get('entry', parameters={'limit': 5})

import pprint
pprint.pprint(entries[0])  # Look at the first entry to explore keys


{'_id': '66310e7b1b56c53abc7e0fb1',
 'created': '2024-04-30T15:30:03.326000+00:00',
 'data': {'Arc Melting': {'3 Part Sections': {'1': 90.0, '2': 90.0, '3': 110.0},
                          '3-Parts Pre-Mn Melting': {'1': '', '2': '', '3': ''},
                          'Full Ingot': {'1': 110.0,
                                         '2': 145.0,
                                         '3': 145.0,
                                         '4': 145.0,
                                         '5': 145.0,
                                         '6': 145.0,
                                         '7': 175.0},
                          'Ingot Mass Information': {'Final Ingot Mass': 29.9995,
                                                     'Mass Loss': 0.0},
                          'Process Overview': {'Completed By': 'Daniel',
                                               'Finish Date': '2023-06-04',
                                               'Start Date': '2023-05-30',
    

In [None]:
entry = entries[0]  # First entry from previous step
metadata = entry.get('meta', {})
print(metadata.keys())  # This shows the metadata fields like 'Iteration', 'Form', etc.

In [None]:
from girder_client import GirderClient
import pprint

client = GirderClient(apiUrl="https://data.htmdec.org/api/v1")
client.authenticate(apiKey="s6u5gYeQIunhjbc6XOXzV5tuEYSfdQn2ZWbaqJyU")

# Step 1: List all items in your folder
folder_id = "65fac36a60662ef084f6bc06"
items = list(client.listItem(folder_id))

print(f"Found {len(items)} items in the folder.")

# Step 2: Filter by metadata field "Iteration" = "AAA"
for item in items:
    meta = item.get("meta", {})
    if meta.get("Iteration") == "AAA":
        print(f"\nItem Name: {item['name']}")
        pprint.pprint(meta)




In [None]:
# Look at the metadata keys in the first few items
for i, item in enumerate(items[:5]):
    print(f"\nItem {i} metadata keys: {item.get('meta', {}).keys()}")

In [None]:
# Look at the metadata keys in the first few items
for i, item in enumerate(items[:5]):
    print(f"\nItem {i} metadata keys: {item.get('meta', {}).keys()}")

In [None]:
# client.get('entry/search', parameters={'query': f'^{iteration}.._VAM-.', 'limit': 1000})

To query VAM data from iteration 1 (AAA) across all forms, use the `/entry/search` endpoint to query the Sample ID. Data is returned in JSON:

In [3]:
iteration = 'AAA'
raw_data = client.get(
        'entry/search', parameters={'query': f'^{iteration}.._VAM-.', 'limit': 1000}
)
raw_data[0]["data"]

{'Forging': {'Ingot Condition': {'Soak Time': 30, 'Temperature': 1100},
  'Ingot Dimensions After': {'Length': 44.7,
   'Thickness': 3.8,
   'Thickness Reduction': -63.2,
   'Width': 27},
  'Ingot Dimensions Before': {'Length': 37.5,
   'Thickness': 10.3,
   'Width': 14.3},
  'Maximum Load': [{'Maximum Load Step': 244.2}],
  'Press Temperature': 397,
  'Process Overview': {'Completed By': 'Robert & Michael',
   'Finish Date': '2022-10-03',
   'Start Date': '2022-09-30',
   'Time Spent': '7:00'}},
 'Homogenization': {'Process Overview': {'Completed By': 'Michael',
   'Finish Date': '2022-10-02',
   'Start Date': '2022-09-28',
   'Time Spent': '6:00'},
  'Purging Sequence Pressure': {'1': 4.8e-05,
   '2': 3.8e-05,
   '3': 3e-05,
   '4': 1.5e-05},
  'Thermal Conditions': {'Atmosphere': 'Ar',
   'Cooling Rate': 'FC',
   'Duration': 24,
   'Pressure': 5,
   'Temperature': 1150}},
 'Notes': '',
 'sampleId': 'AAA01_VAM-B',
 'suffix': 'Syn',
 'targetPath': 'AAA/VAM-B/AAA01/Syn'}

Use the `/form` and `entry` endpoints to query data from a specific form:

In [None]:
form_name = 'tensile-details.json'
form = client.get('form', parameters={'entryFileName': form_name, 'limit': 1000})
form[0]['_id']

tensile_data = client.get('entry', parameters={'formId': form[0]['_id'], 'limit': 1000})
tensile_data[0]["data"]

### Using the `birdshot` Module

The `birdshot.py` module implements a few helper functions to query the REST API and convert data into a single Pandas dataframe for analysis:

In [6]:
import birdshot

The `query()` method takes the iteration identifier as an argument and returns a dataframe of results from multiple characterization methods. The example dataframe is intended to reproduce the information available in the Summary Synthesis Results (for example see [HTMDEC AAB Summary Synthesis Results](https://docs.google.com/spreadsheets/d/15cdImpOComsvUpAIq20_nyff65WVzN_q/)).

In [11]:
#client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
#client.token = os.environ["GIRDER_TOKEN"]
#client.authenticate(apiKey=os.environ["GIRDER_TOKEN"])
df = birdshot.query("BBA")
df

Unnamed: 0,Target Composition (%).Al,Target Composition (%).Co,Target Composition (%).Cr,Target Composition (%).Cu,Target Composition (%).Fe,Target Composition (%).Mn,Target Composition (%).Ni,Target Composition (%).V,Measured Composition (%).Al,Measured Composition (%).Co,...,Maximum ∂2σ/∂ε2.b,UTS/YS Ratio.b,Ultimate Tensile Strength.b,Yield Strength.b,Elastic Modulus.c,Elongation.c,Maximum ∂2σ/∂ε2.c,UTS/YS Ratio.c,Ultimate Tensile Strength.c,Yield Strength.c
BBA01_VAM-A,4,8,4,4,16,12,48,4,4.074,8.169333,...,826.40772,2.433921,608.363077,249.951812,,,,,,
BBA02_VAM-A,4,16,0,4,12,8,52,4,4.166667,14.974667,...,298.089239,2.30191,677.692158,294.404285,209.541831,1.0,-176.728705,2.30701,709.610253,307.588701
BBA03_VAM-A,4,12,8,4,16,8,48,0,4.252667,12.136,...,562.282798,2.606923,579.563986,222.317245,,,,,,
BBA04_VAM-A,0,12,8,4,16,20,36,4,,,...,,,,,,,,,,
BBA05_VAM-A,4,12,8,0,8,8,56,4,4.068,12.248667,...,654.294205,2.279349,726.744092,318.838402,,,,,,
BBA06_VAM-A,4,0,8,4,24,12,44,4,4.198,0.0,...,950.708832,2.567465,601.019294,234.090518,,,,,,
BBA07_VAM-A,4,12,0,0,16,8,52,8,5.03,12.254667,...,-35046.315706,1.098992,374.294781,340.579997,226.46786,1.0,2606.12752,1.946526,841.641202,432.381192
BBA08_VAM-A,0,24,8,0,12,16,32,8,0.0,22.892667,...,748.285761,2.481465,676.029199,272.43146,,,,,,
BBA09_VAM-A,4,16,8,0,12,0,48,12,4.898667,16.117333,...,-68584.274662,1.270553,358.572122,282.217325,,,,,,
BBA10_VAM-A,4,0,4,0,20,8,52,12,4.388,0.0,...,-486.2199,1.870213,690.342523,369.125009,221.709438,1.0,-13.990309,2.193465,825.504249,376.347128


In [None]:
raw_data = birdshot.query("AAA", raw=True)
print(raw_data)

In [4]:
from girder_client import GirderClient
import os

client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
#client.token = os.environ["GIRDER_TOKEN"]
client.authenticate(apiKey=os.environ["GIRDER_TOKEN"])
print(client.get('user/me'))
raw_data_direct = client.get("entry/search", parameters={"query": "AAA", "limit": 1000})
print(raw_data_direct)


{'_accessLevel': 2, '_id': '6424ac394236ff9b0883f243', '_modelType': 'user', 'admin': True, 'created': '2023-03-29T21:23:05.789000+00:00', 'email': 'elbert@jhu.edu', 'emailVerified': False, 'firstName': 'David', 'groupInvites': [], 'groups': ['641893cba82b019cd3c7005f', '659ed73d60662ef084f6b385', '66ec64d20132d76f601cecf7', '67040aa897930f6e9db881c4', '671e4e6130a3240163bb785d', '67905dfda1b0b3735eefcb04', '67c1ccf744f9c11af77af053'], 'lastLogin': '2025-03-23T16:44:24.427000+00:00', 'lastName': 'Elbert', 'login': 'elbert', 'otherTokens': [{'access_token': 'Agzo3OjM3DXjXW1YjNPE39wedg0PO5BoQelDPg5kwPn95PWz2qtkC82qjom4x2WVpQbBDQBxv3g6kMFbVQwvEtVOvON', 'expires_in': 172800, 'refresh_token': 'AgdqMqrMeO0x98z3P00mxEYXje7Y4qgxKXJk6po7kE6Dqda4gmCOUzD2GQzrXP88z5O5OoODP7ybY3aDBQKqvrrxPa7DW', 'resource_server': 'usc_isrd', 'scope': 'https://auth.globus.org/scopes/a77ee64a-fb7f-11e5-810e-8c705ad34f60/deriva_all', 'state': 'aS0jtMt2Rbb0nB8lZJfubrZTIN1y3MEorNudaLYgN0p5C1luyC4kkEc06cjr2bmj.https://d

In [None]:
from girder_client import GirderClient
import os

client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
client.token = os.environ["GIRDER_TOKEN"]

# 1. List all collections and find the one named "BIRDSHOT (Center)"
collections = client.listCollection()
for col in collections:
    print(col['name'], col['_id'])
    # If col['name'] == 'BIRDSHOT (Center)', store its _id
    if col['name'] == 'BIRDSHOT (Center)':
        birdshot_collection_id = col['_id']

# 2. List the subfolders of the BIRDSHOT (Center) collection
folders = client.listFolder(birdshot_collection_id, parentType='collection')
for folder in folders:
    print(folder['name'], folder['_id'])
    # If folder['name'] == 'sample_data', store its _id
    if folder['name'] == 'sample_data':
        sample_data_folder_id = folder['_id']

# 3. List subfolders inside "sample_data"
subfolders = client.listFolder(sample_data_folder_id)
for sf in subfolders:
    print(sf['name'], sf['_id'])



In [5]:
df = birdshot.query("AAA")

NameError: name 'birdshot' is not defined

In [None]:
import sys
print(sys.executable)

In [None]:
import os
print(os.environ.get("GIRDER_API_URL"))
print(os.environ.get("GIRDER_API_KEY"))
print(os.environ.get("TMP_URL"))


In [None]:
import birdshot
import pandas as pd

data = birdshot.query("AAA", raw=True)  # Assuming you can modify or patch query to return early
df = pd.DataFrame.from_dict(data, orient='index')
print("Actual columns:", df.columns)


In [None]:
from girder_client import GirderClient
import os

client = GirderClient(apiUrl=os.environ["GIRDER_API_URL"])
client.authenticate(apiKey=os.environ["GIRDER_TOKEN"])

# Send the same query that birdshot.query("AAA") uses
raw_data = client.get(
    'entry/search',
    parameters={'query': '^AAA.._VAM-.', 'limit': 1000}
)

# If you want to preview what keys/structure it has:
import json
print(json.dumps(raw_data[:2], indent=2))  # print a couple entries

# Create a DataFrame to see the actual columns
import pandas as pd
df = pd.DataFrame.from_records(raw_data)
print("Actual columns:", df.columns.tolist())


In [None]:
import birdshot
import importlib

importlib.reload(birdshot)


In [None]:
birdshot.show_plot()


In [None]:
import birdshot
import os
import socket

# Find a free port
def get_free_port():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(('', 0))  # Let OS pick an available port
        return s.getsockname()[1]

free_port = get_free_port()
os.environ['TMP_URL'] = 'localhost:8888'  # This should match your running Jupyter server

def patched_show_plot():
    from dash import Dash
    import dash_bootstrap_components as dbc

    app = Dash(
        __name__,
        external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME],
        requests_pathname_prefix=f"/proxy/{free_port}/",
    )
    app.layout = birdshot.serve_layout
    app.run(
        debug=False,
        jupyter_mode="jupyterlab",
        host="0.0.0.0",
        port=free_port,
        jupyter_server_url=f"http://{os.environ['TMP_URL']}/",
    )

birdshot.show_plot = patched_show_plot
birdshot.show_plot()


### Working on understanding the queries to get something interesting into the dataframe

In [None]:
# Define the iteration value
iteration = 'AAA'

# Build the query string using the iteration variable
query_string = f"^{iteration}.._VAM-."

# Construct the birdshot.query statement (as it would be used)
query_statement = f"birdshot.query('entry/search', parameters={{'query': '{query_string}', 'limit': 1000}})"

# Print the constructed query statement
print(query_statement)


In [None]:
iteration = 'AAA'
raw_data = client.get(
        'entry/search', parameters={'query': f'^{iteration}.._VAM-.', 'limit': 1000}
)
raw_data[0]["data"]

In [None]:
birdshot.query('entry/search', parameters={'query': '^AAA.._VAM-.', 'limit': 1000})