<a href="https://colab.research.google.com/github/mkywall/crucible-analysis-notebooks/blob/main/general/Crucible_Tutorial_Summer_School.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pycrucible Tutorial

This tutorial demonstrates how to use the pycrucible client to manage data through the Crucible Platform:
- Retrieve your user crucible API key
- Upload datasets to Crucible with automated metadata parsing
- Upload datasets to Crucible with manually curated metadata appended
- Associate datasets with batches
- Query datasets by batch, sample
- Query samples by batch, dataset
- Upload sample synthesis metadata
- Download data
- Generate AutoBot batch report

In [None]:
!pip install git+https://github.com/MolecularFoundryCrucible/pycrucible.git

In [None]:
import os
import json
from datetime import datetime
from pycrucible import CrucibleClient
import uuid
from typing import List, Dict
import pprint

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#### Step 1: Set up the Crucible Python Client

In a web browser navigate to https://crucible.lbl.gov/testapi/user_apikey.  You will be prompted to login with your ORCID.  Login to ORCID and copy the resulting apikey to an environment variable.

In [None]:
# Configuration - Update these with your credentials
API_URL = "https://crucible.lbl.gov/testapi"  # Replace with your API URL
API_KEY = ''

# Initialize the client
client = CrucibleClient(API_URL, API_KEY)
print("Crucible client initialized successfully!")

#### Step 2: Use the Crucible python client to upload and ingest a batch of SpecRun datasets

In [None]:
data_folder = "drive/Shareddrives/robot summer school/summer-school-sample-data/cbox"
h5_files = [f for f in os.listdir(data_folder) if f.endswith('spec_run.h5')]
print(h5_files)

In [None]:
for h5file in h5_files[0:1]:
    h5 = os.path.join(data_folder, h5file)
    print(h5)
    results = client.build_new_dataset_from_file(files_to_upload = [h5],
                                            ingestor = "SpinbotSpecRunIngestor",
                                            verbose = False)

In [None]:
ds = results['created_record']

##### Check out the data you just uploaded

In [None]:
found_ds = client.get_dataset(ds['unique_id'], include_metadata=True)
pprint.pprint(found_ds)

In [None]:
# should make a client func for ingesting from dsid
client.list_datasets(file_to_upload = ds['file_to_upload'])

In [None]:
# query by dataset
client.list_samples(dataset_id = ds['unique_id'])

In [None]:
batch_id = '0t3h7ymbm5s27000z6tt82zvx4'

In [None]:
# query by batch id
client.list_samples(parent_id = batch_id)

In [None]:
# see all datasets for a batch
client.list_datasets(sample_id = batch_id)

#### Step 3: Send the dataset information to the data catalog (SciCat)

In [None]:
client.send_to_scicat(dsid = ds['unique_id'], wait_for_scicat_response= True)

Go to https://mf-scicat.lbl.gov to get a quick look at your data

##### Add a project to associate with your data

In [None]:
help(client.add_project)

In [None]:
client.add_project(project_info = {"project_id":"AUM_DEMO",
                                   "organization":"Summer School",
                                   "project_lead_email":"mkwall@lbl.gov"})

#### Step 4: Use the Crucible python client to upload and ingest a photo of the batch as a dataset

In [None]:
metadata_to_add = {'comments': 'this is a fake dataset',
                   'weather': 'sunny',
                   'iphone_version': 11
                  }

In [None]:
batch_name = 'S-pMeMBAI-pre-2'
data_folder = "drive/Shareddrives/robot summer school/summer-school-sample-data/photo_capture"
p1 = os.path.join(data_folder, 'DSC_0001.jpg')
p2 = os.path.join(data_folder, 'DSC_0002.jpg')
results = client.build_new_dataset_from_file(files_to_upload = [p1, p2],
                                        dataset_name = 'S-pMeMBAI-pre-photo-capture',
                                        project_id = "AUM_DEMO",
                                        owner_orcid = None,
                                        instrument_name = "PhotoBox",
                                        measurement = "iphone_capture",
                                        session_name = 'S-pMeMBAI-pre-2',
                                        creation_time = None,
                                        source_folder = data_folder,
                                        scientific_metadata = metadata_to_add,
                                        keywords = [batch_name],
                                        ingestor = 'ImageIngestor',
                                        verbose = False,
                                        wait_for_ingestion_response = True)

ds = results['created_record']

#### Step 4: Link this new dataset to the batch it is associated with

In [None]:
client.add_dataset_to_sample(dataset_id = ds['unique_id'], sample_id = batch_id)

In [None]:
client.list_datasets(sample_id = batch_id)

#### Step 5: Add Additional Metadata to Samples

Demonstrate how to add custom metadata to individual samples.

In [None]:
from pydantic import BaseModel
class SpinbotBatchMetadata(BaseModel):
    sample_id: str
    sample_type: str = 'spinbot_batch'
    spin_duration_s: int
    spin_velocity_rpm: int
    dispense_delay_s: int
    pipette_height_mm: float
    dispense_speed_ul_s: int
    precursor_b_volume_ul: float
    annealing_duration_s: int
    molar_ratio_fai_macl: str

In [None]:
batch_metadata = SpinbotBatchMetadata(sample_id= batch_id,
                                       spin_duration_s = 40,
                                       spin_velocity_rpm = 200,
                                       dispense_delay_s = 2,
                                       pipette_height_mm = 0.4,
                                       dispense_speed_ul_s = 4,
                                       precursor_b_volume_ul = 50,
                                       annealing_duration_s = 45,
                                       molar_ratio_fai_macl= '5:1')
batch_metadata

In [None]:
client.add_sample_metadata(**batch_metadata.model_dump())

#### Step 7: Download the data associated with a batch

Download all datasets associated with a batch.

In [None]:
datasets_in_batch = client.list_datasets(sample_id = batch_id)
print(datasets_in_batch)
for ds in datasets_in_batch:
    print(ds)
    try:
      client.download_dataset(dsid = ds['unique_id'])
      print('downloaded')
    except Exception as err:
      print(err)

Download the batch metadata

In [None]:
batch_md = client.get_sample_metadata(sample_id = batch_id)

#### Step 8: Generate a Batch Report Card

##### TO DO


1. modify get_or_add_project to accept project_info
2. add project having some issues in scicat when project doesn't already exist...
3. swap get_or_add for add_project

5. make sure download works
6. sample synthesis table and upload
    *   from xml
7. batch report
9. add open in colab button