# PAD Analytics Complete Function Guide

A comprehensive demonstration of all PAD Analytics functions with pharmaceutical quality testing context.

**Version**: 0.2.1  
**Date**: July 2025  
**Context**: Paper Analytical Devices for Pharmaceutical Quality Testing

## Setup and Imports

In [50]:
# Import PAD Analytics and supporting libraries
import pad_analytics as pad
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

print(f"PAD Analytics version: {pad.__version__}")
print("Ready for pharmaceutical quality testing analysis!")

PAD Analytics version: 0.2.1
Ready for pharmaceutical quality testing analysis!


### Setup Google Drive for saving results

In [None]:
# Setup Google Drive for saving results
from google.colab import drive
drive.mount('/content/drive')

# Workshop folder (same as pandas notebook)
workshop_folder = '/content/drive/MyDrive/ML_PAD_Workshop'

# Create subfolder for today's work
import os
pad_analysis_folder = f'{workshop_folder}/day_1'
os.makedirs(pad_analysis_folder, exist_ok=True)

print(f"📁 Results will be saved to: {pad_analysis_folder}")
print("🔗 This ensures your work persists after the session ends!")

## Understanding PAD Concepts

Before diving into the functions, let's understand the key concepts in PAD Analytics:

### PAD Data Structure

- **Sample ID (`sample_id`)**: Unique identifier for each **physical** PAD card
  - In the image below, this would be the printed number (e.g., 53707)
  - One physical card can have multiple digital images

- **Card ID (`card_id`)**: Unique identifier for each PAD card **image** in the database
  - Different photos of the same physical card have different card_ids
  - Multiple images capture device and lighting variations for robust models

- **Projects**: Collections of physical cards with specific layouts and chemical components
  - Captured under different devices and illumination conditions
  - Represent specific research studies or validation campaigns

- **Datasets**: Curated collections for training/testing ML models
  - Cards from one or multiple projects
  - Split into training and test sets for model validation

- **Models**: ML algorithms trained on specific datasets
  - Drug classification: Identify the pharmaceutical compound
  - Concentration quantification: Measure drug concentration levels

![PAD Example](https://padproject.nd.edu/assets/589291/300x/15371_processed_pad_amoxicillin_100.png)

*Example: Amoxicillin PAD card showing colorimetric reaction patterns*

### Key Relationships

```
Physical PAD → Multiple Images → Projects → Datasets → Models → Predictions
   (sample_id)    (card_ids)                                      ↓
                                                        Quality Assessment
```

**Important**: 
- One `sample_id` → Multiple `card_ids` (same physical card, different photos)
- Multiple projects → Can contribute to one dataset
- One dataset → Trains one or more models
- Models → Make predictions for pharmaceutical quality testing

## PAD Technology Context

### What are Paper Analytical Devices (PADs)?

- **Low-cost diagnostic tools** for pharmaceutical quality testing
- **Colorimetric detection**: Produce color patterns when drug samples are applied
- **Field deployment**: Enable quality testing in resource-limited settings
- **Rapid analysis**: Results available within minutes

### How PAD Analytics Works

1. **Image Capture**: Using the PADReader app,PAD cards photographed under various lighting and device conditions
2. **Feature Extraction**: ML algorithms analyze color patterns from specific card regions
3. **Model Prediction**: Trained models classify drugs and quantify concentrations
4. **Quality Assessment**: Results support pharmaceutical quality control decisions

### Real-World Applications

- **Pharmaceutical Quality Control**: Verify drug identity and potency
- **Counterfeit Detection**: Identify fake or substandard medications
- **Field Validation**: Quality testing in developing countries
- **Supply Chain Monitoring**: Ensure drug quality throughout distribution

---

# Function Categories

Now let's explore all PAD Analytics functions organized by category:

## 1. Dataset Management Functions

*Work with curated collections of PAD cards for ML training/testing*

**PAD Context**: Datasets contain train/test splits of card images from one or more projects, specifically curated for model development and validation in pharmaceutical quality testing. You can also see the [PAD Dataset Catalog](https://padproject.nd.edu/data-structurehttps://padproject.info/pad_dataset_registry/datasets/) for more details.


### 1.1 Discover Available Datasets

In [2]:
# List all available datasets
datasets = pad.get_datasets()
print(f"Total datasets available: {len(datasets)}")
print("\nAvailable datasets for pharmaceutical quality testing:")
datasets.head()

Total datasets available: 10

Available datasets for pharmaceutical quality testing:


Unnamed: 0,Dataset Name,Total Records,Description,Documentation,Source
0,FHI2020_Stratified_Sampling,8001,Enhanced approach to selecting training/test s...,https://padproject.info/pad_dataset_registry/d...,catalog
1,FHI2021,0,Dataset: FHI2021 from PaperAnalyticalDeviceND/...,https://padproject.info/pad_dataset_registry/d...,catalog
2,FHI2022,0,Dataset: FHI2022 from PaperAnalyticalDeviceND/...,https://padproject.info/pad_dataset_registry/d...,catalog
3,FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0,5924,Dataset: FHI360_FHI2020-FHI2022_MidTrainingSet...,https://padproject.info/pad_dataset_registry/d...,catalog
4,FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1,9027,Dataset: FHI360_FHI2020_MidTrainingSet-Zero_Go...,https://padproject.info/pad_dataset_registry/d...,catalog


In [3]:
# List the names of the available datasets
datasets['Dataset Name'].values

array(['FHI2020_Stratified_Sampling', 'FHI2021', 'FHI2022',
       'FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0',
       'FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1',
       'FHI360_FHI2020_MidTrainingSet_Good_v1.0',
       'FHI360_FHI360-FHI2020_BalancedData_v1.0',
       'Leiberman-Lab_ChemoPADNNtraining2024_Partial-Drug-Set_v1.0',
       'TFDA_MSH-Tanzania_v2.0', 'Veripad_ChemoPAD-idPAD2.4_v1.0'],
      dtype=object)

### 1.2 Load Dataset for Analysis

In [4]:
# Load the main pharmaceutical quality testing dataset
dataset_name = "FHI2020_Stratified_Sampling"
fhi_data = pad.get_dataset_cards(dataset_name)

print(f"Loaded '{dataset_name}' for pharmaceutical analysis:")
print(f"Total PAD card images: {len(fhi_data)}")
print(f"Unique drugs tested: {fhi_data['sample_name'].nunique()}")
print(f"Concentration levels: {sorted(fhi_data['quantity'].unique())}")

# Show sample data
print(f"Clean dataset view: {len(fhi_data)} PAD card records")
fhi_data.head() # head: Show the first 5 rows of the dataset

Loaded 'FHI2020_Stratified_Sampling' for pharmaceutical analysis:
Total PAD card images: 8001
Unique drugs tested: 23
Concentration levels: [0, 20, 50, 80, 100]
Clean dataset view: 8001 PAD card records


Unnamed: 0,id,sample_id,sample_name,quantity,camera_type_1,url,hashlib_md5,image_name
0,15589,53787,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,c7ffc09ba273d13dfd9f295aa2f66cb5,15589__53787__hydroxychloroquine__100.png
1,15590,53778,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,ac2f4e2656289c2c26c087bc3b918ed5,15590__53778__hydroxychloroquine__100.png
2,15591,53789,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,7f983c5681156fd10cd49c592ea3d25e,15591__53789__hydroxychloroquine__100.png
3,15592,53787,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,7c5e34b986e53ec0e05b0dfe086b4c2e,15592__53787__hydroxychloroquine__100.png
4,15595,53785,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,a765e98e08c00e506b511b39cdef0ac9,15595__53785__hydroxychloroquine__100.png


In [5]:
# Get clean dataset view (without internal columns)
print("\nSample pharmaceutical data:")
fhi_data[['id', 'sample_id', 'sample_name', 'quantity']].head()


Sample pharmaceutical data:


Unnamed: 0,id,sample_id,sample_name,quantity
0,15589,53787,hydroxychloroquine,100
1,15590,53778,hydroxychloroquine,100
2,15591,53789,hydroxychloroquine,100
3,15592,53787,hydroxychloroquine,100
4,15595,53785,hydroxychloroquine,100


In [6]:
# Get 8 random rows of the dataset
fhi_data[['id', 'sample_id', 'sample_name', 'quantity']].sample(8) 

Unnamed: 0,id,sample_id,sample_name,quantity
856,23871,52167,rifampicin,100
6416,24487,55293,pyrazinamide,20
3100,22848,52766,ethambutol,20
544,21659,51725,hydroxychloroquine,20
4066,19173,52077,tetracycline,20
6951,22926,52707,ethambutol,80
5424,20257,53253,doxycycline,100
2479,24636,55360,azithromycin,20


In [7]:
# Analyze drug distribution
drug_counts = fhi_data['sample_name'].value_counts()
print(f"\nPharmaceutical and distractor compounds in dataset: {len(drug_counts)}")
print("All Compounds by number of PAD images:")
print(drug_counts)


Pharmaceutical and distractor compounds in dataset: 23
All Compounds by number of PAD images:
sample_name
hydroxychloroquine            821
rifampicin                    476
ciprofloxacin                 455
pyrazinamide                  447
ferrous-sulfate               438
chloroquine                   438
azithromycin                  438
ripe                          427
ethambutol                    403
ceftriaxone                   387
amoxicillin                   386
tetracycline                  382
sulfamethoxazole              368
promethazine-hydrochloride    351
epinephrine                   334
isoniazid                     327
albendazole                   281
blank                         208
doxycycline                   205
ampicillin                    186
benzyl-penicillin              94
swiped-but-not-run             78
lactose                        71
Name: count, dtype: int64


In [8]:
# get the dataset cards from the training and test sets separately 
fhi_data_train = pad.get_dataset_cards(dataset_name, data_type="train")
fhi_data_test = pad.get_dataset_cards(dataset_name, data_type="test")

print(f"Training set: {len(fhi_data_train)}")
print(f"Test set: {len(fhi_data_test)}")

display(fhi_data_train.head())
display(fhi_data_test.head())


Training set: 5923
Test set: 2078


Unnamed: 0,id,sample_id,sample_name,quantity,camera_type_1,url,hashlib_md5,image_name
0,15589,53787,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,c7ffc09ba273d13dfd9f295aa2f66cb5,15589__53787__hydroxychloroquine__100.png
1,15590,53778,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,ac2f4e2656289c2c26c087bc3b918ed5,15590__53778__hydroxychloroquine__100.png
2,15591,53789,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,7f983c5681156fd10cd49c592ea3d25e,15591__53789__hydroxychloroquine__100.png
3,15592,53787,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,7c5e34b986e53ec0e05b0dfe086b4c2e,15592__53787__hydroxychloroquine__100.png
4,15595,53785,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,a765e98e08c00e506b511b39cdef0ac9,15595__53785__hydroxychloroquine__100.png


Unnamed: 0,id,sample_id,sample_name,quantity,camera_type_1,url,hashlib_md5,image_name
0,15588,53796,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,ea1e1c9756b054e80aa3942596d6b25d,15588__53796__hydroxychloroquine__100.png
1,15593,53796,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,b7385bdcef1f35a26a242916fcf2731b,15593__53796__hydroxychloroquine__100.png
2,15594,53786,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,264c81897373a775e8f04099fb59d632,15594__53786__hydroxychloroquine__100.png
3,15596,53784,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,b92b37f1a8fb06f65afe8f550a10ba61,15596__53784__hydroxychloroquine__100.png
4,15598,53784,hydroxychloroquine,100,Google Pixel 3a,https://pad.crc.nd.edu//var/www/html/images/pa...,56a5f35f04ede36de1180b010c4a827b,15598__53784__hydroxychloroquine__100.png


#### **Exercise 1**: Conduct an analysis of the drug distribution in the training set of the `FHI2020_Stratified_Sampling` dataset.

In [9]:
# Exercise 1: Your code here
# TODO:

### 1.3 Model-Dataset Relationships

In [10]:
# Understand which dataset was used to train specific models
model_16_dataset = pad.get_dataset_name_from_model_id(16)
model_18_dataset = pad.get_dataset_name_from_model_id(18)

print("Model-Dataset relationships for pharmaceutical analysis:")
print(f"Model 16 (Drug Classification): trained on '{model_16_dataset}' dataset")
print(f"Model 18 (Concentration PLS): trained on '{model_18_dataset}' dataset")

# Load the actual dataset used for model training)
model_16_data = pad.get_model_data(model_id=16, data_type="train")
print(f"\nModel 16 training dataset: {len(model_16_data)} samples")
print(f"Drugs in training set: {model_16_data['sample_name'].nunique()}")



Model-Dataset relationships for pharmaceutical analysis:
Model 16 (Drug Classification): trained on 'FHI2020_Stratified_Sampling' dataset
Model 18 (Concentration PLS): trained on 'FHI2020_Stratified_Sampling' dataset

Model 16 training dataset: 5923 samples
Drugs in training set: 23


In [11]:
# get complete mapping between models and datasets
pad.get_model_dataset_mapping()

Unnamed: 0,Model ID,Model Name,Endpoint URL,Dataset Name,Training Dataset,Test Dataset
0,16,24fhiNN1classifyAPI,https://pad.crc.nd.edu/api/v2/neural-networks/16,FHI2020_Stratified_Sampling,https://raw.githubusercontent.com/PaperAnalyti...,https://raw.githubusercontent.com/PaperAnalyti...
1,19,24 fhi NN1 API concentration,https://pad.crc.nd.edu/api/v2/neural-networks/19,FHI2020_Stratified_Sampling,https://raw.githubusercontent.com/PaperAnalyti...,https://raw.githubusercontent.com/PaperAnalyti...
2,17,24fhiNN1concAPI,https://pad.crc.nd.edu/api/v2/neural-networks/17,FHI2020_Stratified_Sampling,https://raw.githubusercontent.com/PaperAnalyti...,https://raw.githubusercontent.com/PaperAnalyti...
3,18,24fhiPLS1conc,https://pad.crc.nd.edu/api/v2/neural-networks/18,FHI2020_Stratified_Sampling,https://raw.githubusercontent.com/PaperAnalyti...,https://raw.githubusercontent.com/PaperAnalyti...
4,20,ChemoPAD NN training 2024,https://pad.crc.nd.edu/api/v2/neural-networks/20,Leiberman-Lab_ChemoPADNNtraining2024_Partial-D...,https://raw.githubusercontent.com/PaperAnalyti...,https://raw.githubusercontent.com/PaperAnalyti...


## 2. Project Management Functions

*Explore PAD cards organized by research projects*

**PAD Context**: Projects represent collections of physical PAD cards with specific layouts and chemical components, captured under various conditions for model robustness in pharmaceutical quality testing.

In [12]:
# Discover pharmaceutical research projects
projects = pad.get_projects()
print(f"Total pharmaceutical research projects: {len(projects)}")

# Show sample projects
print("\nPharmaceutical quality testing projects:")
projects[['id', 'project_name', 'test_name']].tail() # tail: Show the last 5 rows of the dataset

Total pharmaceutical research projects: 34

Pharmaceutical quality testing projects:


Unnamed: 0,id,project_name,test_name
29,31,12LaneAirGap2024,12LanePADKenya2015
30,32,ChemoPAD12laneairgapNNtraining2024,ChemoPAD
31,33,7LaneIDPAD2024,idPAD 2.4
32,34,2024TB,12LanePADKenya2015
33,35,Parkinson,12LanePADKenya2015


In [13]:
# Get detailed information about a specific project by project id

project_info = pad.get_project(32)

if not isinstance(project_info, type(None)):
    print(f"Project Details:")
    print(f"  Name: {project_info['project_name'].values[0]}")
    print(f"  Test Type: {project_info['test_name'].values[0]}")
    print(f"  Description: {project_info['annotation'].values[0]}")
    print(f"  Sample Names: {project_info['sample_names.sample_names'].values[0]}")
else:
    print(f"Project with ID {id} not found or is empty.")



✅ Found project: 32
Project Details:
  Name: ChemoPAD12laneairgapNNtraining2024
  Test Type: ChemoPAD
  Description: ChemoAir
  Sample Names: ['Cisplatin', 'cyclophosphamide injectable', 'cyclophosphamide oral', 'Doxorubicin', 'hydroxyurea', 'MESNA', 'methotrexate injectable', 'methotrexate oral', 'ifosfamide injectable', 'oxaliplatin', 'leucovorin (calcium leucovorinate)']


In [14]:
# Get all PAD cards from a project by project name
project_cards = pad.get_project_cards("ChemoPAD12laneairgapNNtraining2024")

project_cards

✅ Found project: ChemoPAD12laneairgapNNtraining2024
ℹ️  Project 'ChemoPAD12laneairgapNNtraining2024' exists but has no cards.
ℹ️  No cards found from any of the requested projects.


## 3. Card & Sample Management Functions

*Retrieve individual PAD images and physical card information*

**PAD Context**: 
- **Card ID**: References a specific image capture of a PAD
- **Sample ID**: References the physical PAD card (may have multiple image captures)
- Multiple card_ids can share the same sample_id due to different lighting/device conditions for robust model training

### 3.1 Retrieve Individual PAD Cards

In [15]:
# Get specific PAD card by card_id (specific image)
card = pad.get_card(card_id=47918)
# see dataframe
card


Unnamed: 0,id,sample_name,test_name,user_name,date_of_creation,raw_file_location,processed_file_location,processing_date,camera_type_1,notes,...,project.project_name,project.annotation,project.test_name,project.sample_names.sample_names,project.neutral_filler,project.qpc20,project.qpc50,project.qpc80,project.qpc100,project.notes
0,47918,Amoxicillin,12LanePADKenya2015,api-D5HDZG76N3ICA3GBUYWC,2024-05-21T14:31:49,/var/www/html/images/padimages/raw_local/40000...,/var/www/html/images/padimages/processed/40000...,2024-05-21T14:31:49,Google Pixel 3a,"Predicted drug = ciprofloxacin (0.667), 50, %,...",...,FHI360-App,,12LanePADKenya2015,"[Albendazole, Amoxicillin, Ampicillin, Ascorbi...",Lactose,1,1,1,1,Project for FHI360 App test phase.


In [16]:
print(f"PAD Card Analysis:")
print(f"  Card ID: {card['id'].values[0]} (specific image)")
print(f"  Sample ID: {card['sample_id'].values[0]} (physical card)")
print(f"  Drug: {card['sample_name'].values[0]}")
print(f"  Concentration: {card['quantity'].values[0]}%")
print(f"  Project: {card['project.project_name'].values[0]}")
print(f"  Camera: {card['camera_type_1'].values[0]}")


PAD Card Analysis:
  Card ID: 47918 (specific image)
  Sample ID: 65490 (physical card)
  Drug: Amoxicillin
  Concentration: 100%
  Project: FHI360-App
  Camera: Google Pixel 3a


### 3.2 Quality Control - Identify Problematic Cards

In [17]:
# Essential for pharmaceutical quality control - identify problematic images
issues = pad.get_card_issues()
print(f"Quality Control Alert: {len(issues)} PAD cards have known issues")
print("These should be excluded from pharmaceutical analysis to ensure accuracy")

issues

Quality Control Alert: 4 PAD cards have known issues
These should be excluded from pharmaceutical analysis to ensure accuracy


Unnamed: 0,id,name,description
0,1,Unspecified,
1,2,Leak,
2,3,Stuck,
3,4,Rectification,


In [18]:
# Print the statistic of the issues 
cards_proj = pad.get_project_cards("FHI2022")

cards_issues = cards_proj[cards_proj['issue.name'].isin(['Leak', 'Rectification', 'Stuck']) ]

## Select by issue types
leak = cards_issues[cards_issues['issue.name'] == 'Leak']
rectification = cards_issues[cards_issues['issue.name'] == 'Rectification']
stuck = cards_issues[cards_issues['issue.name'] == 'Stuck']

print(f"Leak: {len(leak)}")
print(f"Stuck: {len(stuck)}")
print(f"Rectification: {len(rectification)}")

✅ Found project: FHI2022
✅ Found 3872 cards from project 'FHI2022'
Leak: 6
Stuck: 718
Rectification: 13


## 4. Model Management Functions

*Access ML models trained on PAD datasets*

**PAD Context**: Models are trained on specific datasets for pharmaceutical quality testing tasks:
- **Drug classification**: Identifying the pharmaceutical compound
- **Concentration quantification**: Measuring drug potency levels

In [19]:
# Discover available models for pharmaceutical analysis
models = pad.get_models()
print(f"Available ML models for pharmaceutical testing: {len(models)}")

models.head() 

Available ML models for pharmaceutical testing: 15


Unnamed: 0,id,name,drugs_size,drugs,labels,lanes_excluded,weights,weights_url,image_mean,image_size,architecture,brightness,type,description,test,training_set,version,SHA256
0,6,msh_tanzania_3k_10_lite,10,"[Paracetamol Starch, Penicillin Procaine, Star...","[Paracetamol Starch, Penicillin Procaine, Star...",,/var/www/html/neuralnetworks/tf_lite/msh_tanza...,https://pad.crc.nd.edu/neuralnetworks/tf_lite/...,,227227.0,,165.5,tf_lite,MSH Tanzania image classifier: Identify the PA...,12LanePADKenya2015,MSH Tanzania,1.0,c686850b957fe7e0bb5338a76c22e795659432e588b99b...
1,7,fhi360_small_lite,21,"[Albendazole, Amoxicillin, Ampicillin, Azithro...","[Albendazole, Amoxicillin, Ampicillin, Azithro...",,/var/www/html/neuralnetworks/tf_lite/fhi360_sm...,https://pad.crc.nd.edu/neuralnetworks/tf_lite/...,,227227.0,,0.0,tf_lite,FHI360 image classifier: Identify the PAD imag...,12LanePADKenya2015,,1.0,ff15f7a5224a65693d6e80e17a915e9d8a030b2cfc59fa...
2,8,fhi360_conc_large_lite,4,"[Albendazole, Amoxicillin, Ampicillin, Azithro...","[100, 80, 50, 20]",,/var/www/html/neuralnetworks/tf_lite/fhi360_co...,https://pad.crc.nd.edu/neuralnetworks/tf_lite/...,,227227.0,,0.0,tf_lite,FHI360 image classifier for concentration: Ide...,12LanePADKenya2015,,1.0,bac3a744ce8c29baf96a3d92e5e4cb21593c7a2e17bd2a...
3,9,idPAD_small_lite,6,"[Cocaine HCL, Crack Cocaine, Diphenhydramine, ...","[Cocaine HCL, Crack Cocaine, Diphenhydramine, ...",,/var/www/html/neuralnetworks/tf_lite/idPAD_sma...,https://pad.crc.nd.edu/neuralnetworks/tf_lite/...,,227227.0,,0.0,tf_lite,idPAD image classifier: Identify the PAD image...,idPAD 2.4,,1.0,5fbaf27079b901eed9b268ea496d19483dc3ad78f31cef...
4,10,pls_fhi360_conc,21,"[Albendazole, Amoxicillin, Ampicillin, Azithro...",[100],,/var/www/html/neuralnetworks/pls/fhi360_concen...,https://pad.crc.nd.edu/neuralnetworks/pls/fhi3...,,,,0.0,pls,PLSD method using 10 sections per lane.,12LanePADKenya2015,,1.0,6d6f72b0ddfd8aa58672b3b1fd60a0c3fa8ef628a3bbe1...


*Model Selection Guide*

**Choose the right model for your pharmaceutical analysis:**

- **Model 16**: FHI360 Neural Network classifier (returns drug name, confidence, energy)
    - Use for: FHI360 drug authentication, 23-drug classifier for antibiotics and antimalarials, trained on 12-lane PAD cards with concentrations 20%, 50%, 80%, 100%
    - Output: `('albendazole', 0.95, 12.3)`

- **Model 18**: Partial least squares (PLS) regression on the concentration (returns predicted concentration)
    - Use for: FHI360 potency analysis, quantitative concentration prediction for authenticated drugs, calibrated for 20-100% concentration range
    - Output: `75.2` (representing 75.2% concentration)

- **Model 20**: ChemoPAD Neural Network classifier (returns drug name, confidence, energy)
    - Use for: ChemoPAD-specific drug identification, specialized for concentrations 100%, 66%, 33%, 0%
    - Output: `('hydroxyurea', 0.9996, 9.586)`

**Selection criteria:**

- **Identification objective** → Use classification models (16, 20)
- **Quantification objective** → Use concentration models (18)
- **Regulatory compliance** → Document model validation data with `get_model_data()`


In [67]:
# Show model details
key_models = models[models['id'].isin([16,18, 20])]

key_models[['id', 'name', 'type', 'description']]

Unnamed: 0,id,name,type,description
10,16,24fhiNN1classifyAPI,tf_lite,FHI360 image classifier: Fhi360 mid data. Iden...
12,18,24fhiPLS1conc,pls,PLSD method using 10 sections per lane.
14,20,ChemoPAD NN training 2024,tf_lite,ChemoPAD based project to train Neural Network...


In [68]:
# Get training and test data for model 16
model_train = pad.get_model_data(16, data_type='train')
model_test = pad.get_model_data(16, data_type='test')

dataset_name = pad.get_dataset_name_from_model_id(16)
print(f"Model 16 dataset: {dataset_name}")

if isinstance(model_train, pd.DataFrame):
    print(f"  Training samples: {len(model_train)}")
    print(f"  Drugs in training: {model_train['sample_name'].nunique()}")
    
if isinstance(model_test, pd.DataFrame):
    print(f"  Test samples: {len(model_test)}")
    print(f"  Drugs in testing: {model_test['sample_name'].nunique()}")

Model 16 dataset: FHI2020_Stratified_Sampling
  Training samples: 5923
  Drugs in training: 23
  Test samples: 2078
  Drugs in testing: 23


## 5. Visualization Functions

*Interactive displays of PAD cards and analysis results*

**PAD Context**: View PAD card images alongside metadata, predictions, and analysis results for pharmaceutical quality validation and research insights.

### 5.1 Single PAD Card Visualization

In [23]:
# Display a PAD card for pharmaceutical analysis
print("PAD Card Visualization for Quality Analysis:")

# show image and data of a card by card_id
pad.show_card(card_id=47918)

PAD Card Visualization for Quality Analysis:


VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_479…

In [24]:
# show image and data of a specific card by sample_id
pad.show_card(sample_id=65490)

VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_479…

### 5.2 Multiple PAD Cards Comparison

In [25]:
# Compare multiple PAD cards for pharmaceutical validation
print("Comparative Analysis of Multiple PAD Cards:")
pad.show_cards(card_ids=[47918, 47919, 47920])

Comparative Analysis of Multiple PAD Cards:


VBox(children=(HBox(children=(VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n …

### 5.3 Display a tabbed interface where each tab represents a different group of cards

Cards are displayed as image grids within each tab

In [26]:
# get cards from a project
cards = pad.get_project_cards("FHI2020")

print("Possible columns to group by: ", cards_proj.columns)

# Grouping by sample_name:
print("\nGrouping by sample_name:")
pad.show_grouped_cards(cards_proj, 'sample_name')


✅ Found project: FHI2020
✅ Found 9706 cards from project 'FHI2020'
Possible columns to group by:  Index(['id', 'sample_name', 'test_name', 'user_name', 'date_of_creation',
       'raw_file_location', 'processed_file_location', 'processing_date',
       'camera_type_1', 'notes', 'sample_id', 'quantity', 'deleted',
       'project.id', 'project.user_name', 'project.project_name',
       'project.annotation', 'project.test_name',
       'project.sample_names.sample_names', 'project.neutral_filler',
       'project.qpc20', 'project.qpc50', 'project.qpc80', 'project.qpc100',
       'project.notes', 'issue.id', 'issue.name', 'issue.description',
       'issue'],
      dtype='object')

Grouping by sample_name:


Output(layout=Layout(height='1000px'))

In [27]:
 # Grouping by issue:
print("Grouping by issue:")
pad.show_grouped_cards(cards_issues, 'issue.name')

Grouping by issue:


Output(layout=Layout(height='1000px'))

### 5.4 Drug-Specific Analysis and Concentration-Based Grouping

In [30]:
# Analyze specific pharmaceutical compounds
amoxicillin_cards = cards[cards['sample_name'].str.contains('Amoxicillin', case=False, na=False)]

# Concentration-Based Grouping
print("Amoxicillin Concentration Analysis:")
pad.show_grouped_cards(amoxicillin_cards, group_column='quantity')

Amoxicillin Concentration Analysis:


Output(layout=Layout(height='1000px'))

## 6. Prediction & Analysis Functions

*Apply trained models to PAD images for drug identification and quantification*

**PAD Context**: 
- **Classification models**: Identify pharmaceutical compounds for authenticity verification
- **Quantification models**: Measure drug concentration levels for concentration assessment

### 6.1 Drug Classification Analysis

In [38]:
# Pharmaceutical compound identification
#actual, prediction = pad.predict(card_id=47918, model_id=16)
actual, prediction = pad.predict(card_id=17008, model_id=16)

drug_name, confidence, energy = prediction

print("Pharmaceutical Compound Identification:")
print(f"  Actual Drug: {actual}")
print(f"  Predicted Drug: {drug_name}")
print(f"  Model Confidence: {confidence:.2%}")
print(f"  Analysis Energy: {energy:.2f}")

# Quality assessment
if drug_name.lower() == actual.lower():
    print(f"\n✅ PASS: Correct drug identification")
else:
    print(f"\n❌ FAIL")

Pharmaceutical Compound Identification:
  Actual Drug: albendazole
  Predicted Drug: albendazole
  Model Confidence: 99.95%
  Analysis Energy: 15.86

✅ PASS: Correct drug identification


In [39]:
actual, prediction = pad.predict(card_id=16891, model_id=16)

drug_name, confidence, energy = prediction

print("Pharmaceutical Compound Identification:")
print(f"  Actual Drug: {actual}")
print(f"  Predicted Drug: {drug_name}")
print(f"  Model Confidence: {confidence:.2%}")
print(f"  Analysis Energy: {energy:.2f}")

# Quality assessment
if drug_name.lower() == actual.lower():
    print(f"\n✅ PASS: Correct drug identification")
else:
    print(f"\n❌ FAIL")

Pharmaceutical Compound Identification:
  Actual Drug: albendazole
  Predicted Drug: albendazole
  Model Confidence: 79.60%
  Analysis Energy: 11.30

✅ PASS: Correct drug identification


### 6.1.1 Integrated Prediction Visualization


In [42]:
pad.show_prediction(card_id=16891, model_id=16)
pad.show_prediction(card_id=17008, model_id=16)

VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_168…

VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_170…

### 6.2 Concentration Quantification

In [43]:
# Drug concentration/potency analysis
actual, predicted = pad.predict(card_id=16891, model_id=18)

print("Pharmaceutical Potency Analysis:")
print(f"  Expected Concentration: {actual:.1f}%")
print(f"  Measured Concentration: {predicted:.1f}%")
print(f"  Absolute Error: {abs(actual - predicted):.1f}%")
print(f"  Relative Error: {abs(actual - predicted)/actual*100:.1f}%")

# Quality assessment based on pharmaceutical standards
error_threshold = 10  # 10% acceptable error for demonstration
if abs(actual - predicted) <= error_threshold:
    print(f"\n✅ PASS: Concentration within acceptable range (±{error_threshold}%)")
else:
    print(f"\n❌ FAIL: Concentration outside acceptable range (±{error_threshold}%)")

Pharmaceutical Potency Analysis:
  Expected Concentration: 20.0%
  Measured Concentration: 41.7%
  Absolute Error: 21.7%
  Relative Error: 108.6%

❌ FAIL: Concentration outside acceptable range (±10%)


In [44]:
# Drug concentration/potency analysis
actual, predicted = pad.predict(card_id=17008, model_id=18)

print("Pharmaceutical Potency Analysis:")
print(f"  Expected Concentration: {actual:.1f}%")
print(f"  Measured Concentration: {predicted:.1f}%")
print(f"  Absolute Error: {abs(actual - predicted):.1f}%")
print(f"  Relative Error: {abs(actual - predicted)/actual*100:.1f}%")

# Quality assessment based on pharmaceutical standards
error_threshold = 10  # 10% acceptable error for demonstration
if abs(actual - predicted) <= error_threshold:
    print(f"\n✅ PASS: Concentration within acceptable range (±{error_threshold}%)")
else:
    print(f"\n❌ FAIL: Concentration outside acceptable range (±{error_threshold}%)")

Pharmaceutical Potency Analysis:
  Expected Concentration: 100.0%
  Measured Concentration: 103.8%
  Absolute Error: 3.8%
  Relative Error: 3.8%

✅ PASS: Concentration within acceptable range (±10%)


### 6.4 Batch Processing for Large-Scale Analysis

- `apply_predictions_to_dataframe()`

In [52]:
# This example demonstrates how to efficiently evaluate model performance
# on a large test dataset and analyze misclassifications

# Get the official test dataset for model 16 (Neural Network classifier)
# This ensures we're testing on data the model hasn't seen during training
test_data = pad.get_model_data(16, data_type='test')
print(f"Test dataset size: {len(test_data)} cards")

# Sample 200 cards for this analysis (using fixed seed for reproducibility)
# In production, you might process the entire test set
sample_size = 200
test_sample = test_data.sample(sample_size, random_state=2)
print(f"Analyzing {sample_size} randomly selected test cards...")

# Apply batch predictions with optimized batch size for performance
# batch_size=64 balances memory usage with processing speed
results = pad.apply_predictions_to_dataframe(test_sample, model_id=16, batch_size=64)

# Separate correctly classified vs misclassified samples
# For neural network models, 'label' is actual drug, 'prediction' is predicted drug
passed_results = results[results['label'] == results['prediction']]
failed_results = results[results['label'] != results['prediction']]

# Calculate and display performance metrics
accuracy = len(passed_results) / len(results) * 100
print(f"\n=== Model Performance Summary ===")
print(f"Accuracy: {accuracy:.2f}% ({len(passed_results)}/{len(results)} correct)")
print(f"Errors: {len(failed_results)} misclassifications")

# Analyze error patterns - which drugs are most often misclassified?
if len(failed_results) > 0:
    print(f"\n=== Error Analysis ===")
    # Show which drugs were confused with which
    error_summary = failed_results.groupby(['label', 'prediction']).size().reset_index(name='count')
    error_summary = error_summary.sort_values('count', ascending=False).head(10)
    print("Top 10 confusion pairs:")
    display(error_summary)

    # Show confidence distribution for failed predictions
    avg_confidence_failed = failed_results['confidence'].mean()
    avg_confidence_passed = passed_results['confidence'].mean()
    print(f"\nAverage confidence - Correct: {avg_confidence_passed:.2%}, Incorrect: {avg_confidence_failed:.2%}")

# Display a few example misclassifications for manual inspection
# Limiting output to keep the notebook readable
num_examples = min(3, len(failed_results))
if num_examples > 0:
    print(f"\n=== Showing {num_examples} Example Misclassifications ===")
    print("These warrant manual review to understand failure modes:\n")

    # Get the most confident failures first (interesting edge cases)
    failed_sorted = failed_results.sort_values('confidence', ascending=False)
    card_ids = failed_sorted['id'].tolist()

    for i, card_id in enumerate(card_ids[:num_examples], 1):
        card_info = failed_sorted[failed_sorted['id'] == card_id].iloc[0]
        print(f"Example {i}: Card {card_id}")
        print(f"  Actual: {card_info['label']}, Predicted: {card_info['prediction']} (confidence: {card_info['confidence']:.2%})")
        pad.show_prediction(card_id=card_id, model_id=16)
        print("-" * 50)

# print("💾 Saving misclassifications for further analysis...")
# result_path = f'{pad_analysis_folder}/model_16_misclassifications.csv'
# failed_results.to_csv(result_path, index=False)

# print(f"✅ Saved to: {result_path}")



Test dataset size: 2078 cards
Analyzing 200 randomly selected test cards...
Starting optimized batch prediction for 200 cards with model 16
Model type: tf_lite
Using optimized batch processing for Neural Network (batch_size=64)
Processing batch 1/4 (64 images)
Processing batch 2/4 (64 images)
Processing batch 3/4 (64 images)
Processing batch 4/4 (8 images)

=== Model Performance Summary ===
Accuracy: 98.50% (197/200 correct)
Errors: 3 misclassifications

=== Error Analysis ===
Top 10 confusion pairs:


Unnamed: 0,label,prediction,count
1,promethazine-hydrochloride,azithromycin,2
0,albendazole,azithromycin,1



Average confidence - Correct: 99.36%, Incorrect: 92.95%

=== Showing 3 Example Misclassifications ===
These warrant manual review to understand failure modes:

Example 1: Card 24724
  Actual: promethazine-hydrochloride, Predicted: azithromycin (confidence: 98.22%)


VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_247…

--------------------------------------------------
Example 2: Card 24874
  Actual: promethazine-hydrochloride, Predicted: azithromycin (confidence: 97.95%)


VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_248…

--------------------------------------------------
Example 3: Card 16988
  Actual: albendazole, Predicted: azithromycin (confidence: 82.68%)


VBox(children=(HTML(value='<br>'), HBox(children=(VBox(children=(HTML(value='\n    <div id="imageContainer_169…

--------------------------------------------------


---

# PAD Analytics Workflows



## Workflow 1: Drug Quality Testing Across Concentrations

In [62]:
# Comprehensive quality analysis for a specific drug
print("Workflow 1: Comprehensive Drug Quality Assessment")

# 1. Get all samples for a specific drug
drug_of_interest = 'amoxicillin'
drug_cards = cards[cards['sample_name'].str.contains(drug_of_interest, case=False, na=False)]
print(f"\nStep 1: Found {len(drug_cards)} {drug_of_interest} PAD cards")

if len(drug_cards) > 0:
    # 2. Analyze concentration distribution
    conc_distribution = drug_cards['quantity'].value_counts().sort_index()
    print(f"\nStep 2: Concentration distribution:")
    for conc, count in conc_distribution.items():
        print(f"  {conc}%: {count} cards")
    
    # 3. Quality control - exclude problematic cards
    issue_ids = pad.get_card_issues()['id'].tolist() if len(issue_cards) > 0 else []
    clean_drug_cards = drug_cards[~drug_cards['id'].isin(issue_ids)]
    print(f"\nStep 3: Quality control - {len(clean_drug_cards)} cards after filtering")
    
    # 4. Sample analysis (limit for demonstration)
    sample_cards = clean_drug_cards.head(3)
    if len(sample_cards) > 0:
        print(f"\nStep 4: Analysis results for {len(sample_cards)} samples:")
        results = pad.apply_predictions_to_dataframe(sample_cards, model_id=16)
        
        # Quality assessment
        for _, row in results.iterrows():
            pred_drug = str(row['prediction']).split('(')[0].strip()
            status = 'PASS' if pred_drug.lower() == row['label'].lower() else 'REVIEW'
            print(f"  Card {row['id']}: {row['label']} → {pred_drug} [{status}]")
else:
    print(f"No {drug_of_interest} samples found in dataset")

Workflow 1: Comprehensive Drug Quality Assessment

Step 1: Found 414 amoxicillin PAD cards

Step 2: Concentration distribution:
  20%: 119 cards
  50%: 92 cards
  80%: 80 cards
  100%: 123 cards

Step 3: Quality control - 414 cards after filtering

Step 4: Analysis results for 3 samples:
Starting optimized batch prediction for 3 cards with model 16
Model type: tf_lite
Using optimized batch processing for Neural Network (batch_size=32)
Processing batch 1/1 (3 images)
  Card 15214: amoxicillin → amoxicillin [PASS]
  Card 15215: amoxicillin → amoxicillin [PASS]
  Card 15216: amoxicillin → amoxicillin [PASS]


## Workflow 2: Model Validation Across Imaging Conditions

In [64]:
# Validate model robustness across different imaging conditions
print("Workflow 2: Model Validation Across Imaging Conditions")

# Find a sample_id with multiple card images (if available)
sample_counts = cards['sample_id'].value_counts()
multi_image_samples = sample_counts[sample_counts > 1]

if len(multi_image_samples) > 0:
    sample_id = multi_image_samples.index[0]
    sample_cards = cards[cards['sample_id'] == sample_id]
    
    print(f"\nAnalyzing Sample ID {sample_id}:")
    print(f"  Physical PAD card: {sample_cards['sample_name'].iloc[0]}")
    print(f"  Multiple images: {len(sample_cards)} different captures")
    
    # Analyze consistency across imaging conditions
    predictions = []
    for _, card in sample_cards.iterrows():
        try:
            actual, pred = pad.predict(card['id'], model_id=16)
            drug_name, confidence, energy = pred
            predictions.append({
                'card_id': card['id'],
                'camera': card.get('camera_type_1', 'Unknown'),
                'predicted_drug': drug_name,
                'confidence': confidence
            })
        except:
            continue
    
    if predictions:
        print(f"\nConsistency Analysis:")
        for pred in predictions:
            print(f"  Card {pred['card_id']}: {pred['predicted_drug']} ({pred['confidence']:.2%}) [{pred['camera']}]")
        
        # Check consistency
        unique_predictions = set([p['predicted_drug'] for p in predictions])
        if len(unique_predictions) == 1:
            print(f"\n✅ CONSISTENT: All images predict same drug")
        else:
            print(f"\n⚠️  INCONSISTENT: {len(unique_predictions)} different predictions")
else:
    print("\nNo samples with multiple images found in this dataset subset")
    print("This analysis requires samples with multiple imaging conditions")

Workflow 2: Model Validation Across Imaging Conditions

Analyzing Sample ID 45056:
  Physical PAD card: albendazole
  Multiple images: 30 different captures


## Workflow 3: Project-Based Comparative Analysis

In [65]:
# Compare performance across different research projects
print("Workflow 3: Project-Based Pharmaceutical Analysis")

# Analyze drug distribution across projects (using our dataset)
project_analysis = cards.groupby('project.project_name')['sample_name'].nunique().sort_values(ascending=False)

print(f"\nDrug Diversity Across Projects:")
for project, drug_count in project_analysis.head().items():
    print(f"  {project}: {drug_count} different drugs")

# Select project with most diverse drug portfolio
top_project = project_analysis.index[0]
project_cards = cards[cards['project.project_name'] == top_project]

print(f"\nFocus Analysis: {top_project}")
print(f"  Total PAD cards: {len(project_cards)}")
print(f"  Unique drugs: {project_cards['sample_name'].nunique()}")
print(f"  Concentration range: {project_cards['quantity'].min()}% - {project_cards['quantity'].max()}%")

# Show drug distribution in this project
drug_dist = project_cards['sample_name'].value_counts().head()
print(f"\nTop drugs in {top_project}:")
for drug, count in drug_dist.items():
    print(f"  {drug}: {count} cards")

Workflow 3: Project-Based Pharmaceutical Analysis

Drug Diversity Across Projects:
  FHI2020: 33 different drugs

Focus Analysis: FHI2020
  Total PAD cards: 9706
  Unique drugs: 33
  Concentration range: 20% - 100%

Top drugs in FHI2020:
  tetracycline: 538 cards
  isoniazid: 536 cards
  sulfamethoxazole: 525 cards
  rifampicin: 500 cards
  ceftriaxone: 486 cards


## Workflow 4: Quality Control Protocol

In [66]:
# Comprehensive quality control protocol for pharmaceutical testing
print("Workflow 4: Pharmaceutical Quality Control Protocol")

# Step 1: Identify and exclude problematic cards
initial_count = len(cards)
qc_filtered = cards[~cards['id'].isin(issue_ids)]
excluded_count = initial_count - len(qc_filtered)

print(f"\nStep 1: Quality Control Filtering")
print(f"  Initial dataset: {initial_count} cards")
print(f"  Excluded problematic: {excluded_count} cards")
print(f"  QC-approved dataset: {len(qc_filtered)} cards")
print(f"  Data quality: {len(qc_filtered)/initial_count*100:.1f}%")

# Step 2: Validate data completeness
required_fields = ['sample_name', 'quantity', 'processed_file_location']
complete_records = qc_filtered.dropna(subset=required_fields)

print(f"\nStep 2: Data Completeness Validation")
print(f"  Complete records: {len(complete_records)} cards")
print(f"  Completeness rate: {len(complete_records)/len(qc_filtered)*100:.1f}%")

# Step 3: Drug authenticity check (sample)
sample_for_validation = complete_records.head(3)
if len(sample_for_validation) > 0:
    print(f"\nStep 3: Drug Authenticity Validation (Sample: {len(sample_for_validation)} cards)")
    validation_results = pad.apply_predictions_to_dataframe(sample_for_validation, model_id=16)
    
    pass_count = 0
    for _, row in validation_results.iterrows():
        pred_drug = str(row['prediction']).split('(')[0].strip()
        is_authentic = pred_drug.lower() == row['actual'].lower()
        status = 'AUTHENTIC' if is_authentic else 'SUSPICIOUS'
        if is_authentic:
            pass_count += 1
        print(f"  Card {row['id']}: {status} ({pred_drug} vs {row['actual']})")
    
    print(f"\nQuality Control Summary:")
    print(f"  Authenticity rate: {pass_count/len(validation_results)*100:.1f}%")
    print(f"  Recommended action: {'APPROVE BATCH' if pass_count == len(validation_results) else 'INVESTIGATE SUSPICIOUS SAMPLES'}")

Workflow 4: Pharmaceutical Quality Control Protocol

Step 1: Quality Control Filtering
  Initial dataset: 9706 cards
  Excluded problematic: 0 cards
  QC-approved dataset: 9706 cards
  Data quality: 100.0%

Step 2: Data Completeness Validation
  Complete records: 9706 cards
  Completeness rate: 100.0%

Step 3: Drug Authenticity Validation (Sample: 3 cards)
Starting optimized batch prediction for 3 cards with model 16
Model type: tf_lite
Using optimized batch processing for Neural Network (batch_size=32)
Processing batch 1/1 (3 images)
Error in NN prediction for https://pad.crc.nd.edu//var/www/html/joomla/images/padimages/email/processed/50332_25586.processed.png: 404 Client Error: NOT FOUND for url: https://pad.crc.nd.edu//var/www/html/joomla/images/padimages/email/processed/50332_25586.processed.png
Error in NN prediction for https://pad.crc.nd.edu//var/www/html/joomla/images/padimages/email/processed/50332_17318.processed.png: 404 Client Error: NOT FOUND for url: https://pad.crc.nd.e

ZeroDivisionError: division by zero

---

# Best Practices for PAD Research

## Understanding ID Relationships

**Critical for pharmaceutical analysis:**

- **Use `card_id`** for analyzing specific PAD image captures
- **Use `sample_id`** to find all images of the same physical PAD card
- **Group by `sample_id`** to analyze consistency across imaging conditions
- **Account for variations** in lighting and camera devices when interpreting results

```python
# Example: Analyze consistency for one physical card
physical_card_images = pad.get_card(sample_id=12345)
# This returns all digital images of the same physical PAD
```

## Performance Optimization

**For large-scale pharmaceutical studies:**

- **Use batch processing**: `apply_predictions_to_dataframe()` 
- **Consider caching**: Enable offline analysis for field research (upcoming feature)
- **Optimize batch size**: Balance memory usage with processing speed
- **GPU acceleration**: TensorFlow 2.14.0 supports GPU for faster neural network inference

```python
# Optimized batch processing
results = pad.apply_predictions_to_dataframe(large_dataset, model_id=16, batch_size=32)
```

## Model Selection Guide

**Choose the right model for your pharmaceutical analysis:**

- **Model 16**: FHI360 Neural Network classifier (returns drug name, confidence, energy)
    - Use for: FHI360 drug authentication, 23-drug classifier for antibiotics and antimalarials, trained on 12-lane PAD cards with concentrations 20%, 50%, 80%, 100%
    - Output: `('albendazole', 0.95, 12.3)`

- **Model 18**: Partial least squares (PLS) regression on the concentration (returns predicted concentration)
    - Use for: FHI360 potency analysis, quantitative concentration prediction for authenticated drugs, calibrated for 20-100% concentration range
    - Output: `75.2` (representing 75.2% concentration)

- **Model 20**: ChemoPAD Neural Network classifier (returns drug name, confidence, energy)
    - Use for: ChemoPAD-specific drug identification, specialized for concentrations 100%, 66%, 33%, 0%
    - Output: `('hydroxyurea', 0.9996, 9.586)`

**Selection criteria:**

- **Identification objective** → Use classification models (16, 20)
- **Quantification objective** → Use concentration models (18)
- **Regulatory compliance** → Document model validation data with `get_model_data()`


---

# Resources

- **GitHub**: https://github.com/PaperAnalyticalDeviceND/pad-analytics
- **PyPI**: https://pypi.org/project/pad-analytics/
- **PAD Project**: https://padproject.nd.edu
- **Documentation**: https://pad.crc.nd.edu/docs
- **API Reference**: https://pad.crc.nd.edu/api/v2

---

*This notebook demonstrates the complete PAD Analytics workflow for pharmaceutical quality testing. For specific research questions or implementation support, consult the PAD project documentation and research team.*