# Complete PAD Analytics Function Demonstration

This notebook demonstrates all available functions in the PAD Analytics package (v0.2.1), organized by category.

## Table of Contents
1. [Setup and Installation](#setup)
2. [Dataset Management](#dataset-management)
3. [Card/Sample Management](#card-management)
4. [Project Management](#project-management)
5. [Model Management](#model-management)
6. [Visualization Functions](#visualization)
7. [Prediction & Analysis](#prediction)
8. [Caching & Performance](#caching)
9. [Utility Functions](#utility)

## 1. Setup and Installation <a name="setup"></a>

In [None]:
# Import the package
import pad_analytics as pad
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Check version
print(f"PAD Analytics version: {pad.__version__}")

## 2. Dataset Management <a name="dataset-management"></a>

Dataset functions help you discover, load, and manage PAD datasets used for training and testing ML models.

### 2.1 get_datasets() - List all available datasets

In [None]:
# Get list of all available datasets
datasets = pad.get_datasets()
print(f"Total datasets available: {len(datasets)}")
print("\nFirst 5 datasets:")
datasets.head()

### 2.2 get_dataset_list() - Get detailed dataset information

In [None]:
# Get comprehensive dataset information including model associations
dataset_list = pad.get_dataset_list()
print("Dataset information with model mappings:")
dataset_list.head()

### 2.3 get_dataset() - Load a complete dataset

In [None]:
# Load a specific dataset (combines training and test data)
dataset_name = "FHI2020_Stratified_Sampling"
fhi_data = pad.get_dataset(dataset_name)

print(f"Dataset '{dataset_name}' loaded:")
print(f"Total samples: {len(fhi_data)}")
print(f"Columns: {list(fhi_data.columns)[:10]}...")  # Show first 10 columns
print(f"\nTraining samples: {len(fhi_data[fhi_data['is_train']])}")
print(f"Test samples: {len(fhi_data[~fhi_data['is_train']])}")

### 2.4 get_dataset_cards() - Get clean dataset view

In [None]:
# Get dataset cards without internal columns like 'is_train'
cards = pad.get_dataset_cards("FHI2020_Stratified_Sampling")
print(f"Dataset cards (clean view):")
print(f"Total cards: {len(cards)}")
print(f"\nNote: 'is_train' column removed for cleaner view")
cards[['card_id', 'sample_id', 'sample_name', 'quantity']].head()

### 2.5 get_dataset_info() - Get dataset metadata

In [None]:
# Get detailed information about a dataset
dataset_info = pad.get_dataset_info("FHI2020_Stratified_Sampling")
print("Dataset information:")
for key, value in dataset_info.items():
    print(f"{key}: {value}")

### 2.6 Model-Dataset Mapping Functions

In [None]:
# Get mapping between models and datasets
model_mapping = pad.get_model_dataset_mapping()
print("Model to Dataset Mapping:")
model_mapping.head()

In [None]:
# Get dataset name from model ID
model_id = 16
dataset_name = pad.get_dataset_name_from_model_id(model_id)
print(f"Model {model_id} was trained on dataset: '{dataset_name}'")

In [None]:
# Load dataset associated with a specific model
model_dataset = pad.get_dataset_from_model_id(16)
print(f"Dataset for model 16:")
print(f"Total samples: {len(model_dataset)}")
print(f"Unique drugs: {model_dataset['sample_name'].nunique()}")

## 3. Card/Sample Management <a name="card-management"></a>

Card functions help you retrieve and manage individual PAD test cards (samples).

### 3.1 get_card() - Flexible card retrieval

In [None]:
# Get card by card_id
card = pad.get_card(card_id=47918)
print("Card retrieved by card_id:")
print(f"Sample: {card['sample_name'].iloc[0]}")
print(f"Quantity: {card['quantity'].iloc[0]}%")
print(f"Project: {card['project.project_name'].iloc[0]}")

In [None]:
# Get card(s) by sample_id - Note: multiple cards can share same sample_id
cards_by_sample = pad.get_card(sample_id=52677)
print(f"Cards retrieved by sample_id: {len(cards_by_sample)} card(s)")
if len(cards_by_sample) > 0:
    print(f"Card IDs: {list(cards_by_sample['card_id'])}")

### 3.2 get_card_by_id() and get_card_by_sample_id()

In [None]:
# Direct card retrieval by ID
card = pad.get_card_by_id(47918)
print(f"Card {card['card_id'].iloc[0]}: {card['sample_name'].iloc[0]} at {card['quantity'].iloc[0]}%")

In [None]:
# Get all cards for a sample
sample_cards = pad.get_card_by_sample_id(52677)
print(f"Found {len(sample_cards)} cards for sample_id 52677")
sample_cards[['card_id', 'sample_id', 'sample_name', 'quantity']].head()

### 3.3 get_card_issues() - Quality control

In [None]:
# Get cards with known quality issues
problematic_cards = pad.get_card_issues()
print(f"Cards with known issues: {len(problematic_cards)}")
print("\nThese cards should typically be excluded from analysis")
if len(problematic_cards) > 0:
    print(f"Example problematic card IDs: {list(problematic_cards['card_id'].head())}")

## 4. Project Management <a name="project-management"></a>

Project functions help you explore and retrieve data organized by research projects.

### 4.1 get_projects() - List all projects

In [None]:
# Get all available projects
projects = pad.get_projects()
print(f"Total projects available: {len(projects)}")
print("\nSample projects:")
projects[['project_id', 'project_name', 'description']].head()

### 4.2 get_project() - Flexible project retrieval

In [None]:
# Get project by ID
project = pad.get_project(id=1)
if project is not None:
    print(f"Project by ID: {project.get('project_name', 'N/A')}")

# Get project by name
project = pad.get_project(name="FHI Study 2020")
if project is not None:
    print(f"Project by name: ID={project.get('project_id', 'N/A')}")

### 4.3 get_project_cards() - Get all cards from project(s)

In [None]:
# Get cards from a specific project by name
project_cards = pad.get_project_cards(project_name="FHI Study 2020")
print(f"Cards in 'FHI Study 2020': {len(project_cards) if project_cards is not None else 0}")

# Get cards from multiple projects by IDs
multi_project_cards = pad.get_project_cards(project_ids=[1, 2, 3])
print(f"\nCards from projects 1, 2, 3: {len(multi_project_cards) if multi_project_cards is not None else 0}")

## 5. Model Management <a name="model-management"></a>

Model functions help you discover and retrieve information about available ML models.

### 5.1 get_models() - List all available models

In [None]:
# Get all available models
models = pad.get_models()
print(f"Total models available: {len(models)}")
print("\nKey models:")
print("- Model 16: Neural Network for drug classification")
print("- Model 17: Neural Network for concentration")
print("- Model 18: PLS for concentration")
print("- Model 19: Neural Network for concentration v2")
models[['model_id', 'model_name', 'model_type']].head()

### 5.2 get_model_by_id() - Get specific model information

In [None]:
# Get information about a specific model
model_16 = pad.get_model_by_id(16)
print("Model 16 (Drug Classification):")
for key, value in model_16.items():
    print(f"  {key}: {value}")

### 5.3 get_model_data() - Get model with associated data

In [None]:
# Get model with different data types
# type='all' returns model info + training/test data
model_data = pad.get_model_data(16, type='train')
print(f"Model 16 training data: {len(model_data) if isinstance(model_data, pd.DataFrame) else 'N/A'} samples")

model_data = pad.get_model_data(16, type='test')
print(f"Model 16 test data: {len(model_data) if isinstance(model_data, pd.DataFrame) else 'N/A'} samples")

## 6. Visualization Functions <a name="visualization"></a>

Visualization functions create interactive displays of PAD cards and predictions.

### 6.1 show_card() - Display single card

In [None]:
# Display a single card with image and metadata
print("Displaying card 47918:")
pad.show_card(card_id=47918)

In [None]:
# Display card by sample_id (may show multiple if sample has multiple cards)
print("Displaying cards for sample 52677:")
pad.show_card(sample_id=52677)

### 6.2 show_cards() - Display multiple cards

In [None]:
# Display multiple cards by card IDs
print("Displaying multiple cards:")
pad.show_cards(card_ids=[47918, 47919, 47920])

In [None]:
# Display multiple cards by sample IDs
print("Displaying cards for multiple samples:")
pad.show_cards(sample_ids=[52677, 52678])

### 6.3 show_cards_from_df() - Display cards from DataFrame

In [None]:
# Get some cards and display them
rifampicin_cards = cards[cards['sample_name'].str.contains('rifampicin', case=False, na=False)].head(3)
if len(rifampicin_cards) > 0:
    print("Displaying rifampicin cards:")
    pad.show_cards_from_df(rifampicin_cards)
else:
    print("No rifampicin cards found in this dataset")

### 6.4 show_grouped_cards() - Display cards organized by groups

In [None]:
# Group cards by concentration and display in tabs
sample_cards = cards[cards['sample_name'].str.contains('ceftriaxone', case=False, na=False)].head(15)
if len(sample_cards) > 0:
    print("Displaying cards grouped by concentration:")
    pad.show_grouped_cards(sample_cards, group_column='quantity', images_per_row=3)
else:
    print("Not enough cards for grouping demonstration")

### 6.5 show_prediction() - Display card with prediction results

In [None]:
# Show prediction results for a card
print("Card with drug classification prediction (Model 16):")
pad.show_prediction(card_id=47918, model_id=16)

In [None]:
# Show concentration prediction
print("\nCard with concentration prediction (Model 18):")
pad.show_prediction(card_id=47918, model_id=18)

## 7. Prediction & Analysis <a name="prediction"></a>

Prediction functions apply ML models to PAD images for drug identification and quantification.

### 7.1 predict() - Single card prediction

In [None]:
# Drug classification prediction (Model 16)
actual, prediction = pad.predict(card_id=47918, model_id=16)
drug_name, confidence, energy = prediction
print(f"Drug Classification (Model 16):")
print(f"  Actual: {actual}")
print(f"  Predicted: {drug_name} (confidence: {confidence:.2%})")
print(f"  Energy: {energy:.2f}")

In [None]:
# Concentration prediction (Model 18 - PLS)
actual, predicted = pad.predict(card_id=47918, model_id=18)
print(f"\nConcentration Prediction (Model 18 - PLS):")
print(f"  Actual: {actual:.2f}%")
print(f"  Predicted: {predicted:.2f}%")
print(f"  Error: {abs(actual - predicted):.2f}%")

### 7.2 predict_url() - Predict from image URL

In [None]:
# Get image URL from a card
card = pad.get_card(card_id=47918)
image_url = card['processed_file_location'].iloc[0]

# Make prediction directly from URL
actual_value = card['quantity'].iloc[0]
_, prediction = pad.predict_url(image_url, model_id=16, actual=card['sample_name'].iloc[0])
print(f"Prediction from URL:")
print(f"  Image: {image_url.split('/')[-1]}")
print(f"  Predicted drug: {prediction[0]}")

### 7.3 apply_predictions_to_dataframe() - Batch predictions (Optimized in v0.2.1)

In [None]:
# Apply predictions to a dataset - now with optimized batch processing!
# Let's use a small subset for demonstration
small_dataset = cards.head(10)

print("Applying batch predictions (optimized in v0.2.1):")
print(f"Processing {len(small_dataset)} cards...")

# Apply predictions with Model 16 (classification)
results = pad.apply_predictions_to_dataframe(small_dataset, model_id=16)

print(f"\nResults shape: {results.shape}")
print("\nPrediction results:")
results[['card_id', 'sample_name', 'quantity', 'actual', 'prediction']].head()

In [None]:
# Batch predictions with concentration model
print("Batch concentration predictions:")
conc_results = pad.apply_predictions_to_dataframe(small_dataset, model_id=18)
conc_results[['card_id', 'sample_name', 'actual', 'prediction']].head()

## 8. Caching & Performance (New in Development) <a name="caching"></a>

Caching functions improve performance by storing downloaded images and predictions locally.

In [None]:
# Note: Caching features are in development (PR #13)
# Once merged, they will be available as:

print("Caching features (coming soon):")
print("- CacheManager: Manages local image cache")
print("- CachedDataset: Downloads and caches dataset images")
print("- predict_with_cache(): Uses cached images for faster predictions")
print("- Cache location: ~/.pad_cache/")
print("\nThese features will enable offline analysis and 50-80% faster predictions!")

## 9. Utility Functions <a name="utility"></a>

Utility functions provide lower-level access and configuration options.

### 9.1 Configuration Functions

In [None]:
# Get current base URL
print(f"Current API base URL: {pad.BASE_URL}")

# You can change the base URL if needed (e.g., for testing)
# pad.set_base_url("https://test.pad.crc.nd.edu/api/v2")

### 9.2 Image Processing Functions

In [None]:
# These functions are typically used internally but can be accessed directly
print("Available image processing functions:")
print("- get_rgb_from_pad_image(): Extract RGB values from PAD regions")
print("- extract_features_for_single_file(): Extract features for ML models")
print("\nThese are used internally by predict() functions")

### 9.3 Dataset Manager Access

In [None]:
# Get direct access to the dataset manager
dm = pad.get_dataset_manager()
print(f"DatasetManager instance: {type(dm)}")
print("\nThis provides advanced dataset operations for power users")

## Summary

This notebook demonstrated all major functions in PAD Analytics v0.2.1:

### Key Capabilities:
1. **Dataset Management** - Load and explore ML training/test datasets
2. **Card Management** - Retrieve individual PAD test results
3. **Project Management** - Organize data by research projects
4. **Model Management** - Access trained ML models
5. **Visualization** - Interactive displays of PAD images and results
6. **Prediction** - Apply ML models for drug identification and quantification
7. **Batch Processing** - Optimized parallel processing (50-80% faster in v0.2.1)
8. **Caching** - Local storage for offline analysis (coming soon)

### Common Workflows:
```python
# 1. Load dataset and make predictions
dataset = pad.get_dataset("FHI2020_Stratified_Sampling")
results = pad.apply_predictions_to_dataframe(dataset, model_id=16)

# 2. Analyze specific drug
rifampicin = dataset[dataset['sample_name'].str.contains('rifampicin')]
pad.show_grouped_cards(rifampicin, 'quantity')

# 3. Quality control
issues = pad.get_card_issues()
clean_data = dataset[~dataset['card_id'].isin(issues['card_id'])]
```

For more information, visit: https://github.com/PaperAnalyticalDeviceND/pad-analytics