<a href="https://colab.research.google.com/github/ikhlas15/ATHENS-AI-Medical-Imaging/blob/main/H14_project_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Notebook 14: Final Project Template for Medical Imaging AI**

### **Course**: Artificial Intelligence in Medical Imaging: From Fundamentals to Applications

***

## **1. Introduction**

Congratulations on reaching the final hands-on session of the course! You have learned all the core components of building a medical imaging AI model, from data loading to explainability. This notebook serves as a **capstone project template**. It brings together all the best practices we've discussed into a single, organized, and reusable structure.

The goal of this template is to provide you with a solid foundation for your own medical AI projects. A well-structured project is easier to develop, debug, reproduce, and share with collaborators.

#### **What this template provides:**
*   A logical **project structure** for organizing your code, data, and results.
*   Reusable, modular code for:
    *   Data loading and preprocessing (`Dataset`, `DataLoader`).
    *   Model definition.
    *   Training and validation loops.
    *   Saving and loading model checkpoints.
    *   Evaluation and visualization.
*   A complete, end-to-end workflow that you can adapt to your specific problem, whether it's classification or segmentation.

***

## **2. The Anatomy of a Medical AI Project**

A professional AI project is more than just a single script. It's a well-organized collection of files and folders, each with a specific purpose. This organization makes your work scalable and reproducible.

Here is a standard project layout. In Google Colab, you can create these directories in `/content/` or in your mounted Google Drive.

```
/my_medical_ai_project/
|
|--- ðŸ“‚ data/
|    |--- ðŸ“‚ train/
|    |    |--- ðŸ“‚ images/
|    |    |--- ðŸ“‚ masks/
|    |--- ðŸ“‚ val/
|    |--- ðŸ“‚ test/
|
|--- ðŸ“‚ src/
|    |--- ðŸ“„ datasets.py       # Custom Dataset classes
|    |--- ðŸ“„ models.py         # Model architectures (e.g., UNet, ResNet)
|    |--- ðŸ“„ engine.py         # Training and validation loops
|    |--- ðŸ“„ utils.py          # Helper functions (e.g., seeding, plotting)
|    |--- ðŸ“„ config.py         # Configuration settings (hyperparameters)
|
|--- ðŸ“‚ notebooks/
|    |--- ðŸ“„ 01_data_exploration.ipynb
|    |--- ðŸ“„ 02_model_training.ipynb
|
|--- ðŸ“œ train.py              # Main script to run training
|--- ðŸ“œ evaluate.py           # Script to run evaluation on a saved model
|
|--- ðŸ“‚ outputs/
|    |--- ðŸ“‚ checkpoints/     # Saved model weights
|    |--- ðŸ“‚ logs/            # TensorBoard logs or metrics
|
`--- ðŸ“œ requirements.txt      # Project dependencies
```
For this template notebook, we will define all the necessary components in one place for simplicity. In a real project, you would split them into the separate `.py` files shown above.

***

## **3. Step 1: Configuration**

A best practice is to manage all your hyperparameters and settings in one place. This makes experiments easy to track and modify.


In [None]:
# --- Configuration Settings ---
import torch
class Config:
    # Data settings
    DATA_DIR = "/content/data/pneumonia/"
    IMAGE_SIZE = 224

    # Model settings
    MODEL_NAME = "resnet18"
    NUM_CLASSES = 2
    PRETRAINED = True

    # Training settings
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
    BATCH_SIZE = 32
    LEARNING_RATE = 1e-4
    NUM_EPOCHS = 4

    # Paths
    CHECKPOINT_PATH = "/content/outputs/checkpoints/best_model.pth"

# Instantiate the config
config = Config()


***

## **4. Step 2: Setup and Utilities**

This section contains our standard setup code: installing packages, importing libraries, and defining helper functions.


In [None]:
#Install required packages
#!pip install -q torch torchvision medmnist scikit-learn seaborn captum

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as T
import torchvision.models as models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
from medmnist import PneumoniaMNIST
from tqdm import tqdm

#  Seeding for Reproducibility
def set_seed(seed=42):
    import random
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.deterministic = True   # ensures same output each run
    torch.backends.cudnn.benchmark = False      # disables cuDNN autotuner for reproducibility

# Hint: Call the function with a fixed seed value
set_seed(_____)

# Hint: Print the device (CPU or GPU). Where is DEVICE stored?
print(f"Using device: {_____.DEVICE}")


***

## **5. Step 3: Data Loading Module**

Here, we define our `Dataset` and `DataLoader`. This code would typically live in `src/datasets.py`.

In [None]:
# --- Data Transformations ---
data_transforms = {
    'train': T.Compose([
        # Hint: Resize to the image size defined in config
        T.Resize((config.IMAGE_SIZE, config.IMAGE_SIZE)),

        # Hint: Data augmentation (random flip)
        T.RandomHorizontalFlip(p=_____),

        # Hint: Small random rotation (degrees)
        T.RandomRotation(degrees=_____),

        T.ToTensor(),

        # Hint: Normalize grayscale image: mean=0.5, std=0.5
        T.Normalize(mean=[_____], std=[_____])
    ]),

    'val': T.Compose([
        T.Resize((config.IMAGE_SIZE, config.IMAGE_SIZE)),
        T.ToTensor(),
        # Hint: Same normalization for validation
        T.Normalize(mean=[_____], std=[_____])
    ]),
}

# --- Datasets ---
# Hint: Load PneumoniaMNIST training split
train_dataset = PneumoniaMNIST(
    split='train',
    transform=data_transforms['_____'],
    download=True
)

# Hint: Load PneumoniaMNIST validation split
val_dataset = PneumoniaMNIST(
    split='_____',
    transform=data_transforms['_____'],
    download=True
)

# --- DataLoaders ---
train_loader = DataLoader(
    train_dataset,
    batch_size=_____.BATCH_SIZE,   # Hint: batch size from config
    shuffle=_____                  # Hint: should training data be shuffled?
)

val_loader = DataLoader(
    val_dataset,
    batch_size=_____.BATCH_SIZE,   # Hint: same batch size
    shuffle=_____                  # Hint: validation is NOT shuffled
)

print(f"Data loaded: {len(train_dataset)} training images, {len(val_dataset)} validation images.")


***

## **6. Step 4: Model Definition Module**

This defines our model architecture. In a real project, this would be in `src/models.py`.


In [None]:
# --- Build Model ---
def build_model(model_name, num_classes, pretrained=True):

    if model_name == "resnet18":

        # Hint: Load a pretrained ResNet18 if requested
        if pretrained:
            model = models.resnet18(weights="_____")
        else:
            model = models.resnet18()

        # Hint: Replace first conv layer for 1-channel (grayscale) input
        model.conv1 = nn.Conv2d(
            in_channels=_____,   # should be 1
            out_channels=_____,  # should be 64
            kernel_size=7,
            stride=2,
            padding=3,
            bias=False
        )

        # Hint: Grab number of features from final FC layer
        num_ftrs = model.fc._____

        # Hint: Replace classifier to match number of classes
        model.fc = nn.Linear(
            in_features=_____,   # same as num_ftrs
            out_features=_____   # same as num_classes
        )

    else:
        # Hint: Only resnet18 is implemented right now
        raise NotImplementedError(f"Model {model_name} not implemented.")

    # Hint: Send model to device (CPU/GPU)
    return model.to(config.______)


# --- Build model using config settings ---
model = build_model(
    model_name=_____.MODEL_NAME,     # Hint: e.g., "resnet18"
    num_classes=_____.NUM_CLASSES,   # Hint: 2 for pneumonia dataset
    pretrained=_____.PRETRAINED      # Hint: True or False
)


***

## **7. Step 5: Training Engine Module**

This contains our core training and validation logic, along with checkpointing functions. This would live in `src/engine.py` and `src/utils.py`.


In [None]:
# --- Loss and Optimizer ---
# Hint: Use CrossEntropyLoss for classification
criterion = nn.__________()

# Hint: Use Adam optimizer with learning rate from config
optimizer = optim.Adam(
    model.__________(),      # model parameters
    lr=_____.LEARNING_RATE   # learning rate
)


# --- Training and Validation Loops ---
def train_one_epoch(model, data_loader, criterion, optimizer, device):
    model.__________()   # Hint: set model to training mode

    total_loss, total_correct = 0.0, 0

    for images, labels in data_loader:
        # Hint: Move data to device (CPU/GPU)
        images  = images.to(__________)
        labels  = labels.to(__________).squeeze().long()

        optimizer.__________()  # Hint: reset gradients

        # Hint: forward pass
        outputs = model(images)

        # Hint: compute loss
        loss = criterion(__________, _________)

        # Hint: backward pass
        loss.__________()

        # Hint: update weights
        optimizer.__________()

        total_loss += loss.item() * images.size(0)

        # Hint: predicted class = argmax
        total_correct += (torch.argmax(outputs, dim=_____) == labels).sum().item()

    # Hint: divide by total dataset size
    return total_loss / len(data_loader.dataset), total_correct / len(data_loader.dataset)



def validate(model, data_loader, criterion, device):
    model.__________()   # Hint: evaluation mode

    total_loss, total_correct = 0.0, 0

    with torch.no_grad():
        for images, labels in data_loader:
            images = images.to(__________)
            labels = labels.to(__________).squeeze().long()

            outputs = model(images)
            loss = criterion(outputs, labels)

            total_loss += loss.item() * images.size(0)
            total_correct += (torch.argmax(outputs, dim=_____) == labels).sum().item()

    return total_loss / len(data_loader.dataset), total_correct / len(data_loader.dataset)



# --- Checkpointing ---
def save_checkpoint(model, optimizer, epoch, path):

    # Hint: Create directory if it does not exist
    os.makedirs(os.path.dirname(path), exist_ok=True)

    # Hint: Save epoch, model weights, optimizer state
    torch.save({
        'epoch': _________,
        'model_state_dict': model.__________(),
        'optimizer_state_dict': optimizer.__________(),
    }, path)


***

## **8. Step 6: Main Training Script**

This is the main executable part of the project that ties everything together. In a full project, this would be `train.py`.


In [None]:
# --- Training History ---
history = {
    'train_loss': [],
    'train_acc':  [],
    'val_loss':   [],
    'val_acc':    []
}

best_val_acc = 0.0

print("--- Starting Training ---")

for epoch in tqdm(range(config.__________)):   # Hint: number of epochs from config

    # Hint: call training function
    train_loss, train_acc = train_one_epoch(
        model,
        train_loader,
        criterion,
        optimizer,
        config.__________    # device
    )

    # Hint: call validation function
    val_loss, val_acc = validate(
        model,
        val_loader,
        criterion,
        config.__________     # device
    )

    # Store history
    history['train_loss'].append(__________)
    history['train_acc'].append(__________)
    history['val_loss'].append(__________)
    history['val_acc'].append(__________)

    print(f"Epoch [{epoch+1}/{config.NUM_EPOCHS}] | "
          f"Train Loss: {__________:.4f}, Train Acc: {__________:.4f} | "
          f"Val Loss: {__________:.4f}, Val Acc: {__________:.4f}")

    # --- Save the best model ---
    # Hint: Check if validation accuracy improved
    if val_acc > __________:
        best_val_acc = val_acc
        print(f\"New best model found! Saving checkpoint to {config.__________}\")

        # Hint: Save model
        save_checkpoint(
            model,
            optimizer,
            epoch,
            config.__________      # checkpoint path
        )

print("--- Training Finished ---")


***

## **9. Step 7: Evaluation and Visualization**

After training, load your best model and perform a thorough evaluation. This part would typically be in `evaluate.py` or an evaluation notebook.


In [None]:
# --- Load the best model for evaluation ---
# Hint: Normally you'd reload from disk using torch.load. Here we assume the model is already loaded.

model.eval()
all_labels = []
all_predictions = []

with torch.no_grad():
    for images, labels in val_loader:
        # Hint: Move images to device
        images = images.to(config.__________)

        outputs = model(images)

        # Hint: Convert model outputs to predicted class indices
        predictions = torch.argmax(outputs, dim=______)

        # Hint: Convert tensors to numpy arrays before extending the lists
        all_labels.extend(labels.__________())
        all_predictions.extend(predictions.cpu().__________())

# --- Final Metrics and Confusion Matrix ---
# Hint: Compute final accuracy using sklearn
final_accuracy = accuracy_score(__________, __________)
print(f"\nFinal Validation Accuracy of Best Model: {final_accuracy:.4f}")

# Hint: Compute confusion matrix
cm = confusion_matrix(__________, __________)

plt.figure(figsize=(6, 5))
sns.heatmap(
    cm,
    annot=True,
    fmt='d',
    cmap='Oranges',
    xticklabels=['Normal', 'Pneumonia'],
    yticklabels=['Normal', 'Pneumonia']
)
plt.title("Final Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

# --- Plot Training Curves ---
plt.figure(figsize=(12, 5))

# --- Loss plots ---
plt.subplot(1, 2, 1)
plt.plot(history['__________'], label='Train Loss')       # Hint: fill history key
plt.plot(history['__________'], label='Validation Loss')  # Hint: fill history key
plt.title("Loss Curves")
plt.legend()

# --- Accuracy plots ---
plt.subplot(1, 2, 2)
plt.plot(history['__________'], label='Train Accuracy')       # Hint: fill history key
plt.plot(history['__________'], label='Validation Accuracy')  # Hint: fill history key
plt.title("Accuracy Curves")
plt.legend()

plt.show()


***

## **10. Summary and Next Steps**

This notebook provides a complete, professional template for tackling a medical imaging AI project. By separating concernsâ€”data, model, training, and evaluationâ€”you create a codebase that is clean, reusable, and easy to modify for new experiments.

**How to use this template for your own project:**
1.  **Define Your Problem:** Is it classification or segmentation? What is your data source?
2.  **Gather and Organize Your Data:** Place your images and labels into the `data/` directory structure.
3.  **Adapt the `Dataset` Class:** Modify the `MedicalImageDataset` class in Step 5 to load your specific file types (e.g., NIfTI, DICOM).
4.  **Choose and Define Your Model:** Select an architecture in Step 6. Will you use a ResNet, a U-Net, or something custom?
5.  **Configure Your Experiment:** Adjust the settings in the `Config` class in Step 3.
6.  **Run and Iterate:** Execute the training and evaluation steps. Analyze the results, and then go back to tweak your model, data augmentations, or hyperparameters.

Good luck with your future projects! This structured approach will serve you well as you build innovative and impactful AI solutions for medicine.
