# Project Summary

## Dataset & Goal
- Dataset Source: RSNA Pneumonia Detection Challenge (Kaggle). This is a high-authority source (Radiological Society of North America)
- Total Images: $\sim$26,000 unique Chest X-ray images.
- Model Goal: Multi-Class Classification (3 classes), with a planned fallback to Binary Classification if performance is poor.
- Image Path Root: All images were successfully unzipped and are located in the Colab runtime environment under the folder path /content/train_images/stage_2_train_images/.

## Class Definitions & Mapping

| Class Name | Label (Target) | Pathological Status | Role in Classification |
| :--- | :--- | :--- | :--- |
| Normal | 0 | Healthy | True Negative (Healthy) |
| Lung Opacity | 1 | Pneumonia Present | True Positive (Pneumonia) |
| No Lung Opacity / Not Normal | 2 | Other Diseases/Issues | Hard Negative (Sick, but NOT Pneumonia) |

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Data Preparation

## Load Datasets

In [None]:
# 1. Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# 2. Install pydicom (needed for medical images)
!pip install pydicom

# 3. Unzip images into the local Colab environment (FAST)
!unzip -n -q "/content/drive/My Drive/STAT362 Final Project_RSNA/images.zip" -d "/content/train_images"

Mounted at /content/drive
Collecting pydicom
  Downloading pydicom-3.0.1-py3-none-any.whl.metadata (9.4 kB)
Downloading pydicom-3.0.1-py3-none-any.whl (2.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydicom
Successfully installed pydicom-3.0.1


In [None]:
import pandas as pd

# Load the detailed class info
base = "/content/drive/MyDrive/STAT362 Final Project_RSNA"  # or whatever the shortcut folder is named
detailed_class = pd.read_csv(f"{base}/stage_2_detailed_class_info.csv")
labels = pd.read_csv(f"{base}/stage_2_train_labels.csv")


## EDA

In [None]:
detailed_class.head()

Unnamed: 0,patientId,class
0,0004cfab-14fd-4e49-80ba-63a80b6bddd6,No Lung Opacity / Not Normal
1,00313ee0-9eaa-42f4-b0ab-c148ed3241cd,No Lung Opacity / Not Normal
2,00322d4d-1c29-4943-afc9-b6754be640eb,No Lung Opacity / Not Normal
3,003d8fa0-6bf1-40ed-b54c-ac657f8495c5,Normal
4,00436515-870c-4b36-a041-de91049b9ab4,Lung Opacity


In [None]:
labels.head()

Unnamed: 0,patientId,x,y,width,height,Target
0,0004cfab-14fd-4e49-80ba-63a80b6bddd6,,,,,0
1,00313ee0-9eaa-42f4-b0ab-c148ed3241cd,,,,,0
2,00322d4d-1c29-4943-afc9-b6754be640eb,,,,,0
3,003d8fa0-6bf1-40ed-b54c-ac657f8495c5,,,,,0
4,00436515-870c-4b36-a041-de91049b9ab4,264.0,152.0,213.0,379.0,1


In [None]:
# Merge the two datasets to explore the relationship between class and target
merge_df = pd.merge(detailed_class, labels[['patientId', 'Target']], on='patientId')
merge_df.head()

Unnamed: 0,patientId,class,Target
0,0004cfab-14fd-4e49-80ba-63a80b6bddd6,No Lung Opacity / Not Normal,0
1,00313ee0-9eaa-42f4-b0ab-c148ed3241cd,No Lung Opacity / Not Normal,0
2,00322d4d-1c29-4943-afc9-b6754be640eb,No Lung Opacity / Not Normal,0
3,003d8fa0-6bf1-40ed-b54c-ac657f8495c5,Normal,0
4,00436515-870c-4b36-a041-de91049b9ab4,Lung Opacity,1


In [None]:
merge_df.groupby(by=['class', 'Target']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,patientId
class,Target,Unnamed: 2_level_1
Lung Opacity,1,16957
No Lung Opacity / Not Normal,0,11821
Normal,0,8851


A consistent class-to-target mapping confirms that 'Lung Opacity' represents pneumonia, while 'Normal' and 'Not Normal' represent non-pneumonia cases. Consequently, we will implement a three-class CNN to distinguish between these three unique states: healthy lungs, pneumonia, and other lung pathologies.

## Process Labels (The Multi-Class Logic)

In [None]:
# 1. REMOVE DUPLICATES
# We only need one label per patientId
detailed_class = detailed_class.drop_duplicates(subset=['patientId'])

# 2. DEFINE 3-CLASS MAPPING
# Normal = 0, Pneumonia = 1, Other Disease = 2
class_mapping = {
    'Normal': 0,
    'Lung Opacity': 1,
    'No Lung Opacity / Not Normal': 2
}

detailed_class['target'] = detailed_class['class'].map(class_mapping)

# 3. Create the file path column
# The patientId in CSV does not have .dcm extension, so we add it
detailed_class['path'] = detailed_class['patientId'].apply(lambda x: f"/content/train_images/stage_2_train_images/{x}.dcm")

print(f"Total unique images: {len(detailed_class)}")
print(detailed_class['target'].value_counts())

Total unique images: 26684
target
2    11821
0     8851
1     6012
Name: count, dtype: int64


## The Custom Dataset Class

This is the most important part. This Python class tells PyTorch how to open a DICOM file and turn it into a tensor your model can understand.

In [None]:
import torch
from torch.utils.data import Dataset
import pydicom
import numpy as np
from PIL import Image
from pydicom.pixel_data_handlers.util import apply_voi_lut

class RSNADataset(Dataset):
    def __init__(self, dataframe, transform=None):
        self.dataframe = dataframe.reset_index(drop=True)
        self.transform = transform

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        img_path = self.dataframe.loc[idx, "path"]
        label = int(self.dataframe.loc[idx, "target"])

        ds = pydicom.dcmread(img_path)
        img = ds.pixel_array.astype(np.float32)

        # apply VOI LUT (better windowing when available)
        try:
            img = apply_voi_lut(img, ds).astype(np.float32)
        except Exception:
            pass

        # handle inverted grayscale
        if getattr(ds, "PhotometricInterpretation", "") == "MONOCHROME1":
            img = img.max() - img

        # robust normalize with percentile clipping
        lo, hi = np.percentile(img, (1, 99))
        img = np.clip(img, lo, hi)
        img = (img - lo) / (hi - lo + 1e-6)  # -> [0, 1]

        # convert to 8-bit and 3-channel PIL for torchvision transforms
        img = (img * 255.0).astype(np.uint8)
        image = Image.fromarray(img).convert("RGB")

        if self.transform:
            image = self.transform(image)

        return image, torch.tensor(label, dtype=torch.long)




## Train-Test-Validation Split



In [None]:
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader

# 1. Split (70/15/15) with stratify
train_df, temp_df = train_test_split(
    detailed_class, test_size=0.3, stratify=detailed_class["target"], random_state=42
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.5, stratify=temp_df["target"], random_state=42
)

print(f"Train: {len(train_df)}, Val: {len(val_df)}, Test: {len(test_df)}")

Train: 18678, Val: 4003, Test: 4003


# CNN Model

### Instruction for building the model
Your crucial data preparation steps are now complete, and the data is ready for modeling. Your immediate focus should be on defining and implementing the custom CNN architecture. Use the provided train_loader to feed batches of data into your model, define your loss function as nn.CrossEntropyLoss() (since we're doing 3-class classification), and start building the training loop on the GPU (cuda). Remember that your final layer must output 3 neurons to match the target labels (0, 1, 2). Once the base model is training successfully, you can begin the process of iteration and architectural refinement.

# First Round: Light Data Augmentation

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.models import densenet121, DenseNet121_Weights
import copy
from tqdm import tqdm

# ==========================================
# 1. SETUP TRANSFORMS & DATASETS (Using Global Split)
# ==========================================
# Train: Light augmentation
train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(7),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.25, 0.25, 0.25]),
])

# Val/Test: No augmentation
eval_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.25, 0.25, 0.25]),
])

# Use the GLOBAL dataframes (train_df, val_df, test_df) you already created
train_dataset = RSNADataset(train_df, transform=train_transforms)
val_dataset   = RSNADataset(val_df,   transform=eval_transforms)
test_dataset  = RSNADataset(test_df,  transform=eval_transforms)

# Create DataLoaders
BATCH_SIZE = 64
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True,
                          num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_dataset,   batch_size=BATCH_SIZE, shuffle=False,
                          num_workers=0, pin_memory=True)
test_loader  = DataLoader(test_dataset,  batch_size=BATCH_SIZE, shuffle=False,
                          num_workers=0, pin_memory=True)

# ==========================================
# 2. INITIALIZE MODEL
# ==========================================
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model = densenet121(weights=DenseNet121_Weights.DEFAULT)

# Modify Head for 3 Classes
num_features = model.classifier.in_features
model.classifier = nn.Sequential(
    nn.Linear(num_features, 512),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(512, 3)
)
model = model.to(device)

# ==========================================
# 3. OPTIMIZER & TRAINING
# ==========================================
optimizer = optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4)

# Calculate weights based on global train_df
class_counts = train_df["target"].value_counts().sort_index().values
class_weights = 1.0 / torch.tensor(class_counts, dtype=torch.float32)
class_weights = (class_weights / class_weights.sum()) * len(class_counts)
criterion = nn.CrossEntropyLoss(weight=class_weights.to(device))

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="max", factor=0.5, patience=1
)

# Training Loop
num_epochs = 10
patience = 3
best_val_acc = -1.0
best_state_dict = None
epochs_no_improve = 0
scaler = torch.amp.GradScaler('cuda')

print("Starting Training (Restoring Round 1)...")

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    train_correct = 0
    train_total = 0

    # TQDM Progress Bar
    loop = tqdm(train_loader, leave=True, desc=f"Epoch {epoch+1}/{num_epochs}")

    for images, labels in loop:
        images = images.to(device, non_blocking=True)
        labels = labels.to(device, non_blocking=True)

        optimizer.zero_grad(set_to_none=True)

        with torch.amp.autocast('cuda'):
            outputs = model(images)
            loss = criterion(outputs, labels)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        # Update metrics
        train_loss += loss.item() * images.size(0)
        preds = outputs.argmax(dim=1)
        train_correct += (preds == labels).sum().item()
        train_total += labels.size(0)

        # --- FIX: Show running accuracy on the progress bar ---
        loop.set_postfix(loss=loss.item(), acc=train_correct/train_total)

    # Calculate final epoch metrics
    train_loss /= train_total
    train_acc = train_correct / train_total

    # Validation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0

    with torch.no_grad():
        for images, labels in val_loader:
            images = images.to(device, non_blocking=True)
            labels = labels.to(device, non_blocking=True)

            with torch.amp.autocast('cuda'):
                outputs = model(images)
                loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            preds = outputs.argmax(dim=1)
            val_correct += (preds == labels).sum().item()
            val_total += labels.size(0)

    val_loss /= val_total
    val_acc = val_correct / val_total

    scheduler.step(val_acc)

    # Save Best Logic
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        best_state_dict = copy.deepcopy(model.state_dict())
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1

    # --- FIX: Restore the detailed print statement ---
    print(
        f"Epoch [{epoch+1}/{num_epochs}] "
        f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} "
        f"| Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}"
    )

    if epochs_no_improve >= patience:
        print("Early stopping triggered")
        break

# Restore best weights
if best_state_dict is not None:
    model.load_state_dict(best_state_dict)
    print(f"Loaded best model with Val Acc = {best_val_acc:.4f}")

Using device: cuda
Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /root/.cache/torch/hub/checkpoints/densenet121-a639ec97.pth


100%|██████████| 30.8M/30.8M [00:00<00:00, 66.3MB/s]


Starting Training (Restoring Round 1)...


Epoch 1/10: 100%|██████████| 292/292 [09:42<00:00,  1.99s/it, acc=0.662, loss=0.405]


Epoch [1/10] Train Loss: 0.6945 | Train Acc: 0.6617 | Val Loss: 0.6363 | Val Acc: 0.6907


Epoch 2/10: 100%|██████████| 292/292 [08:14<00:00,  1.69s/it, acc=0.706, loss=0.476]


Epoch [2/10] Train Loss: 0.6070 | Train Acc: 0.7064 | Val Loss: 0.6201 | Val Acc: 0.7177


Epoch 3/10: 100%|██████████| 292/292 [08:04<00:00,  1.66s/it, acc=0.727, loss=0.466]


Epoch [3/10] Train Loss: 0.5721 | Train Acc: 0.7271 | Val Loss: 0.6132 | Val Acc: 0.7012


Epoch 4/10: 100%|██████████| 292/292 [08:16<00:00,  1.70s/it, acc=0.735, loss=0.594]


Epoch [4/10] Train Loss: 0.5501 | Train Acc: 0.7354 | Val Loss: 0.6327 | Val Acc: 0.6952


Epoch 5/10: 100%|██████████| 292/292 [08:13<00:00,  1.69s/it, acc=0.764, loss=0.577]


Epoch [5/10] Train Loss: 0.4984 | Train Acc: 0.7639 | Val Loss: 0.6514 | Val Acc: 0.6842
Early stopping triggered
Loaded best model with Val Acc = 0.7177


In [None]:
# Save the Round 1 model to Drive so you don't lose it again!
torch.save(model.state_dict(), "/content/drive/MyDrive/STAT362 Final Project_RSNA/densenet_round1_72acc.pth")
print("Round 1 saved successfully.")

Round 1 saved successfully.


## Test Performance

In [None]:
# Evaluating on test dataset (with extra metrics)

import numpy as np
from sklearn.metrics import classification_report, confusion_matrix, balanced_accuracy_score

model.eval()
test_loss = 0.0
test_total = 0

all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device, non_blocking=True)
        labels = labels.to(device, non_blocking=True)

        outputs = model(images)
        loss = criterion(outputs, labels)

        test_loss += loss.item() * images.size(0)
        test_total += labels.size(0)

        preds = outputs.argmax(dim=1)
        all_preds.append(preds.cpu().numpy())
        all_labels.append(labels.cpu().numpy())

test_loss /= test_total

all_preds = np.concatenate(all_preds)
all_labels = np.concatenate(all_labels)

test_acc = (all_preds == all_labels).mean()
bal_acc = balanced_accuracy_score(all_labels, all_preds)
cm = confusion_matrix(all_labels, all_preds)

print(f"  Test Loss:          {test_loss:.4f}")
print(f"  Test Accuracy:      {test_acc:.4f}")
print(f"  Balanced Accuracy:  {bal_acc:.4f}")
print("  Confusion Matrix:\n", cm)
print("\n  Classification Report:\n")
print(classification_report(all_labels, all_preds, digits=4))


  Test Loss:          0.6067
  Test Accuracy:      0.7277
  Balanced Accuracy:  0.7344
  Confusion Matrix:
 [[1155    7  166]
 [  25  628  249]
 [ 280  363 1130]]

  Classification Report:

              precision    recall  f1-score   support

           0     0.7911    0.8697    0.8286      1328
           1     0.6293    0.6962    0.6611       902
           2     0.7314    0.6373    0.6811      1773

    accuracy                         0.7277      4003
   macro avg     0.7172    0.7344    0.7236      4003
weighted avg     0.7282    0.7277    0.7255      4003



## Binary Performance

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# 1. Convert 3-Class predictions to Binary (0=Healthy, 1=Sick)
# We map classes [1, 2] -> 1 (Sick) and keep class [0] -> 0 (Healthy)
binary_preds  = np.where(all_preds == 0, 0, 1)
binary_labels = np.where(all_labels == 0, 0, 1)

# 2. Calculate Scores
bin_acc = accuracy_score(binary_labels, binary_preds)
bin_prec = precision_score(binary_labels, binary_preds)
bin_rec = recall_score(binary_labels, binary_preds)
bin_f1 = f1_score(binary_labels, binary_preds)

print(f"--- Binary Classification (Healthy vs. Sick) ---")
print(f"Binary Accuracy:  {bin_acc:.4f}")
print(f"Binary Precision: {bin_prec:.4f}")
print(f"Binary Recall:    {bin_rec:.4f}  <-- This is your sensitivity (ability to catch sickness)")
print(f"Binary F1 Score:  {bin_f1:.4f}")

--- Binary Classification (Healthy vs. Sick) ---
Binary Accuracy:  0.8806
Binary Precision: 0.9320
Binary Recall:    0.8860  <-- This is your sensitivity (ability to catch sickness)
Binary F1 Score:  0.9084


# Second Round: Intense Data Augmentation & Weight Changed

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.models import densenet121, DenseNet121_Weights
from torch.utils.data import DataLoader
from tqdm import tqdm
import copy
import numpy as np

# ==========================================
# 1. UPGRADE DATA AUGMENTATION (Stronger)
# ==========================================
print("Setting up Stronger Augmentation...")
train_transforms_tuned = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    # RandomAffine: Rotates (15°), Shifts (10%), and Zooms (10%)
    transforms.RandomAffine(degrees=15, translate=(0.1, 0.1), scale=(0.9, 1.1)),
    transforms.ToTensor(),
    # DenseNet standard normalization
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Validation transforms (No augmentation, just resize/norm)
eval_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# ==========================================
# 2. REFRESH DATA LOADERS
# ==========================================
# We re-initialize the datasets to ensure the new transforms are applied
train_dataset = RSNADataset(train_df, transform=train_transforms_tuned)
val_dataset   = RSNADataset(val_df,   transform=eval_transforms)

BATCH_SIZE = 64 # Reduce to 32 if you hit OutOfMemory error
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_dataset,   batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

# ==========================================
# 3. INITIALIZE FRESH MODEL (DenseNet121)
# ==========================================
print("Initializing fresh DenseNet121 model...")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = densenet121(weights=DenseNet121_Weights.DEFAULT)

# Modify Head
num_features = model.classifier.in_features
model.classifier = nn.Sequential(
    nn.Linear(num_features, 512),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(512, 3)
)
model = model.to(device)

# ==========================================
# 4. SETUP OPTIMIZER & BOOSTED LOSS
# ==========================================
# Optimizer (Unfrozen, low LR)
optimizer = optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4)

# Scheduler
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="max", factor=0.5, patience=1
)

# Loss with MANUAL WEIGHT BOOST
# Calculate base weights
class_counts = train_df["target"].value_counts().sort_index().values
class_weights = 1.0 / torch.tensor(class_counts, dtype=torch.float32)
class_weights = (class_weights / class_weights.sum()) * len(class_counts)

# MANUAL BOOST: Multiply Class 1 (Pneumonia) weight by 2.0
class_weights[1] = class_weights[1] * 2.0
print(f"Final Boosted Class Weights: {class_weights}")

criterion = nn.CrossEntropyLoss(weight=class_weights.to(device))

# ==========================================
# 5. START TRAINING LOOP
# ==========================================
num_epochs = 10
patience = 3
best_val_acc = -1.0
best_state_dict = None
epochs_no_improve = 0
scaler = torch.cuda.amp.GradScaler()

print(f"Starting training on {device}...")

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    train_correct = 0
    train_total = 0

    loop = tqdm(train_loader, leave=True, desc=f"Epoch {epoch+1}/{num_epochs}")
    for images, labels in loop:
        images = images.to(device, non_blocking=True)
        labels = labels.to(device, non_blocking=True)

        optimizer.zero_grad(set_to_none=True)

        with torch.cuda.amp.autocast():
            outputs = model(images)
            loss = criterion(outputs, labels)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        train_loss += loss.item() * images.size(0)
        preds = outputs.argmax(dim=1)
        train_correct += (preds == labels).sum().item()
        train_total += labels.size(0)

        loop.set_postfix(loss=loss.item())

    train_loss /= train_total
    train_acc = train_correct / train_total

    # Validation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0

    with torch.no_grad():
        for images, labels in val_loader:
            images = images.to(device, non_blocking=True)
            labels = labels.to(device, non_blocking=True)

            with torch.cuda.amp.autocast():
                outputs = model(images)
                loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            preds = outputs.argmax(dim=1)
            val_correct += (preds == labels).sum().item()
            val_total += labels.size(0)

    val_loss /= val_total
    val_acc = val_correct / val_total

    scheduler.step(val_acc)

    if val_acc > best_val_acc:
        best_val_acc = val_acc
        best_state_dict = copy.deepcopy(model.state_dict())
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1

    print(
        f"Epoch [{epoch+1}/{num_epochs}] "
        f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} "
        f"| Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}"
    )

    if epochs_no_improve >= patience:
        print("Early stopping triggered")
        break

if best_state_dict is not None:
    model.load_state_dict(best_state_dict)
    print(f"Loaded best model with Val Acc = {best_val_acc:.4f}")

Setting up Stronger Augmentation...
Initializing fresh DenseNet121 model...


  scaler = torch.cuda.amp.GradScaler()


Final Boosted Class Weights: tensor([0.9313, 2.7426, 0.6974])
Starting training on cuda...


  with torch.cuda.amp.autocast():
Epoch 1/10: 100%|██████████| 292/292 [08:34<00:00,  1.76s/it, loss=0.744]
  with torch.cuda.amp.autocast():


Epoch [1/10] Train Loss: 0.6710 | Train Acc: 0.5933 | Val Loss: 0.5966 | Val Acc: 0.6443


Epoch 2/10: 100%|██████████| 292/292 [08:07<00:00,  1.67s/it, loss=0.807]


Epoch [2/10] Train Loss: 0.5946 | Train Acc: 0.6434 | Val Loss: 0.5956 | Val Acc: 0.6350


Epoch 3/10: 100%|██████████| 292/292 [08:03<00:00,  1.66s/it, loss=0.504]


Epoch [3/10] Train Loss: 0.5684 | Train Acc: 0.6632 | Val Loss: 0.6004 | Val Acc: 0.6800


Epoch 4/10: 100%|██████████| 292/292 [08:02<00:00,  1.65s/it, loss=0.68]


Epoch [4/10] Train Loss: 0.5525 | Train Acc: 0.6707 | Val Loss: 0.5920 | Val Acc: 0.6458


Epoch 5/10: 100%|██████████| 292/292 [08:06<00:00,  1.67s/it, loss=0.57]


Epoch [5/10] Train Loss: 0.5383 | Train Acc: 0.6809 | Val Loss: 0.6487 | Val Acc: 0.6110


Epoch 6/10: 100%|██████████| 292/292 [08:04<00:00,  1.66s/it, loss=0.403]


Epoch [6/10] Train Loss: 0.4997 | Train Acc: 0.7018 | Val Loss: 0.6019 | Val Acc: 0.6835


Epoch 7/10: 100%|██████████| 292/292 [08:12<00:00,  1.69s/it, loss=0.451]


Epoch [7/10] Train Loss: 0.4787 | Train Acc: 0.7194 | Val Loss: 0.6108 | Val Acc: 0.6722


Epoch 8/10: 100%|██████████| 292/292 [08:11<00:00,  1.68s/it, loss=0.287]


Epoch [8/10] Train Loss: 0.4650 | Train Acc: 0.7234 | Val Loss: 0.6784 | Val Acc: 0.6812


Epoch 9/10: 100%|██████████| 292/292 [08:12<00:00,  1.69s/it, loss=0.409]


Epoch [9/10] Train Loss: 0.4283 | Train Acc: 0.7494 | Val Loss: 0.6281 | Val Acc: 0.6707
Early stopping triggered
Loaded best model with Val Acc = 0.6835


## Test Performance

In [None]:
# Evaluating on test dataset (with extra metrics)

import numpy as np
from sklearn.metrics import classification_report, confusion_matrix, balanced_accuracy_score

model.eval()
test_loss = 0.0
test_total = 0

all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device, non_blocking=True)
        labels = labels.to(device, non_blocking=True)

        outputs = model(images)
        loss = criterion(outputs, labels)

        test_loss += loss.item() * images.size(0)
        test_total += labels.size(0)

        preds = outputs.argmax(dim=1)
        all_preds.append(preds.cpu().numpy())
        all_labels.append(labels.cpu().numpy())

test_loss /= test_total

all_preds = np.concatenate(all_preds)
all_labels = np.concatenate(all_labels)

test_acc = (all_preds == all_labels).mean()
bal_acc = balanced_accuracy_score(all_labels, all_preds)
cm = confusion_matrix(all_labels, all_preds)

print(f"  Test Loss:          {test_loss:.4f}")
print(f"  Test Accuracy:      {test_acc:.4f}")
print(f"  Balanced Accuracy:  {bal_acc:.4f}")
print("  Confusion Matrix:\n", cm)
print("\n  Classification Report:\n")
print(classification_report(all_labels, all_preds, digits=4))

  Test Loss:          0.5585
  Test Accuracy:      0.6697
  Balanced Accuracy:  0.7199
  Confusion Matrix:
 [[1190   46   92]
 [  21  776  105]
 [ 318  740  715]]

  Classification Report:

              precision    recall  f1-score   support

           0     0.7783    0.8961    0.8330      1328
           1     0.4968    0.8603    0.6299       902
           2     0.7840    0.4033    0.5326      1773

    accuracy                         0.6697      4003
   macro avg     0.6864    0.7199    0.6652      4003
weighted avg     0.7174    0.6697    0.6542      4003



## Binary Performance

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# 1. Convert 3-Class predictions to Binary (0=Healthy, 1=Sick)
# We map classes [1, 2] -> 1 (Sick) and keep class [0] -> 0 (Healthy)
binary_preds  = np.where(all_preds == 0, 0, 1)
binary_labels = np.where(all_labels == 0, 0, 1)

# 2. Calculate Scores
bin_acc = accuracy_score(binary_labels, binary_preds)
bin_prec = precision_score(binary_labels, binary_preds)
bin_rec = recall_score(binary_labels, binary_preds)
bin_f1 = f1_score(binary_labels, binary_preds)

print(f"--- Binary Classification (Healthy vs. Sick) ---")
print(f"Binary Accuracy:  {bin_acc:.4f}")
print(f"Binary Precision: {bin_prec:.4f}")
print(f"Binary Recall:    {bin_rec:.4f}  <-- This is your sensitivity (ability to catch sickness)")
print(f"Binary F1 Score:  {bin_f1:.4f}")

--- Binary Classification (Healthy vs. Sick) ---
Binary Accuracy:  0.8808
Binary Precision: 0.9442
Binary Recall:    0.8733  <-- This is your sensitivity (ability to catch sickness)
Binary F1 Score:  0.9074


# Third Round of Tuning: Intense Data Augmentation + No Weight Changed (Final Model)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.models import densenet121, DenseNet121_Weights
import copy
from tqdm import tqdm
from sklearn.model_selection import train_test_split

# ==========================================
# 1. CONFIGURATION (SAFE MODE)
# ==========================================
BATCH_SIZE = 64
SAVE_PATH = "/content/drive/MyDrive/STAT362 Final Project_RSNA/densenet_round3_best.pth"

# ==========================================
# 2. DATA SETUP
# ==========================================
# Train: Stronger augmentation (Zoom, Shift, Rotate) + Horizontal Flip
train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomAffine(degrees=15, translate=(0.1, 0.1), scale=(0.9, 1.1)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Val/Test: No augmentation
eval_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Split (Using random_state=42 ensures this matches your Global Split)
train_df, temp_df = train_test_split(
    detailed_class, test_size=0.3, stratify=detailed_class["target"], random_state=42
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.5, stratify=temp_df["target"], random_state=42
)

# Create Datasets
train_dataset = RSNADataset(train_df, transform=train_transforms)
val_dataset   = RSNADataset(val_df,   transform=eval_transforms)
test_dataset  = RSNADataset(test_df,  transform=eval_transforms)

# Create Loaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True,
                          num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_dataset,   batch_size=BATCH_SIZE, shuffle=False,
                          num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_dataset,  batch_size=BATCH_SIZE, shuffle=False,
                          num_workers=2, pin_memory=True)

# ==========================================
# 3. INITIALIZE MODEL
# ==========================================
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model = densenet121(weights=DenseNet121_Weights.DEFAULT)

num_features = model.classifier.in_features
model.classifier = nn.Sequential(
    nn.Linear(num_features, 512),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(512, 3)
)
model = model.to(device)

# ==========================================
# 4. OPTIMIZER & LOSS
# ==========================================
optimizer = optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4)

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="max", factor=0.5, patience=1
)

# Standard Weights (Math-based, no manual boost)
class_counts = train_df["target"].value_counts().sort_index().values
class_weights = 1.0 / torch.tensor(class_counts, dtype=torch.float32)
class_weights = (class_weights / class_weights.sum()) * len(class_counts)
print(f"Using Standard Weights: {class_weights}")

criterion = nn.CrossEntropyLoss(weight=class_weights.to(device))

# ==========================================
# 5. ROBUST TRAINING LOOP
# ==========================================
num_epochs = 10
patience = 3
best_val_acc = -1.0
best_state_dict = None
epochs_no_improve = 0
scaler = torch.amp.GradScaler('cuda')

print(f"Starting Round 3 Training (Batch Size {BATCH_SIZE})...")

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    train_correct = 0
    train_total = 0

    loop = tqdm(train_loader, leave=True, desc=f"Epoch {epoch+1}/{num_epochs}")
    for images, labels in loop:
        images = images.to(device, non_blocking=True)
        labels = labels.to(device, non_blocking=True)

        optimizer.zero_grad(set_to_none=True)

        with torch.amp.autocast('cuda'):
            outputs = model(images)
            loss = criterion(outputs, labels)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        # Update metrics
        train_loss += loss.item() * images.size(0)
        preds = outputs.argmax(dim=1)
        train_correct += (preds == labels).sum().item()
        train_total += labels.size(0)

        # Show real-time accuracy
        loop.set_postfix(loss=loss.item(), acc=train_correct/train_total)

    train_loss /= train_total
    train_acc = train_correct / train_total

    # Validation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0

    with torch.no_grad():
        for images, labels in val_loader:
            images = images.to(device, non_blocking=True)
            labels = labels.to(device, non_blocking=True)

            with torch.amp.autocast('cuda'):
                outputs = model(images)
                loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            preds = outputs.argmax(dim=1)
            val_correct += (preds == labels).sum().item()
            val_total += labels.size(0)

    val_loss /= val_total
    val_acc = val_correct / val_total

    scheduler.step(val_acc)

    # Save Best Model Logic (With Drive Backup)
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        best_state_dict = copy.deepcopy(model.state_dict())
        epochs_no_improve = 0

        # --- CRITICAL: Save to Drive immediately ---
        torch.save(model.state_dict(), SAVE_PATH)
        print(f"  --> New Best! Saved to Drive (Acc: {best_val_acc:.4f})")
    else:
        epochs_no_improve += 1
        print(f"  --> No improvement. Best was {best_val_acc:.4f}")

    print(
        f"Epoch [{epoch+1}/{num_epochs}] "
        f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} "
        f"| Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}"
    )

    if epochs_no_improve >= patience:
        print("Early stopping triggered")
        break

# Restore best weights
if best_state_dict is not None:
    model.load_state_dict(best_state_dict)
    print(f"Training Complete. Best model with Val Acc = {best_val_acc:.4f} is saved at {SAVE_PATH}")

Using device: cuda
Using Standard Weights: tensor([0.9313, 1.3713, 0.6974])
Starting Round 3 Training (Batch Size 64)...


Epoch 1/10: 100%|██████████| 292/292 [08:52<00:00,  1.82s/it, acc=0.659, loss=0.647]


  --> New Best! Saved to Drive (Acc: 0.6385)
Epoch [1/10] Train Loss: 0.7013 | Train Acc: 0.6594 | Val Loss: 0.6695 | Val Acc: 0.6385


Epoch 2/10: 100%|██████████| 292/292 [08:11<00:00,  1.68s/it, acc=0.701, loss=0.65]


  --> New Best! Saved to Drive (Acc: 0.6732)
Epoch [2/10] Train Loss: 0.6230 | Train Acc: 0.7011 | Val Loss: 0.6900 | Val Acc: 0.6732


Epoch 3/10: 100%|██████████| 292/292 [08:17<00:00,  1.70s/it, acc=0.712, loss=0.417]


  --> New Best! Saved to Drive (Acc: 0.6900)
Epoch [3/10] Train Loss: 0.5998 | Train Acc: 0.7121 | Val Loss: 0.6421 | Val Acc: 0.6900


Epoch 4/10: 100%|██████████| 292/292 [08:12<00:00,  1.69s/it, acc=0.722, loss=0.47]


  --> No improvement. Best was 0.6900
Epoch [4/10] Train Loss: 0.5786 | Train Acc: 0.7222 | Val Loss: 0.6636 | Val Acc: 0.6667


Epoch 5/10: 100%|██████████| 292/292 [08:19<00:00,  1.71s/it, acc=0.729, loss=0.772]


  --> New Best! Saved to Drive (Acc: 0.7035)
Epoch [5/10] Train Loss: 0.5635 | Train Acc: 0.7287 | Val Loss: 0.6236 | Val Acc: 0.7035


Epoch 6/10: 100%|██████████| 292/292 [08:07<00:00,  1.67s/it, acc=0.738, loss=0.553]


  --> No improvement. Best was 0.7035
Epoch [6/10] Train Loss: 0.5474 | Train Acc: 0.7384 | Val Loss: 0.6263 | Val Acc: 0.7007


Epoch 7/10: 100%|██████████| 292/292 [08:07<00:00,  1.67s/it, acc=0.742, loss=0.495]


  --> No improvement. Best was 0.7035
Epoch [7/10] Train Loss: 0.5384 | Train Acc: 0.7422 | Val Loss: 0.6269 | Val Acc: 0.6937


Epoch 8/10: 100%|██████████| 292/292 [08:14<00:00,  1.69s/it, acc=0.765, loss=0.474]


  --> New Best! Saved to Drive (Acc: 0.7170)
Epoch [8/10] Train Loss: 0.4913 | Train Acc: 0.7647 | Val Loss: 0.6507 | Val Acc: 0.7170


Epoch 9/10: 100%|██████████| 292/292 [08:12<00:00,  1.69s/it, acc=0.776, loss=0.461]


  --> No improvement. Best was 0.7170
Epoch [9/10] Train Loss: 0.4695 | Train Acc: 0.7762 | Val Loss: 0.6646 | Val Acc: 0.7092


Epoch 10/10: 100%|██████████| 292/292 [08:12<00:00,  1.69s/it, acc=0.784, loss=0.293]


  --> No improvement. Best was 0.7170
Epoch [10/10] Train Loss: 0.4536 | Train Acc: 0.7845 | Val Loss: 0.7029 | Val Acc: 0.6927
Training Complete. Best model with Val Acc = 0.7170 is saved at /content/drive/MyDrive/STAT362 Final Project_RSNA/densenet_round3_best.pth


In [None]:
# Evaluating on test dataset (with extra metrics)

import numpy as np
from sklearn.metrics import classification_report, confusion_matrix, balanced_accuracy_score

model.eval()
test_loss = 0.0
test_total = 0

all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device, non_blocking=True)
        labels = labels.to(device, non_blocking=True)

        outputs = model(images)
        loss = criterion(outputs, labels)

        test_loss += loss.item() * images.size(0)
        test_total += labels.size(0)

        preds = outputs.argmax(dim=1)
        all_preds.append(preds.cpu().numpy())
        all_labels.append(labels.cpu().numpy())

test_loss /= test_total

all_preds = np.concatenate(all_preds)
all_labels = np.concatenate(all_labels)

test_acc = (all_preds == all_labels).mean()
bal_acc = balanced_accuracy_score(all_labels, all_preds)
cm = confusion_matrix(all_labels, all_preds)

print(f"  Test Loss:          {test_loss:.4f}")
print(f"  Test Accuracy:      {test_acc:.4f}")
print(f"  Balanced Accuracy:  {bal_acc:.4f}")
print("  Confusion Matrix:\n", cm)
print("\n  Classification Report:\n")
print(classification_report(all_labels, all_preds, digits=4))


  Test Loss:          0.6118
  Test Accuracy:      0.7330
  Balanced Accuracy:  0.7364
  Confusion Matrix:
 [[1225    5   98]
 [  30  593  279]
 [ 326  331 1116]]

  Classification Report:

              precision    recall  f1-score   support

           0     0.7748    0.9224    0.8422      1328
           1     0.6383    0.6574    0.6477       902
           2     0.7475    0.6294    0.6834      1773

    accuracy                         0.7330      4003
   macro avg     0.7202    0.7364    0.7245      4003
weighted avg     0.7320    0.7330    0.7281      4003



In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# 1. Convert 3-Class predictions to Binary (0=Healthy, 1=Sick)
# We map classes [1, 2] -> 1 (Sick) and keep class [0] -> 0 (Healthy)
binary_preds  = np.where(all_preds == 0, 0, 1)
binary_labels = np.where(all_labels == 0, 0, 1)

# 2. Calculate Scores
bin_acc = accuracy_score(binary_labels, binary_preds)
bin_prec = precision_score(binary_labels, binary_preds)
bin_rec = recall_score(binary_labels, binary_preds)
bin_f1 = f1_score(binary_labels, binary_preds)

print(f"--- Binary Classification (Healthy vs. Sick) ---")
print(f"Binary Accuracy:  {bin_acc:.4f}")
print(f"Binary Precision: {bin_prec:.4f}")
print(f"Binary Recall:    {bin_rec:.4f}  <-- This is your sensitivity (ability to catch sickness)")
print(f"Binary F1 Score:  {bin_f1:.4f}")

--- Binary Classification (Healthy vs. Sick) ---
Binary Accuracy:  0.8853
Binary Precision: 0.9575
Binary Recall:    0.8669  <-- This is your sensitivity (ability to catch sickness)
Binary F1 Score:  0.9099
