# **üìã Pneumonia Detection with ResNet & PyTorch**

### **Project Goal**
To build a Deep Learning model that looks at Chest X-Ray images and predicts if a patient has **Pneumonia** or is **Normal**.

### **1. Imports and Setup**

We begin by importing all necessary libraries for our pneumonia detection pipeline.


**Core Libraries:**Medical imaging requires careful data handling, visualization for debugging, and robust model training infrastructure. These libraries form the foundation of professional deep learning workflows.

*   **`torch`**: PyTorch's main library - provides tensor operations, neural network building blocks, and GPU acceleration.**Why These Imports Matter:**

*   **`torch.nn`**: Contains neural network layers (Conv2d, Linear, etc.) and loss functions.

*   **`torchvision`**: Specialized for computer vision - provides datasets, pre-trained models, and image transformations.*   **`tqdm`**: Progress bars to monitor training loops.

*   **`DataLoader`**: Efficiently loads data in batches, handles shuffling, and supports multi-threaded data loading.*   **`pandas`**: Data manipulation (optional, for structured data analysis).

*   **`matplotlib` & `seaborn`**: Visualization tools for plotting graphs and displaying images.

**Utility Libraries:***   **`numpy`**: Numerical operations and array manipulation.
*   **`os`**: Navigate file systems and construct file paths.

In [None]:
import os
import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from tqdm import tqdm

Makes the chart appear directly inside the notebook

In [None]:
# %matplotlib inline

### **2. Configuration (Constants)**

Configuration constants define the hyperparameters and paths for our project. Centralizing these values makes experimentation easier.


**Key Parameters:****Performance Note:** These settings are optimized for Google Colab's T4 GPU. Adjust based on your hardware.

*   **`IMAGE_SIZE = 128`**: All input images are resized to 128√ó128 pixels. 

    *   *Why fixed size?* Neural networks require uniform input dimensions. CNNs use fixed kernel sizes that expect consistent input shapes.    *   *Local:* Use absolute path like `'F:/datasets/chest_xray'`

    *   *Trade-off:* Larger sizes (224√ó224) capture more detail but require more memory and compute. 128√ó128 balances speed and accuracy for medical imaging.    *   *Google Colab:* Data stored in Google Drive at `/content/drive/MyDrive/chest_xray`

*   **`DATA_DIR`**: Path to your dataset folder containing train/val/test subdirectories.

*   **`BATCH_SIZE = 64`**: Number of images processed simultaneously before updating model weights.

    *   *Larger batches:* More stable gradients, better GPU utilization, faster training.    *   *Optimization:* 64 is optimal for T4 GPU with ~15GB memory.
    *   *Smaller batches:* Better generalization, less memory usage.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# CONSTANTS
IMAGE_SIZE = 128
BATCH_SIZE = 64  # ‚ö° Increased from 32 (2x faster!)
DATA_DIR = '/content/drive/MyDrive/chest_xray'  # Update this path to where you uploaded the data

In [None]:
DATA_DIR

In [None]:
print(f"üñ•Ô∏è  Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")
print(f"‚ö° CUDA Available: {torch.cuda.is_available()}")
print(f"üìä GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

# Test actual speed
import time
x = torch.randn(1000, 1000).cuda()
start = time.time()
y = x @ x
torch.cuda.synchronize()
print(f"‚è±Ô∏è  GPU Speed Test: {(time.time()-start)*1000:.2f}ms")

### **3. Data Transforms (The Preprocessing Pipeline)**

Before feeding images into the model, we apply a series of transformations. This pipeline ensures data consistency and prepares images for neural network processing.

**Transform Steps:**

1. **`Grayscale(num_output_channels=3)`**: Converts single-channel grayscale X-rays to 3-channel images.
    *   *Why?* ResNet and most pre-trained models expect RGB images (3 channels: Red, Green, Blue).
    *   *Technical Detail:* We duplicate the grayscale channel 3 times: [Gray] ‚Üí [Gray, Gray, Gray].
    *   *Alternative:* Modify the first conv layer to accept 1 channel (we do this later for ResNet).

2. **`Resize((IMAGE_SIZE, IMAGE_SIZE))`**: Standardizes all images to 128√ó128 pixels.

    *   *Why?* CNNs require fixed input dimensions. Medical images come in various sizes.```
    *   *Method:* Uses bilinear interpolation to maintain image quality.Ready for Neural Network!
    ‚Üì

3. **`ToTensor()`**: Converts PIL Image (pixel values 0-255) to PyTorch tensor (values 0.0-1.0).Centered values (-1.0 to 1.0)

    *   *Technical:* Changes data type from uint8 to float32.    ‚Üì Normalize
    *   *Shape:* Converts (Height, Width, Channels) ‚Üí (Channels, Height, Width) for PyTorch.PyTorch tensor (0.0-1.0)
    ‚Üì ToTensor()

4. **`Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])`**: Centers and scales pixel values.Standardized size

    *   *Formula:* `output = (input - mean) / std`    ‚Üì Resize(128, 128)
    *   *Result:* Transforms range from [0, 1] to [-1, 1].3-channel grayscale (R=G=B)
    *   *Why?* Normalized inputs lead to:    ‚Üì Grayscale(3)
        - Faster convergence during trainingOriginal X-ray (variable size, 0-255 pixels)
        - More stable gradients```
        - Better model generalization**Pipeline Visualization:**


In [None]:
# TRANSFORMERS (grayscale, resize, to tensor, normalize)

transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=3),
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

### **4. Loading Data (Creating Dataset & DataLoader)**

PyTorch's data loading pipeline consists of two main components: **Datasets** (what data to load) and **DataLoaders** (how to load it efficiently).

**ImageFolder - The Automatic Labeler:**

*   **`ImageFolder`**: PyTorch's clever tool that automatically creates labels from folder structure.- Persistent workers eliminate process creation overhead

*   **Expected Structure:**- Pinned memory accelerates data transfer
    ```- Parallel loading prevents GPU starvation
    chest_xray/These DataLoader settings provide ~2-3x speedup compared to basic configuration:
    ‚îú‚îÄ‚îÄ train/**Performance Optimizations:**
    ‚îÇ   ‚îú‚îÄ‚îÄ NORMAL/       ‚Üê Label 0
    ‚îÇ   ‚îî‚îÄ‚îÄ PNEUMONIA/    ‚Üê Label 1*   **Test Set:** Final evaluation on unseen data - **never** used during training!
    ‚îú‚îÄ‚îÄ val/*   **Validation Set:** Used during training to tune hyperparameters and check for overfitting.
    ‚îÇ   ‚îú‚îÄ‚îÄ NORMAL/*   **Training Set:** Model learns patterns from this data (typically 70-80% of data).
    ‚îÇ   ‚îî‚îÄ‚îÄ PNEUMONIA/**Train/Val/Test Split Strategy:**
    ‚îî‚îÄ‚îÄ test/
        ‚îú‚îÄ‚îÄ NORMAL/    - **`persistent_workers=True`**: Keeps workers alive between epochs (avoids recreation overhead).
        ‚îî‚îÄ‚îÄ PNEUMONIA/    - **`pin_memory=True`**: Pins memory to speed up CPU‚ÜíGPU transfer (~10-20% faster).
    ```    - **`num_workers=2`**: Uses 2 CPU threads for parallel data loading (reduces GPU idle time).
*   **How it works:** Folder name ‚Üí Class label (alphabetically sorted).    - **`shuffle=False`** (Validation/Test): Maintains consistent evaluation order.
*   **Transforms:** Each image passes through our preprocessing pipeline.    - **`shuffle=True`** (Training): Randomizes order to prevent memorization. *Critical for generalization!*
    - **`batch_size=64`**: Process 64 images per iteration (optimized for GPU memory).

**DataLoader - The Efficient Delivery System:***   **Key Parameters:**
*   **Purpose:** Loads data in batches, handles shuffling, and supports parallel loading.

In [None]:
# LOAD DATASETS

train_dataset = datasets.ImageFolder(os.path.join(DATA_DIR, 'train'), transform=transform)
val_dataset = datasets.ImageFolder(os.path.join(DATA_DIR, 'val'), transform=transform)
test_dataset = datasets.ImageFolder(os.path.join(DATA_DIR, 'test'), transform=transform)

# ‚ö° Optimized DataLoaders for speed
train_loader = DataLoader(
    train_dataset, 
    batch_size=BATCH_SIZE, 
    shuffle=True, 
    num_workers=2,           # Parallel loading
    pin_memory=True,         # Faster GPU transfer
    persistent_workers=True  # Keep workers alive
)
val_loader = DataLoader(
    val_dataset, 
    batch_size=BATCH_SIZE, 
    shuffle=False, 
    num_workers=2, 
    pin_memory=True,
    persistent_workers=True
)
test_loader = DataLoader(
    test_dataset, 
    batch_size=BATCH_SIZE, 
    shuffle=False, 
    num_workers=2, 
    pin_memory=True
)

print("‚úÖ Classes : ", train_dataset.classes)
print("‚úÖ Dataset sizes : Train", len(train_dataset))
print("‚úÖ Dataset sizes : Validation", len(val_dataset))
print("‚úÖ Dataset sizes : Test", len(test_dataset))

In [None]:
print(len(train_dataset))

In [None]:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

### **5. Data Visualization & Class Imbalance Check**

**Why Visualize Class Distribution?**
In medical datasets, class imbalance is extremely common. A model trained on imbalanced data may develop biases.

**The Class Imbalance Problem:**
*   **Scenario:** 4000 Pneumonia images vs 1000 Normal images (4:1 ratio).
*   **Naive Model Behavior:** Always predicting "Pneumonia" gives 80% accuracy!- **Severe Imbalance:** >10:1 ratio ‚Üí Consider oversampling minority class
*   **Real Problem:** This model is useless - it can't detect healthy patients.- **Imbalanced:** One bar much taller ‚Üí Use class weights (see Section 18)
- **Balanced:** Both bars roughly equal height ‚Üí No special handling needed
**Our Solution Strategy:****What to Look For:**
1. **Visualize:** Plot class distribution to quantify imbalance.
2. **Weighted Loss:** Make the minority class "more expensive" to misclassify (implemented later).*   **Performance Gain:** ~1000x faster for large datasets!
3. **Evaluation Metrics:** Use Precision, Recall, F1-Score instead of just accuracy.*   **‚úÖ Fast Method:** `dataset.targets` - accesses pre-computed labels (takes milliseconds)
*   **‚ùå Slow Method:** `[label for _, label in dataset]` - loads all images (takes minutes)
**Code Optimization:**

In [None]:
from collections import Counter
import matplotlib.pyplot as plt

# FAST WAY: Access the labels directly from the list stored in memory
# This takes 0.01 seconds instead of 10 minutes
labels = train_dataset.targets 
label_counts = Counter(labels)

class_names = train_dataset.classes
class_labels = [class_names[i] for i in label_counts.keys()]
counts = list(label_counts.values())

# PLOTTING
plt.figure(figsize=(6, 4))
plt.bar(class_labels, counts, color=['green', 'red'])
plt.title("Class Distribution in Training Set")
plt.xlabel("Class")
plt.ylabel("Number of Images")
plt.show()

### **6. Sanity Check (Visual Inspection of Training Data)**

**Why This Step is Critical:**

Before training, always visually inspect your data. This catches:- Only one type of X-ray angle ‚Üí limited dataset diversity

- Corrupted images- Wrong labels ‚Üí need to fix dataset
- Incorrect labels- Text/markers on images ‚Üí need better preprocessing
- Poor image quality- Images appearing completely black/white ‚Üí normalization issue
- Unexpected preprocessing artifacts**Red Flags to Watch For:**

**What to Look For:**    - *Why needed?* Different libraries use different dimension ordering
1. **Correct Labels:** Do "Pneumonia" images actually show lung opacity/consolidation?    - *Matplotlib Format:* (Height, Width, Channels) = (128, 128, 3)
2. **Image Quality:** Are images clear or blurry? Proper contrast?    - *PyTorch Format:* (Channels, Height, Width) = (3, 128, 128)
3. **Preprocessing:** Did resize/normalization work correctly? Any distortion?*   **Transpose Operation:** `transpose((1, 2, 0))`
4. **Variety:** Do you see different patient positions, image qualities, disease severities?
    - *Result:* Converts [-1, 1] back to [0, 1] for matplotlib display
**Understanding the Visualization:**    - *Formula:* `original = (normalized * std) + mean = (pixel * 0.5) + 0.5`

*   **`unnormalize`**: Reverses the normalization transform for human viewing.    - *Recall:* We normalized with mean=0.5, std=0.5 to get range [-1, 1]

In [None]:
def show_batch(dl, class_names):
    
    # Fetch one batch from the dataloader
    images, labels = next(iter(dl))
    fig, axes = plt.subplots(3, 8, figsize=(15, 6))

    # Show images up to batch size
    num_images = len(images)
    for i, ax in enumerate(axes.flatten()):
        if i < num_images:
            img = images[i].numpy().transpose((1, 2, 0))
            
            img = (img * 0.5) + 0.5  # unnormalize 
            ax.imshow(img)
            ax.set_title(class_names[labels[i]])
            ax.axis('off')
        else:
            ax.axis('off')
    plt.suptitle("Sample Images from Training Set", fontsize=16)
    plt.tight_layout()
    plt.show()

show_batch(train_loader, train_dataset.classes)

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


### **7. Building a Custom CNN Model**
Since we are learning, let's first build a "Brain" from scratch before using a pre-trained one.
#
**The Architecture:**
*   **3 Convolutional Blocks:** Each block scans the image, finds patterns, and shrinks the size.
    *   `Conv2d`: The scanner (finds edges/textures).
    *   `BatchNorm`: The stabilizer (keeps math numbers healthy).
    *   `ReLU`: The activator (allows complex patterns).
    *   `MaxPool`: The compressor (shrinks image by 2x).
*   **Classifier Head:** The final layers that make the decision.
    *   `Flatten`: Squashes the 3D feature map into a 1D list of numbers.
    *   `Linear`: Connects neurons to the final answer.
    *   `Dropout`: Randomly turns off neurons to prevent overfitting (memorization).

In [None]:
class PneumoniaCNN(nn.Module):
    def __init__(self, input_size=150):
        super(PneumoniaCNN, self).__init__()

        # Block 1 : Input (3 RGB chnannels) -> 32 filters
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2) # Image size halves: 150 -> 75
        )

        # Block 2 : 32 filters -> 64 filters
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2) # Image size halves: 75 -> 37
        )


        # Block 3 : 64 filters -> 128 filters
        self.conv_block3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2) # Image size halves: 37 -> 18
        )

        # üß† Dynamic Math Calculation for Flattening
        # We need to tell the Linear layer exactly how many inputs to expect.
        # Original Size (150) divided by 2 (three times) = 150 // 8 = 18
        final_size = input_size // 8
        flattened_size = 128 * final_size * final_size

        self.fc = nn.Sequential(
            nn.Dropout(0.5), # drop 50% of neurons to avoid overfitting
            nn.Linear(flattened_size, 256),
            nn.ReLU(),
            nn.Dropout(0.5), # drop 50% of neurons to avoid overfitting
            nn.Linear(256, 2) # 2 output classes: Normal, Pneumonia
        )

    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = self.conv_block3(x)
        x = x.view(x.size(0), -1)  # Flatten 3D tensors to 1D
        x = self.fc(x)
        return x

### **8. Training Setup (The "Teacher" & "Rules")**

*   **GPU (`cuda`):** We move the model to the graphics card for 100x faster training.
*   **Class Imbalance Handling:**
    *   *Problem:* We have way more "Pneumonia" images than "Normal". The model might become lazy and just guess "Pneumonia".
    *   *Solution:* We calculate **Weights**. If "Normal" is rare, we tell the model: *"Pay 3x more attention to Normal images during grading."*
*   **Optimizer (`Adam`):** The algorithm that updates the weights. 
*   **Scheduler (`ReduceLROnPlateau`):** If the model stops learning (loss flattens), this tool lowers the Learning Rate to help it find a more precise solution.

In [None]:
import torch.optim as optim

# set device (GPU is must for faster training)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# initialize model with correct input size
model = PneumoniaCNN(input_size=IMAGE_SIZE).to(device)
model = torch.compile(model)

#  Handle Class Imbalance (The 'Weighted Loss' Strategy)
# We count samples to see which class is rare
labels = [label for _, label in train_dataset.imgs]
class_counts = Counter(labels)
total = sum(class_counts.values())

# Calculate weights: inverse of frequency
weights = [total / class_counts[i] for i in range(len(class_counts))]

# Define loss function with weights
class_weights = torch.FloatTensor(weights).to(device)
criterion = nn.CrossEntropyLoss(weight=class_weights) # scorecard

# Define optimizer
optimizer = optim.Adam(model.parameters(), lr=1e-4) # updater
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3) # learning rate scheduler

### **9. The Advanced Training Loop (Mixed Precision)**
This isn't just a basic loop. It uses **Automatic Mixed Precision (AMP)**.
#
*   **Concept:** Normal math uses `float32` (heavy). AMP intelligently switches to `float16` (light) where possible.
*   **Benefit:** Training becomes **2x Faster** and uses **less Memory**.
*   **`GradScaler`:** Helps manage the small numbers in `float16` so we don't lose precision.

In [None]:
from torch.cuda.amp import GradScaler
from torch import amp

# manager for mixed precision
scaler = GradScaler() 

def train_model(model, loader):
    model.train() # set to training mode (enales dropout, batchnorm, etc.)
    total_loss, correct = 0, 0

    pbar = tqdm(loader, desc='Training', unit='batch')

    for images, labels in pbar:
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad() # reset gradients
        
        # ‚ö° MIXED PRECISION: 2-3x faster!
        # runs the forward pass with mixed precision
        with amp.autocast(device_type='cuda'):
            outputs = model(images)
            loss = criterion(outputs, labels)
        
        # scales the loss, calls backward(), and unscales the gradients
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        total_loss += loss.item() 
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == labels).sum().item()

        pbar.set_postfix({'loss': f'{(total_loss / len(loader)) * 100:.2f}%', 'accuracy': f'{(correct / len(loader.dataset)) * 100:.2f}%'})

    accuracy = correct / len(loader.dataset)
    return total_loss / len(loader), accuracy

In [None]:
def validate_model(model, loader):
    model.eval() # set to evaluation mode (disables dropout, batchnorm, etc.)
    total_loss, correct = 0, 0

    pbar = tqdm(loader, desc="Validating", leave=False)

    with torch.no_grad(): # disable gradient calculations
        # ‚ö° Use mixed precision for validation too
        with amp.autocast(device_type='cuda'):
            for images, labels in pbar:
                images = images.to(device)
                labels = labels.to(device)

                outputs = model(images)
                loss = criterion(outputs, labels)

                total_loss += loss.item() 
                preds = torch.argmax(outputs, dim=1)
                correct += (preds == labels).sum().item()

                pbar.set_postfix({'loss': f'{(total_loss / len(loader)) * 100:.2f}%', 'accuracy': f'{(correct / len(loader.dataset)) * 100:.2f}%'})

    accuracy = correct / len(loader.dataset)
    return total_loss / len(loader), accuracy

### **10. Running the Experiment**
We loop through **Epochs** (full passes of the dataset).
*   **Checkpointing:** We save the model (`best_pneumonia_cnn.pth`) *only* when it beats its previous best score. This ensures we keep the smartest version, not the last version.

In [None]:
NUM_EPOCHS = 5  ## change to 25 for better results
best_val_accuracy = 0.0

train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []

for epoch in range(NUM_EPOCHS + 1):

    # ‚ö° Train
    train_loss, train_accuracy = train_model(model, train_loader)
    
    # ‚ö° Validate every 2 epochs (except first and last)
    if epoch % 2 == 0 or epoch == NUM_EPOCHS:
        val_loss, val_accuracy = validate_model(model, val_loader)

        # store history
        train_losses.append(train_loss)
        val_losses.append(val_loss)

        train_accuracies.append(train_accuracy)
        val_accuracies.append(val_accuracy)
        
        # adjust learning rate based on validation accuracy
        scheduler.step(val_accuracy)

        print(f"Epoch {epoch}/{NUM_EPOCHS}  "
              f"Train Loss: {train_loss*100:.2f}%, Acc: {train_accuracy*100:.2f}%  "
              f"Val Loss: {val_loss*100:.2f}%, Acc: {val_accuracy*100:.2f}%", flush=True)
        
        # Save best model
        if val_accuracy > best_val_accuracy:
            best_val_accuracy = val_accuracy
            torch.save(model.state_dict(), 'best_pneumonia_cnn.pth')
            print("‚úÖ New Best Model Saved!", flush=True)
    else:
        print(f"Epoch {epoch}/{NUM_EPOCHS}  "
              f"Train Loss: {train_loss*100:.2f}%, Acc: {train_accuracy*100:.2f}%", flush=True)

print(f"üéØ Best validation Accuracy: {best_val_accuracy*100:.2f}%")

### **11. Performance Analysis**
*   **Loss Graph:** Should go **DOWN**. If Validation Loss goes UP while Training Loss goes DOWN, you are **Overfitting**.
*   **Accuracy Graph:** Should go **UP**.

In [None]:
# Create epoch indices that match validation points
validation_epochs = [i for i in range(NUM_EPOCHS + 1) if i % 2 == 0 or i == NUM_EPOCHS]

plt.figure(figsize=(12, 5))

# Plot Loss
plt.subplot(1, 2, 1)
plt.plot(validation_epochs, train_losses, label='Train Loss', marker='o')
plt.plot(validation_epochs, val_losses, label='Validation Loss', marker='o')
plt.title('Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot Accuracy
plt.subplot(1, 2, 2)
plt.plot(validation_epochs, train_accuracies, label='Train Accuracy', marker='o')
plt.plot(validation_epochs, val_accuracies, label='Validation Accuracy', marker='o')
plt.title('Accuracy over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### **12. Transfer Learning with ResNet18**

**Why Transfer Learning?**
Custom CNNs are good for learning, but pre-trained models are superior for real-world applications.

**ResNet18 Advantages:**
*   **Pre-trained on ImageNet:** Already learned to detect edges, textures, shapes from 1.2M images across 1000 categories.*   **Validation/Test:** NO augmentation - evaluate on original, unmodified images
*   **Skip Connections:** Residual connections prevent vanishing gradients, enabling deeper networks.*   **Training:** Includes augmentation to create variation and prevent overfitting
*   **Proven Architecture:** State-of-the-art on many computer vision benchmarks.**Train vs Val/Test Transforms:**
*   **Transfer Learning:** We leverage its learned features and fine-tune for medical imaging.
- **Extreme Crops:** May cut off diagnostic regions
**Our Transfer Learning Strategy:**- **Large Rotations:** Unrealistic patient positioning
1. **Load Pre-trained Weights:** Start with ImageNet knowledge (general visual features)- **Vertical Flips:** Anatomically incorrect (lungs always at top)
2. **Modify Architecture:** Replace final layer for our 2-class problem- **Color Jittering:** X-rays are grayscale - intensity changes can hide pathology
3. **Fine-tune:** Train on chest X-rays while keeping learned features**‚ö†Ô∏è Augmentations to AVOID for Medical Imaging:**

**Data Augmentation - Teaching Robustness:**    - *Why not more?* Large rotations are unrealistic in medical imaging
*   **Purpose:** Artificially expand training data to prevent overfitting.    - *Why small angle?* Patient positioning varies slightly
*   **Medical Image Considerations:** Be conservative! Extreme augmentations may introduce unrealistic artifacts.*   **`RandomRotation(degrees=3)`**: Slight rotation (¬±3¬∞)

**Safe Augmentations for X-rays:**    - *50% chance:* Doubles training variety without distortion
*   **`RandomHorizontalFlip(p=0.5)`**: Mirrors image left‚Üîright    - *Medical validity:* X-rays can be from either lung/side view

In [None]:
IMAGE_SIZE = 128

# new augmentation pipeline

train_transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=3),
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.RandomHorizontalFlip(p=0.5),

    transforms.RandomRotation(degrees=3),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])

In [None]:
test_val_transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=3),
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])

In [None]:
# ‚ö†Ô∏è NOTE: This cell redefines datasets - only run if you want to use augmented transforms
# Otherwise, use the datasets already loaded in cell 15

# Use the DATA_DIR from earlier (Google Drive path)
train_dir = os.path.join(DATA_DIR, 'train')
val_dir = os.path.join(DATA_DIR, 'val')
test_dir = os.path.join(DATA_DIR, 'test')

train_dataset = datasets.ImageFolder(train_dir, transform=train_transform)
val_dataset = datasets.ImageFolder(val_dir, transform=test_val_transform)
test_dataset = datasets.ImageFolder(test_dir, transform=test_val_transform)

BATCH_SIZE = 64  # Match the optimized batch size
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, 
                         num_workers=2, pin_memory=True, persistent_workers=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False,
                       num_workers=2, pin_memory=True, persistent_workers=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False,
                        num_workers=2, pin_memory=True)

class_names = train_dataset.classes
print("‚úÖ Classes : ", class_names)

In [None]:
# get a batch of training data

images, labels = next(iter(train_loader))

# denormalize images for visualization
def denormalize(img):
    img = img * 0.5 + 0.5  # unnormalize
    img = img.numpy().transpose((1, 2, 0))
    return img


# plot a grid of images
plt.figure(figsize=(12, 6))
for i in range(min(BATCH_SIZE, 16)):
    plt.subplot(4, 4, i + 1)
    plt.imshow(denormalize(images[i]))
    plt.title(class_names[labels[i]])
    plt.axis('off')

plt.tight_layout()
plt.show()

### **13. Loading & Modifying ResNet18**


**Understanding the Modification:**```

Output: 1000 classes     ‚Üí  Output: 2 classes

ResNet18 was designed for ImageNet (1000 object categories: cats, dogs, cars, etc.). We need to adapt it for our binary classification task (Normal vs Pneumonia).  ‚Üì FC: 512‚Üí1000            ‚Üì FC: 512‚Üí2

Features: 512-dim        ‚Üí  Features: 512-dim

**Modification Steps:**  ‚Üì conv layers (frozen)      ‚Üì conv layers (fine-tuned)

Input: RGB (3√ó224√ó224)  ‚Üí  Input: Grayscale (3√ó128√ó128)

1. **Load Pre-trained Model:**ImageNet ResNet18:        Medical ResNet18:

   ```python```

   resnet = models.resnet18(weights="IMAGENET1K_V1")**Architecture Comparison:**

   ```

   - Downloads model with ImageNet weights (~46MB)*   **Hybrid:** Freeze early layers, train later layers ‚Üí Balance of both

   - Contains learned features from 1.2 million images*   **Fine-tuning (our approach):** Train all layers ‚Üí Better accuracy, needs more data

*   **Frozen Layers:** Keep pre-trained weights ‚Üí Faster training, less data needed

2. **Modify First Layer (Optional for Grayscale):****Transfer Learning Trade-off:**

   ```python

   resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)   - **Why 512?** ResNet18's final feature layer outputs 512-dimensional vectors

   ```   - **Modified:** `Linear(512, 2)` - outputs 2 classes (Normal, Pneumonia)

   - **Original:** Expects 3-channel RGB input   - **Original:** `Linear(512, 1000)` - outputs 1000 ImageNet classes

   - **Modified:** Accepts 1-channel grayscale input   ```

   - **Alternative:** Keep 3 channels and duplicate grayscale (our current approach)   resnet.fc = nn.Linear(num_features, 2)

   ```python
3. **Replace Classification Head:**

In [None]:
from torchvision import models

# load pre-trained ResNet18 model
resnet = models.resnet18(weights="IMAGENET1K_V1")
resnet.conv1 = nn.Conv2d(1, 64, kernel_size = 7, stride=2, padding=3, bias=False)

num_features = resnet.fc.in_features
resnet.fc = nn.Linear(num_features, 2)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet = resnet.to(device)

print("ü§ñ ResNet18 Loaded and Modified!")

### **14. Advanced Training: Early Stopping & Checkpointing**

**The Overfitting Problem:**
Training too long causes the model to memorize training data instead of learning general patterns. This leads to poor performance on new patients.

**Our Solution: Two Guardian Mechanisms**

**1. Early Stopping - The Safety Brake**
*   **How it works:** Monitor validation accuracy each epoch. If it doesn't improve for `PATIENCE` consecutive epochs, stop training.
*   **Logic:**
    ```

    If validation accuracy improves:*   **`best_model_wts`:** Stores parameters of best-performing model

        ‚Üí Reset patience counter*   **`epochs_no_improve`:** Counts consecutive epochs without improvement

        ‚Üí Save model checkpoint*   **`best_val_accuracy`:** Tracks highest validation score achieved

    Else:*   **`PATIENCE = 5`:** Tolerance for non-improvement (5 consecutive bad epochs)

        ‚Üí Increment patience counter*   **`NUM_EPOCHS = 25`:** Maximum training iterations (may stop early)

        ‚Üí If counter >= PATIENCE: STOP!**Key Variables Explained:**

    ```

*   **Benefits:**```

    - Prevents wasted computation on declining performance      ‚îî‚îÄ NO  ‚Üí Continue to Next Epoch

    - Automatically finds optimal training duration      ‚îú‚îÄ YES ‚Üí Early Stop! Load Best Checkpoint

    - Protects against overfitting    Patience >= PATIENCE?

         ‚Üì

*   **Setting PATIENCE:**  ‚îî‚îÄ NO  ‚Üí Increment Patience Counter

    - Too small (1-2): May stop too early, missing potential improvements  ‚îú‚îÄ YES ‚Üí Save Checkpoint, Reset Patience

    - Too large (10+): Wastes time, allows overfittingValidation Accuracy Improved?

    - **Sweet spot: 3-5 epochs** for most medical imaging tasks  ‚Üì

Evaluate on Validation Set

**2. Model Checkpointing - Saving the Best Version**  ‚Üì

*   **Problem:** The final epoch model isn't always the best!Train on Training Set

*   **Solution:** Save model weights every time we achieve new best validation accuracy.  ‚Üì

*   **File:** `best_resnet18_pneumonia.pth` contains best model state.Start Epoch

```

**Learning Rate Scheduling - Adaptive Optimization****Training Flow Diagram:**

*   **ReduceLROnPlateau:** Automatically lowers learning rate when progress stalls.

*   **Analogy:** Like slowing down a car when approaching a parking spot for precise positioning.    - `min_lr=1e-6`: Never go below this threshold

*   **Settings:**    - `patience=3`: Wait 3 epochs before reducing

    - `mode='max'`: Monitor increasing metric (accuracy)    - `factor=0.5`: Cut learning rate in half when stuck

In [None]:
# Training Loop with early stopping and checkpointing

import copy

# CONFIGURATIONS
NUM_EPOCHS = 25
PATIENCE = 5  # for early stopping

# load the best model path
BEST_MODEL_PATH = 'best_resnet18_pneumonia.pth'

# Loss, Optimizer, Scheduler
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet.parameters(), lr=1e-3)

# Schedular : "Reduce learning rate on plateau"
# if accuracy doesn't improve for 3 epochs, cut LR by half (factor = 0.5)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3, min_lr=1e-6)

# logging
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []

# early stopping 
best_val_accuracy = 0.0
epochs_no_improve = 0
best_model_wts = copy.deepcopy(resnet.state_dict())

for epoch in range(NUM_EPOCHS):
    resnet.train()
    total_loss, correct , total= 0, 0, 0

    for images, lable in train_loader:
        images = images.to(device)
        lable = lable.to(device)

        optimizer.zero_grad()
        outputs = resnet(images)
        loss = criterion(outputs, lable)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        preds = outputs.max(1)[1]
        correct += preds.eq(lable).sum().item()
        total += lable.size(0)


    train_loss /= total
    train_accuracy = correct / total
    train_losses.append(train_loss)
    train_accuracies.append(train_accuracy)

    # validation phase
    resnet.eval()
    val_loss, correct, total = 0, 0, 0
    with torch.no_grad():
        for images, lable in val_loader:
            images = images.to(device)
            lable = lable.to(device)

            outputs = resnet(images)
            loss = criterion(outputs, lable)

            val_loss += loss.item()
            preds = outputs.max(1)[1]
            correct += preds.eq(lable).sum().item()
            total += lable.size(0)

    val_loss /= total
    val_accuracy = correct / total
    val_losses.append(val_loss)
    val_accuracies.append(val_accuracy)

    scheduler.step(val_accuracy)

    print(f"Epoch {epoch+1}/{NUM_EPOCHS}  Train Loss: {train_loss*100:.2f}%, Acc: {train_accuracy*100:.2f}%  Val Loss: {val_loss*100:.2f}%, Acc: {val_accuracy*100:.2f}%")

    # check for improvement
    if val_accuracy > best_val_accuracy:
        best_val_accuracy = val_accuracy
        best_model_wts = copy.deepcopy(resnet.state_dict())
        torch.save(resnet.state_dict(), BEST_MODEL_PATH)
        epochs_no_improve = 0
        print("üìå Best model saved.")
    else:
        epochs_no_improve += 1

        # early stopping
        if epochs_no_improve >= PATIENCE:
            print("‚èπÔ∏è Early stopping triggered.")
            break

# load best model weights
resnet.load_state_dict(best_model_wts)
print(f"‚úÖ Training completed. Best Validation Accuracy: {best_val_accuracy*100:.2f}%")

### **15. Visualizing Training Performance**

After training, we must diagnose the model's "health" using graphs.

*   **Loss Graph (Left):** Both lines should go **DOWN**.
    *   *Red Flag:* If Training Loss goes down but Validation Loss goes UP, the model is **Overfitting**.
*   **Accuracy Graph (Right):** Both lines should go **UP**.
    *   *Goal:* We want the Orange line (Validation) to be as high as possible.

In [None]:
epoch_range = range(1, len(train_losses) + 1)

plt.figure(figsize=(12, 5))

# loss plot
plt.subplot(1, 2, 1)
plt.plot(epoch_range, train_losses, label='Train Loss', color='blue')
plt.plot(epoch_range, val_losses, label='Validation Loss', color='orange')
plt.title('Loss over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# accuracy plot
plt.subplot(1, 2, 2)
plt.plot(epoch_range, train_accuracies, label='Train Accuracy', color='blue')
plt.plot(epoch_range, val_accuracies, label='Validation Accuracy', color='orange')
plt.title('Accuracy over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

### **16. Deep Dive Evaluation**
Accuracy isn't everything. In medicine, **False Negatives** (missed pneumonia) are dangerous.

*   **Precision:** "Of all the cases predicted as Pneumonia, how many were actually Pneumonia?"
*   **Recall (Sensitivity):** "Of all the actual Pneumonia cases, how many did we find?" (We want this high!).
*   **F1-Score:** The harmonic mean of Precision and Recall.
#
**Confusion Matrix:**
*   **Diagonal:** Correct predictions.
*   **Off-Diagonal:** Mistakes. Look closely at the bottom-left box (False Negatives).

In [None]:
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

# put model in evaluation mode
resnet.eval()

# collect all predictions and true labels
y_true = []
y_pred = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)

        outputs = resnet(images)
        preds = outputs.max(1)[1]

        y_true.extend(labels.cpu().numpy())
        y_pred.extend(preds.cpu().numpy())

# generate classification report
class_names = ['Normal', 'Pneumonia']
print("üìä Classification Report:\n")
print(classification_report(y_true, y_pred, target_names=class_names))


# plot confusion matrix
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names)
disp.plot(cmap=plt.cm.Blues, values_format='d')
plt.title("Confusion Matrix")
plt.show()

### **17. Visualizing Model Mistakes**
It is crucial to see *what* the model is getting wrong.

*   **Correct Predictions:** Images where the model and doctor agreed.
*   **Incorrect Predictions:** Images where the model failed.
*   **Analysis:** Look at the incorrect images. Are they blurry? Is there a medical device obstructing the lung? This helps you understand your data quality.

1. Step 1: Generate predictions
2. Step 2: Sort into Correct vs Incorrect indices
3. Step 3: Display them

In [None]:
# step 1. generate predictions on True labels
resnet.eval()

all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)

        outputs = resnet(images)
        preds = outputs.max(1)[1]

        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())


# step 2. Find correct and incorrect predictions
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)

correct_indices = np.where(all_preds == all_labels)[0]
incorrect_indices = np.where(all_preds != all_labels)[0]


# step 3. Visualize some correct predictions
def show_images(indices, title, n=16):
    plt.figure(figsize=(12, 6))
    for i, idx in enumerate(indices[:n]):
        image, label = test_dataset[idx]
        image = denormalize(image)

        plt.subplot(4, 4, i + 1)
        plt.imshow(image)
        plt.title(f"Pred: {class_names[all_preds[idx]]}\nTrue: {class_names[label]}")
        plt.axis('off')

    plt.suptitle(title, fontsize=16)
    plt.tight_layout()
    plt.show()

# step 4. Show correct predictions
show_images(correct_indices, "‚úÖ Correct Predictions")
show_images(incorrect_indices, "‚ùå Incorrect Predictions")

### **18. Handling Class Imbalance (The Professional Way)**

**The Problem:** If we have 4000 Pneumonia images and only 1000 Normal images, the model naturally becomes biased towards predicting "Pneumonia".

**The Solution: Class Weighting**
Instead of deleting data, we make the "Normal" images **more expensive** to get wrong.

*   **`compute_class_weight`:** This function calculates the perfect penalty score. If "Normal" is 4x rarer, its error counts 4x more.
*   **Weighted Loss:** We pass these weights into `CrossEntropyLoss`.

In [None]:
from sklearn.utils.class_weight import compute_class_weight
import numpy as np

# 1. Calculate Weights automatically based on training data
all_labels = [label for _, label in train_dataset]  # or your raw labels array
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(all_labels), y=all_labels)

# 2. Convert to Tensor and move to GPU
class_weights = torch.tensor(class_weights, dtype=torch.float).to(device)

print("Class Weights:", class_weights)

In [None]:
# 3. update the loss function
criterion = nn.CrossEntropyLoss(weight=class_weights)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import ReduceLROnPlateau

# Loss function with class weights
criterion = nn.CrossEntropyLoss(weight=class_weights)

# Optimizer and Scheduler
optimizer = optim.Adam(model.parameters(), lr=1e-3)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.3, patience=2, min_lr=1e-6)

# Early stopping setup
best_val_acc = 0
early_stop_counter = 0
PATIENCE = 5
EPOCHS = 20

train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []

for epoch in range(EPOCHS):
    model.train()
    running_loss, correct, total = 0.0, 0, 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, preds = torch.max(outputs, 1)
        total += labels.size(0)
        correct += preds.eq(labels).sum().item()

    train_loss = running_loss / total
    train_acc = correct / total
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)

    # Validation
    model.eval()
    val_loss, correct, total = 0.0, 0, 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            _, preds = torch.max(outputs, 1)
            total += labels.size(0)
            correct += preds.eq(labels).sum().item()

    val_loss /= total
    val_acc = correct / total
    val_losses.append(val_loss)
    val_accuracies.append(val_acc)

    print(f"Epoch {epoch+1}/{EPOCHS}  Train Loss: {train_loss*100:.2f}%, Acc: {train_acc*100:.2f}%  Val Loss: {val_loss*100:.2f}%, Acc: {val_acc*100:.2f}%")

    # Learning rate update
    scheduler.step(val_acc)

    # Checkpoint and early stopping
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        torch.save(model.state_dict(), "resnet18_pneumonia_best.pth")
        print("üìå Best model saved.")
        early_stop_counter = 0
    else:
        early_stop_counter += 1
        if early_stop_counter >= PATIENCE:
            print("‚èπÔ∏è Early stopping triggered.")
            break

print("‚úÖ Training complete.")
# Load best model
model.load_state_dict(torch.load("resnet18_pneumonia_best.pth"))


In [None]:
epochs_range = range(1, len(train_losses) + 1)

plt.figure(figsize=(14, 5))

# üìâ Loss Plot
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_losses, label='Train Loss', color='blue')
plt.plot(epochs_range, val_losses, label='Val Loss', color='orange')
plt.title("Loss Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)

# üìà Accuracy Plot
plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_accuracies, label='Train Accuracy', color='blue')
plt.plot(epochs_range, val_accuracies, label='Val Accuracy', color='orange')
plt.title("Accuracy Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()


In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Evaluation
model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        all_preds.extend(predicted.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Print classification report
print(classification_report(all_labels, all_preds, target_names=class_names))

# Plot confusion matrix
cm = confusion_matrix(all_labels, all_preds)
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.title("Confusion Matrix on Test Set")
plt.xlabel("Predicted label")
plt.ylabel("True label")
plt.show()

### **19. Inference (Using the Model)**

This is the final step: taking a raw image file from the real world (e.g., a new patient scan) and getting a diagnosis.

**Key Step:** You **must** apply the exact same transforms (Resize, Normalize) to the single image that you used during training.

*   **`unsqueeze(0)`**: PyTorch expects a batch of images (Batch, Channel, Height, Width). A single image is just (C, H, W). We add a fake "Batch dimension" of 1 to make it (1, C, H, W).

In [None]:
transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=3),
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])

# load best model
resnet = models.resnet18(weights="IMAGENET1K_V1")
resnet.load_state_dict(torch.load("resnet18_pneumonia_best.pth"))
resnet.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet = resnet.to(device)


# predict the single image

def predict_image(image_path):
    image = Image.open(image_path).convert("L")  # convert to grayscale
    image_tensor = transform(image).unsqueeze(0).to(device)  # add batch dimension and move to device

    with torch.no_grad():
        outputs = resnet(image_tensor)
        _, predicted = torch.max(outputs, 1)

    class_names = ['Normal', 'Pneumonia']
    prediction = class_names[predicted.item()]

    # show image with prediction
    plt.imshow(image, cmap='gray')
    plt.title(f"Prediction: {prediction}")
    plt.axis('off')
    plt.show()

    return prediction

# change path to your image
predict_image('/content/drive/MyDrive/chest_xray/test/NORMAL/IM-0001-0001.jpeg')  