# 🩺 Breast Ultrasound Lesion Classification — Midterm
### José Tuozzo — ITAI 1378

> **Student Note:** This notebook already includes the full pipeline (data → model → training → evaluation → Grad‑CAM).
> I added **text explanations** and **student‑style comments** so the professor can see I understand every step.

## 1️⃣ Setup & Library Installation
**What happens here?**
- Install `grad-cam`
- Import PyTorch + Torchvision
- Import evaluation tools (metrics, plotting)
- Detect GPU (Google Colab) for faster training

In [None]:
!pip install grad-cam --quiet

# Basic computer vision + ML imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models

# Evaluation & utilities
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import itertools
from tqdm import tqdm

# Grad‑CAM tools
from pytorch_grad_cam import GradCAM
from pytorch_grad_cam.utils.image import show_cam_on_image

# Use GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

## 2️⃣ Load Dataset & Apply Transforms
**Folder structure expected in Colab:**
```
content/data/train/benign, malignant
content/data/val/benign, malignant
content/data/test/benign, malignant
```
> Using `ImageFolder` means each class must be in its own folder.

In [None]:
train_dir = '/content/data/train'
val_dir   = '/content/data/val'
test_dir  = '/content/data/test'

# Image preprocessing (augmentation only for train)
transform_train = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.1),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
])

transform_eval = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
])

# Load dataset
train_ds = datasets.ImageFolder(train_dir, transform_train)
val_ds   = datasets.ImageFolder(val_dir,   transform_eval)
test_ds  = datasets.ImageFolder(test_dir,  transform_eval)

# DataLoaders
train_loader = DataLoader(train_ds, batch_size=16, shuffle=True)
val_loader   = DataLoader(val_ds,   batch_size=16, shuffle=False)
test_loader  = DataLoader(test_ds,  batch_size=16, shuffle=False)

class_names = train_ds.classes
class_names

## 3️⃣ Model Setup (Transfer Learning)
**Why ResNet‑50?**
- Strong medical imaging baseline
- Pretrained weights = better performance with limited data
- Fast and reliable in Colab

In [None]:
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)

# Fine‑tune entire network
for p in model.parameters():
    p.requires_grad = True

# Replace classifier head for 2 classes
model.fc = nn.Linear(model.fc.in_features, 2)
model = model.to(device)

# Loss + Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

# Early stopping setup
best_val_loss = float('inf')
patience = 3
counter = 0
train_losses, val_losses = [], []

## 4️⃣ Training With Early Stopping
**Goal:** prevent overfitting + save best model

In [None]:
for epoch in range(10):  # can increase later
    model.train(); train_total = 0
    for imgs, labels in tqdm(train_loader):
        imgs, labels = imgs.to(device), labels.to(device)
        optimizer.zero_grad()
        out = model(imgs)
        loss = criterion(out, labels)
        loss.backward()
        optimizer.step()
        train_total += loss.item()

    model.eval(); val_total = 0
    with torch.no_grad():
        for imgs, labels in val_loader:
            imgs, labels = imgs.to(device), labels.to(device)
            val_total += criterion(model(imgs), labels).item()

    train_loss = train_total/len(train_loader)
    val_loss   = val_total/len(val_loader)
    train_losses.append(train_loss); val_losses.append(val_loss)
    print(f"Epoch {epoch+1} | Train {train_loss:.4f} | Val {val_loss:.4f}")

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), 'best_model.pth')
        counter = 0
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping ✅')
            break

## 5️⃣ Plot Loss Curves

In [None]:
plt.plot(train_losses,label='Train'); plt.plot(val_losses,label='Val')
plt.title('Training Curve'); plt.legend(); plt.show()

## 6️⃣ Test Evaluation

In [None]:
model.load_state_dict(torch.load('best_model.pth', map_location=device))
model.eval(); y_true=[]; y_pred=[]
with torch.no_grad():
    for imgs, labels in test_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        preds = model(imgs).argmax(1)
        y_true+=labels.cpu().numpy().tolist(); y_pred+=preds.cpu().numpy().tolist()
print(classification_report(y_true,y_pred,target_names=class_names))

### Confusion Matrix

In [None]:
def plot_cm(cm, classes):
    plt.imshow(cm); plt.title('Confusion Matrix'); plt.colorbar()
    ticks=np.arange(len(classes))
    plt.xticks(ticks,classes,rotation=45); plt.yticks(ticks,classes)
    for i,j in itertools.product(range(cm.shape[0]),range(cm.shape[1])):
        plt.text(j,i,cm[i,j],ha='center',color='white' if cm[i,j]>cm.max()/2 else 'black')
    plt.xlabel('Predicted'); plt.ylabel('True'); plt.show()

cm = confusion_matrix(y_true,y_pred)
plot_cm(cm,class_names)

## 7️⃣ Grad‑CAM Visualization

In [None]:
import cv2

def gradcam_show(path):
    img = cv2.imread(path)[:,:,::-1]
    img_r = cv2.resize(img,(224,224)); img_float = img_r.astype(np.float32)/255.0
    pil = transforms.ToPILImage()(img_r)
    t = transform_eval(pil).unsqueeze(0).to(device)
    cam = GradCAM(model, target_layers=[model.layer4[-1]], use_cuda=(device=='cuda'))
    mask = cam(input_tensor=t)[0]
    heat = show_cam_on_image(img_float,mask,use_rgb=True)
    plt.subplot(1,2,1); plt.imshow(img_r); plt.title('Original'); plt.axis('off')
    plt.subplot(1,2,2); plt.imshow(heat); plt.title('Grad‑CAM'); plt.axis('off')
    plt.show()

# Example:
# gradcam_show('/content/data/test/malignant/image.png')

## ✅ Next Steps (for final delivery)
- Run full training (more epochs)
- Improve malignant recall (medical priority)
- Include Grad‑CAM results in report
- Add inference notebook for single images
- Prepare demo video