In [None]:
# run this cell to ensure course package is installed
import sys
from pathlib import Path

course_tools_path = Path('../../Lessons/Course_Tools/').resolve() # change this to the local path of the course package
sys.path.append(str(course_tools_path))

from install_introdl import ensure_introdl_installed
ensure_introdl_installed(force_update=False, local_path_pkg= course_tools_path / 'introdl')

In [1]:
# please add all your imports here

import torch
import torchvision.transforms.v2 as transforms
from torch.utils.data import Dataset, DataLoader
from torchvision.io import read_image
from torchvision import tv_tensors
from pathlib import Path

# import local modules
from graphics_and_data_prep import display_yolo_predictions, prepare_penn_fudan_yolo

from introdl.utils import config_paths_keys

paths = config_paths_keys()
DATA_PATH = paths["DATA_PATH"]
MODELS_PATH = paths["MODELS_PATH"]


MODELS_PATH=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\models
DATA_PATH=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\data
TORCH_HOME=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads
HF_HOME=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads


## Homework 6

For this assignment there are two primary tasks.  

1.  Explore UNet and UNet++ on the nuclei segmentation task described in the textbook.
2.  Fine-tune a YOLO model for pedestrian detection and compare the results to the Faster R-CNN model in the lesson.

### Task 1 - Nuclei Segmentation (20 points)

You're going to use the segmentation models pytorch package as we did in the lesson to fine-tune and evaluate UNet and UNet++ models on the nuclei segmentation task shown in the textbook

We've already prepared the data and put it in the directory with this homework.  The next cell contains most of a custom dataset class and some transforms to get get you started.  You'll need to finish the code with "####" to read the image and mask and add appropriate augmentation transforms.  


In [3]:
# Run this cell once to download the Nuclei Segmentation dataset 

from graphics_and_data_prep import download_and_extract_nuclei_data

# Call the function
download_and_extract_nuclei_data(DATA_PATH)


C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\data\nuclei_data already exists. Skipping download and extraction.


In [None]:
class NucleiDataset(Dataset):
    def __init__(self, root, transform=None):
        """
        Args:
            root (str or Path): Path to the dataset (train or val folder).
            transform (callable, optional): Optional transforms to apply to both image and mask.
        """
        self.root = Path(root)  # Convert to pathlib Path object
        self.transform = transform
        self.data = []  # List to store (image_tensor, mask_tensor) tuples

        # Load all image and mask files
        all_imgs = sorted((self.root / "images").iterdir())
        all_masks = sorted((self.root / "masks").iterdir())

        # Ensure that the number of images and masks are the same
        assert len(all_imgs) == len(all_masks), "The number of images and masks must be the same"        

        # Read and store images and masks as tensors in memory
        for img_path, mask_path in zip(all_imgs, all_masks):
            # Read images and masks as tensors
            image =#### read_image from image path, convert to float and scale to [0,1]
            mask = #### read image from mask path, any entries bigger than 0 map to 1, rest to 0, convert to float

            # Store as tuple
            self.data.append((tv_tensors.Image(image), tv_tensors.Mask(mask)))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        image, mask = self.data[idx]

        # Apply transforms if provided
        if self.transform:
            image, mask = self.transform(image, mask)

        return image, mask

# Define a set of transforms
train_transforms = transforms.Compose([
    #### Add your augmentation transforms here
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Define a set of transforms for validation (without augmentation)
val_transforms = transforms.Compose([
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load dataset
train_dataset = NucleiDataset(root=DATA_PATH / "nuclei_data/train", transform=train_transforms)
val_dataset = NucleiDataset(root=DATA_PATH / "nuclei_data/val", transform=val_transforms)

# create dataloaders with batch size 8
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=8, shuffle=False)


Now setup and train Unet and UnetPlusPlus models with a pretrained resnet50 backbone as we did in the lesson.  Model your code on the code in the "Better Training" part of the notebook.  You should set different learning rates for the encoder and decoder and use OneCycleLR as we did.  We found that 12 epochs of fine-tuning worked reasonably well.

For each model display convergence graphs of the loss and IOU and sample images along with the ground truth and predicted masks.  

Answer the following followup questions:
1. Which model performs better?
2. Use AI to write a short summary of the difference between Unet and Unet++.
3. Report the highest value of the IOU metric on the validation set.  Interpret that value in the context of this problem.  What is it telling you about the predicted masks for the cell nuclei?

Please number your responses.

## Task 2 - Fine-Tune a YOLO v11 model for Pedestrian Detection

YOLO (You Only Look Once) models are a family of object detection models known for their speed and accuracy. Unlike traditional object detection methods that use a sliding window approach, YOLO models frame object detection as a single regression problem, directly predicting bounding boxes and class probabilities from full images in one evaluation. This makes YOLO models extremely fast, making them suitable for real-time applications.

YOLO models consist of a single convolutional network that simultaneously predicts multiple bounding boxes and class probabilities for those boxes. The architecture is divided into several key components:

1. **Backbone**: This is typically a convolutional neural network (CNN) that extracts essential features from the input image.
2. **Neck**: This part of the network aggregates and combines features from different stages of the backbone. It often includes components like Feature Pyramid Networks (FPN) or Path Aggregation Networks (PAN).
3. **Head**: The final part of the network, which predicts the bounding boxes, objectness scores, and class probabilities. It usually consists of convolutional layers that output the final detection results.

YOLO models are quite easy to load and train because they provide pre-trained weights and a straightforward API for customization and fine-tuning.  The hardest part may be preparing the data in the format that the API expects, but we've done that for you.  

You'll need to install two packages.  Copy this into a code cell and run it once on each server you use:
```python
!pip install ultralytics torchmetrics
```

Run the cell below once to prepare the Penn Fudan Pedestrian dataset in YOLO format.  This dataset uses the same splits we used in the lesson to allow you to compare the results to the Faster R-CNN model we trained there.

In [None]:
# only need to run this once per platform, but it's safe to run multiple times
prepare_penn_fudan_yolo(DATA_PATH)

# the dataset will be here:
dataset_path = DATA_PATH / "PennFudanPedYOLO"

# you may wish to set an output path for the model
output_path = MODELS_PATH / "PennFudanPedYOLO"

# the YAML file for the dataset is here:
yaml_path = dataset_path / "dataset.yaml"

Visit the ultralytics website to learn about YOLO11.  You can watch short video to learn more about it.  Sample code is provided to show you how to load and train a model (use 'yolo11s.pt').  Pass `project=output_path` to `model.train` to store the output in your models directory.   After training you might want to look at some of the images created in that directory.

You can run the following cell to show selected images and boxes from the validation set.  You can replace `indices=selected_indices` with `num_samples=3` to display 3 randomly selected images.  The selected images we chose should align with the images we showed in the lesson.

In [None]:
selected_indices = [28,29,33]
display_yolo_predictions(yaml_path, model, indices=selected_indices, show_confidence=True, conf=0.5)

Answer the followup questions:

1.  Find and plot an image with a false positive box in the validation data.
2. How is the process of fine-tuning the YOLO model different than for the Faster R-CNN model in the lesson?  Is it easier or harder?  Why?
3.  What did you get for map50 and map50:95 on the validation data with your YOLO model?
4.  How do those values compare to values in the lesson?
5.  How do the predicted boxes compare qualitatively to the boxes predicted by Faster R-CNN in the lesson?  Do they align better or worse with the ground truth boxes.
6.  Thoroughly explain what your map50 value tells you about the performance of your YOLO model at detecting pedestrians in the validation data.

Please number your responses.