## CAS Deep Learning - Computer Vision mit Deep Learning (Part 1)

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

try:
    import jupyter_black

    jupyter_black.load()
except:
    print("black not installed")

# Image Classification - Project

## Learning Goals

- Learn how to model an image classification task
- Learn how to systematically implement data prep, model, training, and evaliation
- Learn how to check and verify the implementation
- Learn how to incorporate boilerplate code from [torchvision](https://pytorch.org/vision/0.9/index.html) and  [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/)

## Setup

We setup our environment and data save / load paths.

In [None]:
import os
from pathlib import Path

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image
import torch
from tqdm.notebook import tqdm

Mount your google drive to store data and results.

In [None]:
try:
    import google.colab

    IN_COLAB = True
except:
    IN_COLAB = False

print(f"In colab: {IN_COLAB}")

In [None]:
if IN_COLAB:
    from google.colab import drive

    drive.mount("/content/drive")

Modify the following paths if necessary.

In [None]:
if IN_COLAB:
    DATA_PATH = Path("/content/drive/MyDrive/cas-dl-module-compvis-part1")
else:
    DATA_PATH = Path("../data")

Install packages not in base Colab environment and required by you.

In [None]:
if IN_COLAB:
    os.system("pip install torchshow torchinfo pytorch-lightning")

**Change the following parameter to false**: `DEV_RUNS=False`

Otherwise model training code will only run a small number of times.

In [None]:
# set the following parameter to False, otherwise model training is only conducted for a small number of steps to test the code
DEV_RUNS = 5

## Project Selection

Choose one of the following projects to work on. 

Choose Cats vs Dogs if you want to mainly click through the template (some modificaitons are required!). All other datasets require more substantial adaptations.


### Cats vs Dogs

**Goal**: Develop a model to classify images of cats and dogs. The dataset is designed to facilitate the identification of these animals from images.

**Approach**: Create a Convolutional Neural Network (CNN) to classify the images into two categories: cats and dogs. Experiment with various CNN architectures and techniques to determine the most effective method. Use data augmentation techniques to handle variations in pose, lighting, and background. Ensure the model generalizes well by using cross-validation and monitoring for overfitting.

**Dataset**: The dataset contains 25,000 images, with approximately 12,500 images per class (cats and dogs). Each image varies in size and resolution. The data is provided by Microsoft as part of their Kaggle competition.

[Source](https://www.microsoft.com/en-us/download/details.aspx?id=54765)

![Dog](dog.jpg)
![Cat](cat.jpg)


### Concrete Crack Detection

**Goal**: Develop a model to classify concrete images as having cracks or not. The dataset is designed to facilitate the identification of structural issues in concrete buildings.

**Approach**: Create a Convolutional Neural Network (CNN) to classify the images into negative (no crack) and positive (crack) categories. Experiment with various CNN architectures and techniques to determine the most effective method. Use image processing techniques to handle variations in surface finish and illumination. Ensure the model generalizes well by using cross-validation and monitoring for overfitting.

**Dataset**: The dataset contains 40,000 images, with 20,000 images per class (negative and positive). Each image is 227 x 227 pixels with RGB channels. The data is collected from 458 high-resolution images (4032 x 3024 pixels) from various METU Campus Buildings. No data augmentation such as random rotation or flipping is applied.

[Source](https://data.mendeley.com/datasets/5y9wdsg2zt/2)

![Crack](crack_example.jpg)
![No Crack](crack_negative.jpg)


### Scene Classification

**Goal**: Develop a model to classify natural scene images into one of six categories. The dataset aims to facilitate the recognition of various natural scenes from around the world.

**Approach**: Design a Convolutional Neural Network (CNN) to classify images into six categories: buildings, forest, glacier, mountain, sea, and street. Test different CNN architectures to find the best performing model. Apply data augmentation techniques to improve generalization. Separate the data into training, testing, and prediction sets to evaluate model performance effectively.

**Dataset**: The dataset contains around 25,000 images of size 150 x 150 pixels, distributed across six categories. The data is separated into training (14,000 images), testing (3,000 images), and prediction (7,000 images) sets.

[Source](https://www.kaggle.com/datasets/puneet6060/intel-image-classification?resource=download)


![Builings](natural_scenes_buildings.jpg)
![Forest](natural_scenes_forest.jpg)
![Glacier](natural_scenes_glacier.jpg)

### Satellite Land Cover Classification

**Goal**: Develop a model to classify satellite images into different land cover types. The dataset contains images of 10 different classes and aims to support land use and land cover classification tasks.

**Approach**: Develop Convolutional Neural Networks (CNNs) to model the satellite image data. Experiment with different CNN architectures to identify the best performing model. Compare pre-trained models with those trained from scratch. Use data augmentation techniques to enhance model generalization. Given the relatively small dataset, pay attention to overfitting and compare models robustly.

**Dataset**: The dataset consists of 27,000 RGB images categorized into 10 classes. The dataset is available in two formats: one in RGB and another with 13 spectral bands. Use the RGB dataset for this project.

[Source](https://github.com/phelber/eurosat)

![Crop](sat_crop.jpg)
![Forest](sat_forest.jpg)
![Highway](sat_highway.jpg)


### Choose your own dataset!

Feel free to choose your own dataset.

# Overall Approach

Inspired by [A Recipe for Training Neural Networks by Andrej Karpathy](https://karpathy.github.io/2019/04/25/recipe/)

For your chosen dataset. Do the following:


## 1) Data Preparation & Data Inspection

- Download the data
- Inspect the data formats
- Build a `torch.utils.data.Dataset`
    - define training, validation and test sets
- Implement a `torch.utils.data.DataLoader'
- Inspect the data:
    - Look at samples
    - Inspect the label distribution

## 2) Baselines

- Implement a small CNN
- Learn input-independent baseline (provide only labels but random noise as input)
- Overfitt CNN on one batch
- Inspect pre-processing
  
## 3) (Over)fit
- Build a large(er) architecture (pre-trained or self-implemented)
- Train a high-performing model with respect to training set
  
## 4) Regularize
- Is it beneficial to collect more data?
- Data Augmentation
- Early Stopping on Validation Set
- Weight Decay

## 5) Hyper-Parameter Tuning
- Define HPs and parameterise architecture
- do grid- or random search over HP grids

## 6) Squeeze out the juice
-  Ensembling
-  Longer training
-  Special techniques: AdamW optimizer, fancy data augmentation, label smoothing, stochastic depth

# Step 1 - Data Preparation & Data Inspection

**_This step is critical. I like to spend copious amount of time (measured in units of hours) scanning through thousands of examples, understanding their distribution and looking for patterns. (A Karpathy)_**

- Download the data
- Inspect the data formats and file organization
- Remove corrupt data
- Build a [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)
    - define training, validation and test sets
- Implement a [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)
- Inspect the data:
    - Look at samples
    - Inspect the label distribution


Important information about how to define a dataset can be found here: [https://pytorch.org/docs/stable/data.htm](https://pytorch.org/docs/stable/data.htm)

## Download the Data

The following code snippets help you get startet with the data. It may take a while to download though...

Load the functions and jump to the dataset that you have chosen!

In [None]:
from pathlib import Path


def download_and_extract_zip(url: str, save_path: Path, extract_path: Path):
    """
    Downloads a ZIP file from a given URL and extracts its contents to a specified directory.

    Args:
        url (str): The URL of the ZIP file to download.
        save_path (Path): The path where the downloaded ZIP file will be saved.
        extract_path (Path): The directory where the ZIP file will be extracted.
    """
    import os
    import requests
    import zipfile

    # Make sure the directory exists
    os.makedirs(os.path.dirname(save_path), exist_ok=True)

    if not save_path.exists():
        # Download the file
        response = requests.get(url, stream=True)
        with open(save_path, "wb") as file:
            for chunk in response.iter_content(chunk_size=8192):
                _ = file.write(chunk)

        print(f"File downloaded and saved to {save_path}")

    if not extract_path.exists():
        # Unzip the file
        with zipfile.ZipFile(save_path, "r") as zip_ref:
            zip_ref.extractall(extract_path)

        print(f"File extracted to {extract_path}")


def download_from_gdrive_and_extract_zip(
    file_id: str, save_path: Path, extract_path: Path
):
    """
    Downloads a ZIP file from Google Drive using its file ID and extracts its contents to a specified directory.

    Args:
        file_id (str): The Google Drive file ID of the ZIP file to download.
        save_path (Path): The path where the downloaded ZIP file will be saved.
        extract_path (Path): The directory where the ZIP file will be extracted.
    """
    import os
    import gdown
    import zipfile

    url = f"https://drive.google.com/uc?id={file_id}"
    if not save_path.exists():
        gdown.download(url, str(save_path), quiet=False)
        print(f"File downloaded and saved to {save_path}")

    if not extract_path.exists():
        # Unzip the file
        with zipfile.ZipFile(save_path, "r") as zip_ref:
            zip_ref.extractall(extract_path)

        print(f"File extracted to {extract_path}")


def delete_bad_file(file_path: Path):
    """
    Deletes a specified file if it exists.

    Args:
        file_path (Path): The path of the file to be deleted.
    """
    import os

    # Check if file exists before trying to delete it
    if os.path.exists(file_path):
        os.remove(file_path)
        print(f"{file_path} has been deleted")
    else:
        print(f"{file_path} does not exist")

Cats vs Dogs

In [None]:
download_and_extract_zip(
    url="https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip",
    save_path=DATA_PATH.joinpath("cats_vs_dogs.zip"),
    extract_path=DATA_PATH.joinpath("cats_vs_dogs/"),
)

bad_files = [
    DATA_PATH.joinpath("cats_vs_dogs") / "PetImages" / "Cat" / "666.jpg",
    DATA_PATH.joinpath("cats_vs_dogs") / "PetImages" / "Dog" / "11702.jpg",
]


for bad_file in bad_files:
    delete_bad_file(bad_file)

Case you chose EuroSat Data:

In [None]:
download_and_extract_zip(
    url="https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip?download=1",
    save_path=DATA_PATH.joinpath("EuroSAT_RGB.zip"),
    extract_path=DATA_PATH.joinpath("EuroSAT_RGB/"),
)

Concrete Data:

In [None]:
download_and_extract_zip(
    url="https://prod-dcd-datasets-cache-zipfiles.s3.eu-west-1.amazonaws.com/5y9wdsg2zt-2.zip",
    save_path=DATA_PATH.joinpath("concrete.zip"),
    extract_path=DATA_PATH.joinpath("concrete/"),
)

Scene classification

In [None]:
download_from_gdrive_and_extract_zip(
    file_id="1Bx3R56VBONS-x91wCDU6KX3xqPoJoH9P",
    save_path=DATA_PATH / "scene_classification.zip",
    extract_path=DATA_PATH.joinpath("scene_classification/"),
)

If you have your own dataset it you could organize the images into class specific folders.

## Inspect Data Format & Organization, Build Dataset and Loader

We need to figure out how the data is organized. Particularly, how the data is labelled, to correctly define it with a [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset).

First you should look at the data / folder structure of the downloaded data.

Once you have figured out how the data is organized we can build a [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset). A [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) allows for iterating over your dataset, returning tuples of images and labels at each iteration.

**Note: To evaluate and select models in a later stage, we already create a training, validation and a test dataset.**

Adapt the following code if necessary:

In [None]:
from typing import Callable, Tuple, List

from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset, Subset


def load_image_paths_and_labels(image_dir: str) -> Tuple[List[str], List[str]]:
    """
    Load image paths and corresponding labels.

    Args:
        image_dir: Directory with all the images.

    Returns:
        A tuple of (image paths, labels).
    """
    image_paths = []
    labels = []
    classes = os.listdir(image_dir)
    image_extensions = {".jpg", ".jpeg", ".png", ".bmp", ".gif"}

    for label in classes:
        class_dir = os.path.join(image_dir, label)
        for img_name in os.listdir(class_dir):
            img_path = os.path.join(class_dir, img_name)
            if any(img_path.lower().endswith(ext) for ext in image_extensions):
                image_paths.append(img_path)
                labels.append(label)

    return image_paths, labels


def create_train_test_split(
    image_paths: List[str],
    labels: List[str],
    test_size: float = 0.2,
    random_state: int = None,
) -> Tuple[List[str], List[str], List[str], List[str]]:
    """
    Create stratified train and test splits.

    Args:
        image_paths: List of image paths.
        labels: List of labels.
        test_size: The proportion of the dataset to include in the test split.
        random_state: Controls the shuffling applied to the data before applying the split.

    Returns:
        train_image_paths, test_image_paths, train_labels, test_labels
    """
    train_image_paths, test_image_paths, train_labels, test_labels = train_test_split(
        image_paths,
        labels,
        stratify=labels,
        test_size=test_size,
        random_state=random_state,
    )

    return train_image_paths, test_image_paths, train_labels, test_labels


class ImageDataset(Dataset):

    def __init__(
        self,
        image_paths: List[str],
        labels: List[str],
        transform: Callable | None = None,
        classes: List[str] = None,
    ):
        """
        Args:
            image_paths: List of image paths.
            labels: List of labels.
            transform: Optional transform to be applied on a sample.
            classes: List of class names.
        """
        self.image_paths = image_paths
        self.labels = labels
        self.transform = transform
        self.classes = classes if classes is not None else sorted(set(labels))

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx: int):
        """
        Args:
            idx: Index

        Returns:
            tuple: (image, label) where label is the image classification.
        """
        try:
            image_path = self.image_paths[idx]
            image = Image.open(image_path).convert("RGB")
            label = self.labels[idx]
            label_num = self.classes.index(label)

            if self.transform:
                image = self.transform(image)
            return image, label_num
        except Exception as e:
            print(f"Error loading image at index {idx}: {e}")
            return None


# Specify your image root path
image_root_path = DATA_PATH.joinpath("cats_vs_dogs/PetImages")
image_paths, labels = load_image_paths_and_labels(image_root_path)

# Create Train, Validation and Test Splits
train_image_paths, test_image_paths, train_labels, test_labels = (
    create_train_test_split(image_paths, labels, test_size=0.2, random_state=123)
)
train_image_paths, validation_image_paths, train_labels, validation_labels = (
    create_train_test_split(
        train_image_paths, train_labels, test_size=0.1, random_state=123
    )
)

# Specify transformations
train_transform = None
test_transform = None
validation_transform = None

ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
ds_validation = ImageDataset(
    validation_image_paths, validation_labels, transform=validation_transform
)
ds_test = ImageDataset(test_image_paths, test_labels, transform=test_transform)

**Question**: What is the role of: `label_num = self.classes.index(label)`?

**Question**: Why do we (often)  need a training, validation and a testset?

**Question**: Why do we need transformations. And why do we need different ones for train and test?

YOUR ANSWER HERE

Now we test the `Dataset` object by getting and visualising a sample.

In [None]:
import torchshow as ts

image, label = ds_train[0]
ts.show(image)

For model training we need to batch examples. Thats why we need to define a [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader).

We also need to convert the images to [Tensors](https://pytorch.org/docs/stable/tensors.html). We can use the `transform` parameter of our `ImageDataset` class to specify transformations using [torchvision.transforms](https://pytorch.org/vision/0.9/transforms.html).

In [None]:
from torch.utils.data import DataLoader
from torchvision import transforms

# Set random seed for reproducibility
torch.manual_seed(123)

# Define a simple transformation
train_transform = transforms.Compose(
    [
        transforms.Resize((64, 64)),
        transforms.ToTensor(),
    ]
)

# Create the dataset from the training data and a dataloader
ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
dataloader_train = DataLoader(ds_train, batch_size=16, shuffle=True)

images, labels = next(iter(dataloader_train))

ts.show(images)
labels

**Question**: What does `shuffle=True` achieve? Why is it recommended?

**Question**: Why do we use `torch.manual_seed(123)`? What does it do in this example?

YOUR ANSWER HERE

## Inspect the Data

Now you can use the `ImageDataset` or `DataLoader` objects to insepct the dataset. 

- **Initial Step**: Avoid touching neural net code initially; focus on inspecting the data thoroughly.
- **Time Investment**: Spend hours scanning thousands of examples to understand their distribution and look for patterns.
- **Identify Issues**: Look for duplicate examples, corrupted images/labels, data imbalances, and biases.
- **Classify Process**: Pay attention to how you classify the data to inform the architecture exploration.
- **Feature Analysis**: Determine if local features or global context is needed.
- **Variation Analysis**: Assess the variation in the data, identify spurious variations for preprocessing.
- **Spatial Consideration**: Evaluate if spatial position matters or if averaging it out is beneficial.
- **Detail and Downsampling**: Consider the importance of detail and the feasibility of downsampling images.
- **Label Noise**: Assess the noise level in the labels.
- **Understand Predictions**: Use network (mis)predictions to understand inconsistencies and data issues (at a later stage!).
- **Quantitative Analysis**: Write simple code to search, filter, and sort data by various attributes.
- **Visualize Distributions**: Visualize distributions and outliers to uncover bugs in data quality or preprocessing.

For now do at least the following:
- what is the class distribution? Use, for example: [numpy.unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html).
- how difficult do you think is the problem?
- are there any obvious issues with the data?
- do the labels seem accurate?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Step 2 - Implement Baselines

In this step we want to implement a training pipeline and evaluate simple baselines to get a feeling for the problem and to test and verify if the pipeline works.

**Reproducibility**

- Fix random seed: Always use a fixed random seed to ensure consistent outcomes in repeated runs.

**Simplification and Initialization**

- Simplify: Disable unnecessary features like data augmentation initially.
- Verify loss at initialization: Ensure loss starts at the expected value.

**Baselines and Metrics**

- Human baseline: Compare model metrics to human-interpretable metrics (e.g., accuracy).
- Input-independent baseline: Train a baseline model with zeroed inputs and compare it to a variant with normal data. There should be a clear difference!

**Overfitting and Visualization**

- Overfit one batch: Overfit a single batch to verify the model can reach the minimum loss.
- Verify decreasing training loss: Ensure training loss decreases when model capacity increases.
- Visualize before the net: Visualize data immediately before feeding it to the network to catch preprocessing issues.
- Visualize prediction dynamics: Track model predictions on a fixed test batch during training to understand training progression.

**Evaluation**

- Add significant digits to your eval: Evaluate on the entire test set for accuracy.
- Visualize: Visualize model inputs and outputs to ensure correctness.

**Additional Tips**

- Verify simplifications: Simplify initial setup by turning off data augmentation and complex features to reduce bugs.

We will address some of the steps above. Feel free to do more!

## Reproducibility

Pytorch lightning provides a function to set random seeds of different modules:

In [None]:
import pytorch_lightning as ptl

ptl.seed_everything(123)

## Simple DataLoader

Implement a simple dataloader without fancy transformations.

In [None]:
from torch.utils.data import DataLoader
from torchvision import transforms

# Set random seed for reproducibility
torch.manual_seed(123)

# Define a simple transformation
train_transform = transforms.Compose(
    [
        transforms.Resize((64, 64)),
        transforms.ToTensor(),
    ]
)

# Create the dataset from the training data and a dataloader
ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
dataloader_train = DataLoader(ds_train, batch_size=16, shuffle=True)

images, labels = next(iter(dataloader_train))

## Simple Model

Start with a simple model that is (most likely) correct and should be able to learn something (quickly).

For example you could implement the following architecture.

- Input Shape: (3, **height**, **width**)
- Convolution: 16 Filters, Kernel-Size 5x5
- Pooling: Stride 2, Kernel-Size 2
- Convolution: 32 Filter, Kernel-Size 5x5
- Global Average Pooling
- FC: 2 neurons (**number of classes**)

Use `ReLU` activation after each convolution.

Define a class which inherits from `torch.nn.Module`.


In [None]:
import torchinfo
import torch.nn as nn
import torch.nn.functional as F


class SmallCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, (5, 5))
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.global_avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(32, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.global_avg_pool(x)
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        # YOUR CODE HERE
        raise NotImplementedError()
        return x


net = SmallCNN()

print(net)
print(torchinfo.summary(net, input_size=(1, 3, 64, 64)))

**Question**: Briefly explain what happens with a data point during the forward pass.

YOUR ANSWER HERE

## Define a training Loop

We use Pytorch-Lightning which greatly simplifys implementing boilerplate code such as  training loops.

Tutorial here: https://lightning.ai/pages/community/tutorial/step-by-step-walk-through-of-pytorch-lightning/

We also include additional metrics from [torchmetrics](https://lightning.ai/docs/torchmetrics/stable/) to easily log and calculate accuracy.  Adapt `task="binary"` if necessary!

In [None]:
import pytorch_lightning as pl
import torchmetrics


class Classifier(pl.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model
        self.loss_fn = nn.CrossEntropyLoss()
        self.accuracy = torchmetrics.Accuracy(task="binary")

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = self.loss_fn(logits, y)
        preds = torch.argmax(logits, dim=1)

        # Update accuracy metric
        acc = self.accuracy(preds, y)
        self.log("train_loss", loss, prog_bar=True)
        self.log("train_acc", acc, prog_bar=True)

        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.001)

Change the following parameters accoring to your hardware. As you can see, this simplifies hardware switches greatly!

We want to perform a functional check only. Train the model only for 10 steps.

In [None]:
trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    fast_dev_run=DEV_RUNS,
    max_steps=10,
    enable_checkpointing=False,
    logger=False,
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
)

net = SmallCNN()
model = Classifier(net)
trainer.fit(model, train_dataloaders=dataloader_train)

In [None]:
print(f"Metrics:  {trainer.logged_metrics}")

**Question**: Does the previous code display a useful information? If so, what does it say and why?

**Question**: What is the loss at initialization? Does the value make sense?

YOUR ANSWER HERE

Now we train the model for longer to get a sense of the performance. Adjust the following code accordingly (increase the number of steps the model is training for).

In [None]:
trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    max_steps=1,
    enable_checkpointing=False,
    logger=False,
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
)

net = SmallCNN()
model = Classifier(net)
trainer.fit(model, train_dataloaders=dataloader_train)

In [None]:
print(f"Metrics:  {trainer.logged_metrics}")

## Learn Input independent Model


Modify the `Dataset` class such that random images, e.g. white noise, is returned. The label remains unchanged. Then train a model.

**Question**: What kind of loss do you expect if the model works?

YOUR ANSWER HERE

In [None]:
class ImageDatasetRandom(ImageDataset):

    def __getitem__(self, idx: int):
        """
        Args:
            idx: Index

        Returns:
            tuple: (image, label) where label is the image classification.
        """
        try:
            image_path = self.image_paths[idx]
            original_image = np.array(Image.open(image_path).convert("RGB"))
            image_shape = original_image.shape
            random_image = Image.fromarray(
                np.random.randint(0, 256, image_shape, dtype=np.uint8)
            )

            label = self.labels[idx]
            label_num = self.classes.index(label)

            if self.transform:
                random_image = self.transform(random_image)
            return random_image, label_num
        except Exception as e:
            print(f"Error loading image at index {idx}: {e}")
            return None


# Create the dataset and dataloader
ds_train_random = ImageDatasetRandom(
    train_image_paths, train_labels, transform=train_transform
)
dataloader_random = DataLoader(ds_train_random, batch_size=64, shuffle=True)

Verify your work!

In [None]:
image_random, label = ds_train_random[0]
ts.show(image_random)

Now train your model.

In [None]:
trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    fast_dev_run=DEV_RUNS,
    max_steps=100,
    enable_checkpointing=False,
    logger=False,
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
)

net = SmallCNN()
model = Classifier(net)
trainer.fit(model, train_dataloaders=dataloader_random)

In [None]:
print(f"Metrics:  {trainer.logged_metrics}")

**Question:** Were your expectations met? If not, why?

YOUR ANSWER HERE

### Overfit on one Batch of Data

**Question**: What do you expect?

YOUR ANSWER HERE

In [None]:
trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    fast_dev_run=DEV_RUNS,
    max_steps=100,
    enable_checkpointing=False,
    logger=False,
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
    # this option limits the training set to one batch, disables shuffel
    overfit_batches=1.0,
)

net = SmallCNN()
model = Classifier(net)

ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
dataloader_train = DataLoader(ds_train, batch_size=32, shuffle=False)
trainer.fit(model, train_dataloaders=dataloader_train)

In [None]:
print(f"Metrics:  {trainer.logged_metrics}")

**Question**: Did it work? Careful, there might be a bug!

YOUR ANSWER HERE

# Step 3 - (Over)Fit

In this step we try to drive the trainings-loss as low as possible.

**Model Selection and Initialization**

- Pick a proven model: Start with a simple, well-established architecture (e.g., ResNet-50 for image classification) rather than creating complex, custom models.
- Use Adam optimizer: Begin with Adam and a learning rate of 3e-4 for its forgiving nature with hyperparameters (or the PyTorch default value).

**Gradual Complexity**

- Add complexity incrementally: Integrate multiple signals or features into your classifier one at a time, ensuring each addition improves performance.

**Learning Rate Management**

- Avoid default learning rate decay: Be cautious with repurposed code and learning rate decay schedules. Initially, disable learning rate decay and maintain a constant learning rate, tuning it later in the project.
    

You can do the following:
- implement your own model
- use a pre-defined model
- use a pre-trained model

## Pre-Trained Model

In the following we will use a pre-trained model and adapt it to our dataset (transfer-learning).

### Load Model

Here we use a pre-trained model.  Read the doc here: [https://pytorch.org/vision/0.8/models.html](https://pytorch.org/vision/0.8/models.html).)

**It is important to read how the data is pre-processed for a given pre-trained model. This should be consistent with how you pre-process the data.**


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Now we adapt the output layer to match our dataset.

In [None]:
net.fc = nn.Sequential(nn.Linear(512, 2))

We can now train the model. 

We also use a `logger` object to log the training process.

Again: Adjust the parameters of the trainer class to your liking.

In [None]:
from pytorch_lightning.loggers import TensorBoardLogger

logger = TensorBoardLogger(
    DATA_PATH.joinpath("lightning_logs"), name="overfit_baseline1"
)

trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    max_steps=100,
    fast_dev_run=DEV_RUNS,
    enable_checkpointing=False,
    logger=logger,
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
)

model = Classifier(net)

ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
dataloader_train = DataLoader(ds_train, batch_size=64, shuffle=False)
trainer.fit(model, train_dataloaders=dataloader_train)

In [None]:
print(f"Metrics:  {trainer.logged_metrics}")

View the tensorboard logs. This may not work in a container without opening tensorboard ports.

(You would need to add the following options to docke run `-p 6006-6015:6006-6015`)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir={DATA_PATH.joinpath("lightning_logs")} --host localhost

**Question**: What can you observe? Describe what you see and propose any changes to the trainer class if you see opportunities.

YOUR ANSWER HERE

### Feel Free to try a larger model

Use a larger model and observe the performance.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Step 4 - Regularization

Regularization is a process to deliberately limit a model's capacity in order to reduce overfitting and to improve generalization.

**Data Collection and Augmentation**

- Get more data: Collect additional real training data for the most effective regularization.
- Data augmentation: Use more aggressive data augmentation techniques.
- Creative augmentation: Explore simulation, hybrid methods, or GANs to expand datasets.

**Model Initialization and Size**

- Pretrain: Utilize pretrained networks when possible.
- Smaller input dimensionality: Remove features with spurious signals and reduce image size if low-level details are not critical.
- Smaller model size: Use domain knowledge to constrain and reduce the size of the network.

**Regularization Techniques**

- Decrease batch size: Smaller batch sizes can act as stronger regularizers due to batch normalization effects.
- Add dropout: Use dropout (including dropout2d for ConvNets) sparingly.
- Weight decay: Increase the weight decay penalty.
- Early stopping: Stop training based on validation loss to avoid overfitting.

**Model Complexity**

- Try a larger model: Consider larger models for potentially better early-stopped performance, despite higher risk of eventual overfitting.

  

You can try the following techniques:

- Weight Decay
- Data Augmentation
- Early Stopping on Validation Set


## Weight Decay

Weight decay is a technique to reduce model complexity by adding a penalty to the magnitude of the weights. It can be implemented by decaying the weights towards 0 after each gradient descent step. 

Read the following documentation and add Weight Decay to your model: [torch.optim.Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam)

It is implemented in the optimizer.

Make it configurable.

In [None]:
import pytorch_lightning as pl
import torchmetrics


class Classifier(pl.LightningModule):
    def __init__(self, model, weight_decay=0):
        super().__init__()
        self.model = model
        self.loss_fn = nn.CrossEntropyLoss()
        self.accuracy = torchmetrics.Accuracy(task="binary")
        self.weight_decay = weight_decay

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = self.loss_fn(logits, y)
        preds = torch.argmax(logits, dim=1)

        # Update accuracy metric
        acc = self.accuracy(preds, y)
        self.log("train_loss", loss, prog_bar=True)
        self.log("train_acc", acc, prog_bar=True)

        return loss

    def configure_optimizers(self):
        # YOUR CODE HERE
        raise NotImplementedError()

## Data Augmentation

Data augmentation is the process of applying random transformations to the input data before it is processed by the model. This increases the robustness of the model and improves its generalization capabilities.

In [None]:
import torchvision.transforms as transforms
from pytorch_lightning.loggers import TensorBoardLogger


train_transform = transforms.Compose(
    [
        transforms.RandomResizedCrop((128, 128)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
)

val_transform = transforms.Compose(
    [
        transforms.Resize((128, 128)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
)

# Example usage with ImageDataset class
ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
dataloader_train = DataLoader(ds_train, batch_size=64, shuffle=True)

# Create model instance
model = Classifier(net)

# Create a logger
logger = TensorBoardLogger(
    DATA_PATH.joinpath("lightning_logs"), name="data_augmentation"
)

# Create trainer
trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    max_steps=100,
    fast_dev_run=DEV_RUNS,
    enable_checkpointing=False,
    logger=logger,
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
)

# Train the model
trainer.fit(model, train_dataloaders=dataloader_train)

## Early Stopping

Early stopping monitors the training process on a separate validation set to determine the optimal point regarding when to stop training (when validation loss / metric is at the best level).

Pytorch-lightning provides such functionality out-of-the-box: [pytorch_lightning.callbacks.early_stopping.EarlyStopping](https://lightning.ai/docs/pytorch/stable/common/early_stopping.html)

**Make sure to let the model run enough steps such that early stopping is actually stopping the training!**

Implement a metric which early stopping should monitor. It should be one calculated on the validation set.


Inspect the `Trainer` class and set more appropriate values  (e.g. `val_check_interval` and `max_steps`)

In [None]:
from pytorch_lightning.callbacks.early_stopping import EarlyStopping


class Classifier(pl.LightningModule):
    def __init__(self, model, weight_decay=0):
        super().__init__()
        self.model = model
        self.loss_fn = nn.CrossEntropyLoss()
        self.train_accuracy = torchmetrics.Accuracy(task="binary")
        self.validation_accuracy = torchmetrics.Accuracy(task="binary")
        self.test_accuracy = torchmetrics.Accuracy(task="binary")
        self.weight_decay = weight_decay

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = self.loss_fn(logits, y)
        preds = torch.argmax(logits, dim=1)

        # Update accuracy metric
        acc = self.train_accuracy(preds, y)
        self.log("train_loss", loss, prog_bar=True)
        self.log("train_acc", acc, prog_bar=True)

        return loss

    def validation_step(self, batch, batch_idx):
        # YOUR CODE HERE
        raise NotImplementedError()

    def test_step(self, batch, batch_idx):
        # YOUR CODE HERE
        raise NotImplementedError()

    def configure_optimizers(self):
        return torch.optim.Adam(
            self.parameters(), lr=0.001, weight_decay=self.weight_decay
        )


# Define early stopping callback
early_stopping = EarlyStopping(
    monitor="validation_acc", min_delta=0.00, patience=3, mode="max", verbose=True
)

# Create model instance
model = Classifier(net, weight_decay=1e-4)

# Datasets
train_transform = transforms.Compose(
    [
        transforms.RandomResizedCrop((128, 128)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
)

val_transform = transforms.Compose(
    [
        transforms.Resize((128, 128)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
)


ds_train = ImageDataset(train_image_paths, train_labels, transform=train_transform)
ds_val = ImageDataset(
    validation_image_paths, validation_labels, transform=val_transform
)

dataloader_train = DataLoader(ds_train, batch_size=64, shuffle=True)
dataloader_val = DataLoader(ds_val, batch_size=64, shuffle=False)

# Create a logger
logger = TensorBoardLogger(DATA_PATH.joinpath("lightning_logs"), name="early_stopping")

# Create trainer
trainer = pl.Trainer(
    devices=1,
    accelerator="cpu",
    precision="32",
    max_steps=100,
    fast_dev_run=DEV_RUNS,
    enable_checkpointing=False,
    logger=logger,
    callbacks=[early_stopping],  # Add the early stopping callback here
    default_root_dir=DATA_PATH.joinpath("lightning_logs"),
)


# Train the model
trainer.fit(model, train_dataloaders=dataloader_train, val_dataloaders=dataloader_val)

**Question**: Compare Training metrics with validation metrics. What do you observe?

YOUR ANSWER HERE

# Step 5 - Hyper-Parameter Optimization

To optimize hyper parameters we need to consider the following:
- paramaterize training process (architecture and pre-processing)
- experiment tracking software
- evaluation procedures (such as cross-validation for smaller datasets)


**Hyper-Parameter Tuning can be time consuming!**

Ideally one uses special libraries such as [RayTune](https://docs.ray.io/en/latest/tune/index.html).

Here you need to use a validation set or need to perform cross-validaton. Experiment tracking software such as tensorboard or weights & biases are highly recommended.

You can implement a hyper-opt loop if you like. You could test different `weight_decay` values, different model architectures, or different data augmentation techniques.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Step 6 - Squeeze out the Juice!

You can try the following techniques to get even further:

- advanced data augmentation. For example: https://pytorch.org/vision/main/auto_examples/transforms/plot_cutmix_mixup.html#sphx-glr-auto-examples-transforms-plot-cutmix-mixup-py
- model ensembling. Train multiple models and combine their predictions.
- advanced techniques: AdamW Optimizer, Stochastic Depth Regularization

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Evaluate your model

We may want to evaluate our model in more detail. In particular we want to know where the model works well and where it fails. This might give us additional insight in the data and the difficulties.

In [None]:
# Prepare data loaders
ds_test = ImageDataset(test_image_paths, test_labels, transform=val_transform)
dataloader_test = DataLoader(ds_test, batch_size=64, shuffle=False)

trainer.test(model, dataloaders=dataloader_test)

### Confusion-Matrix

Plotten Sie eine _confusion matrix_. Benutzen Sie 

- [confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)
- [ConfusionMatrixDisplay](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Question:** Which classes are confused how?