<h1><center>Laboratory work 5.</center></h1>
<h2><center>PyTorch Custom Datasets Exercises</center></h2>

**Completed:** Last name and First name

**Variant:** #__

<a class="anchor" id="5"></a>

## Content

1. [Task 1. Preparing data](#5.1)
2. [Task 2. Creating a model](#5.2)
3. [Task 3. Training and testing loops](#5.3)
4. [Task 4. Conducting experiments with hyperparameters](#5.4)
5. [Task 5. Conducting experiments with the model's layers](#5.5)
6. [Task 6. Making predictions](#5.6)

In [None]:
# Import torch
import torch
from torch import nn

# Exercises require PyTorch > 1.10.0
print(torch.__version__)

# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

For all tasks below provided, utilize our custom Pizza Steak Sushi dataset from the [GitHub repository](https://github.com/radiukpavlo/conducting-experiments/blob/main/data/pizza_steak_sushi.zip).

<a class="anchor" id="5.1"></a>

## <span style="color:red; font-size:1.5em;">Task 1. Preparing data</span>

[Go back to the content](#5)

**Variant 1:** Implement a custom `Dataset` class that loads the food images and applies a standard `ToTensor` transform. Additionally, modify the class to return the image's filename along with the image tensor and its label. Evaluate how this change affects the `DataLoader`'s output structure.

*Technical note:* Modify the `__getitem__` method to return `(image_tensor, label, image_path)`. Ensure the `collate_fn` (if customized) can handle this tuple structure. Libraries: `torch`, `torch.utils.data`, `PIL`, `os`. Metrics: Observe batch structure.

---
**Variant 2:** Create two separate `torchvision.transforms.Compose` pipelines: one for training with basic data augmentation (Random Horizontal Flip, Random Rotation 15 degrees) and one for testing (only Resize and ToTensor). Apply these to the respective `ImageFolder` datasets for the food classification task.

*Technical note:* Use `transforms.RandomHorizontalFlip()`, `transforms.RandomRotation(15)`, `transforms.Resize((64, 64))`, `transforms.ToTensor()`. Apply the training transform to the training `ImageFolder` and the testing transform to the testing `ImageFolder`. Libraries: `torchvision`.

---
**Variant 3:** Calculate the mean and standard deviation of the pixel values across the *entire training set* of the food dataset. Implement a custom transform using `transforms.Normalize` with these calculated values and apply it after `transforms.ToTensor` for both training and testing datasets.

*Technical note:* Iterate through the training dataset once to compute mean/std per channel. Use these values in `transforms.Normalize(mean=[...], std=[...])`. Ensure normalization is applied consistently. Libraries: `torch`, `torchvision`, `numpy`.

---
**Variant 4:** Implement a custom `Dataset` class that incorporates *on-the-fly* generation of negative samples. For each food image (positive sample), randomly select an image from a *different* food class as a negative sample. The `__getitem__` should return the anchor image, its label, and a negative image from another class.

*Technical note:* The `__getitem__` needs access to the list of all image paths and their labels to sample a negative example efficiently. Return structure: `(anchor_img, anchor_label, negative_img)`. Consider `collate_fn`. Libraries: `torch.utils.data`, `PIL`, `random`.

---
**Variant 5:** Create `DataLoader` instances for train and test sets using a batch size of 16. Experiment with the `num_workers` parameter (e.g., 0, 2, 4) and measure the time it takes to iterate through one epoch of the training data for each setting. Report the findings on data loading speed.

*Technical note:* Use `torch.utils.data.DataLoader(..., batch_size=16, num_workers=N, shuffle=True)`. Use `time.time()` before and after the loop over the DataLoader. Libraries: `torch.utils.data`, `time`. Metrics: Epoch iteration time.

---
**Variant 6:** Modify the data preparation process to handle potential grayscale images within the food dataset. Add a transform step that explicitly converts all images to RGB format before any other transformations are applied, ensuring consistency in tensor channels.

*Technical note:* Use `PIL.Image.open(path).convert('RGB')` within the `Dataset` loading logic or add `transforms.Grayscale(num_output_channels=3)` if using `torchvision.transforms` assuming grayscale inputs. Libraries: `PIL`, `torchvision`.

---
**Variant 7:** Implement stratified sampling for creating the training and validation `DataLoader`s. Ensure that each batch reflects the overall class distribution (pizza/steak/sushi) of the dataset. Use `sklearn.model_selection.StratifiedShuffleSplit` or a `WeightedRandomSampler`.

*Technical note:* Calculate class weights or use `sklearn` to generate indices for `SubsetRandomSampler` or configure `WeightedRandomSampler` based on inverse class frequencies. Libraries: `torch.utils.data`, `sklearn.model_selection`, `numpy`.

---
**Variant 8:** Add a more aggressive data augmentation technique, Cutout, to the training transforms. Implement or use a library function for Cutout, which randomly masks a square region of the input image. Evaluate its effect visually on sample images.

*Technical note:* Implement Cutout manually by zeroing out a random patch in the tensor, or use libraries like `albumentations`. Add this transform to the training `transforms.Compose` pipeline. Libraries: `torchvision`, `numpy`, potentially `albumentations`.

---
**Variant 9:** Create a custom `Dataset` that returns image pairs: one original image and one heavily augmented version (e.g., strong color jitter, Gaussian blur). This setup is often used in self-supervised learning. The label remains the food class.

*Technical note:* Define two transform pipelines in `__getitem__`: `transform_weak` and `transform_strong`. Return `(weakly_augmented_img, strongly_augmented_img, label)`. Libraries: `torch.utils.data`, `torchvision`.

---
**Variant 10:** Prepare the data using `ImageFolder` but modify it to resize images to a non-square resolution, such as 64x128 pixels. Analyze how this aspect ratio affects visualization and potentially model input layers.

*Technical note:* Use `transforms.Resize((64, 128))` as the resizing transform. Ensure subsequent model layers can handle this input shape. Libraries: `torchvision`.

---
**Variant 11:** Simulate a scenario with missing data. Randomly remove 30% of the image files from the training directory *before* creating the `Dataset`. Then, proceed with creating the `Dataset` and `DataLoader` using the remaining files. Analyze the impact on dataset size.

*Technical note:* Use `os.remove` or `pathlib.Path.unlink` after listing files but before initializing `ImageFolder` or the custom `Dataset`. Keep track of removed files. Libraries: `os`, `pathlib`, `random`. Metrics: `len(train_dataset)`.

---
**Variant 12:** Implement a custom `collate_fn` for the `DataLoader`. This function should take a batch of (image, label) tuples and pad images within the batch to the maximum height and width in that batch, rather than resizing all images to a fixed size beforehand.

*Technical note:* Define `collate_fn` that finds max H/W in the batch, creates zero tensors of that size, and copies each image tensor into the top-left corner. Return padded image batch and label batch. Libraries: `torch`, `torch.utils.data`.

---
**Variant 13:** Create a `Dataset` where `__getitem__` returns the image tensor, its label, and a one-hot encoded version of the label. Ensure the `DataLoader` and subsequent training steps can handle this multi-output structure.

*Technical note:* Use `torch.nn.functional.one_hot(torch.tensor(label), num_classes=N)` within `__getitem__`. The `collate_fn` should stack image tensors, label indices, and one-hot tensors separately. Libraries: `torch`, `torch.utils.data`.

---
**Variant 14:** Apply Mixup augmentation *during data loading*. Modify the `DataLoader` or create a wrapper that takes batches and mixes pairs of samples (images and labels) according to the Mixup alpha parameter before yielding the batch.

*Technical note:* Requires modifying the `DataLoader` iteration loop or using a custom batch sampler/collate function. Sample lambda from `Beta(alpha, alpha)`, mix `batch_X = lam * X1 + (1-lam) * X2`, `batch_y = lam * y1 + (1-lam) * y2`. Libraries: `torch`, `numpy.random`.

---
**Variant 15:** Prepare the food dataset to be loaded with different image sizes for training and testing. Use 64x64 for training (with augmentations) and 128x128 for testing (no augmentations). Analyze the potential need for model adjustments during evaluation.

*Technical note:* Define two separate `transforms.Compose` pipelines with different `transforms.Resize()` values. Use them with the respective `ImageFolder` datasets. Libraries: `torchvision`.

---
**Variant 16:** Create a custom `Dataset` that loads images and associated metadata (e.g., a dummy 'calorie_estimate' loaded from a parallel file or dictionary). The `__getitem__` should return `(image, label, metadata)`. Design a `collate_fn` to handle batching this structure.

*Technical note:* Assume metadata exists (e.g., in a CSV or dict mapping filenames to values). Load it in `__getitem__`. The `collate_fn` needs to batch images, labels, and metadata appropriately (e.g., stack tensors, list for metadata). Libraries: `torch.utils.data`, `pandas` (optional).

---
**Variant 17:** Implement data loading where only a subset of the classes is used. Modify the `ImageFolder` instantiation or filter the dataset afterwards to only include 'pizza' and 'steak' images for a binary classification task.

*Technical note:* Option 1: Prepare a directory structure with only pizza/steak folders. Option 2: Load full dataset, then use `torch.utils.data.Subset` with indices filtered based on labels. Libraries: `torchvision`, `torch.utils.data`.

---
**Variant 18:** Use `torchvision.transforms.v2` (if available, or simulate its potential features) to apply geometry-changing augmentations (like rotation) and observe how bounding boxes or segmentation masks (if available conceptually) would need to be transformed simultaneously. (Focus on the transform logic).

*Technical note:* Explore `torchvision.transforms.v2` if installed. Alternatively, conceptually design how a rotation transform would also need to apply the same rotation matrix to corner coordinates of a bounding box. Libraries: `torchvision` (v2 ideally).

---
**Variant 19:** Create `DataLoader`s that use different batch sizes for training and testing. Use a larger batch size (e.g., 64) for training to potentially speed up gradient computation and a smaller batch size (e.g., 32) for testing to reduce memory footprint during evaluation.

*Technical note:* Instantiate `DataLoader` for training with `batch_size=64` and for testing with `batch_size=32`. Ensure `shuffle=True` for training and `shuffle=False` for testing. Libraries: `torch.utils.data`.

---
**Variant 20:** Implement a `Dataset` that performs channel shuffling or permutation as a form of data augmentation. After loading an RGB image and converting it to a tensor, randomly permute the channel order (e.g., RGB -> GRB) before returning it.

*Technical note:* Inside `__getitem__`, after `transforms.ToTensor()`, get the tensor shape `(C, H, W)`. Generate a random permutation of `[0, 1, 2]` and re-index the tensor along the channel dimension: `image_tensor = image_tensor[torch.randperm(3), :, :]`. Libraries: `torch`, `torch.utils.data`.

In [None]:
# Utilize the function below as a helper 

import os
def walk_through_dir(dir_path):
  """Walks through dir_path returning file counts of its contents."""
  for dirpath, dirnames, filenames in os.walk(dir_path):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

In [None]:
# Setup train and testing paths


In [None]:
# Visualize an image

In [None]:
# Do the image visualization with matplotlib


We've got some images in our folders.

Now we need to make them compatible with PyTorch by:
1. Transform the data into tensors.
2. Turn the tensor data into a `torch.utils.data.Dataset` and later a `torch.utils.data.DataLoader`.

In [None]:
# Transforming data with torchvision.transforms


In [None]:
# Write transform for turning images into tensors


In [None]:
# Write a function to plot transformed images


### Load image data using `ImageFolder`

In [None]:
# Use ImageFolder to create dataset(s)


In [None]:
# Get class names as a list
class_names = train_data.classes
class_names

In [None]:
# Can also get class names as a dict
class_dict = train_data.class_to_idx
class_dict

In [None]:
# Check the lengths of each dataset
len(train_data), len(test_data)

In [None]:
# Turn train and test Datasets into DataLoaders


In [None]:
# How many batches of images are in our data loaders?


<a class="anchor" id="5.2"></a>

## <span style="color:red; font-size:1.5em;">Task 2. Creating a model</span>

[Go back to the content](#5)

**Variant 1:** Define a simple MLP (Multi-Layer Perceptron) model for food classification. The model should take flattened 64x64x3 images as input, have two hidden layers with ReLU activation (e.g., 128 and 64 units), and an output layer for the 3 food classes.

*Technical note:* Use `nn.Flatten()` first, then `nn.Linear`, `nn.ReLU`, and a final `nn.Linear(..., num_classes)`. Calculate the flattened input size: 64*64*3. Libraries: `torch.nn`.

---
**Variant 2:** Create a CNN model similar to TinyVGG but replace all `nn.MaxPool2d` layers with `nn.AvgPool2d` layers using the same kernel size and stride. Train and evaluate if this change impacts performance on the food classification task.

*Technical note:* Replace `nn.MaxPool2d(kernel_size=2, stride=2)` with `nn.AvgPool2d(kernel_size=2, stride=2)` throughout the convolutional blocks. Keep the rest of the architecture the same. Libraries: `torch.nn`.

---
**Variant 3:** Implement the TinyVGG architecture but add `nn.BatchNorm2d` layers after each `nn.Conv2d` layer (before the activation function). Analyze how batch normalization affects training stability and convergence speed.

*Technical note:* Insert `nn.BatchNorm2d(num_features=out_channels)` after each `nn.Conv2d` layer within the convolutional blocks. The number of features must match the output channels of the preceding conv layer. Libraries: `torch.nn`.

---
**Variant 4:** Design a CNN model with only one convolutional block (Conv2d -> ReLU -> Conv2d -> ReLU -> MaxPool2d) but increase the number of output channels significantly (e.g., 64 and 128) compared to TinyVGG's initial layers. Follow with a classifier head.

*Technical note:* Define a single `nn.Sequential` block with `nn.Conv2d(in_channels=3, out_channels=64, ...)` and `nn.Conv2d(in_channels=64, out_channels=128, ...)`. Calculate the flattened size after pooling for the `nn.Linear` layer. Libraries: `torch.nn`.

---
**Variant 5:** Create a CNN model where the kernel sizes of the convolutional layers are varied. Use a 5x5 kernel in the first conv layer and a 3x3 kernel in the second conv layer within each block. Keep padding appropriate to maintain spatial dimensions where desired.

*Technical note:* Set `kernel_size=5, padding=2` for the first `nn.Conv2d` and `kernel_size=3, padding=1` for the second `nn.Conv2d` in each block (assuming stride=1). Libraries: `torch.nn`.

---
**Variant 6:** Implement a CNN using depthwise separable convolutions. Replace standard `nn.Conv2d` layers with a sequence of a depthwise convolution (`nn.Conv2d` with `groups=in_channels`) followed by a pointwise convolution (`nn.Conv2d` with `kernel_size=1`).

*Technical note:* A block becomes: Depthwise `nn.Conv2d(in_channels, in_channels, kernel_size=3, padding=1, groups=in_channels)`, `nn.ReLU()`, Pointwise `nn.Conv2d(in_channels, out_channels, kernel_size=1)`. Compare parameter count to standard CNN. Libraries: `torch.nn`.

---
**Variant 7:** Build a CNN model incorporating dropout for regularization. Add `nn.Dropout(p=0.25)` layers after the pooling layers and `nn.Dropout(p=0.5)` before the final `nn.Linear` classifier layer.

*Technical note:* Insert `nn.Dropout(p=0.25)` after `nn.MaxPool2d` layers. Add `nn.Dropout(p=0.5)` after flattening the features and before the final classification `nn.Linear` layer. Libraries: `torch.nn`.

---
**Variant 8:** Design a wider version of TinyVGG. Double the number of output channels in all `nn.Conv2d` layers (e.g., 10 -> 20, 20 -> 40). Analyze the impact on the total number of model parameters and potential performance changes.

*Technical note:* Modify the `out_channels` argument in all `nn.Conv2d` layers and adjust the `in_features` of the `nn.Linear` layer accordingly. Calculate the new parameter count using `sum(p.numel() for p in model.parameters())`. Libraries: `torch.nn`.

---
**Variant 9:** Create a CNN model that uses a different activation function, such as `nn.GELU` or `nn.SiLU` (Swish), instead of `nn.ReLU` throughout the network. Compare training curves and final accuracy with the ReLU baseline.

*Technical note:* Replace all instances of `nn.ReLU()` with `nn.GELU()` or `nn.SiLU()`. Observe potential differences in gradient flow or convergence. Libraries: `torch.nn`.

---
**Variant 10:** Implement a simple Residual Network (ResNet) block within the CNN. Create a block where the input is added to the output of two convolutional layers (potentially with a 1x1 conv in the skip connection if channel sizes differ). Replace a standard block in TinyVGG with this ResNet block.

*Technical note:* Define a custom `ResBlock` module. `output = self.conv2(self.relu(self.conv1(x))) + self.shortcut(x)`. Ensure `self.shortcut` handles dimension matching (identity or 1x1 conv). Integrate this block into the main model sequence. Libraries: `torch.nn`.

---
**Variant 11:** Construct a model using `nn.Sequential` for the feature extractor and `nn.Sequential` for the classifier head separately. Instantiate these within the main model class and connect them in the `forward` method. This promotes modular design.

*Technical note:* Define `self.features = nn.Sequential(...)` containing conv/pool layers and `self.classifier = nn.Sequential(...)` containing flatten/linear layers. `forward(self, x): x = self.features(x); x = self.classifier(x); return x`. Libraries: `torch.nn`.

---
**Variant 12:** Build a CNN model that takes 6-channel input, assuming the data loader provides pairs of images (e.g., original and augmented). Modify the first `nn.Conv2d` layer to accept `in_channels=6`.

*Technical note:* Change the first `nn.Conv2d` layer to `nn.Conv2d(in_channels=6, ...)`. Ensure the `DataLoader` provides batches where images are concatenated along the channel dimension. Libraries: `torch.nn`.

---
**Variant 13:** Implement a CNN with dilated convolutions in the second convolutional block. Use `dilation=2` in the `nn.Conv2d` layers of that block to increase the receptive field without adding parameters or reducing spatial resolution via pooling.

*Technical note:* In the second conv block, modify `nn.Conv2d` layers to include `dilation=2, padding=2` (adjust padding to maintain size with dilation). Libraries: `torch.nn`.

---
**Variant 14:** Create a model with Global Average Pooling (GAP) instead of flattening before the final classifier. Replace the `nn.Flatten()` layer and the potentially large `nn.Linear` layer with `nn.AdaptiveAvgPool2d((1, 1))` followed by `nn.Flatten()` and a `nn.Linear` layer with fewer input features.

*Technical note:* After the last conv/pool block, add `nn.AdaptiveAvgPool2d((1, 1))`, then `nn.Flatten()`, then `nn.Linear(in_features=last_out_channels, out_features=num_classes)`. This significantly reduces parameters in the classifier head. Libraries: `torch.nn`.

---
**Variant 15:** Design a CNN model where the stride of the first convolutional layer is increased to 2. Analyze how this initial downsampling affects the feature map sizes throughout the network and the final performance. Adjust padding accordingly.

*Technical note:* Set `stride=2` in the very first `nn.Conv2d` layer. Recalculate the expected input size to the classifier head based on the faster spatial reduction. Use appropriate padding, e.g., `padding=kernel_size//2`. Libraries: `torch.nn`.

---
**Variant 16:** Implement a simple attention mechanism, like a Squeeze-and-Excitation (SE) block, after one of the convolutional blocks in TinyVGG. This block recalibrates channel-wise feature responses.

*Technical note:* Define an `SEBlock` module: GlobalAvgPool -> Linear(C -> C/r) -> ReLU -> Linear(C/r -> C) -> Sigmoid. Multiply the output with the input feature map. Integrate this block after a conv block. Libraries: `torch.nn`.

---
**Variant 17:** Build a CNN model intended for regression instead of classification. Change the final `nn.Linear` layer to have an output size of 1 (e.g., predicting a dummy 'calorie' value). Remove any final activation like Softmax. Use an appropriate loss function (e.g., MSELoss) during training.

*Technical note:* Set `out_features=1` in the last `nn.Linear` layer. Ensure the training loop uses a regression loss like `nn.MSELoss`. Labels provided by the DataLoader should be numerical values. Libraries: `torch.nn`.

---
**Variant 18:** Create a CNN model using different padding strategies. Experiment with `padding='same'` (if using recent PyTorch versions or calculating manually) versus specific integer padding values in the `nn.Conv2d` layers to see the effect on output feature map sizes.

*Technical note:* Use `padding='same'` (for stride 1) or calculate `padding = kernel_size // 2`. Compare with fixed padding like `padding=1`. Observe spatial dimensions after each layer. Libraries: `torch.nn`.

---
**Variant 19:** Construct a very deep, narrow CNN. Increase the number of conv blocks (e.g., 4 or 5 sequential blocks) but keep the number of channels per layer relatively small (e.g., 8 or 16). Analyze the trade-offs between depth and width.

*Technical note:* Add more `nn.Sequential` blocks following the TinyVGG pattern but maintain low `out_channels` (e.g., 8, 8, 16, 16, 32, 32...). Be mindful of vanishing gradients; consider adding BatchNorm or ResBlocks if needed. Libraries: `torch.nn`.

---
**Variant 20:** Design a model with multiple classifier heads branching off at different depths. For example, one head after the first conv block and another head after the second (final) conv block. The final prediction could be an average, or losses could be combined during training.

*Technical note:* Define intermediate classifier heads (e.g., Pool -> Flatten -> Linear). Modify the `forward` method to return multiple outputs. Combine losses: `total_loss = loss_final + 0.3 * loss_intermediate`. Libraries: `torch.nn`.

<a class="anchor" id="5.3"></a>

## <span style="color:red; font-size:1.5em;">Task 3. Training and testing loops</span>

[Go back to the content](#5)

**Variant 1:** Modify the `train` function to include learning rate scheduling. Implement a `StepLR` scheduler that reduces the learning rate by a factor of 0.1 every 5 epochs. Pass the scheduler to the function and call `scheduler.step()` after each epoch.

*Technical note:* Import `torch.optim.lr_scheduler`. Create `scheduler = StepLR(optimizer, step_size=5, gamma=0.1)`. Call `scheduler.step()` at the end of the epoch loop in the `train` function. Log the learning rate changes. Libraries: `torch.optim`.

---
**Variant 2:** Implement an early stopping mechanism within the `train` function. Monitor the validation loss (`test_loss`). If the validation loss does not improve for a specified number of epochs (patience, e.g., 3 epochs), stop the training loop early and save the best model state.

*Technical note:* Track `best_test_loss` and `epochs_no_improve`. Inside the epoch loop, check if `test_loss < best_test_loss`. If yes, update best loss and reset counter; save model state. If not, increment counter. If counter reaches patience, `break` the loop. Libraries: `torch`, `copy`.

---
**Variant 3:** Enhance the `train_step` function to include gradient clipping. After calculating the loss and before the optimizer step, use `torch.nn.utils.clip_grad_norm_` to clip the gradients of the model parameters by norm (e.g., clip at norm 1.0).

*Technical note:* After `loss.backward()`, add `torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)`. Then call `optimizer.step()`. This helps prevent exploding gradients. Libraries: `torch.nn.utils`.

---
**Variant 4:** Modify the training and testing loops to use mixed-precision training with `torch.cuda.amp`. Wrap the forward pass in `train_step` with `amp.autocast()` and use `amp.GradScaler` for the backward pass and optimizer step.

*Technical note:* Import `torch.cuda.amp`. Create `scaler = amp.GradScaler()`. In `train_step`: `with amp.autocast(): outputs = model(X); loss = loss_fn(outputs, y)`. Then `scaler.scale(loss).backward()`, `scaler.step(optimizer)`, `scaler.update()`. Use `amp.autocast()` in `test_step` too. Libraries: `torch.cuda.amp`.

---
**Variant 5:** Adapt the `train` function to log training and validation metrics (loss, accuracy) to TensorBoard. Use `torch.utils.tensorboard.SummaryWriter` to write scalars for each metric at every epoch. Visualize the results in TensorBoard.

*Technical note:* Import `SummaryWriter`. Create `writer = SummaryWriter()`. Inside the epoch loop, call `writer.add_scalar('Loss/train', train_loss, epoch)`, `writer.add_scalar('Accuracy/train', train_acc, epoch)`, and similarly for test metrics. Launch TensorBoard to view logs. Libraries: `torch.utils.tensorboard`.

---
**Variant 6:** Change the loss function used in the training and testing loops. Instead of `nn.CrossEntropyLoss`, implement training with `nn.NLLLoss`. Ensure the model's output layer includes `nn.LogSoftmax` as NLLLoss expects log-probabilities.

*Technical note:* Modify the model to end with `nn.LogSoftmax(dim=1)`. Change `loss_fn = nn.CrossEntropyLoss()` to `loss_fn = nn.NLLLoss()`. Ensure label tensors are of type `torch.long`. Libraries: `torch.nn`.

---
**Variant 7:** Implement weighted loss calculation in `train_step` to handle potential class imbalance in the food dataset. Calculate class weights (e.g., inversely proportional to frequency) and pass them to the `nn.CrossEntropyLoss` criterion.

*Technical note:* Calculate weights `w = [w0, w1, w2]` based on class counts. Convert to tensor `weights = torch.tensor(w).to(device)`. Instantiate loss: `loss_fn = nn.CrossEntropyLoss(weight=weights)`. Libraries: `torch.nn`, `sklearn.utils.class_weight` (optional).

---
**Variant 8:** Modify the `test_step` function to calculate and return additional classification metrics beyond accuracy, such as precision, recall, and F1-score (per-class or macro-averaged). Use a library like `torchmetrics` or `scikit-learn`.

*Technical note:* Accumulate all predictions and true labels from the test set. After the loop, use `torchmetrics.classification.MulticlassAccuracy`, `Precision`, `Recall`, `F1Score` or `sklearn.metrics.classification_report`. Return these metrics from `test_step`. Libraries: `torchmetrics` or `sklearn.metrics`.

---
**Variant 9:** Implement Stochastic Weight Averaging (SWA) within the `train` function. Train normally for a certain number of epochs, then switch to SWA mode where model weights are averaged over several epochs with a cyclical or constant learning rate. Evaluate the SWA model.

*Technical note:* Use `torch.optim.swa_utils.AveragedModel` and `torch.optim.swa_utils.SWALR`. After initial training, wrap the model: `swa_model = AveragedModel(model)`. Use the SWA scheduler. Update SWA model `swa_model.update()` in the SWA phase. Evaluate `swa_model` after training. Libraries: `torch.optim.swa_utils`.

---
**Variant 10:** Refactor the `train` function to include a separate validation loop called periodically (e.g., every N batches or at the end of each epoch) using a dedicated validation `DataLoader`. This is distinct from the final test set evaluation.

*Technical note:* Create a validation `DataLoader` from a subset of the training data. Modify `train` to call a `validation_step` function (similar to `test_step`) using this loader periodically. Use validation metrics for early stopping or hyperparameter tuning. Libraries: `torch.utils.data`.

---
**Variant 11:** Modify the `train_step` to implement label smoothing manually. Instead of passing raw labels to the loss function, create smoothed label tensors (e.g., 0.9 for true class, 0.05 for others). Use KL Divergence loss between model output (LogSoftmax) and smoothed labels.

*Technical note:* Create smoothed labels `y_smooth = torch.full_like(outputs, fill_value=epsilon / (num_classes - 1)); y_smooth.scatter_(1, y.unsqueeze(1), 1.0 - epsilon)`. Use `nn.KLDivLoss(reduction='batchmean')` with `F.log_softmax(outputs, dim=1)`. Libraries: `torch.nn`, `torch.nn.functional`.

---
**Variant 12:** Add functionality to the `train` function to save model checkpoints periodically (e.g., every 5 epochs) and also save the model with the best validation accuracy encountered so far.

*Technical note:* Use `torch.save({'epoch': epoch, 'model_state_dict': model.state_dict(), ...}, PATH)`. Inside the epoch loop, check `if epoch % 5 == 0:` save checkpoint. Also track `best_val_acc` and save `best_model.pth` whenever `test_acc > best_val_acc`. Libraries: `torch`.

---
**Variant 13:** Implement the training loop using a different optimizer, such as `AdamW`, which includes decoupled weight decay. Compare its performance (convergence speed, final accuracy) against the default Adam or SGD optimizer.

*Technical note:* Import `torch.optim.AdamW`. Replace `optimizer = torch.optim.Adam(...)` with `optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)`. Experiment with `weight_decay` values. Libraries: `torch.optim`.

---
**Variant 14:** Modify the `test_step` to compute and return a confusion matrix for the food classification task. Use `torchmetrics` or `scikit-learn` to generate the matrix based on accumulated predictions and labels.

*Technical note:* Accumulate `all_preds` and `all_labels` tensors. After the loop, use `torchmetrics.classification.MulticlassConfusionMatrix(num_classes=3)` or `sklearn.metrics.confusion_matrix`. Return the matrix tensor. Libraries: `torchmetrics`, `sklearn.metrics`.

---
**Variant 15:** Change the `train` function structure to perform k-fold cross-validation. Split the dataset into k folds, and run the training/validation loop k times, each time using a different fold for validation and the rest for training. Average the results across folds.

*Technical note:* Use `sklearn.model_selection.KFold` to get train/validation indices for each fold. Loop k times, creating `SubsetRandomSampler` and `DataLoader`s for each fold. Store results per fold and average. Libraries: `sklearn.model_selection`, `torch.utils.data`.

---
**Variant 16:** Introduce gradient accumulation into the `train_step`. Modify the loop to perform the forward pass and loss calculation multiple times (e.g., 4 steps) before calling `optimizer.step()`. Divide the loss by accumulation steps before `backward()`.

*Technical note:* Set `accumulation_steps = 4`. Inside batch loop: `loss = loss_fn(...) / accumulation_steps; loss.backward()`. `if (i + 1) % accumulation_steps == 0: optimizer.step(); optimizer.zero_grad()`. Effectively simulates larger batch size. Libraries: `torch`.

---
**Variant 17:** Modify the `train` function to dynamically adjust the batch size based on the epoch number. Start with a smaller batch size (e.g., 16) for the first few epochs and increase it (e.g., to 32, then 64) in later epochs. Requires recreating the DataLoader.

*Technical note:* Re-create `train_dataloader` inside the epoch loop with updated `batch_size` based on `epoch`. Manage shuffling state if needed. Analyze impact on stability and speed. Libraries: `torch.utils.data`.

---
**Variant 18:** Implement a training loop that supports multi-task learning. Assume the model returns two outputs (e.g., class prediction, dummy regression value). Calculate two separate losses and combine them using a weighted sum before backpropagation.

*Technical note:* Model forward returns `out_class, out_reg`. Calculate `loss_class = loss_fn_class(out_class, y_class)` and `loss_reg = loss_fn_reg(out_reg, y_reg)`. Combine: `loss = alpha * loss_class + beta * loss_reg`. Then `loss.backward()`. Libraries: `torch.nn`.

---
**Variant 19:** Refactor `train_step` and `test_step` to explicitly move data and model to the target `device` (CPU or GPU) at the beginning of the functions, rather than within the batch loop. Ensure the loss function is also on the correct device.

*Technical note:* Add `model.to(device)` before loops (usually done once). In steps: `X, y = X.to(device), y.to(device)`. Ensure `loss_fn` doesn't have internal state needing device transfer (usually okay). Promotes clarity. Libraries: `torch`.

---
**Variant 20:** Enhance the `train` function's result dictionary to store more detailed metrics, such as the learning rate used at each epoch (especially if using a scheduler) and the time taken per epoch.

*Technical note:* Add `results['learning_rate'] = []` and `results['epoch_time'] = []`. Record `optimizer.param_groups[0]['lr']` and epoch duration (`end_time - start_time`) and append to the lists within the epoch loop. Libraries: `time`.

In [None]:
# Utilize the functions below as start-ups

def train_step(model: torch.nn.Module,
               dataloader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer):
  
  # Put the model in train mode
  model.train()

  # Setup train loss and train accuracy values
  train_loss, train_acc = 0, 0

  # Loop through data loader and data batches
 
    # Send data to target device

    # 1. Forward pass
    
    # 2. Calculate and accumulate loss
    

    # 3. Optimizer zero grad 
    

    # 4. Loss backward 
    

    # 5. Optimizer step
    

    # Calculate and accumualte accuracy metric across all batches
   

  # Adjust metrics to get average loss and average accuracy per batch
  

In [None]:
def test_step(model: torch.nn.Module,
              dataloader: torch.utils.data.DataLoader,
              loss_fn: torch.nn.Module):
  
  # Put model in eval mode
  model.eval()

  # Setup the test loss and test accuracy values
  test_loss, test_acc = 0, 0

  # Turn on inference context manager
  
    # Loop through DataLoader batches
    
      # Send data to target device
      

      # 1. Forward pass
      

      # 2. Calculuate and accumulate loss


      # Calculate and accumulate accuracy

    
  # Adjust metrics to get average loss and accuracy per batch


In [None]:
from tqdm.auto import tqdm

def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
          epochs: int = 5):
  
  # Create results dictionary
  results = {"train_loss": [],
             "train_acc": [],
             "test_loss": [],
             "test_acc": []}

  # Loop through the training and testing steps for a number of epochs
  for epoch in tqdm(range(epochs)):
    # Train step
    train_loss, train_acc = train_step(model=model, 
                                       dataloader=train_dataloader,
                                       loss_fn=loss_fn,
                                       optimizer=optimizer)
    # Test step
    test_loss, test_acc = test_step(model=model, 
                                    dataloader=test_dataloader,
                                    loss_fn=loss_fn)
    
    # Print out what's happening
    print(f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
    )

    # Update the results dictionary
    results["train_loss"].append(train_loss)
    results["train_acc"].append(train_acc)
    results["test_loss"].append(test_loss)
    results["test_acc"].append(test_acc)

  # Return the results dictionary
  return results

<a class="anchor" id="5.4"></a>

## <span style="color:red; font-size:1.5em;">Task 4. Conducting experiments with hyperparameters</span>

[Go back to the content](#5)

**Variant 1:** Conduct an experiment comparing three different learning rates (e.g., 1e-2, 1e-3, 1e-4) for the Adam optimizer. Train the model for 10 epochs for each learning rate, keeping other hyperparameters constant. Plot the training and validation loss curves for comparison.

*Technical note:* Use `torch.optim.Adam`. Run the `train` function three times, changing only the `lr` parameter passed to Adam. Record results dicts. Use Matplotlib to plot losses vs. epochs. Libraries: `torch.optim`, `matplotlib.pyplot`. Metrics: Train/Test Loss, Accuracy curves.

---
**Variant 2:** Investigate the effect of different batch sizes (e.g., 16, 32, 64) on training speed and final model performance. Train the model for 10 epochs for each batch size. Report the average time per epoch and the final test accuracy for each configuration.

*Technical note:* Modify the `DataLoader` instantiation with different `batch_size` values. Run the `train` function three times. Record epoch times and final accuracy. Libraries: `torch.utils.data`, `time`. Metrics: Epoch Time, Test Accuracy.

---
**Variant 3:** Compare the performance of three different optimizers: SGD (with momentum 0.9), Adam, and RMSprop. Use a fixed learning rate (e.g., 1e-3) and train for 15 epochs for each. Plot validation accuracy curves to compare convergence.

*Technical note:* Use `torch.optim.SGD(..., momentum=0.9)`, `torch.optim.Adam`, `torch.optim.RMSprop`. Run `train` function three times, changing the optimizer instance. Plot `results['test_acc']` vs. epochs. Libraries: `torch.optim`, `matplotlib.pyplot`. Metrics: Test Accuracy curve.

---
**Variant 4:** Experiment with the impact of weight decay regularization. Train the Adam optimizer with three different `weight_decay` values (e.g., 0, 1e-4, 1e-3), using a fixed learning rate (1e-3) for 15 epochs each. Compare final validation loss and accuracy.

*Technical note:* Use `torch.optim.Adam(..., weight_decay=wd_value)`. Run `train` three times with different `wd_value`. Compare `results['test_loss'][-1]` and `results['test_acc'][-1]`. Libraries: `torch.optim`. Metrics: Final Test Loss, Final Test Accuracy.

---
**Variant 5:** Evaluate different dropout rates in the classifier head. Modify the model to use dropout probabilities of 0.25, 0.5, and 0.75 before the final linear layer. Train for 15 epochs for each rate and compare validation accuracy.

*Technical note:* Modify the `nn.Dropout(p=...)` layer in the model definition. Re-instantiate the model and train three times. Compare `results['test_acc']`. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 6:** Explore the interaction between learning rate and batch size. Train using combinations: (LR=1e-3, BS=32), (LR=1e-3, BS=64), (LR=5e-4, BS=32), (LR=5e-4, BS=64). Train for 10 epochs each and report the final test accuracy for all four settings.

*Technical note:* Run the `train` function four times, adjusting both the `lr` in the optimizer and `batch_size` in the `DataLoader`. Report final `test_acc`. Libraries: `torch.optim`, `torch.utils.data`. Metrics: Final Test Accuracy.

---
**Variant 7:** Compare different learning rate scheduling strategies. Train the model for 20 epochs using: (a) fixed LR, (b) StepLR (decay by 0.1 every 7 epochs), (c) ReduceLROnPlateau (monitor validation loss). Plot validation accuracy curves.

*Technical note:* Implement cases (a), (b) `StepLR(optimizer, step_size=7, gamma=0.1)`, (c) `ReduceLROnPlateau(optimizer, 'min', patience=3)`. Update scheduler appropriately in the training loop. Plot `test_acc`. Libraries: `torch.optim.lr_scheduler`, `matplotlib.pyplot`. Metrics: Test Accuracy curve.

---
**Variant 8:** Investigate the effect of the number of training epochs. Train the model for 5, 10, 15, and 20 epochs. Plot the final test accuracy achieved at the end of each training duration to observe the trend.

*Technical note:* Run the `train` function four times, changing the `epochs` argument (5, 10, 15, 20). Extract `results['test_acc'][-1]` for each run and plot against the number of epochs. Libraries: `matplotlib.pyplot`. Metrics: Test Accuracy vs. Epochs.

---
**Variant 9:** Experiment with the beta parameters of the Adam optimizer. Try three settings: (beta1=0.9, beta2=0.999) (default), (beta1=0.8, beta2=0.99), (beta1=0.95, beta2=0.9999). Train for 15 epochs each and compare validation loss curves.

*Technical note:* Use `torch.optim.Adam(..., betas=(b1, b2))`. Run `train` three times changing the `betas` tuple. Plot `test_loss` curves. Libraries: `torch.optim`, `matplotlib.pyplot`. Metrics: Test Loss curve.

---
**Variant 10:** Evaluate the impact of using data augmentation versus not using it. Train one model for 15 epochs with the augmented training dataset (e.g., flips, rotations) and another model for 15 epochs with only resizing and ToTensor. Compare final test accuracies.

*Technical note:* Create two `DataLoader` setups: one with augmentation transforms, one without. Run `train` twice using the respective loaders. Compare `results['test_acc'][-1]`. Libraries: `torchvision.transforms`. Metrics: Final Test Accuracy.

---
**Variant 11:** Conduct a small grid search over learning rate and weight decay. Test LR=[1e-3, 5e-4] and WD=[0, 1e-4]. Train for 10 epochs for all 4 combinations. Report the test accuracy for each combination in a table.

*Technical note:* Implement nested loops or a list of configurations. Run `train` for each (LR, WD) pair. Record final `test_acc` and present in a table format. Libraries: `torch.optim`. Metrics: Table of Test Accuracies.

---
**Variant 12:** Explore the `num_workers` parameter in `DataLoader`. Train the model for 5 epochs using `num_workers=0`, `num_workers=2`, and `num_workers=4`. Measure the total training time for each setting and report if more workers speed up training on your system.

*Technical note:* Modify `DataLoader` instantiation with different `num_workers`. Run `train` three times, recording total time using `time.time()`. Report total training times. Libraries: `torch.utils.data`, `time`. Metrics: Total Training Time.

---
**Variant 13:** Compare using `nn.BatchNorm2d` versus not using it in the model. Train one model with BatchNorm layers (as in Task 2, Variant 3) and one without, for 15 epochs each. Plot both training and validation loss curves for comparison.

*Technical note:* Define two model versions: `ModelWithBN` and `ModelWithoutBN`. Train both using the same hyperparameters. Plot `train_loss` and `test_loss` curves for both models on the same graph. Libraries: `torch.nn`, `matplotlib.pyplot`. Metrics: Train/Test Loss curves.

---
**Variant 14:** Experiment with the momentum parameter of the SGD optimizer. Use SGD with momentum values of 0, 0.9, and 0.99. Use a fixed learning rate (e.g., 0.01) and train for 15 epochs each. Compare validation accuracy curves.

*Technical note:* Use `torch.optim.SGD(..., momentum=mom_value)`. Run `train` three times with different `mom_value`. Plot `test_acc` curves. Libraries: `torch.optim`, `matplotlib.pyplot`. Metrics: Test Accuracy curve.

---
**Variant 15:** Assess the impact of image input size. Prepare data and train models with input sizes 32x32, 64x64, and 128x128 (adjusting model if necessary, e.g., classifier input features). Train for 10 epochs each and compare final test accuracy.

*Technical note:* Create datasets/loaders with `transforms.Resize((size, size))`. Modify the model's classifier input size based on the flattened output of the feature extractor for each input size. Train three separate models. Compare final `test_acc`. Libraries: `torchvision.transforms`, `torch.nn`. Metrics: Final Test Accuracy.

---
**Variant 16:** Run an experiment where the training data amount is varied. Use 50%, 75%, and 100% of the available training images. Create subsets of the original training dataset and train the model for 10 epochs on each subset. Compare final test accuracies.

*Technical note:* Use `torch.utils.data.random_split` or manual indexing to create training subsets. Create `DataLoader`s for each subset size. Train three models. Compare final `test_acc`. Libraries: `torch.utils.data`. Metrics: Final Test Accuracy.

---
**Variant 17:** Compare performance with different loss functions: `nn.CrossEntropyLoss` vs. Focal Loss (implement or find a PyTorch implementation). Train for 15 epochs with each loss function and compare validation accuracy, especially for potentially imbalanced classes.

*Technical note:* Implement Focal Loss or use a library. Modify the `train` function to accept the loss function instance. Run `train` twice with `nn.CrossEntropyLoss()` and `FocalLoss()`. Compare `test_acc` and potentially per-class metrics. Libraries: `torch.nn`. Metrics: Test Accuracy, Per-class metrics.

---
**Variant 18:** Explore the effect of the random seed. Run the complete training process (including data shuffling and model initialization) three times using different random seeds (e.g., 42, 123, 999). Report the final test accuracy for each run to assess variability.

*Technical note:* Set `torch.manual_seed(seed_value)` and potentially `numpy.random.seed(seed_value)` at the beginning of each run. Execute the entire training script three times. Report the spread of final `test_acc`. Libraries: `torch`, `numpy`. Metrics: Final Test Accuracy variability.

---
**Variant 19:** Investigate different settings for `ReduceLROnPlateau`. Try patience values of 2, 5, and 10. Train for 25 epochs for each setting, monitoring validation loss. Plot the learning rate schedule and validation loss for each run.

*Technical note:* Use `ReduceLROnPlateau(optimizer, 'min', patience=p_value)`. Run `train` three times changing `p_value`. Log LR at each epoch. Plot LR and `test_loss` vs. epochs. Libraries: `torch.optim.lr_scheduler`, `matplotlib.pyplot`. Metrics: LR schedule, Test Loss curve.

---
**Variant 20:** Combine optimal settings found in previous experiments. For instance, use the best optimizer found, the best learning rate, the best batch size, and train for a longer duration (e.g., 25 epochs) with early stopping enabled. Report the final performance achieved.

*Technical note:* Synthesize findings from variants 1-19. Configure the `DataLoader`, model, optimizer, and `train` function call with the identified best hyperparameters. Include early stopping. Report final `test_acc` and `test_loss`. Libraries: All relevant libraries. Metrics: Final Test Loss, Final Test Accuracy.


<a class="anchor" id="5.5"></a>

## <span style="color:red; font-size:1.5em;">Task 5. Conducting experiments with the model's layers</span>

[Go back to the content](#5)

**Variant 1:** Replace the standard `nn.Conv2d` layers in the TinyVGG model with Depthwise Separable Convolutions. Implement the change and train the modified model. Compare the parameter count and test accuracy against the original TinyVGG.

*Technical note:* Implement Depthwise Separable Conv using `nn.Conv2d(groups=in_channels)` followed by `nn.Conv2d(kernel_size=1)`. Replace standard conv blocks. Calculate parameters `sum(p.numel() for p in model.parameters())`. Train and compare `test_acc`. Libraries: `torch.nn`. Metrics: Parameter Count, Test Accuracy.

---
**Variant 2:** Experiment with different pooling layer types. Replace all `nn.MaxPool2d` layers in the model with `nn.AvgPool2d` using the same kernel size and stride. Train the modified model and compare its performance (test accuracy) with the max-pooling version.

*Technical note:* Substitute `nn.MaxPool2d(2, 2)` with `nn.AvgPool2d(2, 2)`. Train the model using the same hyperparameters. Compare `results['test_acc']`. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 3:** Introduce `nn.BatchNorm2d` layers after every convolutional layer (before ReLU) in the TinyVGG model if not already present. Train this batch-normalized model and compare its training stability (smoother loss curve) and final accuracy to the original model.

*Technical note:* Insert `nn.BatchNorm2d(out_channels)` after each `nn.Conv2d`. Train both versions. Plot `train_loss` and `test_loss` curves for comparison. Check final `test_acc`. Libraries: `torch.nn`, `matplotlib.pyplot`. Metrics: Loss Curves, Test Accuracy.

---
**Variant 4:** Modify the number of filters (output channels) in the convolutional layers. Try increasing the channels (e.g., 16, 32) or decreasing them (e.g., 8, 16) compared to the baseline. Analyze the impact on parameter count and test accuracy.

*Technical note:* Change `out_channels` in `nn.Conv2d` layers and adjust subsequent layer `in_channels` and classifier `in_features`. Calculate parameters. Train each configuration. Compare parameter counts and final `test_acc`. Libraries: `torch.nn`. Metrics: Parameter Count, Test Accuracy.

---
**Variant 5:** Change the kernel sizes in the convolutional layers. Replace all 3x3 kernels with 5x5 kernels (adjust padding to `padding=2`). Train the model and compare performance and parameter count against the 3x3 kernel version.

*Technical note:* Change `kernel_size=3, padding=1` to `kernel_size=5, padding=2` in all `nn.Conv2d` layers. Recalculate parameters. Train and compare final `test_acc`. Libraries: `torch.nn`. Metrics: Parameter Count, Test Accuracy.

---
**Variant 6:** Implement dropout layers within the convolutional blocks, for example, after the ReLU activation function (`nn.Dropout2d(p=0.2)`). Train the model with this spatial dropout and compare its performance to no dropout or dropout only in the classifier.

*Technical note:* Insert `nn.Dropout2d(p=0.2)` after `nn.ReLU` inside the conv blocks. Train the model. Compare `test_acc` with baseline and classifier-dropout versions. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 7:** Replace the `nn.ReLU` activation function throughout the model with `nn.LeakyReLU(negative_slope=0.1)`. Train the modified model and compare its convergence and final accuracy to the ReLU baseline.

*Technical note:* Substitute all `nn.ReLU()` instances with `nn.LeakyReLU(0.1)`. Train both models. Compare `test_loss` curves and final `test_acc`. Libraries: `torch.nn`, `matplotlib.pyplot`. Metrics: Loss Curves, Test Accuracy.

---
**Variant 8:** Add a residual connection (skip connection) around one of the convolutional blocks in TinyVGG. Ensure dimensions match (using a 1x1 conv in the shortcut if needed). Train and evaluate if this addition improves performance or training stability.

*Technical note:* Define a custom block where `output = block_layers(x) + shortcut(x)`. The `shortcut` might be `nn.Identity()` or `nn.Conv2d(in_channels, out_channels, kernel_size=1)`. Integrate this block. Train and compare `test_acc` and loss curves. Libraries: `torch.nn`. Metrics: Test Accuracy, Loss Curves.

---
**Variant 9:** Modify the classifier head of the model. Instead of a single `nn.Linear` layer after flattening, use two `nn.Linear` layers with a `nn.ReLU` activation in between (e.g., Flatten -> Linear -> ReLU -> Linear -> Output). Compare performance.

*Technical note:* Replace `nn.Linear(in_features, num_classes)` with `nn.Sequential(nn.Linear(in_features, hidden_units), nn.ReLU(), nn.Linear(hidden_units, num_classes))`. Choose `hidden_units` (e.g., 128). Train and compare `test_acc`. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 10:** Implement Global Average Pooling (GAP) instead of `nn.Flatten()` before the classifier head. Remove the flatten layer and adjust the input features of the `nn.Linear` layer to match the number of channels from the last conv block.

*Technical note:* Replace `nn.Flatten()` with `nn.AdaptiveAvgPool2d((1, 1))` followed by `nn.Flatten()`. Change the `nn.Linear` layer to `nn.Linear(in_features=last_conv_out_channels, out_features=num_classes)`. Train and compare `test_acc` and parameter count. Libraries: `torch.nn`. Metrics: Test Accuracy, Parameter Count.

---
**Variant 11:** Experiment with different strides in the convolutional layers. Try using `stride=2` in the first convolution of the *second* block (instead of pooling) to downsample. Adjust subsequent layers and compare performance.

*Technical note:* In the second conv block's first `nn.Conv2d`, set `stride=2` and adjust `padding`. Remove the `nn.MaxPool2d` after that block. Recalculate classifier input size. Train and compare `test_acc`. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 12:** Replace `nn.BatchNorm2d` with `nn.LayerNorm`. Since LayerNorm is typically applied over the last dimensions, you might need to reshape or apply it carefully after conv layers (e.g., normalize over C, H, W). Evaluate the impact.

*Technical note:* This is less common for CNNs. One approach: get shape `(N, C, H, W)`, apply `nn.LayerNorm((C, H, W))` elementwise. Alternatively, apply after flattening in the classifier. Compare performance with BatchNorm. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 13:** Introduce dilated convolutions (`dilation=2`) in the *last* convolutional block to increase the receptive field just before classification. Adjust padding to maintain spatial size. Train and compare results.

*Technical note:* In the final conv block, set `dilation=2, padding=2` (for 3x3 kernel) in the `nn.Conv2d` layers. Train the model and compare `test_acc` with the baseline. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 14:** Create a significantly deeper version of TinyVGG by adding two more identical convolutional blocks (Conv-ReLU-Conv-ReLU-Pool). Analyze the effect on performance, training time, and potential overfitting.

*Technical note:* Add two more `nn.Sequential` blocks following the existing pattern. Adjust classifier input size. Be mindful of potential gradient issues in deeper networks without skips/BatchNorm. Train and compare `test_acc`, training time, and train/test loss gap. Libraries: `torch.nn`. Metrics: Test Accuracy, Training Time, Overfitting gap.

---
**Variant 15:** Build a model using only 1x1 convolutional layers after an initial larger kernel convolution. This forms a "Network in Network" style architecture. E.g., Conv(3x3) -> Conv(1x1) -> Conv(1x1) -> Pool -> ...

*Technical note:* Use `nn.Conv2d(..., kernel_size=1)` for most layers. These act like linear transformations across channels. Can be used to reduce/increase channels or add non-linearity between channel interactions. Train and compare with baseline. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 16:** Implement a simple Squeeze-and-Excitation (SE) block after the *first* convolutional block. Compare its impact versus adding it later in the network (as in another variant) or not adding it at all.

*Technical note:* Define `SEBlock`. Insert it after the first main conv block. Train the model. Compare `test_acc` results with baseline and potentially SE block added later. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 17:** Remove all pooling layers from the model and rely solely on strided convolutions for spatial downsampling. Replace each `nn.MaxPool2d(2, 2)` with a `nn.Conv2d` having `stride=2` (adjust padding). Compare performance.

*Technical note:* Remove `nn.MaxPool2d`. Use `stride=2` in the second `nn.Conv2d` of each block (or the first, consistently). Adjust padding to manage spatial dimensions. Train and compare `test_acc`. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 18:** Change the number of neurons in the final hidden layer (if using a multi-layer classifier head, as in Variant 9). Experiment with significantly fewer (e.g., 32) or more (e.g., 512) hidden units. Evaluate the impact on accuracy and overfitting.

*Technical note:* If using `Linear -> ReLU -> Linear`, change the `hidden_units` in the first `nn.Linear` and the `in_features` of the second. Train different versions. Compare `test_acc` and train/test loss gap. Libraries: `torch.nn`. Metrics: Test Accuracy, Overfitting gap.

---
**Variant 19:** Use `nn.AdaptiveMaxPool2d((1, 1))` instead of `nn.AdaptiveAvgPool2d((1, 1))` for the Global Pooling layer before the classifier. Compare if max pooling captures different signal than average pooling at this stage.

*Technical note:* Replace `nn.AdaptiveAvgPool2d((1, 1))` with `nn.AdaptiveMaxPool2d((1, 1))` before the final `nn.Flatten()` and `nn.Linear`. Train and compare `test_acc`. Libraries: `torch.nn`. Metrics: Test Accuracy.

---
**Variant 20:** Construct an ensemble model *within* the architecture by having parallel convolutional pathways in the first block that are later concatenated or added before feeding into the subsequent layers (similar to Inception module idea).

*Technical note:* In the first block, define parallel `nn.Conv2d` layers (e.g., 1x1, 3x3). Concatenate their outputs along the channel dimension using `torch.cat(..., dim=1)`. Pass the combined tensor to the next layer. Train and compare performance. Libraries: `torch.nn`, `torch`. Metrics: Test Accuracy.

<a class="anchor" id="5.6"></a>

## <span style="color:red; font-size:1.5em;">Task 6. Making predictions</span>

[Go back to the content](#5)

**Variant 1:** Create a dedicated function `predict_single_image(model, image_path, transform, class_names, device)` that takes a file path to a single food image, loads it, applies the necessary transformations, performs inference using the trained model, and returns the predicted class name (e.g., "pizza"). Test this function on a few sample images from the test set.

*Technical note:* The function should load the image (PIL), apply the test transform (Resize, ToTensor, Normalize), add batch dimension (`unsqueeze(0)`), move tensor to device, pass through `model.eval()`, get prediction index (`torch.argmax`), and map index to `class_names`. Libraries: `torch`, `torchvision`, `PIL`. Metrics: Correct class name output.

---
**Variant 2:** Write a loop to iterate through the entire test `DataLoader`, make predictions for each batch, and store all predicted labels and corresponding true labels. Calculate the overall accuracy on the test set based on these stored predictions.

*Technical note:* Set `model.eval()`. Use `torch.inference_mode()`. Loop through `test_dataloader`. Collect `y_preds = torch.argmax(model(X.to(device)), dim=1)` and `y_true = y.to(device)`. Use `torch.cat` to accumulate predictions and labels. Calculate accuracy: `(torch.eq(all_y_preds, all_y_true).sum().item() / len(all_y_true)) * 100`. Libraries: `torch`. Metrics: Test Accuracy.

---
**Variant 3:** Calculate and visualize the confusion matrix for the test set predictions. Use the stored predictions and true labels from Variant 2. Employ `torchmetrics` or `scikit-learn` to compute the matrix and `matplotlib`/`seaborn` to display it as a heatmap with labels for pizza, steak, and sushi.

*Technical note:* Use `torchmetrics.classification.MulticlassConfusionMatrix(num_classes=3)` or `sklearn.metrics.confusion_matrix`. Pass the accumulated `all_y_preds.cpu()` and `all_y_true.cpu()`. Use `matplotlib.pyplot.imshow` or `seaborn.heatmap` for visualization, annotating axes with `class_names`. Libraries: `torchmetrics` or `sklearn.metrics`, `matplotlib.pyplot`, `seaborn`. Metrics: Confusion Matrix visualization.

---
**Variant 4:** Calculate the Top-2 accuracy for the test set. For each sample, check if the true label is among the top 2 predicted classes (based on softmax probabilities or logits). Report the overall Top-2 accuracy.

*Technical note:* Inside the test loop with `torch.inference_mode()`, get model outputs (logits). Use `torch.topk(outputs, k=2, dim=1)` to get indices of top 2 predictions. Check if the true label `y` is present in these top 2 indices for each sample. Calculate overall percentage. Libraries: `torch`. Metrics: Top-2 Accuracy.

---
**Variant 5:** Identify and visualize several images from the test set that the model misclassified. Display the image along with its true label and the incorrect label predicted by the model. This helps in understanding failure cases.

*Technical note:* After getting all predictions and labels (Variant 2), find indices where `all_y_preds != all_y_true`. Retrieve the corresponding original images (requires mapping indices back to dataset samples or storing paths/images). Use `matplotlib.pyplot.imshow` to display image, title with "True: {true_label}, Pred: {pred_label}". Libraries: `torch`, `matplotlib.pyplot`, `torch.utils.data`.

---
**Variant 6:** Demonstrate saving and loading the trained model for inference. Save the `state_dict` of your best performing model. Then, in a separate step or script, instantiate the model architecture, load the saved `state_dict`, set the model to evaluation mode, and use it to predict on a sample test image using the function from Variant 1.

*Technical note:* Use `torch.save(model.state_dict(), 'best_model.pth')`. Later: `model = YourModelArchitecture(...)`, `model.load_state_dict(torch.load('best_model.pth'))`, `model.to(device)`, `model.eval()`. Perform prediction. Libraries: `torch`. Metrics: Successful prediction after loading.

---
**Variant 7:** Test the trained model's generalization by predicting on 3-5 new images of pizza, steak, or sushi obtained from external sources (e.g., web search). Ensure these images are preprocessed using the same `test_transform` pipeline before feeding them to the model. Report the predictions.

*Technical note:* Download images. Use the `predict_single_image` function (Variant 1) or manually apply loading (PIL), `test_transform`, batching, and inference. Observe if the model correctly classifies these unseen images. Libraries: `torch`, `torchvision`, `PIL`, `requests` (optional for download). Metrics: Qualitative prediction accuracy on new data.

---
**Variant 8:** Analyze the model's confidence scores. For all test set predictions, extract the softmax probability associated with the predicted class. Plot two histograms: one for the confidence scores of correctly classified images and one for incorrectly classified images.

*Technical note:* Get model outputs (logits). Apply `torch.softmax(outputs, dim=1)`. Get the max probability for each prediction: `confidences, preds = torch.max(probabilities, dim=1)`. Separate confidences based on whether `preds == y_true`. Use `matplotlib.pyplot.hist` to plot distributions. Libraries: `torch`, `matplotlib.pyplot`. Metrics: Confidence histograms.

---
**Variant 9:** Implement simple Test-Time Augmentation (TTA). For each image in the test set, make predictions on both the original image and its horizontally flipped version. Average the softmax probability outputs from both versions before determining the final predicted class. Compare accuracy with and without TTA.

*Technical note:* Define a flip transform: `hflip = transforms.RandomHorizontalFlip(p=1.0)`. In the test loop: get prediction `p1` for original `X`. Get prediction `p2` for `hflip(X)`. Average probabilities: `avg_probs = (torch.softmax(p1, dim=1) + torch.softmax(p2, dim=1)) / 2`. Predict class from `avg_probs`. Calculate accuracy and compare to non-TTA accuracy. Libraries: `torch`, `torchvision.transforms`. Metrics: Test Accuracy comparison.

---
**Variant 10:** Implement Grad-CAM (Gradient-weighted Class Activation Mapping) to visualize model attention. Select a few correctly classified test images (one for each class). Generate Grad-CAM heatmaps overlayed on the images to show which regions influenced the classification decision.

*Technical note:* Requires accessing intermediate feature maps and gradients. Can use libraries like `pytorch-grad-cam` or implement manually by hooking into the last conv layer, getting gradients w.r.t. feature map, pooling gradients, weighting feature maps. Overlay heatmap using OpenCV or Matplotlib. Libraries: `torch`, `cv2` (optional), `matplotlib.pyplot`, `pytorch-grad-cam` (optional). Metrics: Visual heatmap interpretation.

---
**Variant 11:** Calculate and report detailed per-class performance metrics from the test set predictions. Compute precision, recall, and F1-score specifically for the 'pizza', 'steak', and 'sushi' classes. Identify which class the model performs best/worst on.

*Technical note:* Use `torchmetrics.classification.MulticlassPrecision`, `MulticlassRecall`, `MulticlassF1Score` with `average='none'` or `sklearn.metrics.classification_report(..., output_dict=True)`. Extract metrics for each class index. Libraries: `torchmetrics` or `sklearn.metrics`. Metrics: Per-class Precision, Recall, F1-score.

---
**Variant 12:** Measure the average inference latency per image. Create a loop that loads a single test image, preprocesses it, moves it to the device, and runs `model(image_batch)` N times (e.g., N=100). Record the time for each inference and calculate the average. Ensure GPU synchronization if using CUDA.

*Technical note:* Load and prep one image `img_tensor.to(device)`. Use `model.eval()`, `torch.inference_mode()`. Loop N times: `start = time.time(); _ = model(img_tensor); torch.cuda.synchronize()` (if GPU); `end = time.time(); times.append(end - start)`. Calculate `np.mean(times[1:])` (skip first). Libraries: `torch`, `time`, `numpy`. Metrics: Average inference time (ms/image).

---
**Variant 13:** Identify low-confidence predictions on the test set. After obtaining softmax probabilities for all test samples, find those where the maximum probability (confidence) is below a certain threshold (e.g., 0.6 or 0.7). Display a few of these low-confidence images and their predicted labels.

*Technical note:* Get `confidences, preds = torch.max(torch.softmax(outputs, dim=1), dim=1)`. Find indices where `confidences < threshold`. Retrieve and display corresponding images/labels. Analyze if these are ambiguous images or failures. Libraries: `torch`, `matplotlib.pyplot`. Metrics: Qualitative analysis of low-confidence samples.

---
**Variant 14:** Compare predictions from models saved at different training stages. Load model checkpoints from an early epoch (e.g., epoch 5) and a late epoch (e.g., final epoch). Run predictions on the entire test set using both loaded models. Compare their overall accuracy and confusion matrices.

*Technical note:* Requires saving checkpoints during training (Task 3, Var 12). Load `state_dict` from `checkpoint_epoch_5.pth` and `checkpoint_final.pth` into separate model instances. Run evaluation loop (Variant 2) for both. Compare accuracy and confusion matrices. Libraries: `torch`. Metrics: Accuracy comparison, Confusion matrix comparison.

---
**Variant 15:** Evaluate the effect of input resolution on prediction accuracy at inference time. Take the trained model (e.g., trained on 64x64). Preprocess the test set images using different resize values (e.g., 32x32, 64x64, 128x128) via `transforms.Resize`. Make predictions for each resolution and report the test accuracy.

*Technical note:* Create multiple test `DataLoader`s, each with a different `transforms.Resize((size, size))` in its transform pipeline. Run the evaluation loop (Variant 2) for each loader using the *same trained model*. Report accuracy for each test resolution. Libraries: `torch`, `torchvision.transforms`, `torch.utils.data`. Metrics: Test Accuracy vs. Inference Resolution.

---
**Variant 16:** Simulate a simple model ensemble prediction. Train the same model architecture twice using different random seeds to get `model_1` and `model_2`. Load both models. For each test sample, get softmax probabilities from both, average them (`(probs1 + probs2) / 2`), and determine the final prediction based on the averaged probabilities. Compare ensemble accuracy to individual model accuracies.

*Technical note:* Train two models. Load both `model_1.eval()`, `model_2.eval()`. In test loop: `out1 = model_1(X); out2 = model_2(X); avg_probs = (torch.softmax(out1, 1) + torch.softmax(out2, 1)) / 2; preds = torch.argmax(avg_probs, 1)`. Calculate accuracy. Compare with individual model accuracies. Libraries: `torch`. Metrics: Ensemble Accuracy, Individual Accuracies comparison.

---
**Variant 17:** Identify the 'hardest' examples in the test set based on loss value. During the `test_step` loop, calculate the loss for each individual sample (or batch average). Store the samples (or their indices) with the highest loss values. Visualize the top 5 hardest images.

*Technical note:* Modify `test_step` or a separate evaluation loop. Calculate loss per sample: `loss = loss_fn(output, y, reduction='none')`. Store `(loss.item(), index)` for each sample. Sort by loss descending. Retrieve and display top N images. Libraries: `torch`, `matplotlib.pyplot`. Metrics: Qualitative analysis of high-loss samples.

---
**Variant 18:** Assess prediction robustness to basic image augmentations. Take 5 correctly classified test images. Apply several different augmentations (e.g., rotation 30 degrees, increased brightness, Gaussian blur) to each. Make predictions on these augmented versions and check if the prediction remains the same as the original.

*Technical note:* Define several augmentation transforms (`transforms.RandomRotation(30)`, `transforms.ColorJitter(brightness=0.5)`, `transforms.GaussianBlur(kernel_size=3)`). Apply each to the sample images. Use `predict_single_image` (Variant 1) on augmented versions. Report consistency/changes in predictions. Libraries: `torchvision.transforms`, `PIL`. Metrics: Prediction consistency under augmentation.

---
**Variant 19:** Visualize the learned feature space using dimensionality reduction. Pass all test images through the trained model up to the penultimate layer (e.g., the output of `nn.Flatten` or GAP, before the final `nn.Linear`). Collect these feature vectors. Use t-SNE or UMAP to reduce dimensions to 2D and create a scatter plot, coloring points by their *true* class labels.

*Technical note:* Requires modifying the model or using hooks to extract features from an intermediate layer. Collect features for all test samples. Use `sklearn.manifold.TSNE(n_components=2)` or `umap.UMAP(n_components=2)`. Create scatter plot using `matplotlib.pyplot.scatter`, coloring based on true labels. Libraries: `torch`, `sklearn.manifold` or `umap-learn`, `matplotlib.pyplot`, `numpy`. Metrics: 2D feature visualization.

---
**Variant 20:** Perform prediction consistency checks. If using dropout, run inference multiple times (e.g., 10 times) on the same test image with `model.train()` (to keep dropout active) but within `torch.inference_mode()`. Analyze the variance in the output probabilities or predicted class. If not using dropout, run inference multiple times with `model.eval()` to confirm deterministic output.

*Technical note:* For MC Dropout: `model.train()`, `with torch.inference_mode(): outputs = [model(X) for _ in range(10)]`. Analyze `torch.stack(outputs).std(dim=0)`. For determinism check: `model.eval()`, run `model(X)` multiple times and assert outputs are identical. Libraries: `torch`. Metrics: Output variance (MC Dropout) or Determinism confirmation.