<h1><center>Laboratory work 6.</center></h1>
<h2><center>PyTorch Going Modular Exercises</center></h2>

**Completed:** Last name and First name

**Variant:** #__

<a class="anchor" id="5"></a>

## Outline

1. [Task 1. Modular Data Preparation](#5.1)
2. [Task 2. Modular Model Creation and Configuration](#5.2)
3. [Task 3. Modular Training and Testing Loops](#5.3)
4. [Task 4. Modular Training Script](#5.4)
5. [Task 5. Modular Prediction Script](#5.5)

In [None]:
# Import torch
import torch
from torch import nn

# Exercises require PyTorch > 1.10.0
print(torch.__version__)

# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

For all tasks below provided, utilize our custom Pizza Steak Sushi dataset from the [GitHub repository](https://github.com/radiukpavlo/conducting-experiments/blob/main/data/pizza_steak_sushi.zip).

<a class="anchor" id="5.1"></a>

## <span style="color:red; font-size:1.5em;">Task 1. Modular Data Preparation (`data_setup.py`)</span>

[Go back to the content](#5)

**Variant 1:** Enhance the robustness of `data_setup.py` by implementing a detailed data validation function, `validate_dataset_structure(base_path, required_classes)`. This function should meticulously check if the `base_path` directory exists, if it contains 'train' and 'test' subdirectories, and if each of these subdirectories contains further subdirectories corresponding to every class name in `required_classes` (['pizza', 'steak', 'sushi']). If any part of this expected structure is missing, the function should raise a specific `FileNotFoundError` or `ValueError` with a highly descriptive message pinpointing the exact missing element (e.g., "Error: Test directory 'data/pizza_steak_sushi/test' not found." or "Error: Missing class subfolder 'sushi' inside 'data/pizza_steak_sushi/train'."). Modify `create_dataloaders` to call this validation function at the very beginning, ensuring that data loading proceeds only if the structure is confirmed. Additionally, incorporate Python's `logging` module to report the validation status (e.g., "INFO: Dataset structure validated successfully." or "ERROR: Dataset validation failed. See previous error messages.").

*Technical note:* Utilize `pathlib.Path` for robust path manipulation and checks (`exists()`, `is_dir()`). The validation function should iterate through `['train', 'test']` and `required_classes` performing checks. `create_dataloaders` should have a `try...except` block around the validation call or let exceptions propagate. Configure basic logging in the main script using `logging.basicConfig`. Libraries: `pathlib`, `logging`, `torch`, `torchvision`, `torch.utils.data`. Metrics: Observe script behavior with correctly and incorrectly structured data directories; check log output clarity.

---
**Variant 2:** Augment `data_setup.py` to facilitate rapid prototyping and debugging by adding subset loading capability. Introduce an optional float argument `subset_fraction` (e.g., default 1.0, meaning use all data) and an optional integer `random_seed` for reproducibility to the `create_dataloaders` function signature. If `subset_fraction` is less than 1.0, the function must load only this specified fraction (e.g., 0.1 for 10%) of the *original* training and testing datasets. Crucially, this subsetting must preserve the relative class distribution within both the training and testing subsets. Implement this using `sklearn.model_selection.train_test_split` with the `stratify` option on the list of targets obtained from the initial `ImageFolder` dataset. The function should log the original and subset sizes for clarity.

*Technical note:* First, create `datasets.ImageFolder` instances for the full train/test sets to access `dataset.samples` and `dataset.targets`. Use these targets with `train_test_split` (setting `train_size=subset_fraction`, `random_state=random_seed`, `stratify=targets`) to get indices for the subset. Create `torch.utils.data.Subset` using the original dataset and these indices. Return DataLoaders based on these subsets. Libraries: `torch`, `torchvision`, `torch.utils.data`, `sklearn.model_selection`, `numpy`, `logging`. Metrics: Verify `len(train_dataloader.dataset)` and `len(test_dataloader.dataset)` match the expected subset size; check class distribution in a few batches.

---
**Variant 3:** Introduce configurable data augmentation strategies within `data_setup.py`. Modify `create_dataloaders` to accept a string argument `augmentation_strategy` with possible values like 'none', 'light', 'moderate', 'heavy'. Define distinct `torchvision.transforms.Compose` pipelines for each strategy. 'none' should only include Resize, ToTensor, and Normalize (if calculated). 'light' could add `RandomHorizontalFlip`. 'moderate' might add `RandomRotation(15)`. 'heavy' could incorporate `ColorJitter` and `RandomAffine`. The chosen strategy should only be applied to the training dataset; the test/validation dataset must consistently use the 'none' pipeline (Resize, ToTensor, Normalize) for stable evaluation. Ensure normalization transform, if used, is applied last in all pipelines.

*Technical note:* Use conditional logic (`if/elif/else`) within `create_dataloaders` to construct `train_transform` based on `augmentation_strategy`. Define `test_transform` separately and consistently. Examples: `light = transforms.Compose([Resize, RandomHorizontalFlip, ToTensor, Normalize])`, `heavy = transforms.Compose([Resize, RandomAffine(degrees=10, translate=(0.1, 0.1)), ColorJitter(brightness=0.2), RandomHorizontalFlip, ToTensor, Normalize])`. Libraries: `torchvision.transforms`, `torch.utils.data`. Evaluation: Train models with different strategies and compare validation accuracy curves.

---
**Variant 4:** Implement automated calculation and application of dataset-specific normalization statistics in `data_setup.py`. Create a utility function `calculate_mean_std(dataloader)` that iterates through *all* batches of the provided training `DataLoader` once, accumulating the sum and sum-of-squares of pixel values across all images for each channel (RGB). From these sums, compute the per-channel mean and standard deviation. Modify `create_dataloaders` to accept a boolean argument `compute_normalize`. If `True`, it should first create a temporary DataLoader for the training set *without* normalization, pass it to `calculate_mean_std`, obtain the stats, and then create the final train and test transforms including `transforms.Normalize` with these computed values. Optionally, cache the computed mean/std values (e.g., in a `.pt` or `.json` file) associated with the dataset path to avoid re-computation on subsequent runs.

*Technical note:* The calculation needs careful handling of batches: accumulate `sum`, `squared_sum`, `num_pixels`. `mean = sum / num_pixels`, `std = sqrt(squared_sum / num_pixels - mean**2)`. Ensure iteration covers the entire dataset. Modify `create_dataloaders` flow: create initial dataset -> temp dataloader -> calculate stats -> create final transforms with stats -> create final dataloaders. Caching involves checking file existence and load/save logic. Libraries: `torch`, `torchvision`, `numpy` (for calculations), `json` or `torch` (for caching), `os`. Metrics: Report computed mean/std values; verify normalization layer uses them.

---
**Variant 5:** Enhance `data_setup.py` to reliably create a three-way data split: training, validation, and testing. Modify `create_dataloaders` to accept a `validation_split_ratio` argument (e.g., 0.15 for 15%) and a `random_seed` for deterministic splitting. The function should first load the *original* training dataset specified by `train_dir`. Then, it should split this original training set into new, smaller training and validation sets based on the `validation_split_ratio`, ensuring stratification by class labels using `sklearn.model_selection.train_test_split`. The original test set (from `test_dir`) remains untouched. The function must return three `DataLoader` instances (train, validation, test) and the list of class names.

*Technical note:* Load full training `ImageFolder`. Get indices `list(range(len(train_dataset)))` and targets `train_dataset.targets`. Use `train_test_split` on indices with `test_size=validation_split_ratio`, `random_state=random_seed`, `stratify=train_dataset.targets` to get `train_indices` and `val_indices`. Create `train_subset = Subset(train_dataset, train_indices)` and `val_subset = Subset(train_dataset, val_indices)`. Create DataLoaders for `train_subset`, `val_subset`, and the original `test_dataset`. Libraries: `torch`, `torchvision`, `torch.utils.data.Subset`, `sklearn.model_selection`.

---
**Variant 6:** Address potential class imbalance in the `Pizza Steak Sushi` training data by incorporating `WeightedRandomSampler` into `data_setup.py`. Create a helper function `compute_class_weights(dataset)` that calculates the number of samples per class in the provided `dataset` and returns a weight for each sample, typically calculated as `weight = total_samples / (num_classes * samples_in_class)`. Modify `create_dataloaders` to accept a boolean argument `use_weighted_sampler`. If `True`, compute these sample weights for the training dataset, create a `WeightedRandomSampler` instance using these weights, and configure the training `DataLoader` to use this sampler (remembering to set `shuffle=False` in the DataLoader, as the sampler handles shuffling). The validation and test DataLoaders should not use weighted sampling.

*Technical note:* Calculate class counts from `dataset.targets`. Compute weights per class. Create a tensor `sample_weights` where `sample_weights[i]` is the weight for the class of sample `i`. Instantiate `sampler = WeightedRandomSampler(sample_weights, num_samples=len(sample_weights), replacement=True)`. Pass `sampler=sampler` and `shuffle=False` to the training `DataLoader`. Libraries: `torch`, `torchvision`, `torch.utils.data.WeightedRandomSampler`, `numpy`. Evaluation: Train with and without the sampler, compare validation metrics, especially per-class accuracy.

---
**Variant 7:** Bolster `data_setup.py` with a pre-emptive image integrity check. Implement a function `verify_image_files(image_folder_path, corrupted_dir_name='_corrupted')` that recursively scans the train and test directories within `image_folder_path`. For each file found, it attempts to open it using `PIL.Image.open()` and immediately calls `img.load()` to force reading the image data. If any exception occurs during this process, it logs a detailed error message including the file path and the exception type. Optionally, if `corrupted_dir_name` is provided, it moves the problematic file to a subdirectory with that name within its original parent folder (e.g., `train/pizza/_corrupted/bad_image.jpg`). The `create_dataloaders` function should execute this verification *before* attempting to instantiate `ImageFolder`, preventing crashes during training due to bad files.

*Technical note:* Use `pathlib.Path(image_folder_path).rglob('*.*')` to find all files. In the loop, check if file extension is common image type. Use a `try...except Exception as e:` block around `img = PIL.Image.open(filepath); img.load()`. Use `logging.error(...)`. Use `shutil.move(filepath, corrupted_path)` for moving files, creating the corrupted directory if needed (`os.makedirs(..., exist_ok=True)`). Libraries: `PIL`, `pathlib`, `logging`, `shutil`, `os`. Metrics: Run on a dataset potentially containing corrupted images; check log output and presence of `_corrupted` directories.

---
**Variant 8:** Explore advanced data loading by allowing `create_dataloaders` in `data_setup.py` to accept a custom `collate_fn` function as an argument. Implement a specific example: `collate_pad(batch)`, which takes a batch of `(image_tensor, label)` tuples where image tensors might have varying sizes (e.g., after resizing with aspect ratio preservation but before cropping). This function should find the maximum height and width in the batch, create zero-padded tensors of shape `(batch_size, C, max_H, max_W)`, copy each image tensor into the top-left corner of its corresponding padded tensor, and stack labels as usual. Pass this `collate_pad` function to the `DataLoader` construction using the `collate_fn` argument. Analyze the implications for model input layers.

*Technical note:* `collate_pad(batch)`: `images, labels = zip(*batch)`. Find `max_h = max(img.shape[1] for img in images)`, `max_w = max(img.shape[2] for img in images)`. Create `padded_images = torch.zeros(len(images), images[0].shape[0], max_h, max_w)`. Loop through images, copy `padded_images[i, :, :img.shape[1], :img.shape[2]] = img`. Stack labels: `labels = torch.tensor(labels)`. Return `padded_images, labels`. Note: TinyVGG expects fixed input size, so this mainly demonstrates `collate_fn` usage unless the model is adapted. Libraries: `torch`, `torch.utils.data`.

---
**Variant 9:** Deepen understanding of dataset mechanics by replacing the use of `torchvision.datasets.ImageFolder` in `data_setup.py` with a fully custom `Dataset` class named `PizzaSteakSushiDataset`. This class must replicate the core functionality: its `__init__` method should scan the provided directory (`train_dir` or `test_dir`), identify the classes ('pizza', 'steak', 'sushi') based on subfolder names, create a mapping from class names to integer indices (0, 1, 2), build a list of `(image_path, class_index)` samples, and store the required `transform`. The `__len__` method should return the total number of samples, and `__getitem__(self, idx)` must load the image from the path corresponding to `idx`, apply the stored transform, and return the transformed image tensor and its integer label. `create_dataloaders` should then use this custom class.

*Technical note:* `__init__(self, root_dir, transform=None)`: Use `os.listdir` or `pathlib` to find class subfolders. Build `self.class_to_idx`. Iterate through class folders to populate `self.samples = [(path, idx), ...]`. Store `self.transform`. `__len__(self)`: return `len(self.samples)`. `__getitem__(self, idx)`: Get `img_path, label = self.samples[idx]`. Load with `PIL.Image.open(img_path).convert('RGB')`. Apply `self.transform(image)` if it exists. Return transformed image and label. Libraries: `torch`, `torch.utils.data.Dataset`, `PIL`, `os`, `pathlib`.

---
**Variant 10:** Add a utility function `visualize_transformed_batch(dataloader, class_names, num_images=16, save_path=None)` to `data_setup.py`. This function should fetch a single batch from the provided `dataloader`, denormalize the image tensors if a `transforms.Normalize` step was applied (requires knowing the mean/std used), convert them to a displayable format (e.g., HWC, NumPy array), and create a grid visualization of `num_images` from the batch using `matplotlib`. Each image in the grid should be titled with its corresponding class name from the `class_names` list. Optionally, if `save_path` is provided, save the visualization grid to that file path instead of displaying it interactively. Modify `create_dataloaders` to optionally call this function after creating the training DataLoader.

*Technical note:* Need inverse normalization: `img = img * std[:, None, None] + mean[:, None, None]`. Use `torchvision.utils.make_grid(images, nrow=...)` for easy grid creation. Clamp values to [0, 1]. Convert tensor to NumPy `img.permute(1, 2, 0).cpu().numpy()`. Use `plt.imshow()`, `plt.title()`, `plt.axis('off')`. Handle `save_path` with `plt.savefig()`. Need mean/std used in normalization. Libraries: `torch`, `torchvision.utils`, `matplotlib.pyplot`, `numpy`.

---
**Variant 11:** Integrate RandAugment, a more advanced automated data augmentation technique, into `data_setup.py`. Modify `create_dataloaders` to accept optional integer arguments `randaugment_n` (number of augmentations to apply, e.g., 2) and `randaugment_m` (magnitude of augmentations, e.g., 9). If these are provided (and > 0), include `transforms.RandAugment(num_ops=randaugment_n, magnitude=randaugment_m)` in the training transform pipeline *before* `ToTensor` and normalization. Compare the training dynamics and final performance against simpler augmentation strategies.

*Technical note:* Add `transforms.RandAugment(num_ops=args.randaugment_n, magnitude=args.randaugment_m)` to the `transforms.Compose` list for training data, typically after resizing but before ToTensor. Requires `torchvision >= 0.11`. Libraries: `torchvision.transforms`. Evaluation: Monitor training curves; compare final validation accuracy.

---
**Variant 12:** Simulate a real-world scenario where the dataset is provided as a single `.zip` archive (`pizza_steak_sushi.zip`). Modify `data_setup.py` to include a custom `Dataset` class, `ZipImageDataset`, that can read images directly from the zip archive without fully extracting it to disk. The `__init__` method should open the zip file using `zipfile.ZipFile` and parse the internal file structure to build the list of samples (image paths within the zip, class indices). The `__getitem__` method should use `zip_ref.read(filename)` to get the image bytes, decode them using `io.BytesIO` and `PIL.Image.open`, apply transforms, and return the image and label. `create_dataloaders` should be adapted to use this dataset if the input path points to a zip file.

*Technical note:* `ZipImageDataset.__init__(self, zip_path, transform=None)`: Open `zipfile.ZipFile(zip_path, 'r')`. Use `zip_ref.namelist()` to find image files and infer structure/classes. Build `self.samples`. `__getitem__(self, idx)`: Get `img_name_in_zip, label`. `img_bytes = self.zip_ref.read(img_name_in_zip)`. Load with `PIL.Image.open(io.BytesIO(img_bytes)).convert('RGB')`. Apply transform. Ensure `ZipFile` object is kept open or reopened as needed. Libraries: `zipfile`, `io`, `PIL`, `torch`, `torch.utils.data.Dataset`.

---
**Variant 13:** Implement MixUp augmentation directly within the data loading pipeline in `data_setup.py`. This is more advanced than applying it in the training loop. Create a custom `collate_fn`, `mixup_collate(batch, alpha=0.4)`, which takes a standard batch from the Dataset. Inside this function, pair up samples within the batch, sample a mixing coefficient `lambda` from a Beta distribution (`Beta(alpha, alpha)`), and create a new batch where images are `lam * img1 + (1 - lam) * img2` and labels are `lam * label1 + (1 - lam) * label2` (requires labels to be one-hot encoded first, or handle loss function appropriately later). Pass this `mixup_collate` function to the training `DataLoader`.

*Technical note:* `mixup_collate`: Get `images, labels = zip(*batch)`. Convert labels to one-hot: `labels_onehot = F.one_hot(torch.tensor(labels), num_classes=N).float()`. Generate `lam = np.random.beta(alpha, alpha)`. Shuffle batch indices: `shuffled_indices = torch.randperm(len(images))`. Mix images: `mixed_images = lam * images + (1 - lam) * images[shuffled_indices]`. Mix labels: `mixed_labels = lam * labels_onehot + (1 - lam) * labels_onehot[shuffled_indices]`. Return `mixed_images, mixed_labels`. Need `num_classes` N. Loss function must handle soft labels (like `nn.CrossEntropyLoss`). Libraries: `torch`, `torch.nn.functional as F`, `numpy`, `torch.utils.data`.

---
**Variant 14:** Generalize image file handling in `data_setup.py`. Modify the logic (either `ImageFolder`'s `is_valid_file` parameter or the file discovery part of a custom `Dataset`) to accept a list of allowed file extensions provided as an argument (e.g., `allowed_extensions=['.jpg', '.jpeg', '.png', '.bmp', '.gif']`) to `create_dataloaders`. The data loading should robustly identify and attempt to load files with any of these extensions, skipping or logging warnings for files with other extensions found within the class directories.

*Technical note:* Define `is_valid_image_file(filename, extensions)` function. Use `filename.lower().endswith(tuple(extensions))`. Pass this function to `ImageFolder(..., is_valid_file=lambda x: is_valid_image_file(x, args.allowed_extensions))`. Or implement similar filtering logic in a custom `Dataset` when building `self.samples`. Libraries: `os`, `pathlib`, `torchvision.datasets`, `logging`.

---
**Variant 15:** Integrate detailed performance profiling for the data loading process within `data_setup.py`. Create a comprehensive function `profile_dataloader(dataloader, num_batches=50, profile_memory=False)` that iterates through `num_batches`. Use `torch.profiler.profile` to capture CPU time, CUDA time (if applicable), and optionally memory usage (`profile_memory=True`). The function should then print a summary table generated by the profiler (`prof.key_averages().table(...)`) focusing on data loading related operations (e.g., workers, transforms, collate). Modify `create_dataloaders` to optionally call this profiler for different `num_workers` settings (0, 2, 4) and report the results.

*Technical note:* Use `with torch.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], profile_memory=profile_memory) as prof:` around the batch iteration loop. After the loop, print `prof.key_averages().table(...)`. Ensure enough `num_batches` are processed for meaningful results. Call this function within `create_dataloaders` based on an argument like `profile_workers=True`. Libraries: `torch.profiler`, `torch.utils.data`, `time`. Metrics: Profiler table output showing time spent in different operations.

---
**Variant 16:** Implement aspect ratio preserving resize followed by random cropping for training data augmentation in `data_setup.py`. Modify the training transform pipeline: first, use `transforms.Resize(target_size)` where `target_size` is an integer (e.g., 72 for a final 64x64 crop), resizing the shortest edge to `target_size`. Then, apply `transforms.RandomCrop(output_size)` where `output_size` is the desired square dimension (e.g., 64). For testing, use `transforms.Resize(target_size)` followed by `transforms.CenterCrop(output_size)`. Compare this method against simple `transforms.Resize((output_size, output_size))`.

*Technical note:* Training transform: `transforms.Compose([transforms.Resize(72), transforms.RandomCrop(64), ...])`. Testing transform: `transforms.Compose([transforms.Resize(72), transforms.CenterCrop(64), ...])`. Adjust `target_size` and `output_size` as needed. Libraries: `torchvision.transforms`. Evaluation: Observe visual quality of crops; compare model performance.

---
**Variant 17:** Replace `torchvision.transforms` with the `Albumentations` library for data augmentation in `data_setup.py`. Create a custom `Dataset` class (`AlbumentationsDataset`). Define an Albumentations composition pipeline using `A.Compose([...])` including at least three diverse augmentations (e.g., `A.HorizontalFlip(p=0.5)`, `A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.1, rotate_limit=15, p=0.5)`, `A.RandomBrightnessContrast(p=0.3)`). In the `__getitem__` method, load the image as a NumPy array (using `cv2` or `PIL+numpy`), apply the Albumentations pipeline, convert the augmented NumPy array back to a PyTorch tensor (handling channel order C, H, W), and return it with the label. Use this dataset in `create_dataloaders`.

*Technical note:* `import albumentations as A; from albumentations.pytorch import ToTensorV2`. Define `train_transforms = A.Compose([A.Resize(height, width), A.HorizontalFlip(...), ..., ToTensorV2()])`. In `__getitem__`: `image = cv2.imread(img_path); image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB); augmented = train_transforms(image=image); image_tensor = augmented['image']; return image_tensor, label`. Ensure `ToTensorV2` is used for conversion. Libraries: `albumentations`, `cv2`, `numpy`, `torch`, `torch.utils.data.Dataset`.

---
**Variant 18:** Develop a comprehensive dataset reporting function `generate_dataset_report(train_dataset, test_dataset, report_path)` within `data_setup.py`. This function should analyze the provided train and test `Dataset` objects (assuming `ImageFolder` or compatible structure). It must gather statistics like: total number of training images, total number of testing images, number of images per class in training, number of images per class in testing, explicitly list the class names found, and describe the applied transformations (by inspecting the `.transform` attribute, if possible, or requiring it as input). Format this information clearly into a dictionary and save it as a JSON file to `report_path`.

*Technical note:* Access `train_dataset.samples`, `test_dataset.samples`, `train_dataset.classes`, `train_dataset.targets`, `test_dataset.targets`. Count samples and class occurrences. Inspecting `transform` object might involve checking `transform.transforms` if it's a `Compose` object and getting class names `t.__class__.__name__`. Write the collected dictionary using `json.dump(report_dict, open(report_path, 'w'), indent=2)`. Libraries: `json`, `os`, `torch.utils.data`, `collections.Counter` (optional).

---
**Variant 19:** Adapt `data_setup.py` and the associated model expectation for grayscale image processing. Add a boolean argument `force_grayscale` to `create_dataloaders`. If `True`, the image transformation pipeline must include `transforms.Grayscale(num_output_channels=1)` immediately after loading the image and *before* any normalization or other transforms. Crucially, if normalization is computed or applied, it must be based on single-channel statistics. Document that the model used in `train.py` (e.g., TinyVGG's first layer) must be modified to accept `input_shape=1` instead of 3 when this option is enabled.

*Technical note:* Conditionally insert `transforms.Grayscale(num_output_channels=1)` into the `Compose` list. If using computed normalization, `calculate_mean_std` needs adaptation for single channel. Standard ImageNet normalization stats are invalid. Remind user to adjust `model_builder.TinyVGG(input_shape=1, ...)` in `train.py`. Libraries: `torchvision.transforms`.

---
**Variant 20:** Implement dataset caching to accelerate repeated script executions, especially when complex transforms are used. Create a function `load_or_create_transformed_dataset(cache_path, dataset_creator_fn, transform)` in `data_setup.py`. This function first checks if `cache_path` (e.g., `data/pizza_steak_sushi_train_transformed.pt`) exists. If yes, it loads the cached data using `torch.load()`. If not, it calls `dataset_creator_fn()` (which returns an initial `Dataset`, e.g., `ImageFolder`), iterates through the entire dataset applying the `transform` to each sample, collects the transformed images and labels into lists or tensors, saves them to `cache_path` using `torch.save()`, and returns the collected data. Wrap the loaded/collected data in a simple custom `Dataset` for the `DataLoader`. Apply this logic within `create_dataloaders`.

*Technical note:* The cache should store a dictionary `{'images': list_of_tensors, 'labels': list_of_labels}`. The custom wrapper dataset `class CachedDataset(Dataset): def __init__(self, images, labels): self.images = images; self.labels = labels; def __len__ ...; def __getitem__ ...`. Need careful management of cache invalidation if transforms or source data change (e.g., include transform details or data hash in cache filename). Libraries: `torch`, `torch.utils.data.Dataset`, `os`, `pickle` (can be alternative to torch.save).

<a class="anchor" id="5.2"></a>

## <span style="color:red; font-size:1.5em;">Task 2. Modular Model Creation and Configuration (`model_builder.py`, `train.py` argparse)</span>

[Go back to the content](#5)

**Variant 1:** Enhance the `TinyVGG` model in `model_builder.py` by adding configurable dropout for regularization. Introduce a `dropout_rate` float parameter (default 0.5) to the `TinyVGG.__init__` method. Inside the model architecture, insert `nn.Dropout(p=self.dropout_rate)` layers strategically: specifically, after the second `nn.ReLU` in `conv_block_1`, after the second `nn.ReLU` in `conv_block_2`, and just before the final `nn.Linear` layer within the `classifier` block. Update `train.py` to accept a command-line argument `--dropout_rate` using `argparse`, allowing users to experiment with different dropout levels (e.g., 0.0, 0.25, 0.5) to observe its effect on preventing overfitting, particularly noticeable by comparing training vs. validation accuracy curves over epochs.

*Technical note:* Add `self.dropout_rate = dropout_rate` in `__init__`. Insert `nn.Dropout(p=self.dropout_rate)` at specified locations within the `nn.Sequential` definitions. In `train.py`: `parser.add_argument('--dropout_rate', type=float, default=0.5, help='Dropout probability for regularization')`. Pass `dropout_rate=args.dropout_rate` when instantiating `model_builder.TinyVGG`. Libraries: `torch.nn`, `argparse`. Evaluation: Compare train/val accuracy gap for different dropout rates.

---
**Variant 2:** Integrate Batch Normalization (`BatchNorm2d`) into the `TinyVGG` architecture in `model_builder.py` to potentially improve training stability and speed. Modify both `conv_block_1` and `conv_block_2` by inserting an `nn.BatchNorm2d` layer immediately after each `nn.Conv2d` layer and *before* the subsequent `nn.ReLU` activation. The `num_features` for `nn.BatchNorm2d` must match the `out_channels` of the preceding `nn.Conv2d` layer. In `train.py`, add an `argparse` flag `--use_batchnorm` (e.g., `action='store_true'`, default enabled) to conditionally include or exclude these BatchNorm layers during model creation, allowing for direct comparison of training dynamics (convergence speed, learning rate sensitivity) with and without BatchNorm.

*Technical note:* Modify `nn.Sequential` blocks: `nn.Conv2d(...), nn.BatchNorm2d(hidden_units), nn.ReLU(), nn.Conv2d(...), nn.BatchNorm2d(hidden_units), nn.ReLU(), nn.MaxPool2d(...)`. In `train.py`, add `parser.add_argument('--use_batchnorm', default=True, type=lambda x: (str(x).lower() == 'true'))`. Conditionally define the model architecture (e.g., pass `use_bn=args.use_batchnorm` to `TinyVGG` constructor which internally uses it). Libraries: `torch.nn`, `argparse`. Evaluation: Compare loss curves and final accuracy.

---
**Variant 3:** Expand the model zoo in `model_builder.py` by defining a second, distinct architecture, for example, `SimpleCNN`, consisting of only one convolutional block (Conv2d, ReLU, MaxPool2d) followed by a Flatten and Linear layer. Update `train.py` significantly: add a required `argparse` argument `--model_name` with choices (e.g., `'tinyvgg'`, `'simplecnn'`). Dynamically import or select the corresponding model class from `model_builder.py` based on the `args.model_name` value and instantiate it. Ensure the hyperparameters like `hidden_units` and `output_shape` are correctly passed to the chosen model constructor. This demonstrates modular model selection.

*Technical note:* Define `class SimpleCNN(nn.Module): ...` in `model_builder.py`. In `train.py`: `parser.add_argument('--model_name', type=str, required=True, choices=['tinyvgg', 'simplecnn'])`. Use factory pattern or `if/elif`: `if args.model_name == 'tinyvgg': model = model_builder.TinyVGG(...) elif args.model_name == 'simplecnn': model = model_builder.SimpleCNN(...)`. Adjust `hidden_units` meaning/usage if needed per model. Libraries: `torch.nn`, `argparse`, `model_builder`.

---
**Variant 4:** Make the activation function within the `TinyVGG` model in `model_builder.py` a configurable parameter. Modify `TinyVGG.__init__` to accept an `activation_name` string parameter (default 'relu'). Inside `__init__`, map this string to the corresponding `torch.nn` activation class (e.g., 'relu' -> `nn.ReLU`, 'gelu' -> `nn.GELU`, 'silu' -> `nn.SiLU`). Replace all hardcoded `nn.ReLU()` instances in the convolutional blocks with instances of the selected activation class. In `train.py`, add an `argparse` argument `--activation` with choices like `'relu'`, `'gelu'`, `'silu'` to allow command-line selection of the activation function. Train models with different activations and compare performance.

*Technical note:* Add `activation_name: str = 'relu'` to `__init__`. Create mapping: `activation_map = {'relu': nn.ReLU, 'gelu': nn.GELU, 'silu': nn.SiLU}`. Get class: `act_class = activation_map.get(activation_name)`. Use `act_class()` in `nn.Sequential`. In `train.py`: `parser.add_argument('--activation', type=str, default='relu', choices=['relu', 'gelu', 'silu'])`. Pass `activation_name=args.activation` to `TinyVGG`. Libraries: `torch.nn`, `argparse`. Evaluation: Compare validation accuracy.

---
**Variant 5:** Generalize the optimizer selection in `train.py`. Use `argparse` to add a `--optimizer` argument allowing the user to choose from at least three common optimizers: `'adam'` (default), `'sgd'`, and `'rmsprop'`. Add another argument `--momentum` (default 0.9) specifically for the SGD optimizer. Based on the chosen optimizer string, instantiate the correct class from `torch.optim` (`torch.optim.Adam`, `torch.optim.SGD`, `torch.optim.RMSprop`), passing the model parameters, the parsed `--learning_rate`, and the momentum value only if SGD is selected. This allows easy comparison of optimizer performance.

*Technical note:* `parser.add_argument('--optimizer', type=str, default='adam', choices=['adam', 'sgd', 'rmsprop'])`. `parser.add_argument('--momentum', type=float, default=0.9, help='Momentum for SGD')`. Conditional instantiation: `if args.optimizer == 'adam': optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate) elif args.optimizer == 'sgd': optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate, momentum=args.momentum) elif args.optimizer == 'rmsprop': optimizer = torch.optim.RMSprop(model.parameters(), lr=args.learning_rate)`. Libraries: `torch.optim`, `argparse`.

---
**Variant 6:** Introduce L2 regularization (weight decay) as a tunable hyperparameter in `train.py`. Add a command-line argument `--weight_decay` using `argparse`, with a default value (e.g., `1e-4` or `0.0`). Ensure this parsed float value is passed to the `weight_decay` parameter during the instantiation of the selected optimizer (`Adam`, `SGD`, `RMSprop` all support it). Experiment with different values (e.g., 0.0, 1e-5, 1e-4, 1e-3) and observe the impact on model generalization, potentially reducing the gap between training and validation performance.

*Technical note:* `parser.add_argument('--weight_decay', type=float, default=1e-4, help='Weight decay (L2 penalty) for optimizer')`. Pass `weight_decay=args.weight_decay` to the optimizer constructor, e.g., `optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate, weight_decay=args.weight_decay)`. Libraries: `torch.optim`, `argparse`. Evaluation: Analyze training/validation loss curves for signs of reduced overfitting.

---
**Variant 7:** Implement rigorous input argument validation in `train.py` immediately after parsing arguments using `argparse`. Add explicit checks for the validity and sensible ranges of crucial numerical hyperparameters. For instance, verify that `learning_rate` is positive, `batch_size` is greater than 0, `num_epochs` is at least 1, `hidden_units` (if single value) is positive, and `dropout_rate` (if used) is between 0.0 and 1.0 inclusive. If any validation check fails, raise a `ValueError` with a clear, user-friendly message indicating the problematic argument and the valid range or condition. This prevents wasted computation due to invalid settings.

*Technical note:* After `args = parser.parse_args()`, add checks: `if args.learning_rate <= 0: raise ValueError(f"Invalid learning rate: {args.learning_rate}. Must be positive.")`. Add similar checks for `batch_size`, `num_epochs`, `hidden_units`, `dropout_rate`, etc. Libraries: `argparse`.

---
**Variant 8:** Parameterize the convolutional kernel size in `model_builder.py`. Modify `TinyVGG.__init__` to accept an integer `kernel_size` parameter (default 3). Use this parameter for the `kernel_size` argument in all `nn.Conv2d` layers within the model. Crucially, analyze how changing the kernel size (e.g., to 5 or 7) affects the output feature map dimensions after convolutions and pooling. Recalculate the required `in_features` for the final `nn.Linear` layer based on the chosen `kernel_size`, assuming standard padding (`padding=0` in the original). Update `train.py` to include a `--kernel_size` argument via `argparse` (e.g., default 3, maybe choices [3, 5]).

*Technical note:* Add `kernel_size: int = 3` to `TinyVGG.__init__`. Use `kernel_size=self.kernel_size` in `nn.Conv2d`. Manually trace or calculate output H, W: `W_out = floor((W_in + 2*padding - dilation*(kernel_size - 1) - 1)/stride + 1)`. Recalculate `hidden_units * H_final * W_final` for the Linear layer's `in_features`. Add `parser.add_argument('--kernel_size', type=int, default=3)` to `train.py`. Pass `args.kernel_size`. Libraries: `torch.nn`, `argparse`, `math.floor`.

---
**Variant 9:** Allow selection between Max Pooling and Average Pooling in the `TinyVGG` model via configuration. In `model_builder.py`, add a string parameter `pooling_type` (default 'max') to `TinyVGG.__init__`. Replace the hardcoded `nn.MaxPool2d(kernel_size=2, stride=2)` layers with conditional logic: if `pooling_type` is 'max', use `nn.MaxPool2d`, otherwise if it's 'avg', use `nn.AvgPool2d(kernel_size=2, stride=2)`. In `train.py`, add a corresponding `argparse` argument `--pooling` with choices `'max'` and `'avg'` to control the pooling layer type used in the model.

*Technical note:* Add `pooling_type: str = 'max'` to `__init__`. Inside `nn.Sequential` for conv blocks, use a helper function or conditional instantiation: `pool_layer = nn.MaxPool2d(2, 2) if self.pooling_type == 'max' else nn.AvgPool2d(2, 2)`. Add this `pool_layer` to the sequence. Add `parser.add_argument('--pooling', type=str, default='max', choices=['max', 'avg'])` to `train.py`. Pass `pooling_type=args.pooling`. Libraries: `torch.nn`, `argparse`.

---
**Variant 10:** Implement automated logging of the exact configuration used for each training run in `train.py`. After parsing arguments using `argparse`, determine the directory where the model will be saved (using `utils.save_model`'s `target_dir`). Create a text or JSON file within that directory (e.g., `run_config.json`) and save all the parsed command-line arguments (and their values) into this file. This includes learning rate, epochs, batch size, hidden units, optimizer choice, etc. This practice is crucial for reproducibility and tracking which hyperparameters led to specific results.

*Technical note:* Get `args = parser.parse_args()`. Construct save directory path `save_dir = os.path.join("models", args.model_save_name_base)` (need a base name arg). Create directory `os.makedirs(save_dir, exist_ok=True)`. Define config file path `config_path = os.path.join(save_dir, 'run_config.json')`. Use `with open(config_path, 'w') as f: json.dump(vars(args), f, indent=2)`. Libraries: `argparse`, `os`, `json`, `utils`.

---
**Variant 11:** Integrate learning rate scheduling into the training process configured via `train.py`. Add `argparse` arguments to select a scheduler type (`--scheduler`, choices: 'none', 'step', 'cosine', default 'none'), and associated parameters like `--lr_step_size` (e.g., 10, for StepLR), `--lr_gamma` (e.g., 0.1, for StepLR), and potentially `--lr_eta_min` (e.g., 0, for CosineAnnealingLR). Instantiate the chosen scheduler from `torch.optim.lr_scheduler` after creating the optimizer. Modify the `engine.train` function in `engine.py` to accept the scheduler object and call `scheduler.step()` appropriately (typically at the end of each epoch).

*Technical note:* Add `argparse` args. In `train.py`, after optimizer: `scheduler = None; if args.scheduler == 'step': scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=args.lr_step_size, gamma=args.lr_gamma) elif args.scheduler == 'cosine': scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=args.num_epochs, eta_min=args.lr_eta_min)`. Modify `engine.train` signature `(..., scheduler=None)`. Inside epoch loop: `if scheduler: scheduler.step()`. Libraries: `torch.optim.lr_scheduler`, `argparse`, `engine`.

---
**Variant 12:** Implement gradient clipping as a configurable option in `train.py` to prevent exploding gradients, especially during early training or with high learning rates. Add a `--clip_grad_norm` float argument via `argparse` (e.g., default 1.0, use 0 or negative value to disable). Pass this value to the `engine.train` function. Modify the `train_step` function in `engine.py` to accept this `clip_value`. Inside `train_step`, *after* the `loss.backward()` call and *before* `optimizer.step()`, check if `clip_value > 0`. If so, call `torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=clip_value)`.

*Technical note:* Add `parser.add_argument('--clip_grad_norm', type=float, default=1.0, help='Max norm for gradient clipping (0 to disable)')`. Modify `engine.train` signature `(..., clip_value=0)`. Modify `engine.train_step` signature `(..., clip_value=0)`. Inside `train_step`: `loss.backward(); if clip_value > 0: torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value); optimizer.step()`. Pass `clip_value=args.clip_grad_norm` from `train.py` to `engine.train`. Libraries: `torch.nn.utils`, `argparse`, `engine`.

---
**Variant 13:** Generalize the `TinyVGG` architecture in `model_builder.py` to support a variable number of hidden units per convolutional block. Replace the single `hidden_units: int` parameter in `__init__` with `hidden_units_list: list[int]` (e.g., `[10, 20]`). The length of the list implies the number of blocks (or use it for the first N blocks). Use `hidden_units_list[0]` for the first block's channel count, `hidden_units_list[1]` for the second, etc. Adjust the `in_features` of the final `nn.Linear` layer dynamically based on the channel count of the *last* convolutional block and the resulting feature map size. Update `train.py` `argparse` to accept a list of integers for hidden units using `nargs='+'` (e.g., `--hidden_units 16 32`).

*Technical note:* Change `hidden_units` param. Use `hidden_units_list[0]` in `conv_block_1`, `hidden_units_list[1]` in `conv_block_2`. Recalculate linear `in_features` based on `hidden_units_list[-1]` and spatial dimensions. In `train.py`: `parser.add_argument('--hidden_units', type=int, nargs='+', default=[10, 10])`. Pass `hidden_units_list=args.hidden_units`. Ensure list length consistency or add checks. Libraries: `torch.nn`, `argparse`.

---
**Variant 14:** Simulate transfer learning by adding an option to freeze the feature extractor (convolutional layers) of the `TinyVGG` model during training. Add an `argparse` flag `--freeze_feature_extractor` (`action='store_true'`) to `train.py`. If this flag is set, after instantiating the `TinyVGG` model, iterate through the parameters of `model.conv_block_1` and `model.conv_block_2` and set their `requires_grad` attribute to `False`. Crucially, when creating the optimizer, ensure it only receives the parameters that require gradients (i.e., the classifier parameters). Train the model and observe how only the classifier weights are updated.

*Technical note:* Add `parser.add_argument('--freeze_feature_extractor', action='store_true')`. After `model = ...`: `if args.freeze_feature_extractor: for block in [model.conv_block_1, model.conv_block_2]: for param in block.parameters(): param.requires_grad = False`. Create optimizer: `trainable_params = filter(lambda p: p.requires_grad, model.parameters()); optimizer = torch.optim.Adam(trainable_params, lr=args.learning_rate)`. Libraries: `torch.optim`, `argparse`. Evaluation: Check gradients of frozen layers (should be None); observe faster initial training but potentially lower final accuracy if features need tuning.

---
**Variant 15:** Enable Automatic Mixed Precision (AMP) training for potential speedup and memory saving on compatible GPUs. Add an `argparse` flag `--use_amp` (`action='store_true'`) to `train.py`. Pass this flag and the device to `engine.train`. Modify `engine.train` to initialize `torch.cuda.amp.GradScaler()` only if `use_amp` is true and `device` is 'cuda'. Pass the scaler to `train_step` and `use_amp` flag to both `train_step` and `test_step`. In `train_step`, wrap the model forward pass and loss calculation with `with torch.cuda.amp.autocast(enabled=use_amp):`. Use `scaler.scale(loss).backward()`, `scaler.step(optimizer)`, and `scaler.update()` instead of direct calls. In `test_step`, only wrap the forward pass and loss calculation in the `autocast` context.

*Technical note:* Add flag to `train.py`. Modify `engine.train`: `scaler = torch.cuda.amp.GradScaler() if use_amp and device == 'cuda' else None`. Modify step signatures. In `train_step`: `with autocast(enabled=use_amp): y_pred = model(X); loss = loss_fn(...)`. Handle scaler logic: `if scaler: scaler.scale(loss).backward(); scaler.step(optimizer); scaler.update() else: loss.backward(); optimizer.step()`. In `test_step`: `with autocast(enabled=use_amp): test_pred_logits = model(X); loss = loss_fn(...)`. Libraries: `torch.cuda.amp`, `argparse`, `engine`.

---
**Variant 16:** Improve code structure and extensibility in `model_builder.py` by using an abstract base class. Define `class BaseFoodModel(nn.Module): from abc import ABC, abstractmethod; @abstractmethod def forward(self, x: torch.Tensor) -> torch.Tensor: pass`. Make `TinyVGG` inherit from this: `class TinyVGG(BaseFoodModel): ...`. Implement another simple model, e.g., `class LinearProbe(BaseFoodModel): def __init__(self, input_shape, output_shape): super().__init__(); self.flatten = nn.Flatten(); self.linear = nn.Linear(input_shape * 64 * 64, output_shape); def forward(self, x): return self.linear(self.flatten(x))`. Refactor `train.py`'s model selection logic (from Variant 3) to instantiate different classes that all inherit from `BaseFoodModel`, demonstrating polymorphism.

*Technical note:* Use `abc` module. Ensure all models implement `forward`. The factory/conditional logic in `train.py` now selects between `TinyVGG`, `LinearProbe`, etc., all guaranteed to have the basic `nn.Module` interface plus the `forward` method. This structure facilitates adding more models later. Libraries: `torch.nn`, `abc`, `argparse`, `model_builder`.

---
**Variant 17:** Implement automatic saving of the best performing model based on validation accuracy during training. Add an `argparse` flag `--save_best_model` (`action='store_true'`) to `train.py`. If enabled, determine the full path for the best model file (e.g., `models/best_tinyvgg_model.pth`). Pass this path to `engine.train`. Inside `engine.train`, initialize `best_val_acc = 0.0`. After each epoch's validation step (`test_step`), compare the current `test_acc` with `best_val_acc`. If the current accuracy is higher, update `best_val_acc` and call `utils.save_model` to save the current model state dict to the specified best model path, overwriting the previous best.

*Technical note:* Add flag to `train.py`. Construct `best_model_path = os.path.join("models", f"best_{args.model_name}.pth")` (assuming `args.model_name`). Modify `engine.train` signature `(..., save_best_path=None)`. Inside epoch loop: `if save_best_path and test_acc > best_val_acc: best_val_acc = test_acc; utils.save_model(model=model, target_dir=os.path.dirname(save_best_path), model_name=os.path.basename(save_best_path)); print(f"INFO: New best model saved to {save_best_path} (Acc: {best_val_acc:.4f})")`. Libraries: `os`, `argparse`, `utils`, `engine`.

---
**Variant 18:** Allow the loss function to be configurable via `train.py`. Add an `argparse` argument `--loss_fn` with choices like `'cross_entropy'` (default) and `'label_smoothing'` (a common regularization technique). If `'label_smoothing'` is chosen, add another argument `--label_smoothing_epsilon` (e.g., default 0.1). Based on the `args.loss_fn` value, instantiate the appropriate loss function (`nn.CrossEntropyLoss()` or a custom/library implementation of Label Smoothing Cross Entropy, passing the epsilon value). Ensure the chosen loss function is passed to the `engine.train` function.

*Technical note:* Add `--loss_fn` and `--label_smoothing_epsilon` args. Need `LabelSmoothingLoss` implementation (can be found online or implemented: combines KL divergence with uniform distribution). Conditional instantiation: `if args.loss_fn == 'cross_entropy': loss_fn = nn.CrossEntropyLoss() elif args.loss_fn == 'label_smoothing': loss_fn = LabelSmoothingLoss(smoothing=args.label_smoothing_epsilon, classes=num_classes)`. Pass `loss_fn` to `engine.train`. Libraries: `torch.nn`, `argparse`, `engine`.

---
**Variant 19:** Explore architectural efficiency by modifying `TinyVGG` in `model_builder.py` to use depthwise separable convolutions instead of standard `nn.Conv2d`. Replace each standard `nn.Conv2d` (except potentially the very first one if input channels differ) with a sequence of: 1) a depthwise convolution (`nn.Conv2d` with `groups=in_channels`, `out_channels=in_channels`, same kernel size and padding) and 2) a pointwise convolution (`nn.Conv2d` with `kernel_size=1`, `in_channels=in_channels`, `out_channels=out_channels`). Add an `argparse` flag `--use_separable_conv` to `train.py` to switch between the original and the separable convolution versions. Compare the parameter count and potentially training speed/accuracy.

*Technical note:* Modify `nn.Sequential` blocks. E.g., replace `nn.Conv2d(h, h, 3)` with `nn.Conv2d(h, h, 3, padding=0, groups=h), nn.Conv2d(h, h, 1, padding=0)`. Add flag `parser.add_argument('--use_separable_conv', action='store_true')`. Conditionally build model in `train.py` based on flag. Calculate params: `sum(p.numel() for p in model.parameters() if p.requires_grad)`. Libraries: `torch.nn`, `argparse`. Metrics: Parameter count, validation accuracy.

---
**Variant 20:** Enhance reproducibility by adding robust random seed management in `train.py`. Add an `argparse` argument `--seed` (integer, e.g., default 42). At the *very beginning* of the script's execution (after imports, before any other operations), use this seed to set the state for Python's `random`, NumPy's `np.random`, and PyTorch's CPU and GPU random number generators (`torch.manual_seed`, `torch.cuda.manual_seed_all`). Additionally, consider setting deterministic algorithm flags (`torch.backends.cudnn.deterministic = True`, `torch.backends.cudnn.benchmark = False`) for further reproducibility, although this might impact performance. Log the seed being used.

*Technical note:* Add `parser.add_argument('--seed', type=int, default=42)`. At script start: `import random; import numpy as np; import torch; random.seed(args.seed); np.random.seed(args.seed); torch.manual_seed(args.seed); if torch.cuda.is_available(): torch.cuda.manual_seed_all(args.seed); # Optional deterministic settings: torch.backends.cudnn.deterministic = True; torch.backends.cudnn.benchmark = False`. Log: `print(f"Using random seed: {args.seed}")`. Libraries: `random`, `numpy`, `torch`, `argparse`.

<a class="anchor" id="5.3"></a>

## <span style="color:red; font-size:1.5em;">Task 3. Modular Training and Testing Loops (`engine.py`)</span>

[Go back to the content](#5)

**Variant 1:** Enhance the evaluation capabilities within `engine.py` by calculating and returning a more comprehensive set of classification metrics. Modify both `train_step` and `test_step` to compute not only loss and accuracy but also Precision, Recall, and F1-score. Utilize the `torchmetrics` library for robust calculation. Initialize `torchmetrics.Accuracy`, `torchmetrics.Precision`, `torchmetrics.Recall`, and `torchmetrics.F1Score` (specifying `task='multiclass'`, `num_classes`, and an appropriate `average` method like `'macro'` or `'weighted'`). Update metrics within the batch loop using `metric.update(preds, target)`. Compute final scores at the end of each epoch using `metric.compute()` and ensure `metric.reset()` is called. Update the `train` function's return dictionary to include these new metrics (e.g., `results["train_precision"]`, `results["test_f1"]`, etc.).

*Technical note:* `import torchmetrics`. Init metrics: `precision_metric = torchmetrics.Precision(task='multiclass', num_classes=N, average='macro').to(device)`. In steps: `metric.update(torch.softmax(y_pred, dim=1), y)`. At epoch end: `epoch_precision = precision_metric.compute()`. Add keys to `results` dict. Libraries: `torchmetrics` (needs install), `torch`. Metrics: Precision, Recall, F1-score (macro/weighted average).

---
**Variant 2:** Implement robust early stopping functionality within the `train` function in `engine.py` to prevent overfitting and save computation time. Add function parameters `early_stopping_patience` (integer, e.g., 5, number of epochs to wait for improvement) and `early_stopping_delta` (float, e.g., 0.001, minimum improvement required). Monitor the validation loss (`test_loss`). Maintain a counter for epochs without sufficient improvement (`epochs_no_improve`) and track the best validation loss seen so far (`best_val_loss`). If `test_loss` does not decrease by at least `early_stopping_delta` compared to `best_val_loss` for `early_stopping_patience` consecutive epochs, terminate the training loop prematurely using `break`. Log a clear message indicating that early stopping was triggered and the epoch number.

*Technical note:* Initialize `epochs_no_improve = 0`, `best_val_loss = float('inf')`. Inside epoch loop after `test_step`: `if test_loss < best_val_loss - early_stopping_delta: best_val_loss = test_loss; epochs_no_improve = 0 else: epochs_no_improve += 1`. Add check: `if epochs_no_improve >= early_stopping_patience: print(f"INFO: Early stopping triggered at epoch {epoch+1}."); break`. Libraries: `torch`, `logging` (optional).

---
**Variant 3:** Refactor `engine.py` to differentiate between evaluation during training (validation) and final standalone testing. Modify `test_step` to accept an `evaluation_context` string parameter (e.g., 'validation', 'final_test'). Conditionally compute metrics based on this context. For 'validation', compute only loss and accuracy (for speed). For 'final_test', compute the full suite: loss, accuracy, precision, recall, F1-score, and potentially a confusion matrix. Create a new standalone function `evaluate_model(model, dataloader, loss_fn, device, num_classes)` that specifically calls `test_step` with `evaluation_context='final_test'` and returns the detailed metrics dictionary. The main `train` function will continue to call `test_step` with `evaluation_context='validation'`.

*Technical note:* Modify `test_step` signature and add conditional metric calculation using `if evaluation_context == 'final_test': ... compute full metrics ... else: ... compute basic metrics ...`. Implement `evaluate_model` function that sets up metrics (like in Variant 1) and calls `test_step`. Libraries: `torch`, `torchmetrics`.

---
**Variant 4:** Integrate TensorBoard visualization for richer experiment monitoring. Modify the `train` function in `engine.py` to accept an optional `writer` object (an instance of `torch.utils.tensorboard.SummaryWriter`). Inside the main epoch loop, if the `writer` object is provided, use it to log key metrics. Specifically, log `train_loss`, `train_acc`, `test_loss`, and `test_acc` as scalar values using `writer.add_scalar('Train/Loss', train_loss, epoch)`, `writer.add_scalar('Validation/Loss', test_loss, epoch)`, etc. If a learning rate scheduler is used, also log the current learning rate(s) using `writer.add_scalar('LearningRate', optimizer.param_groups[0]['lr'], epoch)`. Ensure the `SummaryWriter` is created and managed in `train.py` and passed appropriately.

*Technical note:* Add `writer: torch.utils.tensorboard.SummaryWriter = None` to `train` signature. In `train.py`, `from torch.utils.tensorboard import SummaryWriter; writer = SummaryWriter(log_dir=f'runs/{experiment_name}')`. Pass `writer` to `engine.train`. Inside `train` loop: `if writer: writer.add_scalar(...)`. Call `writer.close()` after training in `train.py`. Libraries: `torch.utils.tensorboard`.

---
**Variant 5:** Implement gradient accumulation within `train_step` in `engine.py` to simulate larger batch sizes when GPU memory is limited. Add an integer parameter `gradient_accumulation_steps` to the `train_step` function (and pass it down from `train`). Modify the training batch loop: calculate loss for each batch, scale it down by dividing by `gradient_accumulation_steps` (`loss = loss / gradient_accumulation_steps`), call `loss.backward()` to accumulate gradients. Only execute `optimizer.step()` and `optimizer.zero_grad()` once every `gradient_accumulation_steps` batches (or on the last batch if the dataset size isn't divisible). This effectively increases the batch size the optimizer sees without increasing memory usage per step.

*Technical note:* Add `gradient_accumulation_steps: int = 1` parameter. Inside batch loop: `loss = loss_fn(...) / gradient_accumulation_steps; loss.backward(); if (batch_idx + 1) % gradient_accumulation_steps == 0 or (batch_idx + 1) == len(dataloader): optimizer.step(); optimizer.zero_grad()`. Ensure `optimizer.zero_grad()` is only called after `optimizer.step()`. Libraries: `torch.optim`.

---
**Variant 6:** Augment the `test_step` function in `engine.py` to calculate and return a multi-class confusion matrix. Use `torchmetrics.ConfusionMatrix` initialized with the correct number of classes and task type (`task='multiclass'`). Update the matrix within the batch loop using `conf_mat.update(preds, target)`. At the end of the epoch, compute the final matrix using `conf_mat.compute()` and reset the metric state. Modify the `train` function to optionally receive and store/return the confusion matrix, perhaps only from the final epoch or the epoch with the best validation accuracy. This provides detailed insight into class-specific errors.

*Technical note:* Init `conf_mat = torchmetrics.ConfusionMatrix(task='multiclass', num_classes=N).to(device)`. Update `conf_mat.update(test_pred_logits.argmax(dim=1), y)`. Get `cm_tensor = conf_mat.compute(); conf_mat.reset()`. Return `cm_tensor` along with other metrics from `test_step`. `train` function needs to handle this potentially large tensor in its results dictionary. Libraries: `torchmetrics`, `torch`. Metrics: Confusion Matrix (`torch.Tensor`).

---
**Variant 7:** Add periodic model checkpointing capability to the `train` function in `engine.py`. Introduce an integer parameter `checkpoint_frequency` (default 0, meaning disabled). Inside the main epoch loop, check if `checkpoint_frequency > 0` and if the current epoch number (plus 1) is a multiple of `checkpoint_frequency`. If both conditions are true, call the `utils.save_model` function to save the current model state dict. Construct a unique filename for each checkpoint, incorporating the epoch number (e.g., `model_epoch_{epoch+1}.pth`), and save it to the specified target directory. This allows resuming training or accessing intermediate models.

*Technical note:* Add `checkpoint_frequency: int = 0`, `target_dir: str`, `model_name_base: str` parameters to `train`. Inside epoch loop: `current_epoch = epoch + 1; if checkpoint_frequency > 0 and current_epoch % checkpoint_frequency == 0: checkpoint_name = f"{model_name_base}_epoch_{current_epoch}.pth"; utils.save_model(model, target_dir, checkpoint_name)`. Ensure `target_dir` and `model_name_base` are passed from `train.py`. Libraries: `utils`, `os`.

---
**Variant 8:** Refactor the core logic of `engine.py` using an object-oriented approach for better organization, particularly if the training/testing process becomes more complex. Define abstract base class `EpochRunner(ABC)` with an abstract method `run_epoch()`. Create concrete subclasses `TrainEpochRunner(EpochRunner)` and `TestEpochRunner(EpochRunner)`. Move the contents of the current `train_step` function into `TrainEpochRunner.run_epoch` and `test_step` into `TestEpochRunner.run_epoch`. The `__init__` methods of these classes would store references to the model, dataloader, loss function, optimizer (for train), device, etc. The main `train` function in `engine.py` would then instantiate these runner objects and call their `run_epoch` methods within its loop, simplifying its own structure.

*Technical note:* `from abc import ABC, abstractmethod`. Define classes as described. `train` function becomes: `trainer = TrainEpochRunner(...); tester = TestEpochRunner(...)`. Inside epoch loop: `train_loss, train_acc = trainer.run_epoch(); test_loss, test_acc = tester.run_epoch()`. This promotes separation of concerns. Libraries: `torch`, `abc`.

---
**Variant 9:** Improve user experience during long training runs by adding detailed progress bars using the `tqdm` library. Modify the `train` function in `engine.py`: wrap the main `range(epochs)` iterable with `tqdm(range(epochs), desc='Epochs')` to show overall progress. Furthermore, modify both `train_step` and `test_step` functions: wrap their respective `DataLoader` iterations with `tqdm(dataloader, desc='Training Batch', leave=False)` and `tqdm(dataloader, desc='Testing Batch', leave=False)`. Using `leave=False` ensures the batch progress bars are removed after each epoch, avoiding excessive console output.

*Technical note:* `from tqdm.auto import tqdm`. Apply `tqdm()` wrappers as described. The `desc` argument provides context for each bar. `leave=False` is crucial for nested bars. Libraries: `tqdm`.

---
**Variant 10:** Provide deeper training insights by logging gradient and parameter statistics to TensorBoard from within `engine.py`. Modify `train_step` to accept the `writer` object and a `log_grads_freq` integer argument. Inside the batch loop, check if `writer is not None` and `(batch_idx + 1) % log_grads_freq == 0`. If so, iterate through `model.named_parameters()`. For each parameter `p` that has a gradient (`p.grad is not None`), log its gradient norm using `writer.add_scalar(f'Gradients/{name}', p.grad.norm(), global_step)`. Optionally, also log parameter value norms (`p.data.norm()`) or even histograms of gradients/weights using `writer.add_histogram()`. Calculate `global_step = epoch * len(dataloader) + batch_idx`.

*Technical note:* Pass `writer` and `log_grads_freq` down to `train_step`. Use conditional logging loop. Be mindful that logging histograms frequently can slow down training significantly. Libraries: `torch`, `torch.utils.tensorboard`.

---
**Variant 11:** Increase the flexibility of `engine.py` functions (`train_step`, `test_step`, `train`) to handle PyTorch models that return multiple outputs, such as primary logits plus auxiliary information (e.g., intermediate features for inspection, or outputs from auxiliary heads in some architectures). Modify the steps to check if `model(X)` returns a tuple. If so, assume the first element is the primary output (logits) used for loss calculation and standard metric computation. Allow passing additional outputs back through the return signatures if they need to be logged or processed further in the main `train` loop (e.g., logging norms of auxiliary features).

*Technical note:* In steps: `output = model(X); if isinstance(output, tuple): logits = output[0]; auxiliary_info = output[1:] else: logits = output; auxiliary_info = None`. Calculate loss using `logits`. Modify return values `return train_loss, train_acc, avg_aux_info` if needed. Update `train` function to handle potentially extended return values. Libraries: `torch`.

---
**Variant 12:** Create a dedicated, optimized `predict_step` function within `engine.py` specifically for inference tasks where gradients are not needed. This function should take `model`, `dataloader`, and `device` as input. It must operate entirely within a `with torch.inference_mode():` context manager. Inside the loop over the dataloader, it should perform the forward pass (`model(X)`), obtain predictions (e.g., using `.argmax(dim=1)` for class indices or `.softmax(dim=1)` for probabilities), and collect these predictions along with the corresponding true labels (if available in the dataloader) into lists. Finally, it should return the aggregated lists of all predictions and labels from the dataloader.

*Technical note:* Define `predict_step(model, dataloader, device) -> tuple[list, list]: model.eval(); all_preds = []; all_labels = []; with torch.inference_mode(): for X, y in dataloader: X, y = X.to(device), y.to(device); preds = model(X).argmax(dim=1); all_preds.extend(preds.cpu().tolist()); all_labels.extend(y.cpu().tolist()); return all_preds, all_labels`. Note: adapt for probability output if needed. Libraries: `torch`.

---
**Variant 13:** Implement fine-tuning schedule support within `engine.py`. Modify the `train` function to accept an `unfreeze_epoch` integer argument (default -1, meaning disabled). If `unfreeze_epoch >= 0`, the function should assume certain layers (e.g., specified by name or obtained via `model.conv_block_1`, etc.) should initially be frozen (`param.requires_grad = False`). Training proceeds with only unfrozen layers (e.g., classifier). When the current `epoch` reaches `unfreeze_epoch`, the function must unfreeze the previously frozen layers (`param.requires_grad = True`). Crucially, the optimizer might need to be reset or have the newly unfrozen parameters added to its parameter groups, potentially with a different learning rate.

*Technical note:* Add `unfreeze_epoch: int = -1`. Before loop: `if unfreeze_epoch >= 0: # Freeze layers`. Create optimizer with only trainable params initially. Inside loop: `if epoch == unfreeze_epoch: # Unfreeze layers; # Recreate optimizer or add param groups: optimizer = torch.optim.Adam(model.parameters(), lr=...) # (potentially use lower LR for unfrozen parts)`. Requires careful handling of optimizer state. Libraries: `torch.optim`.

---
**Variant 14:** Replace all standard `print()` statements within `engine.py` with proper logging using Python's built-in `logging` module for better control over output levels and destinations. Configure the logger in the main `train.py` script (e.g., using `logging.basicConfig`) to output messages of level INFO and above to both the console (`StreamHandler`) and a log file (`FileHandler`, e.g., `training.log`). Pass the configured logger instance down to the `engine.train` function and subsequently to `train_step` and `test_step`. Use `logger.info(...)` for standard epoch summaries, `logger.warning(...)` for non-critical issues, and `logger.error(...)` for critical problems.

*Technical note:* `train.py`: `import logging; logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s', handlers=[logging.FileHandler('training.log'), logging.StreamHandler()])`. Get logger: `logger = logging.getLogger(__name__)`. Pass `logger` to `engine.train`. Replace `print(...)` with `logger.info(...)` etc. in `engine.py`. Libraries: `logging`.

---
**Variant 15:** Add robustness against numerical instability by implementing NaN (Not a Number) detection in the loss calculation within `engine.py`. Modify `train_step`: immediately after computing the loss (`loss = loss_fn(...)`), check if the loss value is NaN using `torch.isnan(loss)`. If it returns `True`, log a critical error message indicating NaN loss was detected at a specific epoch and batch, optionally save a debug checkpoint (model state, optimizer state, current batch data), and then raise a `RuntimeError("NaN loss detected during training")` to halt the training process immediately, preventing further propagation of invalid values.

*Technical note:* Inside `train_step`: `loss = loss_fn(...) ; if torch.isnan(loss): logger.critical(f"NaN loss detected at Epoch {epoch}, Batch {batch_idx}! Halting."); # Optional: torch.save({'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'batch_X': X, 'batch_y': y}, 'nan_debug_checkpoint.pth'); raise RuntimeError("NaN loss encountered")`. Requires logger passed to `train_step`. Libraries: `torch`, `logging`.

---
**Variant 16:** Introduce a flexible callback system into `engine.py` to make extending the training loop easier without modifying the core `engine` functions directly. Define an abstract base class `Callback` with methods like `on_train_begin`, `on_epoch_begin`, `on_batch_end`, `on_epoch_end`, `on_train_end`, etc. Modify the `train` function to accept a list of `Callback` objects. Call the appropriate methods of each callback at the corresponding points within the training loop. Re-implement features like TensorBoard logging, early stopping, and model checkpointing as separate `Callback` subclasses (e.g., `TensorBoardCallback`, `EarlyStoppingCallback`, `ModelCheckpointCallback`).

*Technical note:* `class Callback(ABC): ... @abstractmethod def on_epoch_end(self, trainer): pass ...`. `engine.train` signature `(..., callbacks: list[Callback] = None)`. Callbacks loop: `for cb in callbacks: cb.on_epoch_begin(...)`. Implement concrete callbacks inheriting from `Callback`. This is a significant structural change promoting modularity. Libraries: `abc`.

---
**Variant 17:** Add precise epoch timing measurement and logging to `engine.py`. Modify the `train` function to record the system time using `time.perf_counter()` just before starting the training step for an epoch and again just after finishing the testing step. Calculate the duration (`epoch_duration = end_time - start_time`). Include this duration in the per-epoch log message or print statement (e.g., "... | Duration: {epoch_duration:.2f} sec"). Also, calculate and log the total training time at the very end of the `train` function.

*Technical note:* `import time`. Inside epoch loop: `epoch_start_time = time.perf_counter(); # train_step call; # test_step call; epoch_end_time = time.perf_counter(); epoch_duration = epoch_end_time - epoch_start_time;`. Modify print/log statement. Calculate total time after the loop. Libraries: `time`.

---
**Variant 18:** Enhance model evaluation in `test_step` within `engine.py` by computing and returning per-class accuracy alongside the overall (average) accuracy. This is crucial for understanding model performance on imbalanced datasets or identifying specific classes the model struggles with. Use `torchmetrics.Accuracy` initialized with `average=None` to get an accuracy value for each class, or manually calculate `TP_i / (TP_i + FN_i)` for each class `i`. Modify the return signature of `test_step` and the results dictionary in `train` to accommodate this list or tensor of per-class accuracies.

*Technical note:* `per_class_acc_metric = torchmetrics.Accuracy(task='multiclass', num_classes=N, average=None).to(device)`. Update metric in batch loop. At epoch end: `per_class_accuracies = per_class_acc_metric.compute(); per_class_acc_metric.reset()`. Return `per_class_accuracies`. Add `results['test_per_class_acc'] = []` and append the tensor/list each epoch. Libraries: `torchmetrics`, `torch`. Metrics: List or Tensor of per-class accuracy values.

---
**Variant 19:** Implement a learning rate warm-up phase at the beginning of training within `engine.py`. Modify the `train` function to accept `warmup_epochs` (integer, e.g., 3) and `target_lr` (the final learning rate set via argparse). For the first `warmup_epochs`, linearly increase the learning rate applied in the optimizer from a very small value (e.g., `1e-6`) up to `target_lr`. After the warm-up phase, the learning rate should either stay at `target_lr` or be controlled by a learning rate scheduler if one is configured. This requires manually adjusting `optimizer.param_groups[0]['lr']` at the start of each warm-up epoch.

*Technical note:* Add `warmup_epochs: int = 0`, `target_lr: float` parameters. Inside epoch loop: `if epoch < warmup_epochs: current_lr = initial_tiny_lr + (target_lr - initial_tiny_lr) * (epoch + 1) / warmup_epochs; for param_group in optimizer.param_groups: param_group['lr'] = current_lr; elif epoch == warmup_epochs and scheduler is None: # Ensure target LR is set if no scheduler takes over for param_group in optimizer.param_groups: param_group['lr'] = target_lr; # If scheduler exists, it takes over after warmup`. Needs careful integration with schedulers. Libraries: `torch.optim`.

---
**Variant 20:** Add basic GPU resource monitoring during training if CUDA is used. Modify the `train` function in `engine.py` to periodically check and log GPU memory usage. For example, at the end of each epoch (or less frequently to avoid overhead), if the `device` is 'cuda', use `torch.cuda.memory_allocated()` and `torch.cuda.max_memory_allocated()` (or `torch.cuda.memory_summary()`) to get current and peak GPU memory usage in megabytes. Log these values using the logger instance. This helps identify potential memory leaks or check if the batch size is appropriate for the available GPU memory.

*Technical note:* Pass `logger` and `device` to `train`. Check `if device == 'cuda':` at end of epoch. Get memory: `allocated_mb = torch.cuda.memory_allocated(device) / (1024**2); peak_mb = torch.cuda.max_memory_allocated(device) / (1024**2); logger.info(f"GPU Memory - Allocated: {allocated_mb:.2f} MB, Peak: {peak_mb:.2f} MB")`. Reset peak stats if needed: `torch.cuda.reset_peak_memory_stats(device)`. Libraries: `torch.cuda`, `logging`. Metrics: Logged GPU memory usage (MB).

<a class="anchor" id="5.4"></a>

## <span style="color:red; font-size:1.5em;">Task 4. Modular Training Script (`train.py`)</span>

[Go back to the content](#5)

**Variant 1:** Restructure `train.py` using an object-oriented approach by encapsulating the entire training setup and execution logic within a `Trainer` class. The `Trainer.__init__` method should accept the parsed `argparse` arguments (`args`) and be responsible for setting up all components: instantiating the DataLoaders (using `data_setup`), creating the model (using `model_builder`), defining the loss function, and configuring the optimizer and potentially schedulers based on `args`. The class should have a main `run_training()` method that executes the core training loop by calling `engine.train`, passing the necessary configured objects. The main script execution block (`if __name__ == '__main__':`) will then simply parse arguments, instantiate `Trainer(args)`, and call `trainer.run_training()`. This improves code organization, testability, and encapsulation.

*Technical note:* Define `class Trainer: def __init__(self, args): self.args = args; self._setup_device(); self._setup_dataloaders(); self._setup_model(); self._setup_loss_optim_scheduler(); def _setup...(): ...; def run_training(self): self.results = engine.train(...); self._save_final_model(); def _save_final_model(): ...`. Main block becomes very concise. Libraries: `argparse`, `torch`, `engine`, `model_builder`, `data_setup`, `utils`.

---
**Variant 2:** Implement a K-Fold Cross-Validation training scheme orchestrated by `train.py`. Add `argparse` arguments `--num_folds` (integer K, e.g., 5) and `--run_fold` (integer from 0 to K-1, specifying which fold to execute in this run). Modify the data loading part: load the *entire* dataset (potentially combining original train and test, or just using train). Use `sklearn.model_selection.StratifiedKFold` to generate K pairs of train/validation indices based on the full dataset labels. Conditionally select the indices corresponding to the `args.run_fold`. Create `Subset` datasets for training and validation using these indices. The script then trains and validates the model *only* on this specific fold's data. Running the script K times with different `--run_fold` values (0 to K-1) completes the cross-validation process. Log results specific to the fold being run.

*Technical note:* Load full dataset (e.g., `all_data = ConcatDataset([train_dset, test_dset])`). Use `skf = StratifiedKFold(n_splits=args.num_folds, shuffle=True, random_state=args.seed)`. Get splits: `splits = list(skf.split(all_data_indices, all_data_targets))`. Select `train_idx, val_idx = splits[args.run_fold]`. Create `train_subset = Subset(all_data, train_idx)`, `val_subset = Subset(all_data, val_idx)`. Use these for DataLoaders. Ensure model/log saving incorporates fold number. Libraries: `sklearn.model_selection`, `torch.utils.data`, `argparse`, `numpy`.

---
**Variant 3:** Integrate advanced experiment tracking using Weights & Biases (`wandb`) directly into `train.py`. Add necessary `argparse` arguments: `--use_wandb` (flag), `--wandb_project` (string, project name), `--wandb_entity` (string, your wandb username/team), `--wandb_run_name` (optional string, specific run name). If `--use_wandb` is set, initialize `wandb` using `wandb.init()` with these arguments, passing the entire `args` dictionary to `wandb.config` for hyperparameter logging. Modify the `engine.train` function (or use a custom callback if implemented) to accept a flag or the `wandb` object, and within the epoch loop, log all relevant metrics (train/val loss, train/val accuracy, precision, recall, F1, learning rate, etc.) using `wandb.log({'metric_name': value, ...}, step=epoch)`.

*Technical note:* `import wandb`. Add args. Check `if args.use_wandb:` initialize: `run = wandb.init(project=args.wandb_project, entity=args.wandb_entity, name=args.wandb_run_name, config=vars(args))`. Modify `engine.train` to potentially accept `wandb_run` object. In loop: `if wandb_run: wandb_run.log(...)`. Ensure `wandb login` is done beforehand or API key is set. Call `wandb.finish()` at the end. Libraries: `wandb` (needs install), `argparse`, `engine`.

---
**Variant 4:** Significantly improve the robustness of `train.py` by adding comprehensive error handling and a checkpoint-based resumption mechanism. Wrap major operational blocks (data loading, model creation, `engine.train` call) in `try...except` blocks, catching specific exceptions like `FileNotFoundError`, `torch.cuda.OutOfMemoryError` (catch `RuntimeError` and check message), `ValueError` (e.g., from argument validation), and generic `Exception`. Log errors clearly using the `logging` module. Add an `argparse` argument `--resume_checkpoint` (path, default None). If this path is provided and exists, load the model state dict, optimizer state dict, the epoch number to resume from, and potentially the LR scheduler state from this checkpoint file before starting the `engine.train` loop (pass `start_epoch` to `engine.train`). Modify `engine.train` to save such comprehensive checkpoints periodically or upon graceful exit.

*Technical note:* Use `try...except FileNotFoundError: logger.error(...) ; sys.exit(1)`. Check for OOM: `except RuntimeError as e: if 'out of memory' in str(e): logger.error(...) else: raise e`. Loading: `if args.resume_checkpoint and os.path.exists(args.resume_checkpoint): ckpt = torch.load(args.resume_checkpoint); model.load_state_dict(...); optimizer.load_state_dict(...); start_epoch = ckpt['epoch'] + 1; scheduler.load_state_dict(...)`. Saving checkpoint requires `{'epoch': epoch, 'model_state_dict': ..., 'optimizer_state_dict': ..., 'scheduler_state_dict': ...}`. Libraries: `torch`, `os`, `sys`, `argparse`, `logging`, `engine`.

---
**Variant 5:** Implement a basic hyperparameter optimization (HPO) capability directly within `train.py` using a grid search strategy. Define dictionaries or lists within the script specifying the grid of hyperparameters to explore, e.g., `param_grid = {'learning_rate': [0.01, 0.005, 0.001], 'hidden_units': [[10, 10], [16, 16]], 'optimizer': ['adam', 'sgd']}`. Use `itertools.product` or nested loops to iterate through every possible combination of these hyperparameters. For each combination, configure the model, optimizer, etc., accordingly, run the complete training process using `engine.train`, record the best validation performance (e.g., accuracy) achieved for that combination, and log the combination and its result to a file (e.g., `hpo_results.csv` or `.json`).

*Technical note:* `from sklearn.model_selection import ParameterGrid` (easier than itertools). `grid = ParameterGrid(param_grid)`. Loop `for params in grid:`. Inside loop, update `args` namespace or config dict with `params`. Run `engine.train`. Store `params` and `best_val_acc` in a list of dicts. Save list to CSV/JSON after loop. Be mindful this can be computationally expensive. Libraries: `itertools` or `sklearn.model_selection`, `json` or `csv`, `argparse` (to maybe set ranges), `engine`.

---
**Variant 6:** Integrate detailed performance profiling into `train.py` using `torch.profiler`. Add an `argparse` flag `--profile_run`. If set, wrap the main call to `engine.train` (or potentially the entire training loop if defined directly in `train.py`) within the `torch.profiler.profile` context manager. Configure the profiler to record both CPU and CUDA activities (`torch.profiler.ProfilerActivity.CPU`, `torch.profiler.ProfilerActivity.CUDA`), include operator shapes (`record_shapes=True`), and potentially track memory (`profile_memory=True`). After the profiled section finishes, print a summary table using `prof.key_averages().table(...)` focusing on top time consumers (CPU and CUDA). Optionally, export the full trace to a file (e.g., `trace.json`) which can be viewed in `chrome://tracing`.

*Technical note:* `import torch.profiler`. Check `if args.profile_run:`. Wrap training call: `with torch.profiler.profile(...) as prof: results = engine.train(...)`. After: `print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=20)); prof.export_chrome_trace("training_trace.json")`. Libraries: `torch.profiler`, `argparse`, `engine`.

---
**Variant 7:** Decouple final evaluation from training by creating a dedicated `evaluate.py` script. `train.py` should now focus solely on training the model and saving the best or final checkpoint. Create `evaluate.py`, which takes command-line arguments: `--checkpoint` (path to the saved model `.pth` file), `--test_dir` (path to the test dataset), `--device` ('cpu'/'cuda'), and potentially `--batch_size`. This script should: load the specified model architecture (using `model_builder`, perhaps needing model config saved with checkpoint or passed as args), load the state dict from the checkpoint, setup the test DataLoader (using `data_setup`), run evaluation using a dedicated function from `engine.py` (like `evaluate_model` from Task 3 Var 3, which computes detailed metrics), and print or save the final performance results (accuracy, precision, recall, F1, confusion matrix).

*Technical note:* Create `evaluate.py`. Use `argparse` for inputs. Load model architecture (might need `hidden_units` etc. args or load from config). `model.load_state_dict(torch.load(args.checkpoint)['model_state_dict'])`. Setup test data `test_loader, _, class_names = data_setup.create_dataloaders(test_dir=args.test_dir, ...)`. Call `eval_results = engine.evaluate_model(model, test_loader, loss_fn, device, len(class_names))`. Print `eval_results`. Libraries: `argparse`, `torch`, `engine`, `model_builder`, `data_setup`, `torchmetrics`.

---
**Variant 8:** Implement support for loading run configurations from external files in `train.py`, promoting cleaner command lines and easier management of complex setups. Add an `argparse` argument `--config_file` (path to a YAML or JSON file). Use libraries like `PyYAML` or `json` to load this file early in the script. The loaded configuration dictionary should contain keys corresponding to the argparse arguments (e.g., `learning_rate`, `num_epochs`, `model_name`). Merge this loaded configuration with the default argparse values and any arguments explicitly provided on the command line (command-line args typically override file configs). The final merged configuration drives the training setup.

*Technical note:* `import yaml` (needs `pip install pyyaml`) or `import json`. Add `--config_file` arg. Load file: `if args.config_file: with open(args.config_file, 'r') as f: file_config = yaml.safe_load(f)`. Update defaults: `parser.set_defaults(**file_config)`. Then parse args: `args = parser.parse_args()`. Or load config, parse args, then update args with file config selectively. Libraries: `PyYAML` or `json`, `argparse`.

---
**Variant 9:** Automatically generate and save learning curve plots at the end of training in `train.py`. After the `engine.train` function returns the results dictionary (containing lists of metrics per epoch), extract the `train_loss`, `test_loss` (validation loss), `train_acc`, and `test_acc` lists. Use `matplotlib.pyplot` to create two plots: one for loss (train vs. validation) and one for accuracy (train vs. validation) over epochs. Add labels, titles, and legends. Save these plots as image files (e.g., `loss_curve.png`, `accuracy_curve.png`) into the same directory where the model checkpoint is saved.

*Technical note:* `results = engine.train(...)`. `epochs = range(1, len(results['train_loss']) + 1)`. `plt.figure(); plt.plot(epochs, results['train_loss'], label='Train Loss'); plt.plot(epochs, results['test_loss'], label='Validation Loss'); plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.legend(); plt.title('Loss Curve'); plt.savefig(os.path.join(save_dir, 'loss_curve.png')); plt.close()`. Repeat for accuracy plot. Libraries: `matplotlib.pyplot`, `os`, `engine`.

---
**Variant 10:** Make `train.py` adaptable to different versions or types of datasets following a similar structure. Add an `argparse` argument `--dataset_name` (e.g., default `'pizza_steak_sushi'`). Define a base data directory path (e.g., `BASE_DATA_DIR = 'data'`). Construct the specific `train_dir` and `test_dir` paths dynamically using `os.path.join(BASE_DATA_DIR, args.dataset_name, 'train')` and `os.path.join(BASE_DATA_DIR, args.dataset_name, 'test')`. This allows training on `data/pizza_steak_sushi_v2/` or `data/food10/` (if structured similarly) just by changing the command-line argument, without modifying the script's core data path logic.

*Technical note:* Add `parser.add_argument('--dataset_name', type=str, default='pizza_steak_sushi')`. Set `BASE_DATA_DIR = 'data'`. Define `train_dir = os.path.join(BASE_DATA_DIR, args.dataset_name, 'train')`. `test_dir = os.path.join(BASE_DATA_DIR, args.dataset_name, 'test')`. Pass these paths to `data_setup.create_dataloaders`. Requires data to be organized in `data/<dataset_name>/train/...` and `data/<dataset_name>/test/...`. Libraries: `os`, `argparse`.

---
**Variant 11:** Formalize the final evaluation step on the held-out test set within `train.py`. After the main training loop in `engine.train` completes, identify the best saved model checkpoint (if `--save_best_model` was used and successful) or use the final model state. Load this definitive model state into the model instance (`model.load_state_dict(...)`). Ensure the model is in evaluation mode (`model.eval()`). Run a final evaluation pass using the `test_dataloader` and the dedicated `engine.evaluate_model` function (from Task 3 Var 3) or `engine.test_step` configured for final testing. Log or print these final test set metrics clearly, distinguishing them from the validation metrics reported during training.

*Technical note:* After `engine.train`: Determine `final_model_path` (either last epoch or best saved). `final_ckpt = torch.load(final_model_path); model.load_state_dict(final_ckpt['model_state_dict'])`. `test_results = engine.evaluate_model(model, test_dataloader, loss_fn, device, num_classes)`. Log `logger.info(f"--- Final Test Set Performance --- \n{test_results}")`. Libraries: `torch`, `engine`, `logging`.

---
**Variant 12:** Add functionality to `train.py` to export the trained model to the ONNX (Open Neural Network Exchange) format, suitable for deployment across various platforms and inference engines. Add an `argparse` flag `--export_onnx`. After training is complete, load the desired model state dict (e.g., the best one). Set the model to evaluation mode (`model.eval()`). Create a dummy input tensor with the correct shape (e.g., `(1, 3, 64, 64)`) matching the model's expected input. Use `torch.onnx.export()` to perform the export, providing the model, dummy input, desired output `.onnx` file path (e.g., `model.onnx` in the save directory), and potentially input/output names (`input_names=['input'], output_names=['output']`).

*Technical note:* Add flag `parser.add_argument('--export_onnx', action='store_true')`. After training: `if args.export_onnx: # Load best model state; model.eval(); dummy_input = torch.randn(1, 3, args.image_size, args.image_size).to(device) # Use correct size; onnx_path = os.path.join(save_dir, f"{args.model_name}.onnx"); torch.onnx.export(model, dummy_input, onnx_path, verbose=False, input_names=['input'], output_names=['output'], opset_version=11); logger.info(f"Model exported to ONNX format at: {onnx_path}")`. Libraries: `torch.onnx`, `os`, `torch`, `logging`.

---
**Variant 13:** Implement robust device selection logic in `train.py`. Add an `argparse` argument `--device` with choices `'auto'`, `'cuda'`, `'cpu'` (default `'auto'`). If `'auto'`, the script should attempt to use CUDA if `torch.cuda.is_available()` returns true, otherwise fall back to CPU. If `'cuda'` is explicitly requested but `torch.cuda.is_available()` is false, the script should raise a `RuntimeError` or print an error and exit. If `'cpu'` is requested, it should use the CPU. The determined `torch.device` object should be created early and passed consistently to all modules/functions that need it (data loading, model creation, engine).

*Technical note:* Add `parser.add_argument('--device', type=str, default='auto', choices=['auto', 'cuda', 'cpu'])`. Logic: `if args.device == 'auto': selected_device = 'cuda' if torch.cuda.is_available() else 'cpu'; elif args.device == 'cuda': if not torch.cuda.is_available(): raise RuntimeError("CUDA device requested but not available."); selected_device = 'cuda'; else: selected_device = 'cpu'; device = torch.device(selected_device); logger.info(f"Using device: {device}")`. Pass `device` object. Libraries: `torch`, `argparse`, `logging`.

---
**Variant 14:** Enable fine-tuning from a pre-existing model checkpoint using `train.py`. Add a `--load_checkpoint` argument that accepts the path to a `.pth` file. Before starting the training loop, check if this argument is provided. If yes, load the state dictionary using `torch.load(args.load_checkpoint)`. Load these weights into the newly created model instance using `model.load_state_dict(state_dict, strict=False)`. Using `strict=False` is important as it allows loading weights even if the model architectures don't perfectly match (e.g., if only the feature extractor weights are being loaded, or the classifier size is different). Optionally, add logic to selectively load only parts of the checkpoint (e.g., excluding classifier weights). Log that weights were loaded.

*Technical note:* Add `parser.add_argument('--load_checkpoint', type=str, default=None)`. Before `engine.train`: `if args.load_checkpoint: logger.info(f"Loading weights from checkpoint: {args.load_checkpoint}"); state_dict = torch.load(args.load_checkpoint); # Optional filtering logic here based on key names; model.load_state_dict(state_dict, strict=False)`. Libraries: `torch`, `os`, `argparse`, `logging`.

---
**Variant 15:** Implement basic fault tolerance and resumption capability in `train.py` for long training runs. At the end of *every* epoch within `engine.train`, save a comprehensive checkpoint file named `latest_checkpoint.pth` in the model save directory. This checkpoint must include the epoch number, model state dict, optimizer state dict, and LR scheduler state dict (if used). Add an `argparse` flag `--resume`. If `train.py` is run with `--resume`, it should check for the existence of `latest_checkpoint.pth`. If found, load all saved states (model, optimizer, scheduler) and determine the `start_epoch` (saved epoch + 1). Pass `start_epoch` to `engine.train` so it begins from the correct epoch. If not found, start training from epoch 0 as usual.

*Technical note:* Add flag `parser.add_argument('--resume', action='store_true')`. Modify `engine.train` to save checkpoint every epoch: `torch.save({...}, latest_chkpt_path)`. Before calling `engine.train` in `train.py`: `start_epoch = 0; if args.resume and os.path.exists(latest_chkpt_path): ckpt = torch.load(latest_chkpt_path); model.load_state_dict(...); optimizer.load_state_dict(...); if scheduler: scheduler.load_state_dict(...); start_epoch = ckpt['epoch'] + 1; logger.info(f"Resuming training from epoch {start_epoch}")`. Pass `start_epoch` to `engine.train`. Libraries: `torch`, `os`, `argparse`, `logging`, `engine`.

---
**Variant 16:** Add detailed system and environment information logging at the very beginning of the `train.py` execution. Use the `logging` module to record essential details for reproducibility and debugging: Python version (`platform.python_version()`), PyTorch version (`torch.__version__`), Torchvision version (`torchvision.__version__`), CUDA version if available (`torch.version.cuda`), number of CPU cores (`os.cpu_count()`), and GPU details if CUDA is used (device name via `torch.cuda.get_device_name(0)`, total GPU memory via `torch.cuda.get_device_properties(0).total_memory`). Log this information with INFO level.

*Technical note:* `import platform, os, torch, torchvision, logging`. Configure logger. Log info: `logger.info(f"Python Version: {platform.python_version()}")`, `logger.info(f"PyTorch Version: {torch.__version__}")`, etc. Check `torch.cuda.is_available()` before logging CUDA/GPU details. Libraries: `platform`, `os`, `torch`, `torchvision`, `logging`.

---
**Variant 17:** Centralize default hyperparameter settings by refactoring `train.py` to use a dedicated configuration file, `config.py`. Create `config.py` in the same directory (e.g., `going_modular/`) and define default values within it (e.g., `LEARNING_RATE = 0.001`, `NUM_EPOCHS = 5`, `HIDDEN_UNITS = [10, 10]`). In `train.py`, import these defaults from `config`. When setting up `argparse`, use these imported values as the `default=` for each corresponding argument. This way, defaults are maintained in one place, but users can still override them via command-line arguments.

*Technical note:* Create `config.py` with constants. In `train.py`: `import config`. `parser.add_argument('--learning_rate', type=float, default=config.LEARNING_RATE, ...)`. `parser.add_argument('--num_epochs', type=int, default=config.NUM_EPOCHS, ...)`. The rest of the script uses the parsed `args` as before. Libraries: `argparse`.

---
**Variant 18:** Implement a maximum training time limit in `train.py`. Add an `argparse` argument `--max_training_time_minutes` (integer, default 0 for unlimited). Pass this value to `engine.train`. Inside the `engine.train` function, record the overall training start time using `time.time()`. Within the main epoch loop (e.g., at the beginning or end of each epoch), check the total elapsed time. If `max_training_time_minutes` is positive and the elapsed time (in minutes) exceeds this limit, print/log a message, save a final checkpoint gracefully (using `utils.save_model`), and break the training loop early.

*Technical note:* Add arg to `train.py`. Pass `max_time_mins=args.max_training_time_minutes` to `engine.train`. In `engine.train`: `train_start_time = time.time()`. Inside epoch loop: `elapsed_mins = (time.time() - train_start_time) / 60; if max_time_mins > 0 and elapsed_mins >= max_time_mins: logger.warning(f"Maximum training time ({max_time_mins} mins) reached. Stopping early."); utils.save_model(...); break;`. Libraries: `time`, `argparse`, `logging`, `utils`, `engine`.

---
**Variant 19:** Enhance model analysis in `train.py` by calculating and logging the model's complexity metrics: total number of parameters and estimated Floating Point Operations (FLOPs) or Multiply-Accumulate operations (MACs). After instantiating the model, use a library like `thop` (`pip install thop`) or `ptflops` (`pip install ptflops`). Pass the model and a correctly shaped dummy input tensor to the library's profiling function (e.g., `thop.profile`). Log the returned MACs/FLOPs and parameter count using the logger. This provides valuable context for comparing different model architectures configured via argparse.

*Technical note:* `pip install thop`. `from thop import profile, clever_format`. Create model `model = ... .to(device)`. Create `dummy_input = torch.randn(1, 3, args.image_size, args.image_size).to(device)`. `macs, params = profile(model, inputs=(dummy_input,), verbose=False)`. `macs, params = clever_format([macs, params], "%.3f")`. Log: `logger.info(f"Model Complexity - MACs: {macs}, Params: {params}")`. Libraries: `thop`, `torch`, `logging`, `argparse`. Metrics: MACs/FLOPs, Parameter Count.

---
**Variant 20:** Standardize the output of `train.py` by creating a final summary report in JSON format. After training and any final testing are complete, gather all essential information into a single Python dictionary. This should include: all hyperparameters used (from the final `args` object), the key performance metrics from the final test set evaluation (accuracy, precision, recall, F1), the total training duration, the file path to the saved model checkpoint (best or final), and optionally the system information logged at the start. Save this dictionary to a file named `run_summary.json` within the model's save directory using `json.dump`.

*Technical note:* Collect data throughout script: `hyperparameters = vars(args)`, `final_test_metrics = {...}`, `total_duration = ...`, `saved_model_path = ...`. Combine into `summary_dict = {...}`. Define `summary_path = os.path.join(save_dir, 'run_summary.json')`. `with open(summary_path, 'w') as f: json.dump(summary_dict, f, indent=2)`. Libraries: `json`, `os`, `time`, `argparse`.

<a class="anchor" id="5.5"></a>

## <span style="color:red; font-size:1.5em;">Task 5. Modular Prediction Script (`predict.py`)</span>

[Go back to the content](#5)

**Variant 1:** Generalize `predict.py` to perform inference on an entire directory of images rather than just a single file. Add an `argparse` argument `--input_dir` which accepts a directory path. The script must recursively find all files within this directory that have common image extensions (e.g., '.jpg', '.jpeg', '.png'). For each valid image file found, it should load the image, apply the necessary preprocessing transform, perform inference using the loaded model, determine the predicted class name, and print a distinct output line associating the image filename (or relative path) with its prediction (e.g., `images/test/sushi/img_01.jpg: sushi`). Handle potential non-image files gracefully by skipping them.

*Technical note:* Use `argparse` for `--input_dir`. Use `pathlib.Path(args.input_dir).rglob('*')` to iterate through all files/dirs recursively. Check `if file.is_file() and file.suffix.lower() in ['.jpg', '.jpeg', '.png']:` inside the loop. Call the core prediction logic (load, transform, infer, decode) for each valid file. Print `f"{file}: {predicted_class_name}"`. Libraries: `argparse`, `pathlib`, `torch`, `torchvision`, `PIL`.

---
**Variant 2:** Enhance the prediction output of `predict.py` to show the top-K most likely classes along with their confidence scores, instead of only the single highest-scoring class. Add an integer `argparse` argument `--top_k` (defaulting to 3, for example). After obtaining the raw output logits from the model for an input image, apply the `torch.softmax` function to convert them into probabilities. Then, use `torch.topk(probabilities, k=args.top_k)` to efficiently retrieve the values (top probabilities) and indices of the top K predictions. Convert the indices to their corresponding class names (using the known class list) and print the ranked list, like `Rank 1: sushi (0.95), Rank 2: pizza (0.04), Rank 3: steak (0.01)`.

*Technical note:* Add `parser.add_argument('--top_k', type=int, default=3)`. Inside prediction logic: `model.eval(); with torch.inference_mode(): logits = model(input_tensor)`. `probabilities = torch.softmax(logits, dim=1)`. `top_probs, top_indices = torch.topk(probabilities, args.top_k, dim=1)`. Squeeze batch dim: `top_probs = top_probs.squeeze().tolist(); top_indices = top_indices.squeeze().tolist()`. Map indices to names: `class_names = [...]`. Loop `for i in range(args.top_k): class_name = class_names[top_indices[i]]; prob = top_probs[i]; print(f"Rank {i+1}: {class_name} ({prob:.4f})")`. Libraries: `argparse`, `torch`.

---
**Variant 3:** Optimize `predict.py` for processing multiple images efficiently by implementing batch inference. This is particularly useful when combined with `--input_dir` (Variant 1). Modify the script to first collect all valid image file paths from the input directory. Then, create a custom `Dataset` that loads and transforms these images. Use a `torch.utils.data.DataLoader` with `batch_size` specified via an argparse argument (e.g., `--batch_size 16`) and `shuffle=False`. Iterate through this DataLoader. For each batch of image tensors, pass the entire batch to the `model` for inference. Process the batch of output logits/predictions, mapping them back to the corresponding image file paths if necessary for reporting.

*Technical note:* Create `class InferenceDataset(Dataset): def __init__(self, image_paths, transform): ... def __getitem__(self, idx): ...`. Use `DataLoader(inference_dataset, batch_size=args.batch_size, shuffle=False, num_workers=...)`. Loop `for i, image_batch in enumerate(dataloader):`. Get batch predictions `batch_logits = model(image_batch.to(device))`. Process `batch_logits`. Get corresponding file paths for the batch: `start_idx = i * args.batch_size; end_idx = start_idx + len(image_batch); current_paths = all_image_paths[start_idx:end_idx]`. Print results. Libraries: `argparse`, `torch`, `torch.utils.data`, `torchvision`, `PIL`, `pathlib`.

---
**Variant 4:** Decouple class names from the `predict.py` script by allowing them to be loaded from an external file. Add a required `argparse` argument `--class_names_file` that points to a simple text file where each line contains one class name, in the order corresponding to the model's output indices (0, 1, 2...). The script must read this file, store the class names in a list (e.g., `['pizza', 'steak', 'sushi']`). When decoding the model's predicted index (e.g., from `argmax` or `topk`), use this loaded list to look up the human-readable class name for display, instead of relying on hardcoded values within the script. Handle potential file reading errors.

*Technical note:* Add `parser.add_argument('--class_names_file', type=str, required=True)`. Read file: `try: with open(args.class_names_file, 'r') as f: class_names = [line.strip() for line in f if line.strip()]; except FileNotFoundError: print(f"Error: Class names file not found at {args.class_names_file}"); sys.exit(1)`. Use `predicted_name = class_names[predicted_index]` for lookup. Libraries: `argparse`, `os`, `sys`.

---
**Variant 5:** Improve the reliability and consistency of `predict.py` by automatically loading and using the exact same data transformation pipeline that was used during the model's training. Modify the saving mechanism (e.g., in `utils.save_model` or `train.py`) to store the `torchvision.transforms.Compose` object used for training within the model checkpoint dictionary (`.pth` file). Update `predict.py` to: load the checkpoint dictionary, extract the saved `transform` object, and apply *this specific transform* to the input image during preprocessing, rather than defining a potentially mismatched transform within `predict.py` itself. This ensures preprocessing consistency between training and inference.

*Technical note:* Saving (`train.py`/`utils.py`): `train_transform = ...; torch.save({'model_state_dict': ..., 'transform': train_transform}, path)`. Loading (`predict.py`): `checkpoint = torch.load(args.checkpoint); model.load_state_dict(checkpoint['model_state_dict']); image_transform = checkpoint['transform']`. Use `image_transform(image)` for preprocessing. Note: Saving transforms directly might have limitations; saving the definition/parameters might be more robust but complex. Libraries: `torch`, `torchvision`, `argparse`, `utils`.

---
**Variant 6:** Provide immediate visual confirmation of the prediction by adding an option to display the input image alongside its predicted label. Add an `argparse` flag `--show_image`. If this flag is set when predicting on a single image, after making the prediction and determining the top class name and confidence score, use `matplotlib.pyplot` to display the original input image (before tensor conversion, loaded via PIL). Set the title of the plot to include the predicted class name and its confidence score (e.g., "Predicted: sushi (Confidence: 98.76%)").

*Technical note:* Add `parser.add_argument('--show_image', action='store_true')`. Inside prediction logic (likely after getting `predicted_class_name` and `confidence_score`): `if args.show_image: import matplotlib.pyplot as plt; from PIL import Image; image = Image.open(args.image_path); plt.imshow(image); plt.title(f"Predicted: {predicted_class_name} (Confidence: {confidence_score*100:.2f}%)"); plt.axis('off'); plt.show()`. Libraries: `argparse`, `matplotlib.pyplot`, `PIL`.

---
**Variant 7:** Offer flexibility in obtaining the model weights by allowing `predict.py` to load them either from a local file path or directly from a URL. Add an `argparse` argument `--model_source` with choices `'local'` (default) and `'url'`. The existing `--checkpoint` argument will now represent either the local file path or the URL. If `model_source` is `'url'`, use `torch.hub.load_state_dict_from_url(args.checkpoint, progress=True)` to download and load the state dictionary. If `'local'`, use the standard `torch.load(args.checkpoint)` (potentially extracting the state dict if the whole checkpoint was saved). Then proceed with loading the state dict into the model architecture.

*Technical note:* Add `--model_source` arg. Conditional loading: `if args.model_source == 'url': state_dict = torch.hub.load_state_dict_from_url(args.checkpoint, map_location=device); elif args.model_source == 'local': checkpoint_data = torch.load(args.checkpoint, map_location=device); state_dict = checkpoint_data['model_state_dict'] if isinstance(checkpoint_data, dict) else checkpoint_data; else: # error`. Load into model: `model.load_state_dict(state_dict)`. Libraries: `argparse`, `torch`, `torch.hub`.

---
**Variant 8:** Standardize the output format of `predict.py` for easier programmatic parsing by adding an option to output results as JSON. Add an `argparse` argument `--output_json_path` which, if provided, specifies a file path to write the output. If predicting on a single image, the JSON could be a single object: `{"filename": "path/to/img.jpg", "top_prediction": {"class": "sushi", "confidence": 0.95}, "top_k": [...]}`. If predicting on a directory, the output should be a JSON array, with each element being an object representing one image's prediction results. If `--output_json_path` is not provided, print to console as before.

*Technical note:* Add `--output_json_path` arg. Collect results in a list or dict `output_data`. After all predictions: `if args.output_json_path: with open(args.output_json_path, 'w') as f: json.dump(output_data, f, indent=2); else: # Print to console (maybe format JSON nicely too)`. Structure the `output_data` according to single image vs directory input. Libraries: `argparse`, `json`, `os`.

---
**Variant 9:** Allow filtering of low-confidence predictions in `predict.py`. Add a float `argparse` argument `--confidence_threshold` (e.g., default 0.7). After obtaining the top predicted class and its confidence score (probability), check if the score meets or exceeds this threshold. If it does, print/output the prediction as usual. If the confidence score is *below* the threshold, print or output a specific message indicating uncertainty, such as `"Prediction uncertain: sushi (Confidence: 0.65 < Threshold: 0.70)"` or simply `"Prediction below threshold"`.

*Technical note:* Add `parser.add_argument('--confidence_threshold', type=float, default=0.7)`. Get top class probability `top_prob = torch.softmax(logits, dim=1).max()`. Check `if top_prob.item() >= args.confidence_threshold: # Print normal prediction else: # Print uncertain message`. Libraries: `argparse`, `torch`.

---
**Variant 10:** Integrate model interpretability into `predict.py` by generating and displaying a Grad-CAM (Gradient-weighted Class Activation Mapping) heatmap. Add an `argparse` flag `--visualize_gradcam`. If set (and predicting on a single image), use a library like `pytorch-grad-cam` (`pip install grad-cam`). Identify a suitable target convolutional layer in the loaded TinyVGG model (e.g., the last layer of `conv_block_2`). Instantiate the `GradCAM` object, generate the grayscale CAM mask for the predicted class, and use utility functions from the library (or custom code) to overlay this heatmap onto the original image, showing which regions most influenced the model's decision. Display the overlaid image using matplotlib.

*Technical note:* `pip install grad-cam`. Add flag. `from pytorch_grad_cam import GradCAM; from pytorch_grad_cam.utils.image import show_cam_on_image; from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget`. Need to specify target layer: `target_layers = [model.conv_block_2[-1]]` (assuming MaxPool is last, might need adjustment). `cam = GradCAM(model=model, target_layers=target_layers)`. `targets = [ClassifierOutputTarget(predicted_class_index)]`. `grayscale_cam = cam(input_tensor=input_tensor, targets=targets)[0, :]`. Load original image as numpy array `rgb_img`. Normalize image for overlay. `visualization = show_cam_on_image(rgb_img, grayscale_cam, use_rgb=True)`. Display `visualization`. Libraries: `pytorch-grad-cam`, `cv2` (often needed by grad-cam), `matplotlib.pyplot`, `numpy`, `PIL`, `argparse`.

---
**Variant 11:** Enable real-time prediction using a webcam feed in `predict.py`. Add an `argparse` flag `--use_webcam`. If set, use `opencv-python` (`cv2`) to open the default webcam (`cv2.VideoCapture(0)`). Enter a loop that continuously reads frames from the webcam (`cap.read()`). For each frame: convert it from BGR (OpenCV default) to RGB, convert to PIL Image or directly to tensor, apply the necessary preprocessing (resize, ToTensor, normalize), pass it to the model for inference, get the prediction. Use `cv2.putText` to draw the predicted class name and confidence score directly onto the frame being displayed. Show the annotated frame in an OpenCV window (`cv2.imshow`). Include a way to exit the loop (e.g., pressing 'q').

*Technical note:* `pip install opencv-python`. Add flag. `import cv2`. Init `cap = cv2.VideoCapture(0)`. Loop `while True: ret, frame = cap.read()`. Preprocess frame: `rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB); pil_image = Image.fromarray(rgb_frame); input_tensor = transform(pil_image).unsqueeze(0).to(device)`. Predict. Draw text: `cv2.putText(frame, prediction_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)`. Display `cv2.imshow('Webcam Prediction', frame)`. Check for key press: `if cv2.waitKey(1) & 0xFF == ord('q'): break`. Release capture and destroy windows outside loop. Libraries: `opencv-python`, `PIL`, `torch`, `torchvision`, `argparse`.

---
**Variant 12:** Add performance benchmarking capabilities to `predict.py`. Add an `argparse` flag `--benchmark`. If set, measure and report the time taken for key stages when predicting on a single image: 1) Model loading (`torch.load` or equivalent), 2) Image preprocessing (PIL load + transform), 3) Model inference (forward pass). For inference timing, perform a warm-up run first, then run the `model(input_tensor)` call inside a loop (e.g., 10-100 times) and calculate the average inference time per image. Print these timings clearly at the end.

*Technical note:* Add flag. Use `time.perf_counter()`. `t0 = time.perf_counter(); # Load model; t1 = time.perf_counter(); print(f"Model load time: {t1-t0:.4f}s")`. `t0 = time.perf_counter(); # Load image + transform; t1 = time.perf_counter(); print(f"Preprocessing time: {t1-t0:.4f}s")`. Warm-up: `with torch.inference_mode(): _ = model(input_tensor)`. Timing loop: `torch.cuda.synchronize() # if using GPU`; `t0 = time.perf_counter(); for _ in range(N): with torch.inference_mode(): _ = model(input_tensor); torch.cuda.synchronize() # if using GPU`; `t1 = time.perf_counter(); print(f"Avg inference time ({N} runs): {(t1-t0)/N:.6f}s")`. Libraries: `time`, `torch`, `argparse`.

---
**Variant 13:** Make `predict.py` more robust when processing directories (`--input_dir`) by gracefully handling errors for individual files. Wrap the code block responsible for loading and transforming a single image file (within the loop over discovered image paths) inside a `try...except Exception as e:` block. If an exception occurs (e.g., corrupted image file, unsupported format by PIL), instead of crashing the entire script, catch the exception, print a clear error message using `print(f"Error processing file {image_path}: {e}", file=sys.stderr)` or `logging.error(...)`, and then simply continue to the next image file in the directory.

*Technical note:* Inside the loop `for image_path in image_files:` add `try: image = PIL.Image.open(image_path).convert('RGB'); input_tensor = transform(image).unsqueeze(0); # ... predict ... except Exception as e: print(f"Skipping file due to error: {image_path} - {e}", file=sys.stderr)`. This ensures processing continues for other valid images. Libraries: `PIL`, `sys`, `logging` (optional).

---
**Variant 14:** Enable `predict.py` to load models saved in TorchScript format (`.pt` or `.pth` file saved via `torch.jit.save`). Add an `argparse` flag `--load_scripted_model`. If this flag is set, the script should use `torch.jit.load(args.checkpoint, map_location=device)` to load the model directly, bypassing the need to instantiate the Python model class from `model_builder.py` and then load the state dict. This is useful for deploying models without needing the original Python code. Compare loading time or potential inference speed differences if feasible.

*Technical note:* Add flag `parser.add_argument('--load_scripted_model', action='store_true')`. Conditional loading: `if args.load_scripted_model: model = torch.jit.load(args.checkpoint, map_location=device); else: model = model_builder.TinyVGG(...); model.load_state_dict(...)`. Ensure a TorchScript model was previously created and saved during or after training (e.g., `scripted_model = torch.jit.script(model); torch.jit.save(scripted_model, 'model_scripted.pt')`). Libraries: `torch.jit`, `argparse`, `model_builder`.

---
**Variant 15:** Add functionality to `predict.py` to save an annotated version of the input image. Include an `argparse` argument `--save_annotated_path` which takes a file path for the output image. After predicting on a single input image, load the original image using PIL. Use `PIL.ImageDraw` to draw the predicted class name and confidence score (formatted string) onto the image at a specified position (e.g., top-left corner) with a chosen font and color. Save the modified PIL image to the path specified by `--save_annotated_path`.

*Technical note:* Add `--save_annotated_path` arg. Load image: `image = PIL.Image.open(args.image_path).convert("RGB")`. Create drawing context: `draw = PIL.ImageDraw.Draw(image)`. Define font (optional): `font = PIL.ImageFont.truetype("arial.ttf", 15)` (requires font file or use default). Create text: `text = f"{pred_name} ({confidence:.2f})"`. Draw: `draw.rectangle([(0, 0), (100, 20)], fill="black"); draw.text((5, 5), text, fill="white", font=font)`. Save: `image.save(args.save_annotated_path)`. Libraries: `PIL.Image`, `PIL.ImageDraw`, `PIL.ImageFont`, `argparse`.

---
**Variant 16:** Provide explicit control over the inference device in `predict.py`. Add an `argparse` argument `--device` with choices `'cpu'` and `'cuda'` (default `'cpu'`, or try to auto-detect). Before performing any inference, create the `torch.device` object based on the user's choice. Ensure that the loaded model (`model.to(device)`) and every input tensor (`input_tensor.to(device)`) are explicitly moved to this specified device before the model's forward pass is called. Include error handling if 'cuda' is selected but unavailable.

*Technical note:* Add `parser.add_argument('--device', type=str, default='cpu', choices=['cpu', 'cuda'])`. Validate choice: `if args.device == 'cuda' and not torch.cuda.is_available(): print("Error: CUDA selected but not available.", file=sys.stderr); sys.exit(1); device = torch.device(args.device)`. Apply: `model.to(device)`. Inside prediction loop: `input_tensor = transform(image).unsqueeze(0).to(device)`. Libraries: `torch`, `argparse`, `sys`.

---
**Variant 17:** Allow `predict.py` to handle models trained with different input image sizes. Add an integer `argparse` argument `--input_image_size` (e.g., default 64). Use this value within the preprocessing `transforms.Compose` pipeline, specifically in the `transforms.Resize((args.input_image_size, args.input_image_size))` step. This makes the script adaptable if the loaded checkpoint (`--checkpoint`) corresponds to a model trained on images resized to 128x128 or 224x224, etc., instead of the default 64x64. Ensure the user provides the size matching the loaded model.

*Technical note:* Add `parser.add_argument('--input_image_size', type=int, default=64)`. Define transform: `image_transform = transforms.Compose([transforms.Resize((args.input_image_size, args.input_image_size)), transforms.ToTensor(), ...])`. Apply this transform. Crucially depends on the loaded model being compatible with this input size. Libraries: `argparse`, `torchvision.transforms`.

---
**Variant 18:** Implement a "directory watch" mode for continuous prediction. Add an `argparse` argument `--watch_directory` specifying a path. If provided, use the `watchdog` library (`pip install watchdog`) to monitor this directory for newly created files. Set up an event handler that triggers when a file is created (`on_created`). Inside the handler, check if the created file is an image file (by extension). If it is, call the core prediction logic on this new file path and print the result. The script should run indefinitely, processing images as they appear in the directory, until manually interrupted (e.g., Ctrl+C).

*Technical note:* `pip install watchdog`. Add `--watch_directory` arg. `from watchdog.observers import Observer; from watchdog.events import FileSystemEventHandler; import time`. Define `class PredictionHandler(FileSystemEventHandler): def on_created(self, event): if not event.is_directory and event.src_path.lower().endswith(('.png', '.jpg', '.jpeg')): print(f"New image detected: {event.src_path}"); predict_single_image(event.src_path, model, transform, device, class_names) # Adapt predict logic`. Setup: `event_handler = PredictionHandler(); observer = Observer(); observer.schedule(event_handler, path=args.watch_directory, recursive=False); observer.start(); try: while True: time.sleep(1); finally: observer.stop(); observer.join()`. Libraries: `watchdog`, `time`, `argparse`, `threading` (optional for non-blocking).

---
**Variant 19:** Enable comparative prediction between two different models in `predict.py`. Add two required `argparse` arguments: `--checkpoint1` and `--checkpoint2`, specifying paths to two different saved model checkpoints (e.g., TinyVGG vs SimpleCNN, or TinyVGG trained with different hyperparameters). Load both models (potentially needing different architectures or configurations loaded based on the checkpoints). For a given input image (or directory of images), preprocess the image once, then run inference using both `model1` and `model2`. Print the predictions (top class, confidence) from both models side-by-side for easy comparison, e.g., `Image: img.jpg | Model 1: pizza (0.85) | Model 2: pizza (0.92)`.

*Technical note:* Add `--checkpoint1`, `--checkpoint2`. Load models: `model1 = load_model_from_checkpoint(args.checkpoint1); model2 = load_model_from_checkpoint(args.checkpoint2)` (need helper function possibly). Preprocess image `input_tensor`. Predict: `pred1 = model1(input_tensor); pred2 = model2(input_tensor)`. Decode both predictions. Print formatted comparison string. Libraries: `argparse`, `torch`, `model_builder` (potentially).

---
**Variant 20:** Create a simple, text-based interactive mode for `predict.py`. Add an `argparse` flag `--interactive`. If set, instead of requiring an image path argument, the script should first load the specified model (`--checkpoint` still required). Then, enter a loop that prompts the user: `Enter image path (or type 'quit' to exit): `. Read the user's input using `input()`. If the input is 'quit' (case-insensitive), break the loop. Otherwise, treat the input as a file path, check if it exists and is an image file. If valid, perform prediction on that image and print the results (e.g., top-K predictions). Loop back to prompt again.

*Technical note:* Add `--interactive` flag. Check `if args.interactive:`. Load model once. Enter `while True: user_input = input(...)`. Check `if user_input.lower() == 'quit': break`. Validate path: `if os.path.isfile(user_input) and user_input.lower().endswith(...)`. Call prediction logic. Print results. Add error handling for invalid paths. Libraries: `argparse`, `input`, `os.path`, `sys`.