# MTH 4320 / 5320 - Homework 3

## Convolutional Neural Networks and PyTorch


**Deadline**: Nov 7, submit in Canvas

**Points**: 75

## Instructions

* Submit **one** Jupyter notebook file and (optionally) **one** PDF with your handwritten work. (Alternatively, type solution in markdown cells in the notebook.)

* Your notebook file must include text explanations of your work, well-commented code, and the outputs from your code (must be shown as code output in the notebook).

* All mathematical work must be shown for written/typed problems.

## Overview

In this homework you will implement, tune, and compare three CNN approaches for image classification:

1. Custom CNN (from scratch) — a network you design.

2. Transfer Learning CNN — fine-tune a pretrained backbone from torchvision or others.

3. Ensemble — combine the Custom and Transfer models at inference time.

You will report training/validation curves, final test accuracy, confusion matrices (raw + normalized), per-class accuracy, and a short written analysis comparing methods.

> **Not allowed:** fully-connected networks  
> **Required framework:** PyTorch
> **Recommended:** GPU computing

## Dataset

* **Dataset:** [Intel Image Classification dataset](https://www.kaggle.com/datasets/puneet6060/intel-image-classification) — contains labeled images across multiple scene classes (e.g., `buildings`, `forest`, `street`, etc.).  
* **Framework:** PyTorch + torchvision + matplotlib.  
* **Data splits:** Use the provided data splits.


## Tasks & Requirements

You will complete two main problems, each requiring at least 10 independent training runs (e.g., different hyperparameters, architectures, or augmentation settings). Record all results, pick your best model, and analyze performance.

Problem 3 will ask you to ensemble the models from Problems 1-2 to hopefully eke out some additional accuracy.

### Problem 1 — Custom CNN [30 points]
Train and evaluate your own CNN **from scratch** (in PyTorch) on the Intel Image Classification dataset.

**Requirements:**
- Conduct a systematic hyperparameter tuning campaign with ≥ 10 total training runs*.
  - Each run should modify a *single well-motivated factor* (e.g., learning rate, regularization, architecture, or augmentation strategy) and include a brief rationale.
  - Maintain a structured table summarizing all runs, backbones, hyperparameters, and accuracies.
- Identify and report your best-performing configuration, including:
  - training/validation accuracy and loss curves
  - final test accuracy
  - confusion matrix
  - per-class accuracy summary
- Provide a paragraph (~1/2 a page) reflecting on your tuning process: what helped, what didn’t, and how you would refine it further.
- Save important checkpoints from all models for use in Problem 3.

###  Problem 2 — Transfer Learning [30 points]

Fine-tune a **pretrained CNN backbone** from `torchvision`.

**Requirements:**
- Replace the classifier head with the correct number of classes for the Intel dataset.
- Define and justify your fine-tuning strategy (which layers frozen/unfrozen, when, and why).
- Carry out a hyperparameter tuning campaign with ≥ 10 total runs, systematically exploring learning-rate schedules, layer-freezing policies, and/or augmentation strategies.
  - Each experiment must have a clear, written motivation.
  - Maintain a structured table summarizing all runs, backbones, hyperparameters, and accuracies.
- Identify and report your best-performing configuration, including:
  - training/validation accuracy and loss curves  
  - final test accuracy
  - confusion matrix
  - per-class accuracy summary
- Provide a paragraph (~1/2 a page) reflecting on your tuning process: what helped, what didn’t, and how you would refine it further.
- Save important checkpoints from all models for use in Problem 3.

### Problem 3 — Model Ensembling and Performance Integration [15 points]

In this final problem, you will combine and evaluate ensembles built from models trained during your hyperparameter tuning campaigns in Problems 1 and 2. Rather than training new networks, you will use your saved checkpoints (best runs or diverse configurations) to study how ensembling affects performance, generalization, and class-wise stability.

The goal is to understand how architectural diversity and independent tuning decisions interact when multiple models vote or average predictions.  

You will test several ensemble strategies, compare them to your best single models, and analyze which factors make ensembles most effective.

**Requirements:**
- Build ensembles using models saved from your tuning campaigns in Problems 1 and 2 (do not retrain).  
  - Use at least 5 high-performing checkpoints, mixing both Custom CNN and Transfer Learning models.
- Explore at least two ensemble strategies (such as hard vs. soft voting; uniform vs. weighted averaging)
- Evaluate each ensemble on the same held-out test set and report:
  - training/validation accuracy and loss curves
  - final test accuracy
  - confusion matrix
  - per-class accuracy summary
- Present a summary table listing ensemble composition, weighting scheme, and performance metrics for all ensemble variants tested.
- Write a discussion (~1/2 page) of describing how ensembling affected accuracy and stability across classes, whether model diversity (architecture or hyperparameters) improved results, and which ensemble approach was most effective and why
- Conclude with a brief reflection (3–5 sentences) on what your findings reveal about model diversity, robustness, and generalization.

## Checklist

- [ ] Custom CNN, Transfer model, and Ensemble implemented  
- [ ] Curves (acc/loss), confusion matrices (raw + normalized), per‑class accuracy  
- [ ] Final test accuracies reported for all three models  
- [ ] Summary & Reflection (≤ 300 words) written  
- [ ] Seed, versions, and hardware documented  
- [ ] Notebook runs cleanly end‑to‑end