#### What we were doing last time

We need to run experiment p = 0.3, 0.7 and 0.5.
We also wanted to plot different feature maps in our model to observe which I'll use as the last visualization.
Set up their text cells as we gain knowledge about their behaviour.

# Dropout in Fully Connected Layers

## Purpose
We test whether applying dropout to the **fully connected layers** reduces overfitting and improves generalization.  
This notebook compares FC dropout performance against the baseline.  

In [1]:
!pip install thop

Collecting thop
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: thop
Successfully installed thop-0.1.1.post2209072238


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
%cd /content/drive/MyDrive/regularization-ml/


/content/drive/MyDrive/regularization-ml


In [7]:
import torch
from PIL import Image

from config.paths import PathConfig # Path config

from src.model import MiniCNN, ConvBlock, FCBlock
from src.train import trainModel
from src.data import CustomDataset, load_cifar_10_data, check_data_loading, Loader, class_to_idx
from src.visualizations import plotFmaps_and_activationHist, plotCurves
from src.utils import EarlyStopping, unpickle, loadWeights, readJson, genError, saveHistory, evalModel

In [8]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [9]:
paths = PathConfig("regularization-ml", "regularization-data")
PROJECT_DIR = paths.project
DATA_DIR = paths.data
BASE_DIR = paths.root

In [10]:
# Copy once from Drive
!cp $DATA_DIR/cifar-10-python.tar.gz /content/

# Extract locally
!mkdir /content/dataset/
!tar -xvzf /content/cifar-10-python.tar.gz -C /content/dataset/

cifar-10-batches-py/
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1


## Model Definition
Baseline CNN + dropout in the fully connected layers.  
- Dropout probability tested: p ∈ {0.3, 0.5, 0.7}  
- Other architecture/hyperparameters unchanged.

Firstly, we test for p = 0.3.

### Dropout p = 0.3
We change the probability of dropout occuring in the fc layer to 0.3

In [11]:
base_conv_layers = [
    ConvBlock(3, 64, pool=False),
    ConvBlock(64, 64),
    ConvBlock(64, 128, pool=False),
    ConvBlock(128, 128)
]

base_fc_layers = [
    FCBlock(128, 64, True, 0.3),
    torch.nn.Linear(64, 10)
]

drop_3_model = MiniCNN(base_conv_layers, base_fc_layers)

## Dataset
Same setup as baseline (CIFAR-10, normalized, no augmentations).  


In [12]:
LOCAL_DATA = f"{BASE_DIR}/dataset" # path to cifar-10 dataset
train_data, train_labels, val_data, val_labels, test_data, test_labels = load_cifar_10_data(LOCAL_DATA)

In [13]:
# Creates train, test, and val loaders
train_loader, val_loader, test_loader = Loader(train_data, train_labels, val_data, val_labels, test_data, test_labels)

Starting Data Loading...
✅ CUDA available: Tesla T4
   Memory: 15095 MB
📁 Loading datasets...
✅ Datasets loaded successfully
Training samples: 40000
Validation samples: 10000
Batch size: 64
🔍 Testing data loading...
✅ Train batch shape: torch.Size([64, 3, 32, 32]), Labels: torch.Size([64])
   Input range: [-1.989, 2.126]
   Label range: [0, 9]
✅ Val batch shape: torch.Size([64, 3, 32, 32]), Labels: torch.Size([64])


## Training Setup
Identical to baseline: AdamW, lr=0.01, decay_factor=0.01, lr annealing, batch_size=64, num_epochs dependent on early stopping. Patience is higher, raised to 13 to account for the slower learning due to regularization.

Only difference: dropout added after dense layers.

In [None]:
history_drop3 = {"train_loss": [], "val_loss": [], "train_acc": [], "val_acc": []}
model_type = "fc0.3"
path = f"{DATA_DIR}/weights"
print(path)
base_model = trainModel(drop_3_model, history_drop3, train_loader, val_loader, model_type, path)
saveHistory(history_drop3, "fc0.3") # Saves the training metadata to a json file

/content/drive/MyDrive/regularization-data/weights
Using device: cuda
--------------------------------------------------
Epoch 1/70




Train Loss: 1.9500, Train Acc: 0.2405
Val Loss: 1.6300, Val Acc: 0.3694
Best Val Acc: 0.3694 | LR: 0.001000
Time: 37.6s
--------------------------------------------------


--------------------------------------------------
Epoch 2/70




💾 Saved best model!
Train Loss: 1.6107, Train Acc: 0.3901
Val Loss: 1.4965, Val Acc: 0.4278
Best Val Acc: 0.4278 | LR: 0.001000
Time: 37.3s
--------------------------------------------------


--------------------------------------------------
Epoch 3/70




💾 Saved best model!
Train Loss: 1.4143, Train Acc: 0.4765
Val Loss: 1.2697, Val Acc: 0.5272
Best Val Acc: 0.5272 | LR: 0.001000
Time: 37.0s
--------------------------------------------------


--------------------------------------------------
Epoch 4/70


Training:  48%|████▊     | 303/625 [00:16<00:15, 20.18it/s, loss=1.4080, batch=300/625]

## Plots and Visualizations

In [None]:
weights_path = f"{DATA_DIR}/weights/fc0.3.pth"
fc3model = loadWeights(drop_3_model, weights_path)

In [None]:
visualizations_folder = f"{DATA_DIR}/visualizations/fc0.3"
plotFmaps_and_activationHist(fc3model, visualizations_folder, val_loader, 1)

In [None]:
plotFmaps_and_activationHist(fc3model, visualizations_folder, val_loader, 2)

In [None]:
plotFmaps_and_activationHist(fc3model, visualizations_folder, val_loader, 3)

In [None]:
fc3_data = readJson("fc0.3")

In [None]:
train_losses = fc3_data["train_loss"]
val_losses = fc3_data["val_loss"]
train_accs = fc3_data["train_acc"]
val_accs = fc3_data["val_acc"]
plotCurves(train_losses, val_losses, train_accs, val_accs, "fc0.3")

In [None]:
genError(visualizations_folder, train_losses, val_losses)

In [None]:
evalModel(fc3model, visualizations_folder, test_loader)

### Dropout p = 0.5

In [None]:
# p = 0.5 in the model definition.
base_conv_layers = [
    ConvBlock(3, 64, pool=False),
    ConvBlock(64, 64),
    ConvBlock(64, 128, pool=False),
    ConvBlock(128, 128)
]

base_fc_layers = [
    FCBlock(128, 64, True, 0.5),
    torch.nn.Linear(64, 10)
]

drop_3_model = MiniCNN(base_conv_layers, base_fc_layers)