# Lesson 30 activity: hyperparameter optimization

In the demo, we used Optuna to optimize just 2 hyperparameters:
1. Number of convolutional blocks (1-4)
2. Dropout rate (0.2-0.5)

## Your Challenge

Extend the optimization to include **additional hyperparameters**. The goal is to explore how different aspects of the model and training process affect performance.

### Things to Consider

Think about the different "knobs" you can tune in a neural network:

- **Architecture choices**: How wide should the layers be? How many neurons in the fully connected layers?
- **Training dynamics**: What about the learning rate? Could the optimizer itself be a choice?
- **Regularization**: Are there other regularization techniques besides dropout?

### Hints

- Look at Optuna's `suggest_*` methods: `suggest_int()`, `suggest_float()`, `suggest_categorical()`
- Some hyperparameters might need to be passed to `create_cnn()`, while others might be used when creating the optimizer
- Be careful about search space sizes - more hyperparameters means more trials needed!
- Consider using `suggest_float(..., log=True)` for hyperparameters that span orders of magnitude

### Suggested Starting Points (pick 1-2 to add)

1. **Learning rate**: What range makes sense? (Hint: think logarithmic scale)
2. **Initial filters**: The demo uses 32 - what if you tried 16, 32, or 64?
3. **FC layer sizes**: Could you optimize the fully connected layer dimensions?
4. **Optimizer choice**: Adam vs SGD vs RMSprop - which works best?
5. **Batch size**: Does this affect final accuracy?

---

## Setup

To run this notebook install `cifar10_tools` via pip: `pip install cifar10_tools`

**Note**: If you are not working in one of the course deeplearning containers, you will also need to pip install `optuna` and `optuna-dashboard` to run this notebook.

```
pip install optuna optuna-dashboard
```

The Optuna dashboard can be viewed either via the VS Code extension 'Optuna Dashboard', or via the built-in web server. Start it with:

```
optuna-dashboard sqlite:///data/simple_optimization.db --host 0.0.0.0

## Setup

In [None]:
# Standard library imports
from pathlib import Path

# Third-party imports
import matplotlib.pyplot as plt
import numpy as np
import optuna
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Custom package imports
from cifar10_tools.pytorch.training import train_model
from cifar10_tools.pytorch.evaluation import evaluate_model
from cifar10_tools.pytorch.plotting import (
    plot_sample_images,
    plot_learning_curves,
    plot_confusion_matrix
)

# Suppress Optuna info messages (show only warnings and errors)
optuna.logging.set_verbosity(optuna.logging.WARNING)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

In [None]:
# Fixed hyperparameters (some of these could become tunable!)
batch_size = 1000
initial_filters = 32
fc_units_1 = 512
fc_units_2 = 128
use_batch_norm = True
learning_rate = 0.001

# Optuna settings
n_trials = 30            # You may want more trials with more hyperparameters
n_epochs_per_trial = 20  # Short training per trial
n_epochs_final = 50      # Longer training for final model

# Storage path for Optuna study - use a NEW name for your experiment!
storage_path = Path('../data/activity_optimization.db')
storage_path.parent.mkdir(parents=True, exist_ok=True)
storage_url = f'sqlite:///{storage_path.resolve()}'

# CIFAR-10 class names
class_names = [
    'airplane', 'automobile', 'bird', 'cat', 'deer', 
    'dog', 'frog', 'horse', 'ship', 'truck'
]

num_classes = len(class_names)

## 1. Load and Prepare Data

In [None]:
# Data directory
data_dir = Path('../data')
data_dir.mkdir(parents=True, exist_ok=True)

# Transform: convert to tensor and normalize RGB channels
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Download and load CIFAR-10
train_dataset = torchvision.datasets.CIFAR10(
    root=data_dir,
    train=True,
    download=True,
    transform=transform
)

test_dataset = torchvision.datasets.CIFAR10(
    root=data_dir,
    train=False,
    download=True,
    transform=transform
)

print(f'Training samples: {len(train_dataset)}')
print(f'Test samples: {len(test_dataset)}')

In [None]:
# Preload data to GPU for faster training
X_train_full = torch.stack([img for img, _ in train_dataset]).to(device)
y_train_full = torch.tensor([label for _, label in train_dataset]).to(device)
X_test = torch.stack([img for img, _ in test_dataset]).to(device)
y_test = torch.tensor([label for _, label in test_dataset]).to(device)

# Split training data into train and validation sets (80/20)
n_train = int(0.8 * len(X_train_full))
indices = torch.randperm(len(X_train_full))

X_train = X_train_full[indices[:n_train]]
y_train = y_train_full[indices[:n_train]]
X_val = X_train_full[indices[n_train:]]
y_val = y_train_full[indices[n_train:]]

print(f'X_train: {X_train.shape}')
print(f'X_val: {X_val.shape}')
print(f'X_test: {X_test.shape}')

In [None]:
# Create DataLoaders
train_tensor_dataset = torch.utils.data.TensorDataset(X_train, y_train)
val_tensor_dataset = torch.utils.data.TensorDataset(X_val, y_val)
test_tensor_dataset = torch.utils.data.TensorDataset(X_test, y_test)

train_loader = DataLoader(train_tensor_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_tensor_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_tensor_dataset, batch_size=batch_size, shuffle=False)

print(f'Training batches: {len(train_loader)}')
print(f'Validation batches: {len(val_loader)}')
print(f'Test batches: {len(test_loader)}')

## 2. Define CNN Architecture

**TODO**: Consider modifying this function to accept additional parameters.

For example, you might want to make `initial_filters`, `fc_units_1`, or `fc_units_2` configurable.

*Think about: What would need to change in the function signature and body?*

In [None]:
def create_cnn(n_conv_blocks: int, dropout_rate: float) -> nn.Sequential:
    '''Create a CNN with configurable architecture.
    
    Args:
        n_conv_blocks: Number of convolutional blocks (1-4)
        dropout_rate: Dropout probability
        
        # TODO: Add more parameters here if needed!
    
    Returns:
        nn.Sequential model
    '''

    layers = []
    in_channels = 3  # RGB input
    current_size = 32  # Input image size
    
    for block_idx in range(n_conv_blocks):
        out_channels = initial_filters * (2 ** block_idx)
        
        # Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm -> ReLU -> Pool -> Dropout
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.BatchNorm2d(out_channels))
        layers.append(nn.ReLU())
        
        layers.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.BatchNorm2d(out_channels))
        layers.append(nn.ReLU())
        
        layers.append(nn.MaxPool2d(2, 2))
        layers.append(nn.Dropout(dropout_rate))
        
        in_channels = out_channels
        current_size //= 2
    
    # Calculate flattened size
    final_channels = initial_filters * (2 ** (n_conv_blocks - 1))
    flattened_size = final_channels * current_size * current_size
    
    # Classifier (3 fully connected layers)
    layers.append(nn.Flatten())
    layers.append(nn.Linear(flattened_size, fc_units_1))
    layers.append(nn.ReLU())
    layers.append(nn.Dropout(dropout_rate))
    layers.append(nn.Linear(fc_units_1, fc_units_2))
    layers.append(nn.ReLU())
    layers.append(nn.Dropout(dropout_rate))
    layers.append(nn.Linear(fc_units_2, num_classes))
    
    return nn.Sequential(*layers)

## 3. Optuna Hyperparameter Optimization

### 3.1. Training Function for Trials

In [None]:
def train_trial(
    model: nn.Module,
    optimizer: optim.Optimizer,
    criterion: nn.Module,
    train_loader: DataLoader,
    val_loader: DataLoader,
    n_epochs: int,
    trial: optuna.Trial
) -> float:
    '''Train a model for a single Optuna trial with pruning support.'''
    
    best_val_accuracy = 0.0
    
    for epoch in range(n_epochs):

        # Training phase
        model.train()

        for images, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
        
        # Validation phase
        model.eval()
        val_correct = 0
        val_total = 0
        
        with torch.no_grad():
            for images, labels in val_loader:

                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                val_total += labels.size(0)
                val_correct += (predicted == labels).sum().item()
        
        val_accuracy = 100 * val_correct / val_total
        best_val_accuracy = max(best_val_accuracy, val_accuracy)
        
        # Report for pruning
        trial.report(val_accuracy, epoch)
        
        if trial.should_prune():
            raise optuna.TrialPruned()
    
    return best_val_accuracy

### 3.2. Define Objective Function

**TODO**: This is where you add your new hyperparameters!

Currently we optimize:
- `n_conv_blocks`: 1 to 4 convolutional blocks
- `dropout_rate`: 0.2 to 0.5

**Your task**: Add at least 1-2 more hyperparameters to optimize.

#### Useful Optuna methods:
```python
# For integers (e.g., number of filters, layer sizes)
trial.suggest_int('param_name', low, high)

# For floats (e.g., dropout, learning rate)
trial.suggest_float('param_name', low, high)

# For floats on log scale (great for learning rates!)
trial.suggest_float('param_name', low, high, log=True)

# For categorical choices (e.g., optimizer type)
trial.suggest_categorical('param_name', ['option1', 'option2', 'option3'])
```

#### Example additions you might try:
```python
# Optimize learning rate (log scale is important here!)
lr = trial.suggest_float('learning_rate', 1e-5, 1e-2, log=True)

# Optimize initial filter count
init_filters = trial.suggest_categorical('initial_filters', [16, 32, 64])

# Optimize optimizer choice
optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'SGD', 'RMSprop'])
```

In [None]:
def objective(trial: optuna.Trial) -> float:
    '''Optuna objective function.
    
    TODO: Add more hyperparameters to optimize!
    '''
    
    # === Existing hyperparameters ===
    n_conv_blocks = trial.suggest_int('n_conv_blocks', 1, 4)
    dropout_rate = trial.suggest_float('dropout_rate', 0.2, 0.5)
    
    # === TODO: Add your new hyperparameters here! ===
    # Example:
    # lr = trial.suggest_float('learning_rate', 1e-5, 1e-2, log=True)
    
    
    # Create model
    model = create_cnn(
        n_conv_blocks=n_conv_blocks,
        dropout_rate=dropout_rate
        # TODO: Pass any new architecture parameters here
    ).to(device)
    
    # Create optimizer
    # TODO: If you're optimizing learning rate or optimizer type,
    #       you'll need to modify this section!
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    # Set loss function
    criterion = nn.CrossEntropyLoss()
    
    # Train and return validation accuracy
    try:
        return train_trial(
            model=model,
            optimizer=optimizer,
            criterion=criterion,
            train_loader=train_loader,
            val_loader=val_loader,
            n_epochs=n_epochs_per_trial,
            trial=trial
        )

    except torch.cuda.OutOfMemoryError:

        torch.cuda.empty_cache()
        raise optuna.TrialPruned('CUDA OOM')

### 3.3. Run Optimization

**Note**: With more hyperparameters, you may want to increase `n_trials` for better exploration of the search space.

In [None]:
%%time
    
# Create Optuna study (maximize validation accuracy)
study = optuna.create_study(
    direction='maximize',
    study_name='activity_optimization',
    storage=storage_url,
    load_if_exists=True,
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=3)
)

# Run optimization
study.optimize(objective, n_trials=n_trials, show_progress_bar=True)

print(f'Best validation accuracy: {study.best_trial.value:.2f}%')
print(f'\nBest hyperparameters:')

for key, value in study.best_trial.params.items():
    print(f'  {key}: {value}')

print()

### 3.4. Visualize Optimization Results

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Optimization history
axes[0].set_title('Optimization History')

trial_numbers = [t.number for t in study.trials if t.value is not None]
trial_values = [t.value for t in study.trials if t.value is not None]

axes[0].plot(trial_numbers, trial_values, 'ko-', alpha=0.6)
axes[0].axhline(y=study.best_value, color='r', linestyle='--', label=f'Best: {study.best_value:.2f}%')
axes[0].set_xlabel('Trial')
axes[0].set_ylabel('Validation Accuracy (%)')
axes[0].legend()

# Hyperparameter importance
axes[1].set_title('Hyperparameter Importance')
completed_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

if len(completed_trials) >= 5:
    importance = optuna.importance.get_param_importances(study)
    params = list(importance.keys())
    values = list(importance.values())
    axes[1].barh(params, values, color='steelblue')
    axes[1].set_xlabel('Importance')

else:
    axes[1].text(0.5, 0.5, 'Not enough trials\nfor importance analysis', 
                 ha='center', va='center', transform=axes[1].transAxes)

plt.tight_layout()
plt.show()

## 4. Train Final Model with Best Hyperparameters

**TODO**: Update this section to use any new hyperparameters you added!

In [None]:
# Get best hyperparameters
best_params = study.best_trial.params

print('Best hyperparameters:')
for key, value in best_params.items():
    print(f'  {key}: {value}')

# Create model with best hyperparameters
# TODO: Pass any new parameters you added!
best_model = create_cnn(
    n_conv_blocks=best_params['n_conv_blocks'],
    dropout_rate=best_params['dropout_rate']
).to(device)

# Create optimizer
# TODO: Use best learning rate / optimizer if you optimized those!
best_optimizer = optim.Adam(best_model.parameters(), lr=learning_rate)

# Set loss function
criterion = nn.CrossEntropyLoss()

In [None]:
%%time

# Train for more epochs
history = train_model(
    model=best_model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=best_optimizer,
    epochs=n_epochs_final,
    print_every=10
)

In [None]:
# Plot learning curves
fig, axes = plot_learning_curves(history)
plt.show()

## 5. Evaluate on Test Set

In [None]:
test_accuracy, predictions, true_labels = evaluate_model(best_model, test_loader)
print(f'Test accuracy: {test_accuracy:.2f}%')

In [None]:
# Confusion matrix
fig, ax = plot_confusion_matrix(true_labels, predictions, class_names)
plt.show()

## 6. Reflection Questions

After completing your optimization, answer these questions:

1. **Which hyperparameters did you add?** Why did you choose those?

2. **Which hyperparameter was most important** according to Optuna's importance analysis?

3. **Did adding more hyperparameters improve your best accuracy** compared to the demo's 2-parameter search?

4. **What challenges did you encounter** when expanding the search space?

5. **If you had more time/compute, what else would you try?**

*Your answers here:*
