# Building & Training Neural Networks

## Part 2 Exercises: nn.Module, Loss Functions & Training Loops

Build a banknote forgery detector using the Banknote Authentication dataset.

**Dataset Info:**
- 1372 samples of banknote images
- 4 features extracted via wavelet transform: variance, skewness, kurtosis, entropy
- Binary classification: 0 = authentic, 1 = forged

**Instructions:**
- Complete the code in each cell where you see `# YOUR CODE HERE`
- Run the assertion cells to check your work
- Each exercise builds on concepts from the slides

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

print(f"PyTorch version: {torch.__version__}")

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

---
## Loading the Dataset

First, let's load and explore the Banknote Authentication dataset.

In [None]:
# Load the banknote dataset
data = np.loadtxt('../data/data_banknote_authentication.txt', delimiter=',')

X = data[:, :4]  # Features: variance, skewness, kurtosis, entropy
y = data[:, 4]   # Labels: 0 = authentic, 1 = forged

print(f"Dataset shape: {X.shape}")
print(f"Features: variance, skewness, kurtosis, entropy")
print(f"Class distribution: {np.bincount(y.astype(int))}")
print(f"\nFirst 5 samples:")
print(X[:5])

In [None]:
# Split into train, validation, and test sets
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.2, random_state=42)

print(f"Train set: {X_train.shape[0]} samples")
print(f"Val set: {X_val.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

In [None]:
# Normalize features (important for neural networks!)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)  # Shape: (N, 1)
X_val_t = torch.tensor(X_val, dtype=torch.float32)
y_val_t = torch.tensor(y_val, dtype=torch.float32).unsqueeze(1)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

print(f"X_train_t shape: {X_train_t.shape}")
print(f"y_train_t shape: {y_train_t.shape}")

---
## Exercise 1: Build a Binary Classifier

Create a neural network for binary classification using `nn.Module`.

### 1.1 Define the Model

Create a `BanknoteClassifier` class that inherits from `nn.Module` with:
- Input layer: 4 features
- Hidden layer: 32 neurons with ReLU activation
- Output layer: 1 neuron (no activation - we'll use BCEWithLogitsLoss)

In [None]:
class BanknoteClassifier(nn.Module):
    def __init__(self, hidden_size=32):
        super().__init__()
        # YOUR CODE HERE
        # Define layers: fc1 (4 -> hidden_size), fc2 (hidden_size -> 1)
        ...
    
    def forward(self, x):
        # YOUR CODE HERE
        # Pass through fc1, apply ReLU, pass through fc2
        ...

# Create model instance
model = BanknoteClassifier(hidden_size=32)
print(model)

In [None]:
# Verification
test_input = torch.randn(5, 4)
test_output = model(test_input)
assert test_output.shape == torch.Size([5, 1]), f"Output shape should be (5, 1), got {test_output.shape}"
print("1.1 Passed!")

### 1.2 Count Parameters

Write a function to count the total number of trainable parameters in the model.

In [None]:
def count_parameters(model):
    """Count total trainable parameters in a model."""
    # YOUR CODE HERE
    # Hint: use model.parameters() and p.numel()
    ...

num_params = count_parameters(model)
print(f"Total parameters: {num_params}")

In [None]:
# Verification
# fc1: 4*32 + 32 = 160, fc2: 32*1 + 1 = 33, total = 193
assert num_params == 193, f"Expected 193 parameters, got {num_params}"
print("1.2 Passed!")

---
## Exercise 2: Training Loop

Implement a complete training loop with loss tracking.

### 2.1 Set up Loss and Optimizer

Create:
- Loss function: `BCEWithLogitsLoss` (combines sigmoid + BCE, more numerically stable)
- Optimizer: Adam with learning rate 0.001

In [None]:
# Re-initialize model for fresh training
model = BanknoteClassifier(hidden_size=32)

# YOUR CODE HERE
criterion = ...  # BCEWithLogitsLoss
optimizer = ...  # Adam with lr=0.001

print(f"Loss function: {criterion}")
print(f"Optimizer: {optimizer}")

In [None]:
# Verification
assert isinstance(criterion, nn.BCEWithLogitsLoss), "Use BCEWithLogitsLoss"
assert isinstance(optimizer, optim.Adam), "Use Adam optimizer"
print("2.1 Passed!")

### 2.2 Implement the Training Loop

Train the model for 100 epochs:
- Track train and validation loss each epoch
- Remember: `model.train()` for training, `model.eval()` for validation
- Use `torch.no_grad()` during validation

In [None]:
num_epochs = 100
train_losses = []
val_losses = []

for epoch in range(num_epochs):
    # YOUR CODE HERE
    # Training phase
    model.train()
    ...  # zero_grad, forward, loss, backward, step
    train_loss = ...  # Get the loss value
    
    # Validation phase
    model.eval()
    with torch.no_grad():
        ...  # Forward pass on validation data
        val_loss = ...  # Get the loss value
    
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    
    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1:3d}: Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}")

print(f"\nFinal Train Loss: {train_losses[-1]:.4f}")
print(f"Final Val Loss: {val_losses[-1]:.4f}")

In [None]:
# Verification
assert len(train_losses) == 100, "Should have 100 epochs of training loss"
assert len(val_losses) == 100, "Should have 100 epochs of validation loss"
assert train_losses[-1] < train_losses[0], "Training loss should decrease"
print("2.2 Passed!")

### 2.3 Plot Training Progress

Visualize the training and validation loss over epochs.

In [None]:
# YOUR CODE HERE
# Plot train_losses and val_losses
plt.figure(figsize=(10, 5))
...  # Plot both losses with labels
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

---
## Exercise 3: Hyperparameter Tuning

Compare different model configurations.

### 3.1 Compare Learning Rates

Train models with different learning rates and compare their convergence.

In [None]:
learning_rates = [0.01, 0.001, 0.0001]
results = {}

for lr in learning_rates:
    # YOUR CODE HERE
    # Create new model, criterion, and optimizer for each learning rate
    model = BanknoteClassifier(hidden_size=32)
    criterion = nn.BCEWithLogitsLoss()
    optimizer = ...  # Adam with current lr
    
    losses = []
    for epoch in range(50):  # Shorter training for comparison
        model.train()
        optimizer.zero_grad()
        outputs = model(X_train_t)
        loss = criterion(outputs, y_train_t)
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
    
    results[lr] = losses
    print(f"LR = {lr}: Final Loss = {losses[-1]:.4f}")

In [None]:
# Plot comparison
plt.figure(figsize=(10, 5))
for lr, losses in results.items():
    plt.plot(losses, label=f'LR = {lr}')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Learning Rate Comparison')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### 3.2 Compare Hidden Layer Sizes

Train models with different hidden layer sizes.

In [None]:
hidden_sizes = [16, 32, 64]
size_results = {}

for hidden_size in hidden_sizes:
    # YOUR CODE HERE
    # Create model with current hidden_size
    model = ...  # BanknoteClassifier with hidden_size
    criterion = nn.BCEWithLogitsLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    losses = []
    for epoch in range(50):
        model.train()
        optimizer.zero_grad()
        outputs = model(X_train_t)
        loss = criterion(outputs, y_train_t)
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
    
    size_results[hidden_size] = losses
    print(f"Hidden Size = {hidden_size}: Final Loss = {losses[-1]:.4f}, Params = {count_parameters(model)}")

In [None]:
# Plot comparison
plt.figure(figsize=(10, 5))
for size, losses in size_results.items():
    plt.plot(losses, label=f'Hidden = {size}')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Hidden Layer Size Comparison')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

---
## Exercise 4: Evaluation

Evaluate your model on the test set and compute metrics.

### 4.1 Train Final Model

Train a final model with your best hyperparameters.

In [None]:
# Train final model
final_model = BanknoteClassifier(hidden_size=32)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(final_model.parameters(), lr=0.001)

num_epochs = 100
for epoch in range(num_epochs):
    final_model.train()
    optimizer.zero_grad()
    outputs = final_model(X_train_t)
    loss = criterion(outputs, y_train_t)
    loss.backward()
    optimizer.step()

print(f"Training complete. Final loss: {loss.item():.4f}")

### 4.2 Make Predictions

Get predictions on the test set. Remember:
- Use `model.eval()` and `torch.no_grad()`
- Apply sigmoid to convert logits to probabilities
- Threshold at 0.5 to get binary predictions

In [None]:
# YOUR CODE HERE
final_model.eval()
with torch.no_grad():
    # Get logits
    logits = ...  # Forward pass on X_test_t
    # Convert to probabilities using sigmoid
    probs = ...  # torch.sigmoid(logits)
    # Convert to binary predictions (threshold = 0.5)
    predictions = ...  # (probs >= 0.5).float()

# Convert to numpy for sklearn metrics
y_true = y_test_t.numpy().flatten()
y_pred = predictions.numpy().flatten()

print(f"Sample predictions: {y_pred[:10]}")
print(f"Sample ground truth: {y_true[:10]}")

### 4.3 Calculate Metrics

Compute accuracy, precision, recall, and F1 score.

In [None]:
# YOUR CODE HERE
accuracy = ...  # accuracy_score(y_true, y_pred)
precision = ...  # precision_score(y_true, y_pred)
recall = ...  # recall_score(y_true, y_pred)
f1 = ...  # f1_score(y_true, y_pred)

print(f"Test Set Metrics:")
print(f"  Accuracy:  {accuracy:.4f}")
print(f"  Precision: {precision:.4f}")
print(f"  Recall:    {recall:.4f}")
print(f"  F1 Score:  {f1:.4f}")

In [None]:
# Verification
assert accuracy > 0.9, f"Accuracy should be > 90%, got {accuracy:.2%}"
print("4.3 Passed! Your model achieves good performance.")

### 4.4 Plot Confusion Matrix

Visualize the confusion matrix to understand model errors.

In [None]:
# YOUR CODE HERE
cm = ...  # confusion_matrix(y_true, y_pred)

plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap='Blues')
plt.title('Confusion Matrix')
plt.colorbar()
plt.xlabel('Predicted Label')
plt.ylabel('True Label')

# Add text annotations
for i in range(2):
    for j in range(2):
        plt.text(j, i, str(cm[i, j]), ha='center', va='center', fontsize=20)

plt.xticks([0, 1], ['Authentic (0)', 'Forged (1)'])
plt.yticks([0, 1], ['Authentic (0)', 'Forged (1)'])
plt.tight_layout()
plt.show()

print(f"True Positives (correctly identified forged): {cm[1, 1]}")
print(f"True Negatives (correctly identified authentic): {cm[0, 0]}")
print(f"False Positives (authentic marked as forged): {cm[0, 1]}")
print(f"False Negatives (forged marked as authentic): {cm[1, 0]}")

### 4.5 Save the Model

Save your trained model's state dict.

In [None]:
# YOUR CODE HERE
# Save the model state dict
...  # torch.save(final_model.state_dict(), 'banknote_classifier.pth')

print("Model saved!")

In [None]:
# Verification: Load and test the saved model
loaded_model = BanknoteClassifier(hidden_size=32)
loaded_model.load_state_dict(torch.load('banknote_classifier.pth'))
loaded_model.eval()

with torch.no_grad():
    loaded_outputs = loaded_model(X_test_t)
    loaded_preds = (torch.sigmoid(loaded_outputs) >= 0.5).float()

assert torch.equal(predictions, loaded_preds), "Loaded model should give same predictions"
print("4.5 Passed! Model saved and loaded successfully.")

---
## Bonus Exercises

Great work completing the main exercises! The following bonus sections let you explore more advanced topics and apply what you've learned to a classic machine learning benchmark.

---
## Bonus 1: Add Dropout

Dropout is a regularization technique that randomly "turns off" neurons during training to prevent overfitting. Create an improved model with dropout.

In [None]:
class ImprovedClassifier(nn.Module):
    def __init__(self, hidden_size=32, dropout_rate=0.2):
        super().__init__()
        # YOUR CODE HERE
        # Add dropout layer between hidden and output
        ...
    
    def forward(self, x):
        # YOUR CODE HERE
        ...

# Test the improved model
improved_model = ImprovedClassifier(hidden_size=64, dropout_rate=0.3)
print(improved_model)

---
## Congratulations!

You've completed all the Part 2 exercises. You now have hands-on experience with:

1. **Building Models** - Creating neural networks with `nn.Module`
2. **Training Loops** - Implementing the forward-backward-update cycle
3. **Hyperparameter Tuning** - Comparing learning rates and architectures
4. **Evaluation** - Computing metrics and analyzing model performance
5. **Saving Models** - Persisting trained models for later use
6. **Real-World Data** - Training on MNIST with DataLoaders

You've built a banknote forgery detector AND a handwritten digit classifier!

---
## Bonus 2: MNIST Digit Classifier

Ready for a bigger challenge? MNIST is the "Hello World" of machine learning - a dataset of 70,000 handwritten digits that has been used to benchmark algorithms for decades.

This exercise brings together everything you've learned:
- Building a multi-class classifier (10 digits)
- Using `DataLoader` for batched training
- Working with image data

**The Challenge:**
- 28x28 grayscale images (784 pixels when flattened)
- 10 classes (digits 0-9)
- Use `CrossEntropyLoss` for multi-class classification
- **Target: >95% accuracy**

In [None]:
# Load MNIST dataset
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define transforms: convert to tensor and normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST mean and std
])

# Download and load training/test data
train_dataset = datasets.MNIST('../data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('../data', train=False, transform=transform)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Image shape: {train_dataset[0][0].shape}")

In [None]:
# YOUR CODE HERE: Define the MNIST classifier
class MNISTClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Define your layers here
        ...
    
    def forward(self, x):
        # Implement forward pass (hint: flatten the 28x28 image first)
        ...

mnist_model = MNISTClassifier()
print(mnist_model)

In [None]:
# YOUR CODE HERE: Train the MNIST classifier
# Set up loss function and optimizer, then train for a few epochs
# Hint: Use CrossEntropyLoss and iterate over train_loader

...

In [None]:
# YOUR CODE HERE: Evaluate on test set and print accuracy
# Target: >95% accuracy

...

In [None]:
# Optional: Visualize some predictions
# Show a grid of test images with predicted vs actual labels

...