# Lab 2: The Very Basic Basics of Neural Networks

## üéì SOLUTIONS

**This notebook contains:**
- ‚úÖ Complete solutions to all exercises
- ‚úÖ Teaching notes and common student questions
- ‚úÖ Expected results and benchmarks
- ‚úÖ Additional experiments and extensions
- ‚úÖ Grading rubrics and assessment criteria

## üìã Teaching Notes

### Learning Objectives
1. Understand neural network structure (input, hidden, output layers)
2. Learn PyTorch's nn.Module paradigm
3. Understand training loop mechanics (forward, loss, backward, step)
4. Appreciate the role of activation functions
5. Develop intuition for hyperparameter selection

### Time Allocation (90 min lab)
- 0-10 min: Setup and imports
- 10-30 min: Simple model walkthrough
- 30-50 min: Improved models (hidden layer, ReLU)
- 50-90 min: Student experimentation

### Common Student Questions
1. **"Why zero_grad()?"** ‚Üí Gradients accumulate by default in PyTorch
2. **"What's the difference between forward() and __call__()?"** ‚Üí __call__ includes hooks
3. **"Why view(-1, 784)?"** ‚Üí Reshapes to (batch_size, 784)
4. **"What's a good accuracy?"** ‚Üí >95% is good for this lab
5. **"Why is my loss NaN?"** ‚Üí Usually learning rate too high

### Expected Student Performance
- **A students:** Will achieve >97% accuracy, try multiple architectures, understand trade-offs
- **B students:** Will achieve ~95% accuracy, complete all required exercises
- **C students:** Will achieve ~92% accuracy, may struggle with experimentation
- **Struggling students:** May not complete experimentation section, need help with PyTorch syntax

## Setup

In [None]:
"""
Computer Vision Course - Lab 2: Neural Networks
INSTRUCTOR VERSION

This cell sets up the environment.
Works automatically for both local and Google Colab!
"""

import os
import sys

# Detect environment
IN_COLAB = 'google.colab' in sys.modules

print("=" * 60)
print("Computer Vision - Lab 2 Setup (INSTRUCTOR VERSION)")
print("=" * 60)

if IN_COLAB:
    print("\nüîµ Running on Google Colab")
    print("-" * 60)
    
    if not os.path.exists('computer-vision'):
        print("üì• Cloning repository...")
        !git clone https://github.com/mjck/computer-vision.git
        print("‚úì Repository cloned successfully")
    else:
        print("‚úì Repository already exists")
    
    %cd computer-vision/labs/lab02_neural_networks
    print(f"‚úì Current directory: {os.getcwd()}")
    
    sys.path.insert(0, '/content/computer-vision')
    print("‚úì Python path configured")
    
    print("-" * 60)
    print("üü¢ Colab setup complete!\n")
    
else:
    print("\nüü¢ Running locally")
    print("-" * 60)
    print(f"‚úì Current directory: {os.getcwd()}")
    
    repo_root = os.path.abspath('../..')
    if repo_root not in sys.path:
        sys.path.insert(0, repo_root)
    print(f"‚úì Repository root: {repo_root}")
    
    print("-" * 60)
    print("üü¢ Local setup complete!\n")

print("=" * 60)
print("‚úÖ Environment ready!")
print("=" * 60)

## Import Libraries

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import time

# Import course utilities
try:
    from sdx import *
    print("‚úì Imported course utilities (sdx module)")
except ImportError as e:
    print("‚ùå Error importing sdx module")
    print(f"   {e}")
    raise

# Set device (GPU if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"‚úì Using device: {device}")

if device.type == 'cuda':
    print(f"  GPU: {torch.cuda.get_device_name(0)}")

print("\n" + "=" * 60)
print("‚úÖ All libraries imported successfully!")
print("=" * 60)

## Loading and Displaying the MNIST Dataset

In [None]:
# Download and load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Extract images and labels as numpy arrays for visualization
train_images = train_dataset.data.numpy()
train_labels = train_dataset.targets.numpy()
test_images = test_dataset.data.numpy()
test_labels = test_dataset.targets.numpy()

print(f"Training set: {len(train_images)} images")
print(f"Test set: {len(test_images)} images")
print(f"Image shape: {train_images[0].shape}")

In [None]:
# Visualizations
cv_imshow(train_images[9])
print(f"Label: {train_labels[9]}")

In [None]:
# Grid view with labels
cv_gridshow(train_images, start=10, stop=35, labels=train_labels)

## Building Neural Networks

### üéì Teaching Notes:
- Emphasize that this first model has NO hidden layers
- It's essentially logistic regression
- Expected accuracy: ~92%
- This establishes a baseline

In [None]:
# Simple model: 784 ‚Üí 10 (no hidden layers)
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc = nn.Linear(28 * 28, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten
        x = self.fc(x)
        return x

model = SimpleNN().to(device)
print(model)

total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params:,}")
print(f"Expected: {(784 * 10 + 10):,}")  # weights + biases

In [None]:
# Setup training
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Prepare data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [None]:
# Training function (reusable)
def train_model(model, train_loader, criterion, optimizer, epochs=1):
    model.train()
    
    for epoch in range(epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        
        pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}")
        
        for images, labels in pbar:
            images, labels = images.to(device), labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            pbar.set_postfix({
                'loss': f'{running_loss/(pbar.n+1):.4f}',
                'acc': f'{100*correct/total:.2f}%'
            })
        
        epoch_loss = running_loss / len(train_loader)
        epoch_acc = 100 * correct / total
        print(f"Epoch {epoch+1}: Loss = {epoch_loss:.4f}, Accuracy = {epoch_acc:.2f}%")

def evaluate_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")
    return accuracy

def plot_confusion_matrix(model, test_loader):
    from sklearn.metrics import confusion_matrix
    import seaborn as sns
    
    model.eval()
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.numpy())
    
    cm = confusion_matrix(all_labels, all_preds)
    
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=range(10), yticklabels=range(10))
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title('Confusion Matrix')
    plt.show()

In [None]:
# Train simple model
print("\n" + "=" * 60)
print("Training Simple Model (784 ‚Üí 10)")
print("Expected accuracy: ~92%")
print("=" * 60 + "\n")

train_model(model, train_loader, criterion, optimizer, epochs=1)
simple_acc = evaluate_model(model, test_loader)
plot_confusion_matrix(model, test_loader)

print(f"\nüìä Result: {simple_acc:.2f}% (Target: ~92%)")

## Improved Model with Hidden Layer

### üéì Teaching Notes:
- Adding a hidden layer creates a true neural network
- Without activation, multiple linear layers = one linear layer!
- Expected accuracy: ~95% (without ReLU)
- Students should notice modest improvement

In [None]:
class ImprovedNN(nn.Module):
    def __init__(self, hidden_size=128):
        super(ImprovedNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, hidden_size)
        self.fc2 = nn.Linear(hidden_size, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

model = ImprovedNN(hidden_size=128).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
print("\n" + "=" * 60)
print("Training Improved Model (784 ‚Üí 128 ‚Üí 10)")
print("Expected accuracy: ~94-95%")
print("=" * 60 + "\n")

train_model(model, train_loader, criterion, optimizer, epochs=1)
improved_acc = evaluate_model(model, test_loader)
plot_confusion_matrix(model, test_loader)

print(f"\nüìä Result: {improved_acc:.2f}% (Target: ~94-95%)")
print(f"Improvement: +{improved_acc - simple_acc:.2f}%")

## Model with ReLU Activation

### üéì Teaching Notes:
- ReLU introduces non-linearity
- Without it, stacked linear layers collapse to a single linear transformation
- Expected accuracy: ~97%
- **Key point:** This is where students see the real power of deep learning!

In [None]:
class NNWithReLU(nn.Module):
    def __init__(self, hidden_size=128):
        super(NNWithReLU, self).__init__()
        self.fc1 = nn.Linear(28 * 28, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu(x)  # Non-linearity!
        x = self.fc2(x)
        return x

model = NNWithReLU(hidden_size=128).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
print("\n" + "=" * 60)
print("Training Model with ReLU (784 ‚Üí ReLU ‚Üí 128 ‚Üí 10)")
print("Expected accuracy: ~97%")
print("=" * 60 + "\n")

train_model(model, train_loader, criterion, optimizer, epochs=1)
relu_acc = evaluate_model(model, test_loader)
plot_confusion_matrix(model, test_loader)

print(f"\nüìä Result: {relu_acc:.2f}% (Target: ~97%)")
print(f"Improvement over simple: +{relu_acc - simple_acc:.2f}%")
print(f"Improvement over no activation: +{relu_acc - improved_acc:.2f}%")

## üìä SOLUTION: Student Experimentation Section

### Multiple Solution Examples

Below are several example solutions showing different approaches students might take.

### Solution 1: Deeper Network

**Teaching Notes:**
- Shows diminishing returns with more layers
- Good students will try this
- Expected: ~97-98%

In [None]:
class DeeperNN(nn.Module):
    """Solution 1: Deeper network with 3 hidden layers"""
    def __init__(self):
        super(DeeperNN, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(256, 128)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(128, 64)
        self.relu3 = nn.ReLU()
        self.fc4 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.relu1(self.fc1(x))
        x = self.relu2(self.fc2(x))
        x = self.relu3(self.fc3(x))
        x = self.fc4(x)
        return x

model = DeeperNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

print("Solution 1: Deeper Network (784‚Üí256‚Üí128‚Üí64‚Üí10)")
print(model)
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

train_model(model, train_loader, criterion, optimizer, epochs=1)
deeper_acc = evaluate_model(model, test_loader)
print(f"\nüìä Accuracy: {deeper_acc:.2f}%")

### Solution 2: Using Adam Optimizer

**Teaching Notes:**
- Adam often trains faster than SGD
- Advanced students will discover this
- Typically reaches higher accuracy in same number of epochs

In [None]:
class AdamNN(nn.Module):
    """Solution 2: Same architecture but with Adam optimizer"""
    def __init__(self):
        super(AdamNN, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = AdamNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam instead of SGD!

print("Solution 2: Adam Optimizer")
print("Same architecture as before, but using Adam optimizer")
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

train_model(model, train_loader, criterion, optimizer, epochs=1)
adam_acc = evaluate_model(model, test_loader)
print(f"\nüìä Accuracy: {adam_acc:.2f}%")
print(f"Comparison to SGD: {adam_acc - relu_acc:+.2f}%")

### Solution 3: Adding Dropout

**Teaching Notes:**
- Dropout helps prevent overfitting
- Won't see huge benefit on MNIST (it's too easy)
- Good to introduce the concept

In [None]:
class DropoutNN(nn.Module):
    """Solution 3: Adding dropout for regularization"""
    def __init__(self):
        super(DropoutNN, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.2)
        self.fc2 = nn.Linear(256, 128)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(0.2)
        self.fc3 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.dropout1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

model = DropoutNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

print("Solution 3: With Dropout (20%)")
print(model)
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

train_model(model, train_loader, criterion, optimizer, epochs=1)
dropout_acc = evaluate_model(model, test_loader)
print(f"\nüìä Accuracy: {dropout_acc:.2f}%")

### Solution 4: Training for Multiple Epochs

**Teaching Notes:**
- Simple but effective!
- Shows the value of more training
- Can reach 98%+ with enough epochs

In [None]:
model = NNWithReLU(hidden_size=128).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

print("Solution 4: Training for 5 Epochs")
print("Same architecture, just more training time")

train_model(model, train_loader, criterion, optimizer, epochs=5)
multi_epoch_acc = evaluate_model(model, test_loader)
plot_confusion_matrix(model, test_loader)

print(f"\nüìä Accuracy after 5 epochs: {multi_epoch_acc:.2f}%")

### Solution 5: Different Activation Functions

**Teaching Notes:**
- Comparison of ReLU, LeakyReLU, Tanh
- ReLU usually wins for this problem
- Good experiment for curious students

In [None]:
class FlexibleNN(nn.Module):
    """Solution 5: Flexible model to test different activations"""
    def __init__(self, activation='relu'):
        super(FlexibleNN, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'leaky_relu':
            self.activation = nn.LeakyReLU()
        elif activation == 'tanh':
            self.activation = nn.Tanh()
        elif activation == 'sigmoid':
            self.activation = nn.Sigmoid()
        
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.activation(self.fc1(x))
        x = self.fc2(x)
        return x

results = {}

for activation_name in ['relu', 'leaky_relu', 'tanh', 'sigmoid']:
    print(f"\nTesting {activation_name}...")
    model = FlexibleNN(activation=activation_name).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    train_model(model, train_loader, criterion, optimizer, epochs=1)
    acc = evaluate_model(model, test_loader)
    results[activation_name] = acc

print("\n" + "=" * 60)
print("Activation Function Comparison:")
for name, acc in sorted(results.items(), key=lambda x: x[1], reverse=True):
    print(f"  {name:15s}: {acc:.2f}%")
print("=" * 60)

### Solution 6: Best Performing Model

**Teaching Notes:**
- Combines best practices
- Should achieve 98%+
- This is what A students should aim for

In [None]:
class BestNN(nn.Module):
    """
    Solution 6: Best performing model combining multiple techniques:
    - Deeper architecture
    - Batch normalization
    - Dropout
    - Adam optimizer
    - Multiple epochs
    """
    def __init__(self):
        super(BestNN, self).__init__()
        self.fc1 = nn.Linear(784, 512)
        self.bn1 = nn.BatchNorm1d(512)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.3)
        
        self.fc2 = nn.Linear(512, 256)
        self.bn2 = nn.BatchNorm1d(256)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(0.3)
        
        self.fc3 = nn.Linear(256, 128)
        self.bn3 = nn.BatchNorm1d(128)
        self.relu3 = nn.ReLU()
        self.dropout3 = nn.Dropout(0.2)
        
        self.fc4 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.relu1(x)
        x = self.dropout1(x)
        
        x = self.fc2(x)
        x = self.bn2(x)
        x = self.relu2(x)
        x = self.dropout2(x)
        
        x = self.fc3(x)
        x = self.bn3(x)
        x = self.relu3(x)
        x = self.dropout3(x)
        
        x = self.fc4(x)
        return x

model = BestNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

print("Solution 6: Best Performing Model")
print("Features: Deep architecture + BatchNorm + Dropout + Adam")
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")

print("\nTraining for 5 epochs...")
train_model(model, train_loader, criterion, optimizer, epochs=5)
best_acc = evaluate_model(model, test_loader)
plot_confusion_matrix(model, test_loader)

print(f"\nüèÜ Best Model Accuracy: {best_acc:.2f}%")
print(f"Improvement over baseline: +{best_acc - simple_acc:.2f}%")

## üìä Summary Comparison

### All Results at a Glance

In [None]:
import pandas as pd

# Compile all results
results_summary = pd.DataFrame([
    {'Model': 'Simple (784‚Üí10)', 'Parameters': '7,850', 'Epochs': 1, 'Optimizer': 'SGD', 'Accuracy': simple_acc},
    {'Model': 'Hidden Layer (784‚Üí128‚Üí10)', 'Parameters': '101,770', 'Epochs': 1, 'Optimizer': 'SGD', 'Accuracy': improved_acc},
    {'Model': 'With ReLU', 'Parameters': '101,770', 'Epochs': 1, 'Optimizer': 'SGD', 'Accuracy': relu_acc},
    {'Model': 'Deeper Network', 'Parameters': '~235K', 'Epochs': 1, 'Optimizer': 'SGD', 'Accuracy': deeper_acc},
    {'Model': 'Adam Optimizer', 'Parameters': '101,770', 'Epochs': 1, 'Optimizer': 'Adam', 'Accuracy': adam_acc},
    {'Model': 'With Dropout', 'Parameters': '~235K', 'Epochs': 1, 'Optimizer': 'Adam', 'Accuracy': dropout_acc},
    {'Model': 'Multi-Epoch (5)', 'Parameters': '101,770', 'Epochs': 5, 'Optimizer': 'Adam', 'Accuracy': multi_epoch_acc},
    {'Model': 'Best Model', 'Parameters': '~657K', 'Epochs': 5, 'Optimizer': 'Adam', 'Accuracy': best_acc},
])

results_summary = results_summary.sort_values('Accuracy', ascending=False)
print("\n" + "=" * 80)
print("FINAL RESULTS SUMMARY")
print("=" * 80)
print(results_summary.to_string(index=False))
print("=" * 80)

# Visualize
plt.figure(figsize=(12, 6))
plt.barh(results_summary['Model'], results_summary['Accuracy'])
plt.xlabel('Test Accuracy (%)')
plt.title('Model Comparison - MNIST Classification')
plt.xlim(90, 100)
for i, (model, acc) in enumerate(zip(results_summary['Model'], results_summary['Accuracy'])):
    plt.text(acc, i, f' {acc:.2f}%', va='center')
plt.tight_layout()
plt.show()

## üéì Grading Rubric

### Total: 100 points

#### Completion (40 points)
- All cells executed: 10 points
- Simple model trained: 10 points
- Hidden layer model trained: 10 points
- ReLU model trained: 10 points

#### Experimentation (30 points)
- Tried at least 3 different architectures: 15 points
- Documented what was tried: 10 points
- Compared results: 5 points

#### Analysis (20 points)
- Explained observations: 10 points
- Discussed what worked/didn't work: 10 points

#### Performance (10 points)
- >92% accuracy: 5 points (baseline)
- >95% accuracy: 7 points (good)
- >97% accuracy: 10 points (excellent)

### Bonus (up to 10 points)
- Achieved >98% accuracy: +5 points
- Implemented novel architecture: +3 points
- Created visualization/analysis: +2 points

### Common Deductions
- Didn't run all cells: -10 points
- No experimentation section: -30 points
- No analysis: -20 points
- Plagiarism: 0 points + academic integrity violation

## üîç Expected Student Mistakes

### 1. Forgetting `model.train()` / `model.eval()`
**Symptom:** Inconsistent results, dropout behaving oddly
**Fix:** Always set mode before training/evaluation

### 2. Not calling `optimizer.zero_grad()`
**Symptom:** Loss doesn't decrease, gradients explode
**Fix:** Clear gradients before each backward pass

### 3. Wrong device placement
**Symptom:** "Expected all tensors to be on the same device"
**Fix:** Move both model and data to same device

### 4. Learning rate too high
**Symptom:** Loss becomes NaN, doesn't converge
**Fix:** Reduce learning rate (try 0.001, 0.0001)

### 5. No activation function
**Symptom:** Adding layers doesn't help
**Fix:** Add ReLU or other activation

### 6. Wrong input shape
**Symptom:** "RuntimeError: mat1 and mat2 shapes cannot be multiplied"
**Fix:** Remember to flatten: `x.view(x.size(0), -1)`

## üìö Additional Resources for Students

### For struggling students:
- [PyTorch 60-Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
- [Neural Networks from Scratch](https://www.youtube.com/watch?v=aircAruvnKk)

### For advanced students:
- Try Fashion-MNIST dataset
- Implement learning rate scheduling
- Add data augmentation
- Try different architectures (CNN preview)
- Implement early stopping

### Office Hours Topics:
- Debugging PyTorch code
- Understanding backpropagation
- Choosing hyperparameters
- Project ideas using neural networks