# Module 3 - Exercise 2: Essential Layers

## Learning Objectives
- Understand and implement dropout for regularization
- Work with embedding layers for categorical data
- Implement skip connections (residual blocks)
- Combine essential layers in a complete model
- Train and evaluate models with various layer types

## Test Framework Setup

In [None]:
# Clone the test repository
!git clone https://github.com/racousin/data_science_practice.git /tmp/tests 2>/dev/null || true

# Import required modules
import sys
sys.path.append('/tmp/tests/tests/python_deep_learning')

# Import the improved test utilities
from test_utils import NotebookTestRunner, create_inline_test
from module3.test_exercise2 import Exercise2Validator, EXERCISE2_SECTIONS

# Create test runner and validator
test_runner = NotebookTestRunner("module3", 2)
validator = Exercise2Validator()

## Environment Setup

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt
from typing import List, Tuple

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check CUDA availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## Section 1: Dropout Layer

Dropout is a regularization technique that randomly sets a fraction of input units to 0 during training, which helps prevent overfitting. The key insight is that dropout behaves differently during training and evaluation.

During training:
- Randomly drops connections with probability p
- Scales remaining activations by 1/(1-p)

During evaluation:
- All connections are active
- No scaling needed (due to training-time scaling)

In [None]:
# TODO: Create a Dropout layer with probability 0.5
dropout_layer = None

# Create sample input
sample_input = torch.randn(10, 5)

# TODO: Apply dropout in training mode
# Set the layer to training mode and apply dropout
output_train = None

# TODO: Apply dropout in evaluation mode
# Set the layer to evaluation mode and apply dropout
output_eval = None

# Display results
print(f"Input shape: {sample_input.shape}")
print(f"Output train shape: {output_train.shape if output_train is not None else 'Not implemented'}")
print(f"Output eval shape: {output_eval.shape if output_eval is not None else 'Not implemented'}")
if output_train is not None:
    print(f"\nZeros in training output: {(output_train == 0).sum().item()} out of {output_train.numel()}")
if output_eval is not None:
    print(f"Zeros in eval output: {(output_eval == 0).sum().item()} out of {output_eval.numel()}")

In [None]:
# TODO: Create a simple neural network with dropout layers
# The model should have:
# - Linear layer (input_dim=10, output_dim=20)
# - ReLU activation
# - Dropout (p=0.5)
# - Linear layer (input_dim=20, output_dim=10)
# - ReLU activation
# - Dropout (p=0.5)
# - Linear layer (input_dim=10, output_dim=2)

model_with_dropout = None

# Test the model
if model_with_dropout is not None:
    test_input = torch.randn(5, 10)
    model_with_dropout.train()
    output_train = model_with_dropout(test_input)
    model_with_dropout.eval()
    output_eval = model_with_dropout(test_input)
    print(f"Model output shape: {output_train.shape}")
    print(f"Training vs Eval output difference: {(output_train - output_eval).abs().mean().item():.4f}")

In [None]:
# Test Section 1: Dropout Layer
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE2_SECTIONS["Section 1: Dropout Layer"]]
test_runner.test_section("Section 1: Dropout Layer", validator, section_tests, locals())

## Section 2: Embedding Layer

Embedding layers are essential for working with categorical data, especially in NLP tasks. They map discrete tokens (like word indices) to dense vector representations.

Key concepts:
- Vocabulary size: number of unique tokens
- Embedding dimension: size of the vector representation
- Learnable parameters: vocabulary_size × embedding_dim

In [None]:
# TODO: Create an Embedding layer
# - Vocabulary size: 100
# - Embedding dimension: 16
embedding_layer = None

# TODO: Create a tensor of word indices (batch_size=3, sequence_length=5)
# Values should be integers between 0 and 99
word_indices = None

# TODO: Apply the embedding layer to get embedded word vectors
embedded_words = None

# Display results
if embedding_layer is not None:
    print(f"Embedding layer: {embedding_layer}")
if word_indices is not None:
    print(f"Word indices shape: {word_indices.shape}")
    print(f"Sample indices: {word_indices[0]}")
if embedded_words is not None:
    print(f"Embedded words shape: {embedded_words.shape}")
    print(f"Embedding vector for first word: {embedded_words[0, 0, :5]}...")

In [None]:
# TODO: Create a simple text classifier using embeddings
# The model should:
# 1. Use an embedding layer (vocab_size=1000, embedding_dim=32)
# 2. Average the embeddings across the sequence dimension
# 3. Pass through a linear layer to get 3 output classes

class TextClassifier(nn.Module):
    def __init__(self):
        super(TextClassifier, self).__init__()
        # TODO: Initialize layers
        pass
    
    def forward(self, x):
        # TODO: Implement forward pass
        # x shape: (batch_size, sequence_length)
        # 1. Apply embedding
        # 2. Average across sequence dimension
        # 3. Apply linear layer
        pass

# TODO: Create an instance of the text classifier
text_classifier = None

# Test the classifier
if text_classifier is not None:
    test_sequences = torch.randint(0, 1000, (4, 10), dtype=torch.long)
    output = text_classifier(test_sequences)
    print(f"Input shape: {test_sequences.shape}")
    print(f"Output shape: {output.shape}")
    print(f"Output logits: {output[0]}")

In [None]:
# Test Section 2: Embedding Layer
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE2_SECTIONS["Section 2: Embedding Layer"]]
test_runner.test_section("Section 2: Embedding Layer", validator, section_tests, locals())

## Section 3: Skip Connections (Residual Blocks)

Skip connections, also known as residual connections, help train deeper networks by providing gradient highways. They allow gradients to flow directly through shortcuts, mitigating the vanishing gradient problem.

The key equation: `output = F(x) + x`
where F(x) is the transformation and x is the input (identity mapping)

In [None]:
# TODO: Implement a ResidualBlock
class ResidualBlock(nn.Module):
    def __init__(self, dim):
        super(ResidualBlock, self).__init__()
        # TODO: Create two linear layers with ReLU activation
        # Both layers should have input and output dimension = dim
        pass
    
    def forward(self, x):
        # TODO: Implement forward pass with skip connection
        # 1. Save input as identity
        # 2. Apply first linear layer + ReLU
        # 3. Apply second linear layer
        # 4. Add identity to output
        # 5. Apply final ReLU
        pass

# TODO: Create a residual block and test it
residual_block = ResidualBlock(64) if 'ResidualBlock' in locals() else None
test_input = torch.randn(4, 64)

# TODO: Apply the residual block
residual_output = None

if residual_output is not None:
    print(f"Input shape: {test_input.shape}")
    print(f"Output shape: {residual_output.shape}")
    print(f"Mean absolute difference from input: {(residual_output - test_input).abs().mean().item():.4f}")

In [None]:
# TODO: Create a deeper network with multiple residual blocks
class DeepResidualNet(nn.Module):
    def __init__(self):
        super(DeepResidualNet, self).__init__()
        # TODO: Build a network with:
        # 1. Initial conv layer (1 -> 64 channels, 3x3 kernel)
        # 2. Three residual blocks (dim=64)
        # 3. Global average pooling
        # 4. Final linear layer (64 -> 10 classes)
        pass
    
    def forward(self, x):
        # TODO: Implement forward pass
        # x shape: (batch_size, 1, 28, 28)
        pass

# TODO: Create an instance of the deep residual network
deep_residual_net = None

# Test the network
if deep_residual_net is not None:
    test_images = torch.randn(2, 1, 28, 28)
    output = deep_residual_net(test_images)
    print(f"Input shape: {test_images.shape}")
    print(f"Output shape: {output.shape}")
    print(f"Number of parameters: {sum(p.numel() for p in deep_residual_net.parameters())}")

In [None]:
# Test Section 3: Skip Connections
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE2_SECTIONS["Section 3: Skip Connections"]]
test_runner.test_section("Section 3: Skip Connections", validator, section_tests, locals())

## Section 4: Complete Model with Essential Layers

Now let's combine all the essential layers we've learned into a complete model for a real task. We'll create a model that uses embeddings, dropout, and other essential layers for text classification.

In [None]:
# TODO: Create a complete model combining all essential layers
class CompleteModel(nn.Module):
    def __init__(self, vocab_size=5000, embedding_dim=128, hidden_dim=256, num_classes=5):
        super(CompleteModel, self).__init__()
        # TODO: Initialize layers
        # 1. Embedding layer
        # 2. Dropout after embedding
        # 3. Linear layer (embedding_dim -> hidden_dim)
        # 4. ReLU activation
        # 5. Dropout
        # 6. Linear layer (hidden_dim -> num_classes)
        pass
    
    def forward(self, x):
        # TODO: Implement forward pass
        # x shape: (batch_size, sequence_length)
        # 1. Apply embedding
        # 2. Average pool over sequence dimension
        # 3. Apply dropout
        # 4. Apply first linear + ReLU
        # 5. Apply dropout
        # 6. Apply final linear layer
        pass

# TODO: Create an instance of the complete model
complete_model = None

if complete_model is not None:
    print(f"Complete model architecture:\n{complete_model}")
    print(f"\nTotal parameters: {sum(p.numel() for p in complete_model.parameters())}")

In [None]:
# Create synthetic dataset for training
def create_synthetic_text_data(num_samples=1000, seq_length=20, vocab_size=5000, num_classes=5):
    # Random sequences
    X = torch.randint(0, vocab_size, (num_samples, seq_length), dtype=torch.long)
    # Random labels
    y = torch.randint(0, num_classes, (num_samples,), dtype=torch.long)
    return X, y

# Create training and test data
X_train, y_train = create_synthetic_text_data(800)
X_test, y_test = create_synthetic_text_data(200)

# Create data loaders
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Input shape: {X_train[0].shape}")

In [None]:
# TODO: Implement training function
def train_model(model, train_loader, num_epochs=10, learning_rate=0.001):
    """
    Train the model and return training losses.
    
    Args:
        model: The neural network model
        train_loader: DataLoader for training data
        num_epochs: Number of training epochs
        learning_rate: Learning rate for optimizer
    
    Returns:
        List of training losses
    """
    # TODO: Implement training loop
    # 1. Create optimizer (Adam)
    # 2. Create loss function (CrossEntropyLoss)
    # 3. For each epoch:
    #    - Set model to train mode
    #    - Iterate through batches
    #    - Forward pass
    #    - Compute loss
    #    - Backward pass
    #    - Update weights
    #    - Track losses
    pass

# TODO: Train the complete model
training_losses = None

if complete_model is not None and train_model is not None:
    training_losses = train_model(complete_model, train_loader, num_epochs=20)
    
    if training_losses is not None:
        # Plot training losses
        plt.figure(figsize=(10, 5))
        plt.plot(training_losses)
        plt.title('Training Loss Over Time')
        plt.xlabel('Batch')
        plt.ylabel('Loss')
        plt.grid(True)
        plt.show()
        
        print(f"Initial loss: {training_losses[0]:.4f}")
        print(f"Final loss: {training_losses[-1]:.4f}")

In [None]:
# TODO: Evaluate the model
def evaluate_model(model, test_loader):
    """
    Evaluate the model on test data.
    
    Returns:
        Test accuracy (float between 0 and 1)
    """
    # TODO: Implement evaluation
    # 1. Set model to eval mode
    # 2. Disable gradient computation
    # 3. Iterate through test batches
    # 4. Compute predictions
    # 5. Calculate accuracy
    pass

# TODO: Calculate test accuracy
test_accuracy = None

if complete_model is not None and evaluate_model is not None:
    test_accuracy = evaluate_model(complete_model, test_loader)
    if test_accuracy is not None:
        print(f"Test Accuracy: {test_accuracy:.2%}")

In [None]:
# Test Section 4: Complete Model
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE2_SECTIONS["Section 4: Complete Model"]]
test_runner.test_section("Section 4: Complete Model", validator, section_tests, locals())

## Final Summary

In [None]:
# Display final summary of all tests
test_runner.final_summary()

## Congratulations!

You've successfully completed Module 3 Exercise 2 on Essential Layers! You've learned how to:

1. **Dropout**: Implement and use dropout for regularization
2. **Embeddings**: Work with embedding layers for categorical data
3. **Skip Connections**: Build residual blocks for deeper networks
4. **Complete Models**: Combine multiple layer types in practical applications

These essential layers form the building blocks of modern deep learning architectures. Understanding how to use them effectively is crucial for building robust and performant neural networks.