# Lesson 3B: Neural Networks Practical<a name="introduction"></a>## IntroductionIn Lesson 3A, we built a neural network from scratch to understand every detail of forward propagation, backpropagation, and gradient descent. We achieved ~95% accuracy on MNIST using our hand-crafted implementation.Now it's time to see how professional machine learning engineers build neural networks in production. Think of Lesson 3A as learning to build a car engine by hand - you understand every bolt and gasket. Lesson 3B is learning to drive a Formula 1 race car - leveraging the best tools to achieve maximum performance.**In this lesson, we'll:**1. Rebuild our neural network using PyTorch (the industry-standard deep learning framework)2. Examine modern optimization techniques (Adam, learning rate scheduling)3. Add regularization to prevent overfitting (Dropout, Batch Normalization)4. Build deeper networks and understand when more layers help5. Use GPU acceleration for 10-100x speedups6. Save and load models for production deployment7. Achieve >98% accuracy on MNIST (beating our from-scratch version)**Why PyTorch?**- Used by researchers and industry leaders (Meta, Tesla, OpenAI)- Pythonic and intuitive API- Dynamic computational graphs (easier debugging)- Excellent for both research and production- Strong ecosystem (torchvision, torchaudio, etc.)Let's build production-grade neural networks! 🚀


## Table of Contents1. [Introduction](#introduction)2. [Required libraries](#required-libraries)3. [PyTorch fundamentals](#pytorch-fundamentals)   - [Tensors: PyTorch's numpy](#tensors-pytorchs-numpy)   - [Automatic differentiation with autograd](#automatic-differentiation-with-autograd)   - [GPU acceleration](#gpu-acceleration)4. [Loading MNIST with PyTorch](#loading-mnist-with-pytorch)5. [Building neural networks with nn.Module](#building-neural-networks-with-nnmodule)   - [Simple 2-layer network](#simple-2-layer-network)   - [Training loop with PyTorch](#training-loop-with-pytorch)6. [Modern optimization techniques](#modern-optimization-techniques)   - [Adam optimizer](#adam-optimizer)   - [Learning rate scheduling](#learning-rate-scheduling)7. [Regularization techniques](#regularization-techniques)   - [Dropout](#dropout)   - [Batch normalization](#batch-normalization)   - [L2 regularization (weight decay)](#l2-regularization-weight-decay)8. [Building deeper networks](#building-deeper-networks)9. [Advanced training techniques](#advanced-training-techniques)   - [Early stopping](#early-stopping)   - [Model checkpointing](#model-checkpointing)   - [TensorBoard logging](#tensorboard-logging)10. [Production-grade model](#production-grade-model)11. [Model evaluation and analysis](#model-evaluation-and-analysis)12. [Saving and loading models](#saving-and-loading-models)13. [Conclusion](#conclusion)    - [Key takeaways](#key-takeaways)    - [Further resources](#further-resources)

<a name="required-libraries"></a>## Required librariesWe'll use PyTorch as our deep learning framework, along with familiar libraries from previous lessons.<table style="margin-left:0"><tr><th align="left">Library</th><th align="left">Purpose</th></tr><tr><td>PyTorch</td><td>Deep learning framework for building and training neural networks</td></tr><tr><td>torchvision</td><td>Computer vision datasets and transformations</td></tr><tr><td>Numpy</td><td>Numerical computing</td></tr><tr><td>Matplotlib</td><td>Visualization</td></tr><tr><td>Scikit-learn</td><td>Evaluation metrics</td></tr></table>

In [None]:
# Standard library importsimport osimport timefrom typing import Tuple, List, Dictimport warningswarnings.filterwarnings('ignore')# Third party importsimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns# PyTorch importsimport torchimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimfrom torch.utils.data import DataLoader, TensorDataset, random_split# Torchvision importsimport torchvisionimport torchvision.transforms as transformsfrom torchvision import datasets# Scikit-learn importsfrom sklearn.metrics import accuracy_score, confusion_matrix, classification_report# Set random seeds for reproducibilitytorch.manual_seed(42)np.random.seed(42)if torch.cuda.is_available():    torch.cuda.manual_seed(42)# Configure plottingplt.style.use('seaborn-v0_8-darkgrid')sns.set_palette("husl")%matplotlib inlineprint(f"PyTorch version: {torch.__version__}")print(f"CUDA available: {torch.cuda.is_available()}")if torch.cuda.is_available():    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

<a name="pytorch-fundamentals"></a>## PyTorch fundamentalsBefore building neural networks, let's understand PyTorch's core concepts.<a name="tensors-pytorchs-numpy"></a>### Tensors: PyTorch's numpyTensors are PyTorch's version of numpy arrays, but with GPU support and automatic differentiation.

In [None]:
# Creating tensorsx_numpy = np.array([[1, 2], [3, 4]])x_torch = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)print("NumPy array:")print(x_numpy)print(f"\nPyTorch tensor:")print(x_torch)print(f"Shape: {x_torch.shape}")print(f"Device: {x_torch.device}")  # cpu or cudaprint(f"Data type: {x_torch.dtype}")# Common operations (similar to numpy)print(f"\nMean: {x_torch.mean()}")print(f"Sum: {x_torch.sum()}")print(f"\nMatrix multiplication:")print(x_torch @ x_torch.T)

<a name="automatic-differentiation-with-autograd"></a>### Automatic differentiation with autogradPyTorch's killer feature: automatic gradient computation! No more manual backprop.

In [None]:
# Enable gradient trackingx = torch.tensor([2.0], requires_grad=True)y = torch.tensor([3.0], requires_grad=True)# Compute functionz = x**2 + y**3print(f"z = x² + y³ = {z.item()}")# Compute gradients automatically!z.backward()print(f"\n∂z/∂x = 2x = {x.grad.item()} (expected: {2 * 2.0})")print(f"∂z/∂y = 3y² = {y.grad.item()} (expected: {3 * 3.0**2})")print("\n✅ PyTorch computed gradients automatically using the chain rule!")

<a name="gpu-acceleration"></a>### GPU accelerationMoving tensors to GPU is simple - and provides massive speedups!

In [None]:
# Check device availabilitydevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")print(f"Using device: {device}")# Move tensor to GPU (if available)x_cpu = torch.randn(1000, 1000)x_device = x_cpu.to(device)print(f"\nCPU tensor device: {x_cpu.device}")print(f"GPU tensor device: {x_device.device}")# Benchmark: Matrix multiplicationif torch.cuda.is_available():    # Warm up GPU    _ = x_device @ x_device    # CPU timing    start = time.time()    result_cpu = x_cpu @ x_cpu    cpu_time = time.time() - start    # GPU timing    torch.cuda.synchronize()  # Wait for GPU    start = time.time()    result_gpu = x_device @ x_device    torch.cuda.synchronize()    gpu_time = time.time() - start    print(f"\nCPU time: {cpu_time*1000:.2f} ms")    print(f"GPU time: {gpu_time*1000:.2f} ms")    print(f"Speedup: {cpu_time/gpu_time:.1f}x faster on GPU! 🚀")else:    print("\nGPU not available. Using CPU for training.")    print("To use GPU: Install CUDA-enabled PyTorch or use Google Colab.")

<a name="loading-mnist-with-pytorch"></a>## Loading MNIST with PyTorchPyTorch's `torchvision` makes loading datasets incredibly easy.

In [None]:
# Define transformationstransform = transforms.Compose([    transforms.ToTensor(),  # Convert to tensor and scale to [0, 1]    transforms.Normalize((0.1307,), (0.3081,))  # MNIST mean and std])# Download and load datasetstrain_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)# Split training into train/val (80/20)train_size = int(0.8 * len(train_dataset))val_size = len(train_dataset) - train_sizetrain_dataset, val_dataset = random_split(train_dataset, [train_size, val_size])print(f"Training samples:   {len(train_dataset):,}")print(f"Validation samples: {len(val_dataset):,}")print(f"Test samples:       {len(test_dataset):,}")# Create data loaders (efficient batching and shuffling)batch_size = 128train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)print(f"\nBatch size: {batch_size}")print(f"Training batches: {len(train_loader)}")

Let's visualize some samples:

In [None]:
# Get a batch of training dataimages, labels = next(iter(train_loader))# Plot 16 samplesfig, axes = plt.subplots(4, 4, figsize=(10, 10))fig.suptitle('MNIST Samples (After Normalization)', fontsize=16, fontweight='bold')for i, ax in enumerate(axes.flat):    # Denormalize for visualization    img = images[i].squeeze() * 0.3081 + 0.1307    ax.imshow(img, cmap='gray')    ax.set_title(f'Label: {labels[i].item()}', fontsize=12)    ax.axis('off')plt.tight_layout()plt.show()print(f"\nImage shape: {images[0].shape}")  # [1, 28, 28] - [channels, height, width]print(f"Image dtype: {images.dtype}")print(f"Image device: {images.device}")

<a name="building-neural-networks-with-nnmodule"></a>## Building neural networks with nn.ModulePyTorch models inherit from `nn.Module`. Let's rebuild our 3A network first.<a name="simple-2-layer-network"></a>### Simple 2-layer network

In [None]:
class SimpleNet(nn.Module):    """Simple 2-layer neural network (same architecture as Lesson 3A)."""    def __init__(self, input_size=784, hidden_size=128, output_size=10):        super(SimpleNet, self).__init__()        self.fc1 = nn.Linear(input_size, hidden_size)        self.fc2 = nn.Linear(hidden_size, output_size)    def forward(self, x):        """Forward pass."""        # Flatten image: [batch, 1, 28, 28] → [batch, 784]        x = x.view(x.size(0), -1)        # Hidden layer with ReLU        x = F.relu(self.fc1(x))        # Output layer (logits - no activation yet)        x = self.fc2(x)        return x# Create modelmodel = SimpleNet().to(device)print(model)print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")

<a name="training-loop-with-pytorch"></a>### Training loop with PyTorchPyTorch training loops follow a standard pattern.

In [None]:
def train_epoch(model, loader, criterion, optimizer, device):    """Train for one epoch."""    model.train()  # Set to training mode    total_loss = 0    correct = 0    total = 0    for images, labels in loader:        # Move data to device        images, labels = images.to(device), labels.to(device)        # Forward pass        outputs = model(images)        loss = criterion(outputs, labels)        # Backward pass        optimizer.zero_grad()  # Clear gradients        loss.backward()  # Compute gradients        optimizer.step()  # Update weights        # Track metrics        total_loss += loss.item()        _, predicted = outputs.max(1)        total += labels.size(0)        correct += predicted.eq(labels).sum().item()    return total_loss / len(loader), 100. * correct / totaldef evaluate(model, loader, criterion, device):    """Evaluate model."""    model.eval()  # Set to evaluation mode    total_loss = 0    correct = 0    total = 0    with torch.no_grad():  # Disable gradient computation        for images, labels in loader:            images, labels = images.to(device), labels.to(device)            outputs = model(images)            loss = criterion(outputs, labels)            total_loss += loss.item()            _, predicted = outputs.max(1)            total += labels.size(0)            correct += predicted.eq(labels).sum().item()    return total_loss / len(loader), 100. * correct / totalprint("✅ Training functions defined!")

Now let's train our simple model:

In [None]:
# Training hyperparametersn_epochs = 20learning_rate = 0.001# Loss and optimizercriterion = nn.CrossEntropyLoss()optimizer = optim.SGD(model.parameters(), lr=learning_rate)# Training historyhistory = {    'train_loss': [], 'train_acc': [],    'val_loss': [], 'val_acc': []}print(f"Training SimpleNet for {n_epochs} epochs...\n")print(f"{'Epoch':<6} {'Train Loss':<12} {'Train Acc':<12} {'Val Loss':<12} {'Val Acc':<12}")print("-" * 60)best_val_acc = 0for epoch in range(n_epochs):    # Train and evaluate    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)    val_loss, val_acc = evaluate(model, val_loader, criterion, device)    # Save history    history['train_loss'].append(train_loss)    history['train_acc'].append(train_acc)    history['val_loss'].append(val_loss)    history['val_acc'].append(val_acc)    # Print progress    if (epoch + 1) % 5 == 0:        print(f"{epoch+1:<6} {train_loss:<12.4f} {train_acc:<12.2f} {val_loss:<12.4f} {val_acc:<12.2f}")    if val_acc > best_val_acc:        best_val_acc = val_accprint(f"\n✅ Training complete! Best validation accuracy: {best_val_acc:.2f}%")

<a name="modern-optimization-techniques"></a>## Modern optimization techniques<a name="adam-optimizer"></a>### Adam optimizerAdam (Adaptive Moment Estimation) is the most popular optimizer for deep learning. It adapts learning rates per parameter and uses momentum.

In [None]:
# Recreate model for fair comparisonmodel_adam = SimpleNet().to(device)# Adam optimizer (vs SGD before)optimizer_adam = optim.Adam(model_adam.parameters(), lr=0.001)# Train with Adamprint("Training with Adam optimizer...\n")history_adam = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}for epoch in range(15):    train_loss, train_acc = train_epoch(model_adam, train_loader, criterion, optimizer_adam, device)    val_loss, val_acc = evaluate(model_adam, val_loader, criterion, device)    history_adam['train_loss'].append(train_loss)    history_adam['train_acc'].append(train_acc)    history_adam['val_loss'].append(val_loss)    history_adam['val_acc'].append(val_acc)    if (epoch + 1) % 5 == 0:        print(f"Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%")print(f"\nFinal validation accuracy: {val_acc:.2f}%")print("Adam typically converges faster and to better solutions than SGD!")

<a name="regularization-techniques"></a>## Regularization techniquesRegularization prevents overfitting. Let's add dropout and batch normalization.<a name="dropout"></a>### DropoutDropout randomly "drops" neurons during training, forcing the network to learn reliable features.


In [None]:
class RegularizedNet(nn.Module):    """Network with Dropout and Batch Normalization."""    def __init__(self, input_size=784, hidden_size=128, output_size=10, dropout_rate=0.5):        super(RegularizedNet, self).__init__()        self.fc1 = nn.Linear(input_size, hidden_size)        self.bn1 = nn.BatchNorm1d(hidden_size)  # Batch normalization        self.dropout1 = nn.Dropout(dropout_rate)  # Dropout        self.fc2 = nn.Linear(hidden_size, hidden_size)        self.bn2 = nn.BatchNorm1d(hidden_size)        self.dropout2 = nn.Dropout(dropout_rate)        self.fc3 = nn.Linear(hidden_size, output_size)    def forward(self, x):        x = x.view(x.size(0), -1)        # Layer 1        x = self.fc1(x)        x = self.bn1(x)  # Normalize activations        x = F.relu(x)        x = self.dropout1(x)  # Randomly drop neurons        # Layer 2        x = self.fc2(x)        x = self.bn2(x)        x = F.relu(x)        x = self.dropout2(x)        # Output        x = self.fc3(x)        return xmodel_reg = RegularizedNet().to(device)print(model_reg)print(f"\nTotal parameters: {sum(p.numel() for p in model_reg.parameters()):,}")

Train the regularized model:

In [None]:
optimizer_reg = optim.Adam(model_reg.parameters(), lr=0.001)print("Training regularized network...\n")for epoch in range(15):    train_loss, train_acc = train_epoch(model_reg, train_loader, criterion, optimizer_reg, device)    val_loss, val_acc = evaluate(model_reg, val_loader, criterion, device)    if (epoch + 1) % 5 == 0:        print(f"Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%")print(f"\nFinal validation accuracy: {val_acc:.2f}%")print("Dropout + BatchNorm help prevent overfitting!")

<a name="building-deeper-networks"></a>## Building deeper networksLet's build a 4-layer network and see if more depth helps.

In [None]:
class DeepNet(nn.Module):    """Deeper 4-layer network with modern techniques."""    def __init__(self):        super(DeepNet, self).__init__()        self.network = nn.Sequential(            nn.Flatten(),            nn.Linear(784, 256),            nn.BatchNorm1d(256),            nn.ReLU(),            nn.Dropout(0.3),            nn.Linear(256, 128),            nn.BatchNorm1d(128),            nn.ReLU(),            nn.Dropout(0.3),            nn.Linear(128, 64),            nn.BatchNorm1d(64),            nn.ReLU(),            nn.Dropout(0.2),            nn.Linear(64, 10)        )    def forward(self, x):        return self.network(x)model_deep = DeepNet().to(device)print(model_deep)print(f"\nTotal parameters: {sum(p.numel() for p in model_deep.parameters()):,}")# Train deep modeloptimizer_deep = optim.Adam(model_deep.parameters(), lr=0.001)scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer_deep, mode='max', factor=0.5, patience=3, verbose=True)print("\nTraining deep network with learning rate scheduling...\n")for epoch in range(20):    train_loss, train_acc = train_epoch(model_deep, train_loader, criterion, optimizer_deep, device)    val_loss, val_acc = evaluate(model_deep, val_loader, criterion, device)    # Adjust learning rate based on validation accuracy    scheduler.step(val_acc)    if (epoch + 1) % 5 == 0:        print(f"Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%")print(f"\nFinal validation accuracy: {val_acc:.2f}%")

<a name="model-evaluation-and-analysis"></a>## Model evaluation and analysisLet's evaluate our best model on the test set:

In [None]:
# Test set evaluationtest_loss, test_acc = evaluate(model_deep, test_loader, criterion, device)print(f"🎯 Test Accuracy: {test_acc:.2f}%\n")# Get predictions for confusion matrixall_preds = []all_labels = []model_deep.eval()with torch.no_grad():    for images, labels in test_loader:        images = images.to(device)        outputs = model_deep(images)        _, predicted = outputs.max(1)        all_preds.extend(predicted.cpu().numpy())        all_labels.extend(labels.numpy())# Confusion matrixfrom sklearn.metrics import confusion_matrix, classification_reportcm = confusion_matrix(all_labels, all_preds)plt.figure(figsize=(10, 8))sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))plt.xlabel('Predicted', fontsize=12)plt.ylabel('True', fontsize=12)plt.title('Confusion Matrix - Deep PyTorch Network', fontsize=14, fontweight='bold')plt.tight_layout()plt.show()print("\nClassification Report:")print(classification_report(all_labels, all_preds, digits=3))

<a name="saving-and-loading-models"></a>## Saving and loading modelsSave trained models for production deployment:

In [None]:
# Save modelmodel_path = '../models/mnist_deep_pytorch.pth'os.makedirs('../models', exist_ok=True)torch.save({    'model_state_dict': model_deep.state_dict(),    'optimizer_state_dict': optimizer_deep.state_dict(),    'test_accuracy': test_acc,    'architecture': 'DeepNet'}, model_path)print(f"✅ Model saved to {model_path}")# Load model (demonstration)checkpoint = torch.load(model_path)model_loaded = DeepNet().to(device)model_loaded.load_state_dict(checkpoint['model_state_dict'])print(f"✅ Model loaded! Test accuracy was: {checkpoint['test_accuracy']:.2f}%")

<a name="conclusion"></a>## Conclusion<a name="key-takeaways"></a>### Key takeaways**What we learned:**1. **PyTorch fundamentals** - Tensors, autograd, GPU acceleration2. **nn.Module** - Building networks the professional way3. **Modern optimizers** - Adam converges faster than SGD4. **Regularization** - Dropout and BatchNorm prevent overfitting5. **Deeper networks** - More layers can improve accuracy6. **Learning rate scheduling** - Adaptive learning rates help convergence7. **Production practices** - Model checkpointing, evaluation, deployment**Performance comparison:**- **Lesson 3A (from scratch):** ~95% accuracy on MNIST- **Lesson 3B (PyTorch):** ~98%+ accuracy with modern techniques**The power of PyTorch:**- 10-100x faster training (especially with GPUs)- Automatic differentiation (no manual backprop!)- Production-ready (used by industry leaders)- Rich ecosystem (torchvision, torchaudio, transformers)<a name="further-resources"></a>### Further resources**Official PyTorch:**- [PyTorch Documentation](https://pytorch.org/docs/)- [PyTorch Tutorials](https://pytorch.org/tutorials/)- [PyTorch Examples](https://github.com/pytorch/examples)**Courses:**- Fast.ai - Practical Deep Learning for Coders- Stanford CS230 - Deep Learning- deeplearning.ai - Deep Learning Specialization**Next steps:**- Lesson 8a/b: Convolutional Neural Networks (CNNs) for images- Lesson 8c/d: Recurrent networks (RNNs/LSTMs) for sequences- Examine transfer learning with pretrained models---**🎉 Congratulations! You've mastered production-grade neural networks with PyTorch!**You can now build, train, and deploy deep learning models used by companies like Meta, Tesla, and OpenAI.
