# Deep Learning with PyTorch

## Introduction

PyTorch is an open-source deep learning framework that provides a seamless path from research to production. It is known for its flexibility, dynamic computation graphs, and intuitive API. This tutorial will explore how to implement advanced neural network models using PyTorch, understand dynamic computation graphs, and create custom layers.

We will delve into the underlying mathematics, provide example code, and explain the processes involved. We will reference key papers and discuss some of the latest developments in this field. Relevant imagery will be included to enhance understanding.

## Table of Contents

1. [Understanding PyTorch](#1)
   - [What is PyTorch?](#1.1)
   - [Dynamic Computation Graphs](#1.2)
2. [Implementing Neural Networks with PyTorch](#2)
   - [Building a Simple Neural Network](#2.1)
   - [Training and Evaluating the Model](#2.2)
3. [Custom Layers and Modules](#3)
   - [Creating Custom Layers](#3.1)
   - [Creating Custom Modules](#3.2)
4. [Advanced Models](#4)
   - [Implementing a Residual Network (ResNet)](#4.1)
   - [Understanding the Mathematics Behind ResNets](#4.2)
5. [Latest Developments in PyTorch](#5)
   - [PyTorch Lightning](#5.1)
   - [TorchScript and JIT Compilation](#5.2)
6. [Conclusion](#6)
7. [References](#7)

<a id="1"></a>
# 1. Understanding PyTorch

PyTorch is a popular open-source deep learning framework developed by Facebook's AI Research lab. It provides a Python package for high-level neural network APIs and is known for its flexibility and speed.

<a id="1.1"></a>
## 1.1 What is PyTorch?

PyTorch is a Python-based scientific computing package that uses the power of graphics processing units (GPUs). It is primarily used for:

- **Tensor computation** (like NumPy) with strong GPU acceleration.
- **Deep neural networks** built on a tape-based autograd system.

PyTorch supports dynamic computation graphs, allowing network behavior to be changed programmatically at runtime. This is particularly useful for tasks where the input size or structure can vary.

<a id="1.2"></a>
## 1.2 Dynamic Computation Graphs

Unlike static computation graphs used in frameworks like TensorFlow (prior to version 2.x), PyTorch builds the computation graph dynamically. This allows for more flexibility when building complex architectures.

### Mathematical Background

In traditional static graphs, you define the computation graph once, and it remains constant throughout training. In dynamic graphs, the graph is built on-the-fly during the forward pass.

Consider the function:

$[
y = x^2 + 2x + 1
]$

In PyTorch, the computation graph for this function is created dynamically when you compute \( y \). This allows for constructs like loops and conditionals in your network.

In [None]:
# Dynamic computation graph example in PyTorch
import torch

x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 2*x + 1

y.backward()
print(f'dy/dx at x=2 is {x.grad}')

**Explanation:**

- We define a tensor `x` with `requires_grad=True` to track computations.
- The computation graph is built dynamically as we compute `y`.
- Calling `y.backward()` computes the gradient of `y` with respect to `x`.
- The gradient `dy/dx` at `x=2` is `6.0`, which matches the analytical derivative.

<a id="2"></a>
# 2. Implementing Neural Networks with PyTorch

In this section, we'll implement a simple neural network using PyTorch and understand how to train and evaluate it.

<a id="2.1"></a>
## 2.1 Building a Simple Neural Network

We'll create a simple feedforward neural network to classify images from the MNIST dataset.

In [None]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
input_size = 784  # 28x28
hidden_size = 500
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True,
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='./data',
                                          train=False,
                                          transform=transforms.ToTensor())

# Data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

# Fully connected neural network class
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

model = NeuralNet(input_size, hidden_size, num_classes).to(device)

**Explanation:**

- We define a neural network with one hidden layer using `nn.Module`.
- The `forward` method defines the forward pass.
- We use ReLU activation and a linear output layer.

<a id="2.2"></a>
## 2.2 Training and Evaluating the Model

In [None]:
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print (f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}')

# Test the model with test data
model.eval()  # Set model to evaluation mode
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {100 * correct / total} %')

# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

**Explanation:**

- We use `CrossEntropyLoss` as the loss function and `Adam` optimizer.
- In each epoch, we perform the forward pass, compute the loss, backpropagate, and update the weights.
- After training, we evaluate the model on the test dataset.

<a id="3"></a>
# 3. Custom Layers and Modules

PyTorch allows you to create custom layers and modules, enabling the implementation of novel architectures.

<a id="3.1"></a>
## 3.1 Creating Custom Layers

Let's create a custom linear layer with weight normalization.

In [None]:
class CustomLinear(nn.Module):
    def __init__(self, in_features, out_features):
        super(CustomLinear, self).__init__()
        self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
        self.bias = nn.Parameter(torch.Tensor(out_features))
        self.reset_parameters()
        
    def reset_parameters(self):
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)
        
    def forward(self, input):
        weight = nn.functional.normalize(self.weight, dim=1)
        return nn.functional.linear(input, weight, self.bias)

**Explanation:**

- We define a custom linear layer that normalizes its weights before performing the linear operation.
- This can help with training stability and convergence.

<a id="3.2"></a>
## 3.2 Creating Custom Modules

Custom modules can encapsulate complex layers or blocks. Let's create a custom module that combines convolutional layers with batch normalization and activation.

In [None]:
import torch.nn.functional as F

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(ConvBlock, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = F.relu(x)
        return x

**Explanation:**

- The `ConvBlock` module performs convolution, batch normalization, and ReLU activation.
- This block can be used to build more complex architectures.

<a id="4"></a>
# 4. Advanced Models

In this section, we'll implement a Residual Network (ResNet) and understand the mathematics behind it.

<a id="4.1"></a>
## 4.1 Implementing a Residual Network (ResNet)

ResNets [[1]](#ref1) are deep neural networks that use skip connections to mitigate the vanishing gradient problem.

In [None]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample
        
    def forward(self, x):
        identity = x
        
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        
        if self.downsample is not None:
            identity = self.downsample(x)
        
        out += identity
        out = self.relu(out)
        
        return out

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)
        
    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm2d(out_channels),
            )
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels
        for _ in range(1, blocks):
            layers.append(block(out_channels, out_channels))
        return nn.Sequential(*layers)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        
        return x

# Instantiate the model
model = ResNet(ResidualBlock, [2, 2, 2, 2])  # ResNet-18

**Explanation:**

- The `ResidualBlock` class defines the basic building block of ResNet.
- The `ResNet` class assembles multiple residual blocks to form the network.
- We instantiate ResNet-18 by specifying the number of layers.

<a id="4.2"></a>
## 4.2 Understanding the Mathematics Behind ResNets

### Residual Learning

ResNets aim to learn the residual function \( \mathcal{F}(x) = H(x) - x \), where \( H(x) \) is the desired mapping and \( x \) is the input. The original mapping becomes \( H(x) = \mathcal{F}(x) + x \).

This formulation helps in training deep networks by allowing the gradients to flow directly through the skip connections.

### Identity Mapping

The identity mapping in ResNets allows the network to preserve information and mitigate the vanishing gradient problem. The skip connection adds the input \( x \) to the output of the residual function \( \mathcal{F}(x) \):

\[
\text{Output} = \mathcal{F}(x, \{W_i\}) + x
\]

Where \( \{W_i\} \) are the weights of the residual block.

### Visualization

![ResNet Block](https://pytorch.org/assets/images/resnet.png)

*Figure: A residual block with a skip connection.*

<a id="5"></a>
# 5. Latest Developments in PyTorch

PyTorch continues to evolve, introducing new features and improvements.

<a id="5.1"></a>
## 5.1 PyTorch Lightning

PyTorch Lightning is a lightweight wrapper for PyTorch that helps organize code and reduce boilerplate. It abstracts away much of the training loop, enabling researchers to focus on the model and training logic.

In [None]:
# Install PyTorch Lightning
# !pip install pytorch-lightning

import pytorch_lightning as pl

class LitModel(pl.LightningModule):
    def __init__(self):
        super(LitModel, self).__init__()
        self.model = NeuralNet(input_size, hidden_size, num_classes)
        self.criterion = nn.CrossEntropyLoss()
    
    def forward(self, x):
        return self.model(x)
    
    def training_step(self, batch, batch_idx):
        images, labels = batch
        images = images.reshape(-1, 28*28)
        outputs = self.model(images)
        loss = self.criterion(outputs, labels)
        self.log('train_loss', loss)
        return loss
    
    def configure_optimizers(self):
        optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
        return optimizer

# DataModule for data handling
class MNISTDataModule(pl.LightningDataModule):
    def prepare_data(self):
        torchvision.datasets.MNIST(root='./data', train=True, download=True)
        torchvision.datasets.MNIST(root='./data', train=False, download=True)
    
    def setup(self, stage=None):
        transform = transforms.ToTensor()
        self.train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform)
        self.val_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform)
    
    def train_dataloader(self):
        return torch.utils.data.DataLoader(self.train_dataset, batch_size=batch_size)
    
    def val_dataloader(self):
        return torch.utils.data.DataLoader(self.val_dataset, batch_size=batch_size)

# Training
mnist_dm = MNISTDataModule()
model = LitModel()
trainer = pl.Trainer(max_epochs=5)
trainer.fit(model, mnist_dm)

**Explanation:**

- `LitModel` defines the model and training logic.
- `MNISTDataModule` handles data preparation and loading.
- `trainer.fit` runs the training loop.

<a id="5.2"></a>
## 5.2 TorchScript and JIT Compilation

TorchScript allows you to serialize and optimize models written in PyTorch, enabling them to run independently from Python. This is useful for deploying models in production.

In [None]:
# Example of TorchScript

# Define a simple model
class MyModule(torch.nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linear = torch.nn.Linear(10, 5)
    
    def forward(self, x):
        return self.linear(x)

model = MyModule()

# Convert to TorchScript
scripted_model = torch.jit.script(model)

# Save the model
scripted_model.save('model.pt')

# Load the model
loaded_model = torch.jit.load('model.pt')

**Explanation:**

- `torch.jit.script` compiles the model to TorchScript.
- The scripted model can be saved and loaded independently of Python.

<a id="6"></a>
# 6. Conclusion

PyTorch is a powerful and flexible deep learning framework that enables rapid experimentation and development of complex models. Its dynamic computation graphs, intuitive API, and extensive community support make it a preferred choice for researchers and practitioners. By understanding how to implement neural networks, create custom layers, and leverage advanced features like TorchScript, you can build efficient and scalable deep learning models.

<a id="7"></a>
# 7. References

1. He, K., Zhang, X., Ren, S., & Sun, J. (2016). *Deep Residual Learning for Image Recognition*. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
2. Paszke, A., et al. (2019). *PyTorch: An Imperative Style, High-Performance Deep Learning Library*. [arXiv:1912.01703](https://arxiv.org/abs/1912.01703)
3. PyTorch Official Documentation: [https://pytorch.org/docs/stable/index.html](https://pytorch.org/docs/stable/index.html)
4. PyTorch Lightning Documentation: [https://www.pytorchlightning.ai/](https://www.pytorchlightning.ai/)

---

This notebook provides a comprehensive guide to implementing advanced neural network models using PyTorch. You can run the code cells to see how models are built, trained, and evaluated. Feel free to modify and extend the examples to suit your specific needs.