$$\text{Applied Data Science (CUSP-GX 6001)} $$ 

## Introduction to Pytorch

PyTorch is an open source machine learning library used for developing and training neural network based deep learning models. <br>
It is developed by Facebook’s AI research group and can be used with Python as well as C++. PyTorch leverages the popularity and flexibility of Python while keeping the convenience and functionality of the original Torch library.  ( Torch is a scientific computing framework developed to efficiently handle complex linear algebra tasks on GPU that would otherwise require a lot of processing time on CPUs) .


Unlike TensorFlow (from Google), which use static computation graphs, PyTorch uses dynamic computation, which allows greater flexibility in building complex architectures (although TensorFlow 2.0 has started to incorporate dynamic computations) <br>
Similary Keras is a another deep learning API written in Python developed by Google, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Starting from Tensorflow 2.0, Keras has also been incorporated into Tensorflow 2.0

The major difference between these 2 libraries is that Pytorch delivers a more flexible environment with the price of slightly reduced automation. This environment is a better pick for a team that has a deeper understanding of deep learning concepts and therefore it is more popularly used in research settings. <br>
When it comes to TensorFlow in daily work, the framework delivers a more concise, simpler API. This makes the project less bloated and the code more elegant. A simple training loop requires five lines of code in PyTorch and only one in TensorFlow.
This makes Tensorflow more popular in deployment and it has alot of support in production as well compared to Pytorch.



$$\text{Complete ML Workflow in Pytorch tutorial}$$

## 1. Working with Datasets

PyTorch has two primitives to work with data: <b>torch.utils.data.DataLoader</b> and <b>torch.utils.data.Dataset</b>. <br>
Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset and gives us data in batches <br>
You can use default pre-loaded datasets from pytorch libraries like torchvision or you could use your own custom data.

In [1]:
import torch
import torchvision
from torchvision import datasets, transforms

In [2]:
# Transformation to the dataset including normalisation and converting data into a pytorch tensor
transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])

Pre loaded MNIST dataset

MNIST is the de facto “hello world” dataset of deep learning problems for computer vision. Its a classic dataset of handwritten digits images (0 - 9) used for training various deep learning models. It contains 60,000 training images and 10,000 testing images and each image is 28x28 pixel

In [3]:
train_dataset = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64)

test_dataset = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
test_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64)

Custom defined dataset. We only need to implement 2 functions getitem and len which returns i'th data and length of dataset

In [4]:
from torch.utils.data.dataset import Dataset

class MyCustomDataset(Dataset):
    def __init__(self, args):
        # initialise
        pass
        
    def __getitem__(self, index):
        # code to get ith data element in the dataset (i = index)
        return (data, label)

    def __len__(self):
        # size of your dataset
        return count 

## 2. Creating a Model

We define our neural network by subclassing nn.Module, and initialize the neural network layers in <b>\_\_init__</b>. <br>
Every nn.Module subclass implements the operations on input data in the <b>forward</b> method.

In [5]:
from torch import nn

In [6]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        
        # The input layer is 28*28 image which is flattened to 784
        self.flatten = nn.Flatten()
        
        #Define the hidden layers and output layer.
        self.hidden1 = nn.Linear(784, 128)
        self.hidden2 = nn.Linear(128, 64)
        self.output = nn.Linear(64, 10)
        
        # Define relu activation and softmax output 
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x):
        # Pass the input tensor through each of our operations
        x = self.flatten(x)
        x = self.hidden1(x)
        x = self.relu(x)
        x = self.hidden2(x)
        x = self.relu(x)
        x = self.output(x)
        x = self.softmax(x)
        
        return x

In [7]:
print(Network())

Network(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (hidden1): Linear(in_features=784, out_features=128, bias=True)
  (hidden2): Linear(in_features=128, out_features=64, bias=True)
  (output): Linear(in_features=64, out_features=10, bias=True)
  (relu): ReLU()
  (softmax): Softmax(dim=1)
)


This model is a simple neural network with 1 input layer, 1 output layer and 2 hidden layer
The model looks like as follows:

<img src="1.jpeg" width="600">

This type of model is used for creating non linear classifiers which are better at predicting a class of a given dataset when there is a non linear relationship between labels of class and input features where the inputs are numeric features describing the data. <br>
However we will use image dataset here for the purpose of the tutorial

Move the model to GPU for faster computation <br>
In case GPU is unavailable, the code remains the same, however all the computations will be performed on the cpu which might be slower

In [8]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Network().to(device)

## 3. Calculating Gradients of loss with Backpropogation using Pytorch's Autograd

When training neural networks, the most frequently used algorithm is <b>back propagation</b>. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter. <br>
To compute those gradients, PyTorch has a built-in differentiation engine called <b>torch.autograd</b>. It supports automatic computation of gradient for any computational graph.

Consider a simple example where x is the input, w and b are learnable parameters (which needs to be optimized) and y is true label <br>
$$ \text{z = x*w + b} $$

![Gradients](2.png)

In [9]:
x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss_fn = torch.nn.functional.binary_cross_entropy_with_logits
loss = loss_fn(z, y)

The most popular loss function used for classification problems is Cross Entropy Loss. <br>
In the above example we use binary cross entropy loss because we are predicting just 2 classes. <br>
And for our mnist dataset, we will use cross entropy with 10 classes

In [10]:
# Check the gradient function
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x000002878B5BB5E0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x000002878B5BB7F0>


Pytorch automatically remembers all the operation (eg add / multiply / relu) performed on the learnable parameters which have requires_grad=True (w and b) <br>
Since these operations are remembered/accumulated for each tensor when they are applied, we need to remove them after the end of each epoch using zero_grad:

In [11]:
# Choose an optimizer for training
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
# Remove the previous accumulated gradient
optimizer.zero_grad()

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need $$\frac{\partial loss}{\partial w}$$ and $$\frac{\partial loss}{\partial b}$$ under some fixed values of x and y. Pytorch automatically calcualtes these gradients using autograd with backward function in one line loss.backward()

In [12]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3140, 0.1237, 0.1319],
        [0.3140, 0.1237, 0.1319],
        [0.3140, 0.1237, 0.1319],
        [0.3140, 0.1237, 0.1319],
        [0.3140, 0.1237, 0.1319]])
tensor([0.3140, 0.1237, 0.1319])


Finally, in order to update the weights using <br>
$$ w_{t+1} = w_t - lr*\frac{\partial loss}{\partial w} $$ and $$ b_{t+1} = b_t - lr*\frac{\partial loss}{\partial b} $$
we do optimizer.step()

In [13]:
optimizer.step()

## 4. Training Loop

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters using user defined optimizer and loss function

In [14]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train() # Training mode 
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device) # send input to GPU

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the out of sample validation loss

In [15]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval() # Evaluation mode 
    test_loss, correct, total = 0, 0, 0
    with torch.no_grad(): # no gradient accumulation for validation set
        
        for X, y in dataloader:
            X, y = X.to(device), y.to(device) # send input to GPU
            
            # get prediction
            pred = model(X) 
            
            # get accuracy
            _, predicted = torch.max(pred.data, 1)
            total += y.size(0)
            correct += (predicted == y).sum().item()
            
            # calculate loss
            test_loss += loss_fn(pred, y).item() 
            
    test_loss /= num_batches
    print(f"Test Error: \n Avg loss: {test_loss:>8f} Accuracy: {100 * correct // total:>8f}\n")

In [16]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
    test(test_loader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303717  [    0/60000]
loss: 2.303586  [ 6400/60000]
loss: 2.299951  [12800/60000]
loss: 2.300389  [19200/60000]
loss: 2.302530  [25600/60000]
loss: 2.302290  [32000/60000]
loss: 2.302339  [38400/60000]
loss: 2.301160  [44800/60000]
loss: 2.301150  [51200/60000]
loss: 2.300322  [57600/60000]
Test Error: 
 Avg loss: 2.300294 Accuracy: 11.000000

Epoch 2
-------------------------------
loss: 2.301506  [    0/60000]
loss: 2.301316  [ 6400/60000]
loss: 2.298073  [12800/60000]
loss: 2.297502  [19200/60000]
loss: 2.300217  [25600/60000]
loss: 2.300142  [32000/60000]
loss: 2.300110  [38400/60000]
loss: 2.299390  [44800/60000]
loss: 2.298552  [51200/60000]
loss: 2.297819  [57600/60000]
Test Error: 
 Avg loss: 2.297834 Accuracy: 13.000000

Epoch 3
-------------------------------
loss: 2.299011  [    0/60000]
loss: 2.298731  [ 6400/60000]
loss: 2.296022  [12800/60000]
loss: 2.294166  [19200/60000]
loss: 2.297592  [25600/60000]
loss: 2.297807  [32000

In just 5 epochs, we can see the Cross Entropy loss for the entire training data 60,000 images goes down and increase in accuracy for the validation set from 11% to 21% which means the neural network is learning something. As we increase the number of epochs, number of hidden layers we can see much more improvement. We used a basic shallow neural network which are not known to be good for classifying images. For that purpose, we use Convolutional Neural Networks but the aim for this tutorial is only to look at how to implement basic Pytorch models.

### References :

https://pytorch.org/tutorials/beginner/basics/intro.html <br>
https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html <br>
https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html <br>
https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html <br>