# Introduction

This notebook tries to cover most of the concepts learnt during the course, and applies them using PyTorch

1. TODO: Input processing
   1. Data Augmentation
2. Network Topology
   1. Layers
   2. Activation Function
   3. Loss Function
3. Network Training (Optimization)
   1. Hyperparameter Tuning
   2. Weight Initialization

> Code in this notebook is based on [notebooks from Bootcamp course repo](https://github.com/fawazsammani/The-Complete-Neural-Networks-Bootcamp-Theory-Applications)

# Input Processing

In [1]:
import torch
import torch.nn as nn
import torchvision.datasets as datasets 
import torchvision.transforms as transforms

In [2]:
# Hyperparameters

input_size = 784        # Number of input neurons (image pixels): 28x28 = 784
hidden_size = 400       # Number of hidden neurons
out_size = 10           # Number of classes (0-9) 
epochs = 10             # How many times we pass our entire dataset into our network 
batch_size = 100        # Input size of the data during one iteration 
learning_rate = 0.001   # How fast we are learning

In [3]:
train_dataset = datasets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

train_dataset.data.shape
test_dataset.data.shape

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:11<00:00, 854971.99it/s] 


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 112406.43it/s]


Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:06<00:00, 236093.75it/s]


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<?, ?it/s]

Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw






torch.Size([10000, 28, 28])

We check the shape of downloaded training dataset, and find it has 60K images of 28x28, while test dataset has 10K images.

Now, we use [PyTorch DataLoader](https://pytorch.org/docs/stable/data.html) to load the training and test datasets

> **Batch Size** indicates the number of input samples used for a single **Iteration**

In [4]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

# Network Topology

Now, we define the network, as a class derived from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module)

> [ReLU activation function](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU) is used, and [Kaiming Normal Weights Initialization](https://pytorch.org/docs/stable/nn.init.html#torch.nn.init.kaiming_normal_) is used

In [5]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, out_size):
        super(Net, self).__init__()                    
        self.fc1 = nn.Linear(input_size, hidden_size)    #First Layer                           
        self.fc2 = nn.Linear(hidden_size, hidden_size)      #Second Layer Activation
        self.fc3 = nn.Linear(hidden_size, out_size)
        self.relu = nn.ReLU()
        self.init_weights()
        
    def init_weights(self):
        nn.init.kaiming_normal_(self.fc1.weight)
        nn.init.kaiming_normal_(self.fc2.weight)
        # OPEN: Why output layer weights are not initialized?

    def forward(self, x):                          
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.relu(out)
        out = self.fc3(out)
        return out

Now we define loss function and optimizer

In [6]:
# Create an instance of the NN class defined above
net = Net(input_size, hidden_size, out_size)

# Check if CUDA can be used to speed up
# TODO: Can we use OpenVINO?
CUDA = torch.cuda.is_available()
if CUDA:
    net = net.cuda()

#The loss function. The Cross Entropy loss comes along with Softmax. Therefore, no need to specify Softmax as well
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

We check the shape of weight tensor for a layer of the network

In [7]:
net.fc1.weight.shape

torch.Size([400, 784])

# Network Training

Mathematically, we're trying to find the optimal set of network weights, which lead to minimum loss (actual output - expected output)

> For each batch of inputs, the outputs are evaluated, and loss in prediction calculated for each (k from 1 to n samples in batch) pair of calculated ($y_k$) and expected ($z_k$) output. The aggregated loss, e.g. RMS is used to [back-propagate](https://towardsdatascience.com/understanding-backpropagation-abcc509ca9d0) the error and adjust the weights

$$\sqrt{\frac{\sum_{k=1}^n (y_k - z_k)^2}{n}}$$

> Each epoch uses the entire training data. As more epochs complete, the expectation is that training error reduces

In [8]:
#Train the network
for epoch in range(epochs):
    correct_train = 0
    running_loss = 0
    # Each iteration of below for loop extracts 1 batch of training data
    for i, (images, labels) in enumerate(train_loader):   
        #Flatten the image from size (batch,1,28,28) --> (100,1,28,28) where 1 represents the number of channels (grayscale-->1),
        # to size (100,784) and wrap it in a variable
        images = images.view(-1, 28*28)    
        if CUDA:
            images = images.cuda()
            labels = labels.cuda()
            
        # FORWARD PASS: Evaluate the network for batch of inputs
        outputs = net(images)       

        # Convert one-hot vector output to predicted digit (0 to 9)
        _, predicted = torch.max(outputs.data, 1)                  
        # Check how many correct predictions                            
        correct_train += (predicted == labels).sum()

        # CALCULATE LOSS: Evaluate the loss using specified loss function
        loss = criterion(outputs, labels)                 # Difference between the actual and predicted (loss function)
        running_loss += loss.item()

        # Clear the gradient buffer (we don't want to accumulate gradients)
        optimizer.zero_grad()

        # How are loss and optimizer connected? https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#loss-function
        #   loss.backward() performs autograd on the whole network graph, and optimizer known the network parameters, so it can perform weight updates

        # BACKWARD PASS
        loss.backward()                                   # Backpropagation

        # UPDATE
        optimizer.step()                                  # Update the weights
    
    # TODO: Add validation: https://www.geeksforgeeks.org/training-neural-networks-with-validation-using-pytorch/

    # TODO:Explain how accuracy is calculated
    print('Epoch [{}/{}], Training Loss: {:.3f}, Training Accuracy: {:.3f}%'.format
          (epoch+1, epochs, running_loss/len(train_loader), (100*correct_train.double()/len(train_dataset))))
print("DONE TRAINING!")

Epoch [1/10], Training Loss: 0.239, Training Accuracy: 92.982%
Epoch [2/10], Training Loss: 0.088, Training Accuracy: 97.325%
Epoch [3/10], Training Loss: 0.056, Training Accuracy: 98.290%
Epoch [4/10], Training Loss: 0.040, Training Accuracy: 98.710%
Epoch [5/10], Training Loss: 0.027, Training Accuracy: 99.103%
Epoch [6/10], Training Loss: 0.024, Training Accuracy: 99.160%
Epoch [7/10], Training Loss: 0.019, Training Accuracy: 99.352%
Epoch [8/10], Training Loss: 0.018, Training Accuracy: 99.423%
Epoch [9/10], Training Loss: 0.013, Training Accuracy: 99.552%
Epoch [10/10], Training Loss: 0.015, Training Accuracy: 99.512%
DONE TRAINING!
