![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Pytorch_logo.png/800px-Pytorch_logo.png)

<h1>Lab 4: Pytorch Basics</h1>

<h2>Introduction</h2>
As we've seen we can use numpy to create single layer and even multilayer linear neural networks by calculating the gradients by hand and hard coding them and training them via GD. But what if we want to create larger and more complicated networks? What if we want to use complicated and fancy loss functions or use huge datasets and train with more complicated training regimes?! And what about training on GPUs.......<br>
That's a lot to try and work out EVERY time we want to try something new!! Lucky for us there are a number of Deep learning frameworks that can do much of the heavy lifting for us!<br>
For this unit we will be using Pytorch, a hugely powerful and widely used Deep Learning framework that lets us do all of the above and MORE

<h3> Importing the required libraries </h3>
Pytorch has two main modules, torch and torchvision<br>
torch contains most of the Deep Learning functionalities while torchvision contains many computer vision functions designed to work in hand with torch

In [None]:
%load_ext lab_black

In [None]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

<h3> The Pytorch Tensor </h3>
As we've already explored the "Tensor" is a useful concept and is very useful in Machine Learning, however you probably noticed in Numpy that our "Tensors" are called "Arrays", but now we are in Pytorch this is no more!!<br>
Let's do a recap of Numpy arrays and how similar they are to Pytorch tensors.

In [None]:
# Create some "Matrices" as lists of lists

# 3x3
W = [[1, 1, 1], [1.5, 1.5, 1.5], [2, 2, 2]]

# 3x1
x = [[6], [7], [8]]
# 3x1
b = [[1], [1], [1]]

# Variable to store output
# 3x1
y = [[0], [0], [0]]

As we've seen before

In [None]:
# We can transform our list of lists into a "numpy array" by using the function "array"
W_np = np.array(W)

x_np = np.array(x)

# lets use the function "ones" to create an array of ones!
b_np = np.ones((3, 1))

# Lets now compute Wx + b using these numpy variables!
output = np.matmul(W_np, x_np) + b_np

# print out the result
print("Output:\n", output)
print("Output shape:\n", output.shape)

Now in Pytorch!

In [None]:
# We can transform our list of lists into a "torch tensor" by using the function "FloatTensor"
# Note: here we've specified the datatype of the tensor, a 32bit "float" you can also just use the function "tensor"
# But this will inherit the datatype of the array given, to ensure the data-types are the same
# (and we can perform the wanted operations) we use "FloatTensor"

W_torch = torch.FloatTensor(W)

x_torch = torch.FloatTensor(x)

# lets use the function "ones" to create an array of ones!
b_torch = torch.ones(3, 1)

# Lets now compute Wx + b using these numpy variables!
output = torch.matmul(W_torch, x_torch) + b_torch
output1 = torch.mm(W_torch, x_torch) + b_torch
# print out the result
print("Output:\n", output)
print("Output shape:\n", output.shape)
print()

print(W_torch @ x_torch + b_torch == output)
print((output == output1).all())

Wow! Numpy and Pytorch are remarkably similar, though this is no coincidence! The creators of Pytorch did this intentionally to make it easy to transfer existing skills in Numpy (a Python library that everyone uses - has its origins back in 1995!!) to Pytorch. To aid this transfer there are even functions that can transfer Pytorch tensors to Numpy arrays and back!

In [None]:
# Create a random Numpy array
np_array = np.random.random((3, 4))
print("Numpy array:\n", np_array)

# Convert to Pytorch tensor
torch_tensor = torch.FloatTensor(np_array)
print("Pytorch tensor:\n", torch_tensor)

# Convert back to a Numpy array!
np_array2 = torch_tensor.numpy()
print("Numpy array:\n", np_array2)

<h>

<h2>On to Pytorch!</h2>
Let's further explore Pytorch and it's similarities to Numpy and then see what new functionalities it brings to the table!!

<h3> Basic Element-wise Operations </h3>
Let's quickly go back over some basics using Pytorch

In [None]:
# lets create a 2D Tensor using torch.rand
y = torch.rand(4, 5)
# this will create a "Vector" of numbers from 0 to 1
print("Our 1D Tensor:\n", y)

# We can perform normal python scalar arithmetic on Torch tensors
print("\nScalar Multiplication:\n", y * 10)
print("Addition and Square:\n", (y + 1) ** 2)
print("Addition:\n", y + y)
print("Addition and division:\n", y / (y + 1))

# We can use a combination of Torch functions and normal python arithmetic
print("\nPower and square root:\n", torch.sqrt(y ** 2))

# Torch tensors are objects and have functions
print(
    "\nY -\n Min:%.2f\n Max:%.2f\n Standard Deviation:%.2f\n Sum:%.2f"
    % (y.min(), y.max(), y.std(), y.sum())
)

<h3>Tensor Operations</h3>

In [None]:
# Create two 3D Tensors
tensor_1 = torch.rand(3, 3, 3)
tensor_2 = torch.rand(3, 3, 3)

# Add the 2 Tensors
print("Addition:\n", tensor_1 + tensor_2)

# We cannot perform a normal "matrix" multiplication on a 3D tensor
# But we can treat the 3D tensor as a "batch" (like a stack) of 2D tensors
# And perform normal matrix multiplication independantly on each pair of 2D matricies
print("Batch Multiplication:\n", torch.bmm(tensor_1, tensor_2))

In [None]:
# lets create a more interesting tensor
tensor_3 = torch.rand(2, 4, 5)
# We can swap the Tensor dimensions
print("\nThe origional Tensor is is:\n", tensor_3)
print("With shape:\n", tensor_3.shape)

# tranpose will swap the dimensions it is given
print("The Re-arranged is:\n", tensor_3.transpose(0, 2))
print("With shape:\n", tensor_3.transpose(0, 2).shape)

<h3> Indexing </h3>
Indexing in Pytorch works the same as it does in Numpy, see if you can predict what values will be return by the indexing

In [None]:
# Create a 4D Tensor
tensor = torch.rand(2, 3, 1, 4)
print("Our Tensor:\n", tensor)

# Select the last element of dim0
print("\nThe last element of dim0:\n", tensor[-1])

# 1st element of dim0
# 2nd element of dim1
print("\nIndexed elements:\n", tensor[0, 1])

# Select all elements of dim0
# The 2nd element of dim1
# The 1st element of dim2
# The 3rd element of dim3
print("\nIndexed elements:\n", tensor[:, 1, 0, 2])

<h3> Describing Tensors </h3> <br>
Lets see how we can view the characteristics of our Tensors

In [None]:
# Lets create a large 4D Tensor
tensor = torch.rand(3, 5, 3, 2)

# View the Number of elements in every dimension
print("The Tensor's shape is:", tensor.shape)

# In Pytorch shape and size() do the same thing!
print("The Tensor's shape using size() is:", tensor.size())

# View the number of elements in total
print("There are %d elements in total:" % tensor.numel())

# View the number of Dimensions
print("There are %d Dimensions" % (tensor.ndim))

<h3> Reshaping </h3> <br>
We can change a Tensor to one of the same size (same number of elements) but a different shape by using functions in a similar fashion to Numpy but with different functions!

In [None]:
# Let us reshape our Tensor to a 2D Tensor
print("Reshape to 3x30:\n", tensor.reshape(3, 30))

# We can also use the Flatten method to convert to a 1D Tensor
print("Flatten to a 1D Tensor:\n", tensor.flatten())

# Here the -1 tells Pytorch to put as many elements as it needs here in order to maintain the given dimention sizes
# AKA "I don't care the size of this dimention as long as the first one is 10"
print("Reshape to 10xwhatever:\n", tensor.reshape(10, -1))

<h4>Squeezing and Unsqueezing </h4>
A very common shape-changing operation is to add an "empty" dimension to ensure the shape (specifically the number of dimensions) of the tensor is correct for certain functions. <br>
For example, when we start using Pytorch Neural Network modules, we need to provide the input of the network with a "batch" dimension (we often pass multiple inputs to our network at once) even if we only pass 1 datapoint!

In [None]:
# Lets create a 2D Tensor
tensor = torch.rand(3, 2)

# View the Number of elements in every dimension
print("The Tensors shape is:", tensor.shape)

# unsqueeze adds an "empty" dimension to our Tensor
print("Add an empty dimenson to dim3:", tensor.unsqueeze(2).shape)

# unsqueeze adds an "empty" dimension to our Tensor
print("Add an empty dimenson to dim2:", tensor.unsqueeze(1).shape)

In [None]:
# Lets create a 4D Tensor with a few "empty" dimensions
tensor = torch.rand(1, 3, 1, 2)

# View the Number of elements in every dimension
print("The Tensors shape is:", tensor.shape)

# squeeze removes an "empty" dimension from our Tensor
print("Remove empty dimenson dim3:", tensor.squeeze(2).shape)

# squeeze removes an "empty" dimension from our Tensor
print("Remove empty dimenson dim0:", tensor.squeeze(0).shape)

# If we don't specify a dimension, squeeze will remove ALL empty dimensions
print("Remove all empty dimensons:", tensor.squeeze().shape)

In [None]:
someT = torch.rand(1, 3, 1, 2)

<h2> Broadcasting </h2>
Broadcasting also works in Pytorch!

In [None]:
# Lets create 2 differently shaped 4D Tensors (Matrices)
tensor1 = torch.rand(1, 4, 3, 1)
tensor2 = torch.rand(3, 4, 1, 4)

print("Tensor 1 shape:\n", tensor1.shape)
print("Tensor 2 shape:\n", tensor2.shape)

tensor3 = tensor1 + tensor2

print("The resulting shape is:\n", tensor3.shape)

<h2> Pytorch Autograd </h2>
<h4>Lets see Numpy do this!</h4>
Now on to something that makes Pytorch (and other Deep Learning frameworks) unique, the auto-differentiable computational graphs! (don't worry about how this exactly works)<br>
Remember how we compute the gradients of parameters (weights) of a model by "backpropagation". First we calculate the "gradient" of the loss with respect to the model's output and then using the chain rule find the gradient of the loss with respect to the parameters or the input and on and on for larger networks. Seems like a pretty repetitive process governed by some well known rules right? Well you know what is good at doing repetitive well defined things?!?! Computers!!<br>
This "automatic" backpropagation (among other things) is what Pytorch REALLY gives us that makes training Neural Networks easy. So how does it do it? Well first Pytorch keeps track of everything we do!! (unless we tell it not to) It does this by forming a "computational graph" - a tree-like structure of all the operations we perform starting at some initial tensor. When we tell Pytorch to backpropagate from some point, it works backwards up this tree and calculates and stores the gradients with respect to the point from where we back propagated from.

Lets see an example of this!

In [None]:
# lets create some tensors, requires_grad tells Pytorch we want to store the gradients for this tensor
# we need to do this if we are working with basic Pytorch tensors
x = torch.FloatTensor([4])
x.requires_grad = True
w = torch.FloatTensor([2])
w.requires_grad = True
b = torch.FloatTensor([3])
b.requires_grad = True

# By performing a simple computation Pytorch will build a computational graph.
y = w * x + b  # y = 2 * x + 3

# It's easy to see that
# dy/dx = w = 2
# dy/dw = x = 4
# dy/db = 1

# Compute gradients via Pytorch's Autograd
y.backward()

# Print out the calculated gradients
# These gradients are the gradients with respect to the point where we backprop'd from - y
# Create your own equation and use the auto backprop to see the partial derivatives!
print("Calculated Gradients")
print("dy/dx", x.grad.item())  # x.grad = dy/dx = 2
print("dy/dw", w.grad.item())  # w.grad = dy/dw = 4
print("dy/db", b.grad.item())  # b.grad = dy/db = 1
# Note: .item() simply returns a 0D Tensor as a Python scalar

<h2>Pytorch nn.Module</h2>
In Pytorch the basic template for creating our models is the "Module" class within torch.nn. To create our own class we inherit this class as the "superclass" so that we have access to all the properties and functions. <br>
Lets create our own constructor of this class!

The two main functions we need to create are the <b>\__init__</b> and <b>forward</b> functions. We've already seen <b>\__init__</b> so lets looks at <b>forward</b><br>

The <b>forward</b> function is the only function that we MUST create when we build our class, Pytorch uses this fuction as the "entry point" to our model and is what is called when we do a forward pass of our model.


In [None]:
class SimpleFunction(nn.Module):
    """
    Simple implementation of an nn.Module subclass
    Takes the input (x) and returns x * 4 + 2
    """

    def __init__(self):
        # pass our class and self to superclass and call the superclass's init function
        super(SimpleFunction, self).__init__()

    def forward(self, x):
        return x * 4 + 2

In [None]:
# Create an instance of our class
simple_function = SimpleFunction()
# Perform a "forward pass" of our class
output = simple_function(10)
print("Class output:", output)
# Note we do NOT need to explicitly call the .forward() function of our class,
# a forward pass of our models is such a common step that Pytorch makes it easier and cleaner for us to do

<h3>A more complicated model </h3><br>
The previous nn.Module class that we created wasn't really a ML "model", lets create something that we've seen before; a simple linear model.

In [None]:
class LinearModel(nn.Module):
    """
    Takes the input (x) and returns x * w^t + b
    """

    def __init__(self, input_size, output_size):
        # pass our class and self to superclass and call the superclass's init function
        super(LinearModel, self).__init__()
        # nn.Parameter wraps our normal tensors and "tells" Pytorch
        # that they are our nn.Module's model parameters to be optimized
        self.w = nn.Parameter(torch.randn(output_size, input_size))
        self.b = nn.Parameter(torch.randn(1, output_size))

    def forward(self, x):
        return torch.matmul(x, self.w.t()) + self.b

In [None]:
# Create a batch of 10 datapoints each 5D
input_data = torch.randn(10, 5)

# Create an instance of our Model
linear_model = LinearModel(5, 1)

# Perform a forward pass!
output = linear_model(input_data)

print(output.shape)
print(output.detach())
# Note: detach "disconnects" the tensor and returns it with no history of previous calculations

<h3>Pytorch inbuilt Neural Network Layers</h3>
The "Linear layer" is so common place that Pytorch already has an implementation of it, in fact Pytorch has implementations of most Layer types which act as building blocks for our multi-layer models. For now lets just see how we can implement Pytorch's linear layer (we will see may more layer types later in the semester!).<br>
<b>Things to know!</b><br>
- Pytorch initialises the weights and biases of it's layers in very particular ways (not just from a normal distribution!), usualy based off of deep learning research, see the documentation for more details.<br>
- Pytorch includes a bias term in it's layers by default.

In [None]:
# Build a linear layer aka a "fully connected" layer aka a "Perceptron" layer
# nn.Linear(Number of inputs, Number of outputs)
linear = nn.Linear(3, 1)

# Lets have a look at the parameters of this layer
# The "weights" are what is multipied by the input data
print("w:\n", linear.weight.data)
# The bias is then added on!
print("b:\n", linear.bias.data)

print("w shape:\n", linear.weight.data.shape)
print("b shape:\n", linear.bias.data.shape)
# Note: .data just gives us the raw Tensor without any connection to the computational graph
# - it looks nicer when we print it out
# Note: The opperation the linear layer performs is y = x*A^t + b
# where A^t is the transpose of the weights and b is the bias,
# this opperation is also know as an "affine transformation"

In [None]:
#Lets have a look at the gradients of these parameters
print ('w:\n', linear.weight.grad)
print ('b:\n', linear.bias.grad)
#Note: Pytorch initialises the grad of the tensors to "None" NOT 0!
#They only get created after the first backwards pass.

In [None]:
#Create a random data input tensor
data = torch.randn(100, 3)
#Create some noisey target data
target = data.sum(1, keepdims=True) + 0.01*torch.randn(data.shape[0], 1)
print ('Input data:\n', data[:10])
print ('Output data:\n', target[:10])

Now that everything is set up, lets perform a "forward pass" of our model, aka let's put the data into the model and see what comes out.

In [None]:
#Remember! To perform a forward pass of our model, we just need to "call" our network
#Pytorch's nn.Module class will automatically pass it to the "forward" function in the layer class
target_pred = linear(data)
print("Network output:\n", target_pred.data[:10])
print("Network output shape:", target_pred.shape)

<h3>Loss Functions and Optimizers</h3>
Now lets see how Pytorch helps us optimize our model!<br>
<b>Loss functions</b><br>
We've already seen loss function's before and defined our own, but using Pytorch we can pick from some pre-defined functions (we can also just create our own).

<b>Optimizers</b><br>
This is the object that will be doing the parameter updates for us! Pytorch has a number of different optimizers, some of which we will explore in future labs. For now we will just use our well known Gradient Descent (GD) optimizer.<br>
Note: Most optimizers are just some variant of GD

In [None]:
#Lets perform a regression with a mean square error loss
loss_function = nn.MSELoss()

#Lets create a Stochastic gradient descent optimizer with a learning rate of 0.01
#(the way we will be using it, it is just normal GD) 
#When we create the optimizer we need to tell it WHAT it needs to optimize, so the first thing 
#We pass it are the linear layer's "parameters"
optimizer = torch.optim.SGD(linear.parameters(),lr=0.01) 

We can see from the following scatter plot that the outout of our model is NOT the same as our target data, let's see what the MSE loss is.

In [None]:
#Plotting the first dimension of the input vs the output

#Use the outputs of the model from a few cells ago
plt.scatter(data[:, 0], target_pred.detach())
#Use the Ground Truth data
plt.scatter(data[:, 0], target, marker="x")
plt.legend(["Predictions", "Ground Truth Data"])
plt.xlabel("Inputs")
plt.ylabel("Ouputs")

In [None]:
loss = loss_function(target_pred, target)
print('loss:', loss.item())

Lets perform a backward pass of our model to compute the gradients!

In [None]:
# Backward pass.
loss.backward()
# Print out the gradients.
print ('dL/dw: ', linear.weight.grad) 
print ('dL/db: ', linear.bias.grad)
#Note for every backwards pass of the model we must first perform a forward pass
#as data from parts of the computational graph have been deleted upon the backward pass to save memory.
#We can tell Pytorch to hold onto this data, but, in many cases it needs to be recalculated anyway

Now, finally, tell the optimizer to perform an update step!

In [None]:
# he critical step to update the parameter which reduce the loss
optimizer.step()

#Perform another forward pass of the model to check the new loss
target_pred = linear(data)
loss = loss_function(target_pred, target)
print('loss after 1 step optimization: ', loss.item())

<h3>The Training Loop</h3>
Our loss has gone down!! Lets see how low we can get it to go by constructing a training loop!<br>
For MOST tasks (but not all) a simgle training iteration in Pytorch can be summarised in the following 5 steps:<br>
- Forward pass of our model with the data.<br>
- Calculate the loss.<br>
- Reset the current stored gradients to 0<br>
- Backpropagate the loss to calculate the new gradients.<br>
- Perform an optimization step.<br>
<br>
We perform these steps over and over until our model has converged or some other point has been reached (depending on the application)

In [None]:
#lets create an empty array to log the loss
loss_logger = []

#Lets perform 100 itterations of our dataset
for i in range(1000):
    #Perform a forward pass of our data
    target_pred = linear(data)
    
    #Calculate the loss
    loss = loss_function(target_pred, target)
    
    #.zero_grad sets the stored gradients to 0
    #If we didn't do this they would be added to the 
    #Gradients from the previous step!
    optimizer.zero_grad()
    
    #Calculate the new gradients
    loss.backward()
    
    #Perform an optimization step!
    optimizer.step()

    loss_logger.append(loss.item())
    
    
print("loss:", loss.item())

Lets graph out the loss!

In [None]:
plt.plot(loss_logger)

In [None]:
#Plotting the first dimension of the input vs the output
plt.scatter(data[:, 0], target_pred.detach())
plt.scatter(data[:, 0], target, marker="x")
plt.legend(["Predictions", "Ground Truth"])
plt.xlabel("Inputs")
plt.ylabel("Ouputs")

<h2>Wohoo! We trained our first Pytorch neural network!!<h2>