![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Pytorch_logo.png/800px-Pytorch_logo.png)

<h1>Lab 4: Pytorch Basics</h1>

<h2>Introduction</h2>
As we've seen we can use numpy to create single layer and even multilayer linear neural networks by calculating the gradients by hand and hard coding them and training them via GD. But what if we want to create larger and more complicated networks? What if we want to use complicated and fancy loss functions or use huge datasets and train with more complicated training regimes?! And what about training on GPUs.......<br>
That's a lot to try and work out EVERY time we want to try something new!! Lucky for us there are a number of Deep learning frameworks that can do much of the heavy lifting for us!<br>
For this unit we will be using Pytorch, a hugely powerful and widely used Deep Learning framework that lets us do all of the above and MORE

<h3> Importing the required libraries </h3>
Pytorch has two main modules, torch and torchvision<br>
torch contains most of the Deep Learning functionalities while torchvision contains many computer vision functions designed to work in hand with torch

In [1]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

<h3> The Pytorch Tensor </h3>
As we've already explored the "Tensor" is a useful concept and is very useful in Machine Learning, however you probably noticed in Numpy that our "Tensors" are called "Arrays", but now we are in Pytorch this is no more!!<br>
Let's do a recap of Numpy arrays and how similar they are to Pytorch tensors.

In [2]:
#Create some "Matrices" as lists of lists  

#3x3
W = [[1, 1, 1],
     [1.5, 1.5, 1.5],
     [2, 2, 2]]

#3x1
x = [[6], [7], [8]]
#3x1
b = [[1], [1], [1]]

#Variable to store output
#3x1
y = [[0], [0], [0]]

As we've seen before

In [3]:
#We can transform our list of lists into a "numpy array" by using the function "array"
W_np = np.array(W)

x_np = np.array(x)

#lets use the function "ones" to create an array of ones!
b_np = np.ones((3, 1))

#Lets now compute Wx + b using these numpy variables!
output = np.matmul(W_np, x_np) + b_np

#print out the result
print("Output:\n", output)
print("Output shape:\n", output.shape)

Output:
 [[22. ]
 [32.5]
 [43. ]]
Output shape:
 (3, 1)


Now in Pytorch!

In [4]:
#We can transform our list of lists into a "torch tensor" by using the function "FloatTensor"
#Note: here we've specified the datatype of the tensor, a 32bit "float" you can also just use the function "tensor"
#But this will inherit the datatype of the array given, to ensure the data-types are the same
#(and we can perform the wanted operations) we use "FloatTensor"

W_torch = torch.FloatTensor(W)

x_torch = torch.FloatTensor(x)

#lets use the function "ones" to create an array of ones!
b_torch = torch.ones(3, 1)

#Lets now compute Wx + b using these numpy variables!
output = torch.matmul(W_torch,x_torch) + b_torch

#print out the result
print("Output:\n", output)
print("Output shape:\n", output.shape)

Output:
 tensor([[22.0000],
        [32.5000],
        [43.0000]])
Output shape:
 torch.Size([3, 1])


Wow! Numpy and Pytorch are remarkably similar, though this is no coincidence! The creators of Pytorch did this intentionally to make it easy to transfer existing skills in Numpy (a Python library that everyone uses - has its origins back in 1995!!) to Pytorch. To aid this transfer there are even functions that can transfer Pytorch tensors to Numpy arrays and back!

In [None]:
#Create a random Numpy array
np_array = np.random.random((3, 4))
print("Numpy array:\n", np_array)

#Convert to Pytorch tensor
torch_tensor = torch.FloatTensor(np_array)
print("Pytorch tensor:\n", torch_tensor)

#Convert back to a Numpy array!
np_array2 = torch_tensor.numpy()
print("Numpy array:\n", np_array2)

<h>

<h2>On to Pytorch!</h2>
Let's further explore Pytorch and it's similarities to Numpy and then see what new functionalities it brings to the table!!

<h3> Basic Element-wise Operations </h3>
Let's quickly go back over some basics using Pytorch

In [None]:
#lets create a 2D Tensor using torch.rand
y = torch.rand(4, 5)
#this will create a "Vector" of numbers from 0 to 1
print("Our 1D Tensor:\n",y)

#We can perform normal python scalar arithmetic on Torch tensors
print("\nScalar Multiplication:\n",y * 10)
print("Addition and Square:\n",(y + 1)**2)
print("Addition:\n",y + y)
print("Addition and division:\n",y / (y + 1))

#We can use a combination of Torch functions and normal python arithmetic
print("\nPower and square root:\n",torch.sqrt(y**2))

#Torch tensors are objects and have functions
print("\nY -\n Min:%.2f\n Max:%.2f\n Standard Deviation:%.2f\n Sum:%.2f" %(y.min(), y.max(), y.std(), y.sum()))

<h3>Tensor Opperations</h3>

In [None]:
#Create two 3D Tensors
tensor_1 = torch.rand(3,3,3)
tensor_2 = torch.rand(3,3,3)

#Add the 2 Tensors
print("Addition:\n",tensor_1 + tensor_2)

#We cannot perform a normal "matrix" multiplication on a 3D tensor
#But we can treat the 3D tensor as a "batch" (like a stack) of 2D tensors
#And perform normal matrix multiplication independantly on each pair of 2D matricies
print("Batch Multiplication:\n",torch.bmm(tensor_1,tensor_2))

In [None]:
#lets create a more interesting tensor
tensor_3 = torch.rand(2,4,5)
#We can swap the Tensor dimensions
print("\nThe origional Tensor is is:\n", tensor_3)
print("With shape:\n", tensor_3.shape)

#tranpose will swap the dimensions it is given
print("The Re-arranged is:\n", tensor_3.transpose(0,2))
print("With shape:\n", tensor_3.transpose(0,2).shape)

<h3> Indexing </h3>
Indexing in Pytorch works the same as it does in Numpy, see if you can predict what values will be return by the indexing

In [5]:
#Create a 4D Tensor
tensor = torch.rand(2,3,1,4)
print("Our Tensor:\n",tensor)

#Select the last element of dim0
print("\nThe last element of dim0:\n",tensor[-1])

#1st element of dim0
#2nd element of dim1
print("\nIndexed elements:\n",tensor[0, 1])

#Select all elements of dim0
#The 2nd element of dim1
#The 1st element of dim2
#The 3rd element of dim3
print("\nIndexed elements:\n",tensor[:, 1, 0, 2])

Our Tensor:
 tensor([[[[0.2851, 0.2097, 0.0510, 0.3854]],

         [[0.2761, 0.9551, 0.0627, 0.6965]],

         [[0.1869, 0.8822, 0.9570, 0.0995]]],


        [[[0.6571, 0.8123, 0.5152, 0.3302]],

         [[0.4748, 0.7013, 0.8448, 0.2806]],

         [[0.6839, 0.4991, 0.8183, 0.0602]]]])

The last element of dim0:
 tensor([[[0.6571, 0.8123, 0.5152, 0.3302]],

        [[0.4748, 0.7013, 0.8448, 0.2806]],

        [[0.6839, 0.4991, 0.8183, 0.0602]]])

Indexed elements:
 tensor([[0.2761, 0.9551, 0.0627, 0.6965]])

Indexed elements:
 tensor([0.0627, 0.8448])


<h3> Describing Tensors </h3> <br>
Lets see how we can view the characteristics of our Tensors

In [None]:
#Lets create a large 4D Tensor
tensor = torch.rand(3, 5, 3, 2)

#View the Number of elements in every dimension
print("The Tensors shape is:", tensor.shape)

#In Pytorch shape and size() to the same thing!
print("The Tensors shape using size() is:", tensor.size())

#View the number of elements in total
print("There are %d elements in total:" % tensor.numel())

#View the number of Dimensions(2 in this case)
print("There are %d Dimensions" %(tensor.ndim))

<h3> Reshaping </h3> <br>
We can change a Tensor to one of the same size (same number of elements) but a different shape by using functions in a similar fashion to Numpy but with different functions!

In [None]:
#Let us reshape our Tensor to a 2D Tensor
print("Reshape to 3x30:\n", tensor.view(3,30))

#We can also use the Flatten method to convert to a 1D Tensor
print("Flatten to a 1D Tensor:\n",tensor.flatten())

#Here the -1 tells Pytorch to put as many elements as it needs here in order to maintain the given dimention sizes
#AKA "I don't care the size of this dimention as long as the first one is 10"
print("Reshape to 10xwhatever:\n",tensor.view(10,-1))

<h4>Squeezing and Unsqueezing </h4>
A very common shape-changing operation is to add an "empty" dimension to ensure the shape (specifically the number of dimensions) of the tensor is correct for certain functions. <br>
For example, when we start using Pytorch Neural Network modules, we need to provide the input of the network with a "batch" dimension (we often pass multiple inputs to our network at once) even if we only pass 1 datapoint!

In [None]:
#Lets create a 2D Tensor
tensor = torch.rand(3, 2)

#View the Number of elements in every dimension
print("The Tensors shape is:", tensor.shape)

#unsqueeze adds an "empty" dimension to our Tensor
print("Add an empty dimenson to dim3:", tensor.unsqueeze(2).shape)

#unsqueeze adds an "empty" dimension to our Tensor
print("Add an empty dimenson to dim2:", tensor.unsqueeze(1).shape)

In [None]:
#Lets create a 4D Tensor with a few "empty" dimensions
tensor = torch.rand(1, 3, 1, 2)

#View the Number of elements in every dimension
print("The Tensors shape is:", tensor.shape)

#squeeze removes an "empty" dimension from our Tensor
print("Remove empty dimenson dim3:", tensor.squeeze(2).shape)

#squeeze removes an "empty" dimension from our Tensor
print("Remove empty dimenson dim0:", tensor.squeeze(0).shape)

#If we don't specify a dimension, squeeze will remove ALL empty dimensions
print("Remove all empty dimensons:", tensor.squeeze().shape)

<h2> Broadcasting </h2>
Broadcasting also works in Pytorch!

In [11]:
#Lets create 2 differently shaped 2D Tensors (Matrices)
tensor1 = torch.rand(1, 4, 3, 1)
tensor2 = torch.rand(3, 4, 1, 4)

print("Tensor 1 shape:\n", tensor1.shape)
print("Tensor 2 shape:\n", tensor2.shape)

tensor3 = tensor1 + tensor2

print("The resulting shape is:\n", tensor3.shape)

Tensor 1 shape:
 torch.Size([1, 4, 3, 1])
Tensor 2 shape:
 torch.Size([3, 4, 1, 4])
The resulting shape is:
 torch.Size([3, 4, 3, 4])


<h2> Pytorch Autograd </h2>
<h4>Lets see Numpy do this!</h4>
Now on to something that makes Pytorch (and other Deep Learning frameworks) unique, the auto-differentiable computational graphs! (don't worry about how this exactly works)<br>
Remember how we compute the gradients of parameters (weights) of a model by "backpropagation". First we calculate the "gradient" of the loss with respect to the model's output and then using the chain rule find the gradient of the loss with respect to the parameters or the input and on and on for larger networks. Seems like a pretty repetitive process governed by some well known rules right? Well you know what is good at doing repetitive well defined things?!?! Computers!!<br>
This "automatic" backpropagation (among other things) is what Pytorch REALLY gives us that makes training Neural Networks easy. So how does it do it? Well first Pytorch keeps track of everything we do!! (unless we tell it not to) It does this by forming a "computational graph" - a tree-like structure of all the operations we perform starting at some initial tensor. When tell Pytorch to backpropagate from some point, it works backwards up this tree and calculates and stores the gradients with respect to the point from where we back propagated from.

Lets see an example of this!

In [6]:
#lets create some tensors, requires_grad tells Pytorch we want to store the gradients for this tensor
x = torch.FloatTensor([4])
x.requires_grad = True
w = torch.FloatTensor([2])
w.requires_grad = True
b = torch.FloatTensor([3])
b.requires_grad = True

#By performing a computation Pytorch will build a computational graph.
y = w * x + b    # y = 2 * x + 3

#It's easy to see that
#dy/dx = w
#dy/dw = x
#dy/db = 1

#Compute gradients via Pytorch's Autograd
y.backward()

#Print out the calculated gradients
#These gradients are the gradients with respect to the point where we backprop'd from - y
print("Calculated Gradients")    # x.grad = dy/dx = 2 
print("dy/dx", x.grad.item())    # x.grad = dy/dx = 2 
print("dy/dw", w.grad.item())    # w.grad = dy/dw = 4
print("dy/db", b.grad.item())   # b.grad = dy/db = 1  
#Note: .item() simply returns a 0D Tensor as a Python scalar

Calculated Gradients
dy/dx 2.0
dy/dw 4.0
dy/db 1.0


Let's do something a little more interesting and introduce a Pytorch "Linear layer"<br>
This "Linear layer" is one of the basic building blocks of neural networks. A single layer is akin to a linear regression model's $\theta$ parameters. The Pytorch "Linear'' class however, is a bit more complicated than just a Tensor of parameters. It is in fact an instance of a Pytorch Neural Network class (nn.module). By default it's Tensor of parameters has requires_grad=True, it also has functionality that allow the "backwards" pass of the model, required by Pytorch's Autograd. Finally, as we'll see when we make our own network, it has a function that is called automatically when we pass data to this class.

In [7]:
# Create some random data tensors of shape (10, 3) and (10, 2).
data = torch.randn(10, 3)
target = torch.randn(10, 2)
print ('Input data:\n', data)
print ('Output data:\n', target)

Input data:
 tensor([[-0.7238, -1.8526,  0.3021],
        [ 0.4872, -0.4817, -0.3063],
        [-0.6852,  1.2065, -0.1375],
        [-1.0599,  1.0035,  1.8523],
        [-0.9994, -0.6717, -0.6844],
        [-0.5757,  0.1258,  0.2069],
        [ 0.9757,  1.2868, -0.4972],
        [-1.0397,  0.8363,  0.4712],
        [-0.2310, -0.3825, -0.7933],
        [-0.2247, -1.1272,  0.4225]])
Output data:
 tensor([[ 1.0236e+00,  9.2055e-01],
        [ 1.5576e-01, -8.6465e-01],
        [-6.0999e-01, -1.4656e+00],
        [ 4.7489e-01, -7.9874e-01],
        [-1.3336e+00,  4.3276e-01],
        [-6.1701e-01, -1.1614e+00],
        [ 4.7162e-01, -7.5106e-01],
        [-1.4227e+00,  1.1371e-01],
        [-1.1863e+00,  3.3933e-01],
        [ 4.9271e-01,  3.8111e-05]])


In [8]:
#Build a linear layer (aka a "fully connected" layer or a "Perceptron" layer)
#nn.Linear(Number of inputs, Number of outputs) 
linear = nn.Linear(3, 2) 

#Lets have a look at the parameters of this layer
#The "weights" are what is multipied by the input data
print ('w:\n', linear.weight.data)
#The bias is then added on!
print ('b:\n', linear.bias.data)
#Note: Pytorch's Linear Layer includes a bias term by default!
#Note: .data just gives us the raw Tensor without any connectionm to the computational graph
#- it looks nicer when we print it out
#Note: The opperation the linear layer performs is y = x*A^t + b
#where A^t is the transpose of the weights and b is the bias, this is also know as an "affine transformation"

w:
 tensor([[ 0.3592,  0.4478,  0.1253],
        [ 0.0270, -0.1839, -0.5088]])
b:
 tensor([0.1097, 0.5425])


Something to note is that Pytorch initialises the grad of the tensors to "None" NOT 0! They only get created after the first backwards pass.

In [9]:
#Lets have a look at the gradients of these parameters
print ('w:\n', linear.weight.grad)
print ('b:\n', linear.bias.grad)

w:
 None
b:
 None


Lets introduce a few more useful functions of Pytorch! (I hope you're still with us!)<br>
<b>Loss functions</b><br>
We've already seen loss function's before and defined our own, but using Pytorch we can pick from some predefined functions (we can also just create our own) <br>
<b>Optimizers</b><br>
This is the thing that will be doing the parameter updates. Pytorch has a number of different optimizers, some of which we will explore in future labs. For now we will just use our well known GD.<br>
Note: Most optimizers are just some variant of GD



In [10]:
#Lets perform a regression with a mean square error loss
loss_function = nn.MSELoss()

#Lets create a Stochastic gradient descent optimizer with a learning rate of 0.01
#(the way we will be using it, it is just normal GD) 
#When we create the optimizer we need to tell it WHAT it needs to optimize, so the first thing 
#We pass it are the linear layer's "parameters"
optimizer = torch.optim.SGD(linear.parameters(),lr=0.01) 

Now that everything is set up, lets perform a "forward pass" of our model, aka let's put the data into the model and see what comes out.

In [None]:
#to perform a forward pass of our model, we just need to "call" our network
#Pytorch's nn.Module class will automatically pass it to the "forward" function in the layer class
#more on this later
target_pred = linear(data)
print("Network output\n", target_pred)

We can see that the pred_target is NOT the same as our random target data, let's see what the MSE loss is.

In [None]:
loss = loss_function(target_pred, target)
print('loss:', loss.item())

Lets perform a backward pass of our model to compute the gradients!

In [None]:
# Backward pass.
loss.backward()
# Print out the gradients.
print ('dL/dw: ', linear.weight.grad) 
print ('dL/db: ', linear.bias.grad)
#Note for every backwards pass of the model we must first perform an optomization step
#as data from parts of the computational graph have been removed upon backwards pass to save data 
#Though we can tell Pytorch to hold onto this data, in many cases it needs to be recalculated anyway

Now, finally, tell the optimizer to perform an update step!

In [None]:
# he critical step to update the parameter which reduce the loss
optimizer.step()

#Perform another forward pass of the model to check the new loss
target_pred = linear(data)
loss = loss_function(target_pred, target)
print('loss after 1 step optimization: ', loss.item())

Our loss has gone down!! As this data is random and the model is small, it probably won't reach 0. Lets see how low we can get it to go by constructing a training loop!

In [None]:
#lets create an empty array to log the loss
loss_logger = []

#Lets perform 100 itterations of our dataset
for i in range(1000):
    #Perform a forward pass of our data
    target_pred = linear(data)
    
    #Calculate the loss
    loss = loss_function(target_pred, target)
    
    #.zero_grad sets the stored gradients to 0
    #If we didn't do this they would be added to the 
    #Gradients from the previous step!
    optimizer.zero_grad()
    
    #Calculate the new gradients
    loss.backward()
    
    #Perform an optimization step!
    optimizer.step()

    loss_logger.append(loss.item())
    
    
print("loss:", loss.item())

Lets graph out the loss!

In [None]:
plt.plot(loss_logger)

Wohoo! We trained our first Pytorch neural network!!