# COMP4660/8420 Lab 2.3 Appendix - Introduction to PyTorch Basics

Before delving deeper into the building neural networks with PyTorch, it is helpful to know some basic manipulations of PyTorch elements. The following examples in task 1 are designed to give you a very brief introduction to operations in PyTorch and will be sufficient for the course, but if you are interested in increasing your knowledge we encourage you to look at the PyTorch tutorials and play around further:
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html

## What is PyTorch?

It’s a Python based scientific computing package targeted at two sets of audiences:
* A replacement for NumPy to use the power of GPUs
* a deep learning research platform that provides maximum flexibility and speed

To use PyTorch, you need to import the library by writing:

In [None]:
import torch

## 1. Tensors

### 1.1 What are Tensors?

Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be
used on a GPU to accelerate computing.

Construct a randomly initialized matrix:

In [None]:
# create a randomly initialized matrix
x = torch.rand(5, 3)
print(x)
# get matrix size
print(x.size())

### 1.2 Tensors Operations

Tensors support basic operations such as addition, subtraction, multiplication, and division. There are multiply syntaxes for operations. In the following example, we will take a look at the addition operation.

In [None]:
# Addition: syntax 1
x = torch.rand(5, 3)
y = torch.rand(5, 3)
print(x + y)

In [None]:
# Addition: syntax 2
x = torch.rand(5, 3)
y = torch.rand(5, 3)
print(torch.add(x, y))

In [None]:
# Addition: in-place
x = torch.rand(5, 3)
y = torch.rand(5, 3)
print(y)
y = y.add_(x)
print(y)

Note 1: Any operation that mutates a tensor in-place is post-fixed with an \_. For example: x.copy\_(y) , x.t\_(), will change x.

Indexing is also NumPy-like.

In [None]:
# access the second column
print(x[:, 1])

# access the first row
print(x[0, :])

### 1.3 Tensors <-> NumPy arrays

Tensors can be converted to NumPy’s ndarrays,

In [None]:
# import numpy library
import numpy as np
# create a randomly initialized tensor matrix
x = torch.rand(5, 3)
# convert tensors to numpy array
y = x.numpy()
print(y)

and can be formed by NumPy’s ndarrays.

In [None]:
# create a numpy array
x = np.array(([3,4], [3,5]))
# convert numpy array to tensor
y = torch.from_numpy(x)
print(y)

Note 2: The Torch Tensor and NumPy array will share their underlying memory locations, so changing one will change the other.

In [None]:
# create a numpy array
a = torch.ones(5)
b = a.numpy()
# all elements in a add 1
a.add_(1)
# b will also be updated
print(a)
print(b)

## 2. Autograd: automatic differentiation

Central to all neural networks in PyTorch is the _autograd_ package. It provides automatic differentiation for all operations on Tensors.

Autograd is reverse automatic differentiation system. Conceptually, _autograd_ records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input variables and roots are the output variables. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

Internally, _autograd_ represents this graph as a graph of Function objects (really expressions), which can be _apply()_ ed to compute the result of evaluating the graph. When computing the forwards pass, _autograd_ simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad\_fn attribute of each _Variable_ is an entry point into this graph). When the forwards pass is completed, we evaluate this graph in the backwards pass to compute the gradients.

### 2.1 Variable (Deprecated)

This class has been deprecated as of version 0.4. All functionalities of the class Variable are now supported directly by the class Tensor. The below section is useful to understand what Variable does (if you are using an old PyTorch version) and what Tensor can now do (the functions are directly applicable to Tensor now).

autograd.Variables is the central class of the package. It wraps a Tensor, and supports nearly all operations defined on it. Once you finish your computation, you can call .backward() and have all the gradients computed automatically.

There are three basic attributes in a Variable:

![Variable](images/variable.png)

You can access the raw tensor through the .data attribute, the gradient with respect to this variable is accumulated into .grad. Each variable has a .grad\_fn attribute which references a Function that has created the Variable, except for Variables created by the user – their grad\_fn is None.

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

In [None]:
from torch.autograd import Variable

# Create variables
x = Variable(torch.Tensor([1]), requires_grad=True)
w = Variable(torch.Tensor([2]), requires_grad=True)
b = Variable(torch.Tensor([3]), requires_grad=True)

# Define a function
y = w * x + b        	# y = 2 * x + 3

# Compute gradients.
y.backward()         	# equal to y.backward(torch.Tensor([1.0]))

# Print out the gradients.
print('dy/dx: {}'.format(x.grad.data))   # x.grad = 2
print('dy/dw: {}'.format(w.grad.data))   # w.grad = 1
print('dy/db: {}'.format(b.grad.data))   # b.grad = 1


### 2.2 Gradients

Let’s now look at the following example to understand how gradients are calculated.

In [None]:
import torch
from torch.autograd import Variable

x=torch.Tensor([[1.,2.,3.],[4.,5.,6.]])
x=Variable(x,requires_grad=True)
y=x+2
z=y*y*3
out=z.mean()


An equivalent computation graph of the above code is:
![Computation Graph](images/graph.png)

Now, if you follow the computation direction by using its .grad\_fn attribute, i.e. by printing out the .grad\_fn of each variable as follows,

In [None]:
print(x.grad_fn)
print(y.grad_fn)
print(z.grad_fn)
print(out.grad_fn) 

you will see a graph of computations that looks like this:

    x -> add -> multiply -> mean -> out
    
To see the gradient of out with respect to x, ∂out/∂x, we can do

In [None]:
out.backward()      # equivalent to out.backward(torch.Tensor([1]))
print(x.grad)

## 3. Neural Networks

Neural networks can be constructed using the torch.nn packages. nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the output.

A typical training procedure for a neural network is as follows:
* Define the neural network that has some learnable parameters (or weights)
* Iterate over a dataset of inputs
* Process input through the network
* Compute the loss (how far is the output from being correct)
* Propagate gradients back into the network’s parameters
* Update the weights of the network, typically using a simple update rule: 
    
        weight = weight - learning_rate * gradient 

### 3.1 Define the network
A template for defining a neural network is:

In [None]:
import torch.nn as nn
import torch.nn.functional as F
# a template for defining a neural network
Class net_name(nn.Module):
    def __init__(self):
        super(net_name, self).__init()
        # add layers here
        self.layer1 = nn.Linear(n_input, n_hidden)  #change nn.Linear if it is not linear 
        self.layer2 = …
        # more layers…

    # define the process of performing forward pass,
    # that is how to return a Variable of output data
    # from a Variable of input data x
    def forward(self, x):
        x = F.some_function1(x)        # calling some functions in torch.nn.functional
        x = F.some_function2(x)        # calling some functions in torch.nn.functional 
        x = self.layer1(x)             # apply pre-define layer1 
        … …
        return x

# define a neural network using the customised structure
net = Net()


You just need to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function. The input to the forward is an autograd.Variable, and so is the output.

In [None]:
# perform forward pass to get the actual output
output = net(input)

The learnable parameters of a model are returned by net.parameters().

Here as an example, we define a simple neural network with one hidden layer using the above template:

In [None]:
# define a simple neural network with one sigmoid hidden layer
class TwoLayerNet(torch.nn.Module):

    def __init__(self, n_input, n_hidden, n_output):
        super(TwoLayerNet, self).__init__()
        # define linear hidden layer output
        self.hidden = torch.nn.Linear(n_input, n_hidden)
        # define linear output layer output
        self.out = torch.nn.Linear(n_hidden, n_output)

    def forward(self, x):
        """
            In the forward function we define the process of performing
            forward pass, that is to accept a Variable of input
            data, x, and return a Variable of output data, y_pred.
        """
        # get hidden layer input
        h_input = self.hidden(x)
        # define activation function for hidden layer
        h_output = F.sigmoid(h_input)
        # get output layer output
        y_pred = self.out(h_output)

        return y_pred

# define a neural network using the customised structure
net = TwoLayerNet(input_neurons, hidden_neurons, output_neurons)

### 3.2 Loss Function

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package . A simple loss is: nn.MSELoss which computes the mean-squared error between the input and the target. 

For example:


In [None]:
# perform forward pass to get the actual output
output = net(input)

# define loss function
loss_func = nn.MSELoss()

# compute loss
loss = loss_func (output, target)
print(loss)

So, when we call loss.backward(), the whole computational graph is differentiated with respect to the loss, and all Variables in the graph will have their .grad Variable accumulated with the gradient.

### 3.3 Back propagation

To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

In [None]:
# clear gradient buffers of all parameters
net.zero_grad()

# perform backward pass: compute gradients of the loss with respect to
# all the learnable parameters of the model.
loss.backward()

### 3.4 Update the weights

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
    
    weight = weight - learning_rate * gradient
    
We can implement this using simple python code:

In [None]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, you may want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we can use a small package torch.optim that implements all these methods. Using it is very simple:

In [None]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = loss_func(output, target)
loss.backward()
optimizer.step()    # does the update

### 3.5 Save and load a model

Sometimes, you may want to save the trained model and load it later. There are two approaches for this.

The first (recommended) saves and loads only the model parameters:

In [None]:
torch.save(the_model.state_dict(), PATH)

Then later:

In [None]:
the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

The second saves and loads the entire model:

In [None]:
torch.save(the_model, PATH)

Then later:

In [None]:
the_model = torch.load(PATH)