# Introduction to pytorch

In [1]:
import torch
import numpy as np

In [None]:
pwd() # it is a good idea to check and document which folder your notebook is saved in

## Lists, arrays, and tensors

Adapted from the tutorial at
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py


Please follow through that tutorial as well. 
I tell you slightly different things here. You will gain from doing both tutorials (but I recommend that you do mine first! ) 

In [2]:
data = [[1,2], [3, 4]] # this makes a Python list of lists: this is not a proper array
data # to see what it is

[[1, 2], [3, 4]]

In [3]:
type( data ) # type is a very useful Python function: 
# you should always know what type every variable is in your code

list

You can convert a list of data into a numpy array (I hope you already know how to do this). 

In [4]:
np_data = np.array( data )
np_data

array([[1, 2],
       [3, 4]])

In [5]:
type( np_data )

numpy.ndarray

The type does not give all useful information about a numpy array. Numpy arrays also have attributes, which give their shape and the data type of their elements.  


In [6]:
np_data.shape

(2, 2)

In [7]:
np_data.dtype

dtype('int32')

You should ALWAYS make sure you know the shape of the numpy arrays in your code, and you should also be careful to keep track of the data type of the individual elements. 

If numpy had been part of Python from the beginning, these attributes would be part of the type itself...

### Tensors

The point of this worksheet is to introduce pytorch tensors. These are very similar to numpy arrays, but, as we shall see, you can do some additional things with them. 

In [8]:
torch_data = torch.from_numpy( np_data )
torch_data

tensor([[1, 2],
        [3, 4]], dtype=torch.int32)

We have just created a pytorch tensor from a numpy array.  (There are many other ways to create tensors -- see the other tutorial for some examples.)

We can use a torch tensor just like a numpy array: 

In [9]:
torch_data[0,1] = 17
torch_data

tensor([[ 1, 17],
        [ 3,  4]], dtype=torch.int32)

Let's look at the numpy array from which we created the tensor: 

In [10]:
np_data

array([[ 1, 17],
       [ 3,  4]])

Oh wow! What has happened?  The value in the numpy array has also changed.

This means that the torch tensor and the numpy array share the same data: the torch tensor has some additional information and properties which we will now turn to...

## Gradients

(Heavily) adapted from the tutorial at https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

Please look at that tutorial also. 

We will set up a single neuron with tensors. 

There will be three inputs `x=[1,2,3]` and three weights `w`
and one output, which will go through a relu function. 

We will perform the forward pass and the backward pass, and compute the gradients of the output with respopect to the weights. 

The pre-activation of the neuron is the weighted sum of the inputs: 

$a = w_1 x_1 + w_2 x_2 + w_3 x_3$

The activation, or output, of the neuron is

$relu(a) = \max( a, 0)$ 

We need to set all this up with tensors instead of arrays, for reasons which will become clear. 


Now we set up the weights. There is one weight for each input. We want to compute the gradient of the output wrt the weights and the inputs, and to do this we need to set the tensor attribute `requires_grad` to be `True`. 

In [42]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
x

tensor([1., 2., 3.], requires_grad=True)

In [46]:
w = torch.randn_like( x, requires_grad=True )
w

tensor([-0.6987, -0.0979,  0.6506], requires_grad=True)

In [47]:
a = sum( w * x )
a

tensor(1.0571, grad_fn=<AddBackward0>)

In [48]:
y = torch.max( a, torch.tensor(0.0))
y

tensor(1.0571, grad_fn=<MaximumBackward>)

This is the forward-pass, which is computing the output from the input. 

Now we compute the gradients. 

We could do this by hand: 

$\frac{\partial y}{\partial w_1} = \frac{\partial a}{\partial w_1}\cdot \frac{\partial y}{\partial a} = x_1 \cdot 1 = x_1 $

For these values of $w$ and $x$, $a > 0$, so $\frac{\partial y}{\partial a}=1$. 

If $a < 0 $, $\frac{\partial y}{\partial a} = 0$. 

But we don't need to do it by hand: we can use autodiff.  Because `requires_grad` was set to true for `w` 

In [49]:
y.backward()

Nothing seems to have happened - but a lot has happened. The computational graph for the calculation of y from x and w was retained, and the gradients of all tensors in the computational graph wrt y were calculated. 

We can look at them like this: 

In [38]:
w.grad

tensor([1., 2., 3.])

In [50]:
x.grad

tensor([-0.6987, -0.0979,  0.6506])

Notice that the gradient of $\frac{\partial y}{\partial w} = x \cdot \frac{\partial y}{\partial a}$ and 
$\frac{\partial y}{\partial x} = w \cdot \frac{\partial y}{\partial a}$ 



What about `a`, the pre-activation? Do we have the gradient $\frac{\partial y}{\partial a}$ ?

Let's see: 

In [51]:
a.grad

  a.grad


Ah - pytorch has assumed that we do not need the gradients for intermediate results, and it has disposed of it. We could have kept it...but really we were only looking out of curiosity. 

The leaf-tensors are the tensors at the start and the end of the computational graph. 

## Points to note

1. Convince yourself that the gradients are correct

2. Try running the calculation again. You will find that the gradients accumulate (that is, the new gradients are added to the existing gradients). 
This is a feature, not a bug!  It is often useful to accumulate gradients from many different calculations, before using them. 

You can reset the gradients to zero (well actually, you can delete them) by calling `x.grad = None` and `w.grad=None` before recomputing. 

3. Notice that we used special `torch` functions in the forward pass, instead of ordinary python or numpy functions.  This is because each `torch` forward function has another backward function that is used to compute is gradient, when working back through the computational graph. 

## Tensor attributes

`x.shape`gives the shape of the tensor: the size of each of its dimensions. Very useful - you always need to know the shapes of the tensors you are using.  

`x.dtype` gives the data type of the tensor elements (this will always be a number-type). float32 is the default, and this is the one that gives fast arithmetic. 

`x.device` is only relevant if you have a gpu: then you will keep needing to transfer tensors between the cpu memory and the gpu memory. If you are just using a cpu, life is much simpler and you can ignore this attribute. 

`x.requires_grad` if this is set to true, then the computational graph will be kept for calculations involving this tensor, so that gradients can be calculated. 

`x.grad`is the gradient that has been calculated for the tensor x after y.backward() has been called, where y is a leaf (a final single-number tensor result) of the calculation. 
    Note that y needs to be a single number, because otherwise the derivatives of x need to be calculated with respect to each element of y: don't go there. All our gradient calculations will be for single-valued results. 



In [17]:
sz = x.size()
sz

torch.Size([3])

In [14]:
type( sz )

torch.Size

In [18]:
x.shape

torch.Size([3])

`x.shape` is an alias for `x.Size()` -- you can use either. 

There are many ways to create tensors: another is to use the functions `ones_like` or `zeros_like` or `randn_like` which we use here. `torch.ones_like(x)` creates a new tensor of the same size as `x`, but filled with 1.0  

### Now do the pytorch blitz tutorial

at  https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py