# Pytorch tutorial
*Adapted from [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html "Pytorch tutorial")*

Pytorch is:  
* A deep learning framework (mainly geared towards research)  
* A GPU-powered numpy   

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

## Tensors

Differently from Numpy ndarrays, Tensors can be used on a GPU

In [2]:
x = torch.ones(2,4, dtype=torch.float)
print(x)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])


In [3]:
x = torch.rand(2,4)
print(x)

tensor([[0.7490, 0.5845, 0.5823, 0.2272],
        [0.8460, 0.0612, 0.4490, 0.4116]])


In [21]:
x = torch.tensor([[1.2, 4],[3, 5.6]])
print(x)

tensor([[1.2000, 4.0000],
        [3.0000, 5.6000]])


Tensors can be easily manipulated using usual operation

In [28]:
print(x.size())

y = torch.rand(2,2)
print(x + y)
# alternatively
z =  torch.empty(2,2)
torch.add(x,y, out=z)
print(z)

torch.Size([2, 2])
tensor([[1.8530, 4.0379],
        [3.2658, 5.7057]])
tensor([[1.8530, 4.0379],
        [3.2658, 5.7057]])


In [29]:
print(z)
print(z.size())
batch_size = 1
# to reshape a multi-dimensional array
z = z.view(-1,batch_size)     # the size -1 can be used, the dimension is inferred from the others
print(z.size())

tensor([[1.8530, 4.0379],
        [3.2658, 5.7057]])
torch.Size([2, 2])
torch.Size([4, 1])


From Numpy to Pytorch variables share the memory location, hence we can easily **modify** them, as long as they are on CPU

In [7]:
a = torch.ones(3,2)
print(a)

# from Pytorch to Numpy
b = a.numpy()
print(b)
a.add_(3)
print(a)
print(b)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
[[1. 1.]
 [1. 1.]
 [1. 1.]]
tensor([[4., 4.],
        [4., 4.],
        [4., 4.]])
[[4. 4.]
 [4. 4.]
 [4. 4.]]


In [8]:
# from Numpy to Pytorch
c = torch.from_numpy(b)
print(c)

tensor([[4., 4.],
        [4., 4.],
        [4., 4.]])


## Automatic differentiation

The ``autograd`` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

In [9]:
x = torch.ones(2,4, dtype=torch.float)
print(x)
print(x.requires_grad)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])
False


In [10]:
x.requires_grad_(True)
print(x)
print(x.requires_grad)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]], requires_grad=True)
True


In [11]:
y = x + 7
print(y)

tensor([[8., 8., 8., 8.],
        [8., 8., 8., 8.]], grad_fn=<AddBackward0>)


``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor``.
Other important methods:  
* ``.backward()``: gradients are computed automatically
* ``.detach()``: future computations are not tracked

We can also prevent tracking history wrapping the code in a block in ``with torch.no_grad():``

## Neural networks

We make use of the ``torch.nn`` package. An ``nn.Module`` contains layers, and a method ``forward(input)`` that returns the ``output``

### Checklist for neural networks
1. Define the neural network that has some learnable parameters (or
  weights)
2. Iterate over a dataset of inputs, for each input:
  1. Process input through the network
  2. Compute the loss (how far is the output from being correct)
  3. Propagate gradients back into the network’s parameters
  4. Update the weights of the network, using some stochastic gradient descent algorithm

#### 1. Example of network definition and initialization

In [12]:
class Net_MNIST(nn.Module):

    def __init__(self, input_dim, hidden_dim, batch_size, output_dim=1):
        super(Net_MNIST, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.output_dim = output_dim
        
        self.fc1 = nn.Linear(self.input_dim, self.hidden_dim)
        self.fc2 = nn.Linear(self.hidden_dim, self.output_dim)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return x
    
# create a Net object, i.e. our neural network
input_dim = 28**2   # assuming as input an 28x28 image
hidden_dim = 30
batch_size = 1
output_dim = 10

net = Net_MNIST(input_dim=input_dim, hidden_dim=hidden_dim, batch_size=batch_size, output_dim=output_dim)
print(net)

Net_MNIST(
  (fc1): Linear(in_features=784, out_features=30, bias=True)
  (fc2): Linear(in_features=30, out_features=10, bias=True)
)


For each mini-batch of data:

#### A. Processing input
The network is called as a function, using the input batch data to get the output batch

In [13]:
inp = torch.randn(batch_size, 28, 28)
out = net(inp.view(-1,input_dim)) # Reshaping as the NN wants it

print(out)

tensor([[0.0000, 0.0000, 0.2330, 0.0000, 0.0434, 0.0000, 0.0000, 0.0222, 0.0495,
         0.2554]], grad_fn=<ReluBackward0>)


#### B. Computing loss
There are a number of pre-defined loss functions, but it is possible to define our own.
It is important that, given the prediction $y_{pred} \in \mathbb{R}^d$ and the ground truth $y_{true} \in \mathbb{R}^d$, the loss $\mathcal{L}$ is defined so that

$$ \mathcal{L} : \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}$$

In [14]:
target = torch.randn(10)         # a dummy target, for example
target = target.view(1, -1)      # make it the same shape as output
criterion = nn.MSELoss()         # MSE = mean squared error - the usuall/typical one

loss = criterion(out, target)
print(loss)

tensor(1.0010, grad_fn=<MseLossBackward>)


#### C. Backpropagation
The computation of the gradient is done automatically using the ``.backward()`` method

#### D. Update the weights
We need to choose an optimization algorithm, different options are available as well

In [18]:
# create the optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in the training loop:
optimizer.zero_grad()             # zero the gradient buffers
out = net(inp.view(-1,input_dim))
loss = criterion(out, target)
loss.backward()
optimizer.step()                  # does the update