# PyTorch Tutorial 

<center><img src="img/pytorch-logo.jpeg" width="400" /></center>

* open-source machine learning library written in Python, C++ and CUDA

* has NumPy-like interfaces

* provides two core features: operations with tensors and automatic differentiation
    
* initialy developed at Facebook

### Installation: https://pytorch.org/get-started/locally/

On university linux server just run 'source activate tensor' to enable pytorch environment

### What are tensors?

A PyTorch Tensor is basically the same as a numpy multidimensional array: it does not know
anything about deep learning or computational graphs or gradients, and is just
a generic n-dimensional array to be used for arbitrary numeric computation.

The biggest difference between a numpy array and a PyTorch Tensor is that
a PyTorch Tensor can run on either CPU or GPU. To run operations on the GPU,
just cast the Tensor to a cuda datatype.

<img src="img/tensor.jpeg" width="800">

### Basic tensor matrix operations

In [None]:
import torch

# Simple matrix
m1 = torch.ones(3, 4)
print('Matrix m1: \n', m1)
print(m1.shape,"\n")

# Another (random) matrix
m2 = torch.rand(3, 4) # fill 3x4 matrix with uniform random numbers in [0,1] interval
print('Matrix m2: \n', m2)
print(m2.shape,"\n")

# Transpose of a matrix
print('Matrix m2^T: \n', m2.t())
print(m2.t().shape,"\n")

# Matrix operations
m3 = m1*m2      # Not a matrix multiplication ! Here each value in matrix m1 are multiplied by value m2
print('Matrix m3: \n', m3)
print(m3.shape,"\n")

# Matrix multiplication using torch.mm
m4 = m1.mm(m2.t())
print('Matrix m4: \n', m4)
print(m4.shape,"\n")

# Matrix multiplication using torch.matmul
m5 = m1.matmul(m2.t())
print('Matrix m5: \n', m5)
print(m5.shape,"\n")

# For matrix-vector multiplication there is also the possibility to use torch.mv
vec = torch.tensor([1.0,2.0,3.0])
print(torch.matmul(m5, vec))

### More operations (power, sum, clamp, ...)

(See even more basic operations here: https://jhui.github.io/2018/02/09/PyTorch-Basic-operations/)

In [None]:
mat = torch.rand(3, 4)
print(mat)

# Takes the power of each element in input
print(mat.pow(3))

# Returns the sum of all elements in the input tensor
print(mat.sum())

# Clamp all elements in input into the range [ min, max ] and return a resulting tensor
print(mat.clamp(0.3,0.7))

### Convertion to numpy array

In [None]:
import numpy as np

# convertion PyTorch -> numpy
a = torch.randn(5)
b = a.numpy()

# convertion numpy -> PyTorch
c = torch.from_numpy(b)

print(a)
print(b)
print(c)

### Autograd

A PyTorch Tensor represents a node in a computational graph. If ``x`` is a
Tensor that has ``x.requires_grad=True`` then ``x.grad`` is another Tensor
holding the gradient of ``x`` with respect to some scalar value.

Example for simple regression:

In [None]:
# Simple regression example
x = [1., 2., 3., 4., 5.]           # data
y = [10., 20., 30., 40., 50.]      # target values

# Gradients will be calculated w.r.t this tensors (has "requires_grad=True")
w = torch.tensor([1.],requires_grad=True)

# Number of loops on all sample
for epoch in range(5):
    
    # Loop on data events and target values
    for x_i, y_i in zip(x, y):
        
        # compute predicted target variable
        y_pred = x_i * w
                
        # compute Mean Squared Error (MSE)
        loss = (y_pred - y_i) ** 2
        
        # With PyTorch we can automatically compute the derivative of the loss 
        # w.r.t. the tensors that have requires_grad set to True (i.e. weights).
        # compute gradients
        loss.backward()
                        
        print('\t x=%.1f y=%.1f, w=%.2f, dloss/dw=%.1f' % (x_i, y_i, w.data, w.grad.data))
                
        # make one step towards the local minima, with learning rate 0.01
        w.data -= 0.01 * w.grad.data
        
        # clear gradients after updating weights
        w.grad.data.zero_()
        
    print('Loss at epoch #%d: %.6f \n' % (epoch+1, loss.data[0]))

print('Final: w = %.4f' % (w.data))


## Exercice
Now we have the following data. How should be modified the above code ? How many epochs are needed for convergence ?


In [None]:
x = [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]           # data
y = [15., 25., 35., 45., 55., 65., 75., 85., 95., 105.] # target values

# FILL CODE HERE #



NN with tensor and autograd
-------------------------------

We consider a fully-connected ReLU network with one hidden layer and no biases, trained to predict y from x using Euclidean error.

The model that we want to build has the following structure:
$$\hat{y}(x) = \text{relu}(x.w_1).w_2,$$
where $x$ and $y$ are the input and output features (of dimension 1000 and 10, respectively). Here the relu activation function is used and $w_1$ and $w_2$ are weight matrices.

The network consists of a fully-connected ReLU network with one hidden layer and no biases, trained to predict y from x by minimizing squared Euclidean distance.

This implementation computes the forward pass using operations on PyTorch Tensors, and uses PyTorch autograd to compute gradients.


In [None]:
%matplotlib inline

import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    
    # IMPLEMENT NETWORK TRAINING USING PYTORCH
    


## Optional: Exercice
Add one more hidden layer with 100 neurons. Use successively sigmoid (torch.sigmoid) and relu (torch.relu) activation functions.

Note: you might need to increase the numbers of epochs (this solution is much less effective than the previous one !)
