# Using PyTorch Tensors 

A PyTorch Tensor is basically the same as a numpy multidimensional array: it does not know
anything about deep learning or computational graphs or gradients, and is just
a generic n-dimensional array to be used for arbitrary numeric computation.

The biggest difference between a numpy array and a PyTorch Tensor is that
a PyTorch Tensor can run on either CPU or GPU. To run operations on the GPU,
just cast the Tensor to a cuda datatype.

(This example is adapted from: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html )

<img src="images/tensor.jpeg" width="800">

### Installation: https://pytorch.org/get-started/locally/

On university linux server just run 'source activate mlearning' to enable pytorch environment

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

## What you should do
- Read and run the examples 1-4 below
- Do the exercices 5 and 6

## 1. Basic tensor matrix operations

In [None]:
# Simple matrix
m1 = torch.ones(3, 4)
print('Matrix m1: \n', m1)
print(m1.shape,"\n")

# Another (random) matrix
m2 = torch.rand(3, 4) # fill 3x4 matrix with uniform random numbers in [0,1] interval
print('Matrix m2: \n', m2)
print(m2.shape,"\n")

# Transpose of a matrix
print('Matrix m2^T: \n', m2.t())
print(m2.t().shape,"\n")

# Matrix operations
m3 = m1*m2      # Not a matrix multiplication ! Here each value in matrix m1 are multiplied by value m2
print('Matrix m3: \n', m3)
print(m3.shape,"\n")

# Matrix multiplication using torch.mm
m4 = m1.mm(m2.t())
print('Matrix m4: \n', m4)
print(m4.shape,"\n")

# Matrix multiplication using torch.matmul
m5 = m1.matmul(m2.t())
print('Matrix m5: \n', m5)
print(m5.shape,"\n")

# For matrix-vector multiplication there is also the possibility to use torch.mv
vec = torch.tensor([1.0,2.0,3.0])
print(torch.matmul(m5, vec))

## 2. More operations (power, sum, clamp, ...)

(See even more basic operations here: https://jhui.github.io/2018/02/09/PyTorch-Basic-operations/)

In [None]:
mat = torch.rand(3, 4)
print(mat)

# Takes the power of each element in input
print(mat.pow(3))

# Returns the sum of all elements in the input tensor
print(mat.sum())

# Clamp all elements in input into the range [ min, max ] and return a resulting tensor
print(mat.clamp(0.3,0.7))

## 3. Convertion to numpy array

In [None]:
# convertion PyTorch -> numpy
a = torch.randn(5)
b = a.numpy()

# convertion numpy -> PyTorch
c = torch.from_numpy(b)

print(a)
print(b)
print(c)

## 4. Example using automatic differentiation (Autograd)

A PyTorch Tensor represents a node in a computational graph. If ``x`` is a
Tensor that has ``x.requires_grad=True`` then ``x.grad`` is another Tensor
holding the gradient of ``x`` with respect to some scalar value.

Example for simple regression:

In [None]:
# Simple regression example
x = [1., 2., 3., 4., 5.]           # data
y = [10., 20., 30., 40., 50.]      # target values

# Gradients will be calculated w.r.t this tensors (has "requires_grad=True")
w = torch.tensor([1.],requires_grad=True)

# Number of loops on all sample
for epoch in range(5):
    
    # Loop on data events and target values
    for x_i, y_i in zip(x, y):
        
        # compute predicted target variable
        y_pred = x_i * w
                
        # compute Mean Squared Error (MSE)
        loss = (y_pred - y_i) ** 2
        
        # With PyTorch we can automatically compute the derivative of the loss 
        # w.r.t. the tensors that have requires_grad set to True (i.e. weights).
        # compute gradients
        loss.backward()
                        
        print('\t x=%.1f y=%.1f, w=%.2f, dloss/dw=%.1f' % (x_i, y_i, w.data, w.grad.data))
                
        # make one step towards the local minima, with learning rate 0.01
        w.data -= 0.01 * w.grad.data
        
        # clear gradients after updating weights
        w.grad.data.zero_()
        
    print('Loss at epoch #%d: %.6f \n' % (epoch+1, loss.data[0]))

print('Final: w = %.4f' % (w.data))


## 5. Exercice: modify the previous example on this new data
Now we have the following data. How should be modified the above code ? How many epochs are needed for convergence ?


In [None]:
# Now we have the following data. How should be modified the above code 
x = [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]           # data
y = [15., 25., 35., 45., 55., 65., 75., 85., 95., 105.] # target values




## 6. Exercice: Simple Neural Network implementation

We consider a fully-connected ReLU network with one hidden layer of 100 neurons and no biases, trained to predict y from x by minimizing squared Euclidean distance.

The model that we want to build has the following structure:
$$\hat{y}(x) = \text{relu}(x.w_1).w_2,$$
where $x$ and $y$ are the input and output features (of dimension 1000 and 10, respectively). N=64 examples are considered for the training. Here the relu activation function is used and $w_1$ and $w_2$ are weight matrices.

This implementation computes the forward pass using operations on PyTorch Tensors, and uses PyTorch autograd to compute gradients.

### Initialization

In [None]:
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU (if you have a GPU !)

# Data dimensions
N = 64      # N: input batch size
D_in = 1000 # D_in: input dimension
H = 100     # H: hidden layer dimension;
D_out = 10  # D_out: output dimension

# NN settings
learning_rate = 1e-6
N_epochs = 500

# Create random Tensors to hold input and outputs.
# Default setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

### Train network using pytorch

Implement the following:
- The model: $$\hat{y}(x) = \text{relu}(x.w_1).w_2,$$
- The loss function: $$\sum_{i=1}^{i=64}(\hat{y}(x_i) - y_i)^2$$
- The gradient and the weight update
- Train for N_epochs
- Validation: at each epoch test the model on an independently created sample (see below)
- Store the loss values of the train and validation sample for each epoch
- Do the training and validation steps for N_epochs
- Finally, plot the evolution of cost as a function of the number of iterations.

Conclude on the generalization of the model.

In [None]:
# Independent validation sample, on which the model is tested at each epoch
x_val = torch.randn(N, D_in, device=device, dtype=dtype)
y_val = torch.randn(N, D_out, device=device, dtype=dtype)

### Plot the model performance