# PyTorch Tutorial 

<center><img src="img/pytorch-logo.jpeg" width="400" /></center>

* open-source machine learning library written in Python, C++ and CUDA

* has NumPy-like interfaces

* provides two core features: operations with tensors and automatic differentiation
    
* initialy developed at Facebook

### Installation: https://pytorch.org/get-started/locally/

On university linux server just run 'source activate tensor' to enable pytorch environment

### What are tensors?

A PyTorch Tensor is basically the same as a numpy multidimensional array: it does not know
anything about deep learning or computational graphs or gradients, and is just
a generic n-dimensional array to be used for arbitrary numeric computation.

The biggest difference between a numpy array and a PyTorch Tensor is that
a PyTorch Tensor can run on either CPU or GPU. To run operations on the GPU,
just cast the Tensor to a cuda datatype.

<img src="img/tensor.jpeg" width="800">

### Basic tensor matrix operations

In [10]:
import torch  #importer la library

In [2]:
# Simple matrix
m1 = torch.ones(3, 4)
print('Matrix m1: \n', m1)
print(m1.shape,"\n")

Matrix m1: 
 tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
torch.Size([3, 4]) 



In [3]:
# Another (random) matrix
m2 = torch.rand(3, 4) # fill 3x4 matrix with uniform random numbers in [0,1] interval
print('Matrix m2: \n', m2)
print(m2.shape,"\n")

Matrix m2: 
 tensor([[0.9753, 0.2353, 0.8413, 0.3650],
        [0.2697, 0.1559, 0.3020, 0.6852],
        [0.0883, 0.0926, 0.6021, 0.1590]])
torch.Size([3, 4]) 



In [4]:
# Transpose of a matrix
print('Matrix m2^T: \n', m2.t())
print(m2.t().shape,"\n")

Matrix m2^T: 
 tensor([[0.9753, 0.2697, 0.0883],
        [0.2353, 0.1559, 0.0926],
        [0.8413, 0.3020, 0.6021],
        [0.3650, 0.6852, 0.1590]])
torch.Size([4, 3]) 



In [5]:
# Matrix operations
m3 = m1*m2      # Not a matrix multiplication ! Here each value in matrix m1 are multiplied by value m2
print('Matrix m3: \n', m3)
print(m3.shape,"\n")

Matrix m3: 
 tensor([[0.9753, 0.2353, 0.8413, 0.3650],
        [0.2697, 0.1559, 0.3020, 0.6852],
        [0.0883, 0.0926, 0.6021, 0.1590]])
torch.Size([3, 4]) 



In [6]:
# Matrix multiplication using torch.mm
m4 = m1.mm(m2.t())
print('Matrix m4: \n', m4)
print(m4.shape,"\n")

Matrix m4: 
 tensor([[2.4169, 1.4128, 0.9419],
        [2.4169, 1.4128, 0.9419],
        [2.4169, 1.4128, 0.9419]])
torch.Size([3, 3]) 



In [7]:
# Matrix multiplication using torch.matmul
m5 = m1.matmul(m2.t())
print('Matrix m5: \n', m5)
print(m5.shape,"\n")

Matrix m5: 
 tensor([[2.4169, 1.4128, 0.9419],
        [2.4169, 1.4128, 0.9419],
        [2.4169, 1.4128, 0.9419]])
torch.Size([3, 3]) 



In [11]:
# For matrix-vector multiplication there is also the possibility to use torch.mv
vec = torch.tensor([1.0,2.0,3.0])
print(torch.matmul(m5, vec)) #mv ou matmul pour le vecteur

tensor([8.0683, 8.0683, 8.0683])


### More operations (power, sum, clamp, ...)

(See even more basic operations here: https://jhui.github.io/2018/02/09/PyTorch-Basic-operations/)

In [12]:
mat = torch.rand(3, 4)
print(mat)

tensor([[0.8684, 0.1937, 0.2964, 0.9543],
        [0.1710, 0.1893, 0.2003, 0.3167],
        [0.2126, 0.3147, 0.1839, 0.1187]])


In [13]:
# Takes the power of each element in input
print(mat.pow(3))

tensor([[0.6549, 0.0073, 0.0260, 0.8691],
        [0.0050, 0.0068, 0.0080, 0.0318],
        [0.0096, 0.0312, 0.0062, 0.0017]])


In [14]:
# Returns the sum of all elements in the input tensor
print(mat.sum())

tensor(4.0199)


In [15]:
# Clamp all elements in input into the range [ min, max ] and return a resulting tensor
print(mat.clamp(0.3,0.7))

tensor([[0.7000, 0.3000, 0.3000, 0.7000],
        [0.3000, 0.3000, 0.3000, 0.3167],
        [0.3000, 0.3147, 0.3000, 0.3000]])


### Convertion to numpy array

In [17]:
import numpy as np

# convertion PyTorch -> numpy
a = torch.randn(5)
b = a.numpy()

# convertion numpy -> PyTorch
c = torch.from_numpy(b)

print('Convertir PyTorch -> Numpy A:',a)
print('Convertir PyTorch ->Numpy B:',b)
print('Convertir Numpy -> PyTorch C:',c)

Convertir PyTorch -> Numpy A: tensor([-0.1334, -1.1054, -1.2854, -0.9162,  0.9704])
Convertir PyTorch ->Numpy B: [-0.13339636 -1.1054373  -1.285435   -0.9162246   0.97035503]
Convertir Numpy -> PyTorch C: tensor([-0.1334, -1.1054, -1.2854, -0.9162,  0.9704])


### Autograd

A PyTorch Tensor represents a node in a computational graph. If ``x`` is a
Tensor that has ``x.requires_grad=True`` then ``x.grad`` is another Tensor
holding the gradient of ``x`` with respect to some scalar value.

Example for simple regression:

In [18]:
# Simple regression example
x = [1., 2., 3., 4., 5.]           # data
y = [10., 20., 30., 40., 50.]      # target values

# Gradients will be calculated w.r.t this tensors (has "requires_grad=True")
w = torch.tensor([1.],requires_grad=True)

# Number of loops on all sample
for epoch in range(5):
    
    # Loop on data events and target values
    for x_i, y_i in zip(x, y):
        
        # compute predicted target variable
        y_pred = x_i * w #tensor
                
        # compute Mean Squared Error (MSE)
        loss = (y_pred - y_i) ** 2 #tensor
        
        # With PyTorch we can automatically compute the derivative of the loss 
        # w.r.t. the tensors that have requires_grad set to True (i.e. weights).
        # compute gradients
        loss.backward()
                        
        print('\t x=%.1f y=%.1f, w=%.2f, dloss/dw=%.1f' % (x_i, y_i, w.data, w.grad.data))
                
        # make one step towards the local minima, with learning rate 0.01
        w.data -= 0.01 * w.grad.data
        
        # clear gradients after updating weights
        w.grad.data.zero_()
        
    print('Loss at epoch #%d: %.6f \n' % (epoch+1, loss.data[0]))

print('Final: w = %.4f' % (w.data))


	 x=1.0 y=10.0, w=1.00, dloss/dw=-18.0
	 x=2.0 y=20.0, w=1.18, dloss/dw=-70.6
	 x=3.0 y=30.0, w=1.89, dloss/dw=-146.1
	 x=4.0 y=40.0, w=3.35, dloss/dw=-212.9
	 x=5.0 y=50.0, w=5.48, dloss/dw=-226.2
Loss at epoch #1: 511.797760 

	 x=1.0 y=10.0, w=7.74, dloss/dw=-4.5
	 x=2.0 y=20.0, w=7.78, dloss/dw=-17.7
	 x=3.0 y=30.0, w=7.96, dloss/dw=-36.7
	 x=4.0 y=40.0, w=8.33, dloss/dw=-53.5
	 x=5.0 y=50.0, w=8.86, dloss/dw=-56.9
Loss at epoch #2: 32.337894 

	 x=1.0 y=10.0, w=9.43, dloss/dw=-1.1
	 x=2.0 y=20.0, w=9.44, dloss/dw=-4.5
	 x=3.0 y=30.0, w=9.49, dloss/dw=-9.2
	 x=4.0 y=40.0, w=9.58, dloss/dw=-13.5
	 x=5.0 y=50.0, w=9.71, dloss/dw=-14.3
Loss at epoch #3: 2.043265 

	 x=1.0 y=10.0, w=9.86, dloss/dw=-0.3
	 x=2.0 y=20.0, w=9.86, dloss/dw=-1.1
	 x=3.0 y=30.0, w=9.87, dloss/dw=-2.3
	 x=4.0 y=40.0, w=9.89, dloss/dw=-3.4
	 x=5.0 y=50.0, w=9.93, dloss/dw=-3.6
Loss at epoch #4: 0.129104 

	 x=1.0 y=10.0, w=9.96, dloss/dw=-0.1
	 x=2.0 y=20.0, w=9.96, dloss/dw=-0.3
	 x=3.0 y=30.0, w=9.97, dloss/d

## Exercice
Now we have the following data. How should be modified the above code ? How many epochs are needed for convergence ?


In [20]:
x = [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]           # data
y = [15., 25., 35., 45., 55., 65., 75., 85., 95., 105.] # target values X*10+5

# Gradients will be calculated w.r.t this tensors (has "requires_grad=True")
w = torch.tensor([1.],requires_grad=True)
b = torch.tensor([0.],requires_grad=True)

# Number of loops on all sample
for epoch in range(100):
    
    # Loop on data events and target values
    for x_i, y_i in zip(x, y):
        
        # compute predicted target variable
        y_pred = x_i * w+b #tensor
                
        # compute Mean Squared Error (MSE)
        loss = (y_pred - y_i) ** 2 #tensor
        
        # With PyTorch we can automatically compute the derivative of the loss 
        # w.r.t. the tensors that have requires_grad set to True (i.e. weights).
        # compute gradients
        loss.backward()
                        
        print('\t x=%.1f y=%.1f, w=%.2f, dloss/dw=%.1f' % (x_i, y_i, w.data, w.grad.data))
                
        # make one step towards the local minima, with learning rate 0.01
        w.data -= 0.01 * w.grad.data
        b.data -= 0.01 * w.grad.data #biais
        
        # clear gradients after updating weights
        w.grad.data.zero_()
        b.grad.data.zero_()
        
    print('Loss at epoch #%d: %.6f \n' % (epoch+1, loss.data[0]))

print('Final: w = %.4f' % (w.data))

	 x=1.0 y=15.0, w=1.00, dloss/dw=-28.0
	 x=2.0 y=25.0, w=1.28, dloss/dw=-88.6
	 x=3.0 y=35.0, w=2.17, dloss/dw=-164.0
	 x=4.0 y=45.0, w=3.81, dloss/dw=-215.7
	 x=5.0 y=55.0, w=5.96, dloss/dw=-202.2
	 x=6.0 y=65.0, w=7.99, dloss/dw=-121.2
	 x=7.0 y=75.0, w=9.20, dloss/dw=-33.9
	 x=8.0 y=85.0, w=9.54, dloss/dw=-2.8
	 x=9.0 y=95.0, w=9.56, dloss/dw=-6.5
	 x=10.0 y=105.0, w=9.63, dloss/dw=-1.7
Loss at epoch #1: 0.006979 

	 x=1.0 y=15.0, w=9.65, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.58, dloss/dw=11.0
	 x=3.0 y=35.0, w=9.47, dloss/dw=11.3
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.3
	 x=5.0 y=55.0, w=9.29, dloss/dw=-2.3
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.3
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #2: 0.033335 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=

Loss at epoch #20: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #21: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #22: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.

	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #40: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #41: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #42: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=

	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #60: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #61: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #62: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=

	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #80: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #81: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw=6.6
	 x=2.0 y=25.0, w=9.59, dloss/dw=11.1
	 x=3.0 y=35.0, w=9.48, dloss/dw=11.5
	 x=4.0 y=45.0, w=9.36, dloss/dw=6.6
	 x=5.0 y=55.0, w=9.30, dloss/dw=-2.1
	 x=6.0 y=65.0, w=9.32, dloss/dw=-9.2
	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #82: 0.033382 

	 x=1.0 y=15.0, w=9.66, dloss/dw

	 x=7.0 y=75.0, w=9.41, dloss/dw=-10.0
	 x=8.0 y=85.0, w=9.51, dloss/dw=-6.5
	 x=9.0 y=95.0, w=9.58, dloss/dw=-4.4
	 x=10.0 y=105.0, w=9.62, dloss/dw=-3.7
Loss at epoch #100: 0.033382 

Final: w = 9.6563



NN with tensor and autograd
-------------------------------

We consider a fully-connected ReLU network with one hidden layer and no biases, trained to predict y from x using Euclidean error.

The model that we want to build has the following structure:
$$\hat{y}(x) = \text{relu}(x.w_1).w_2,$$
where $x$ and $y$ are the input and output features (of dimension 1000 and 10, respectively). Here the relu activation function is used and $w_1$ and $w_2$ are weight matrices.

The network consists of a fully-connected ReLU network with one hidden layer and no biases, trained to predict y from x by minimizing squared Euclidean distance.

This implementation computes the forward pass using operations on PyTorch Tensors, and uses PyTorch autograd to compute gradients.


In [28]:
%matplotlib inline

import torch

dtype = torch.float
device = torch.device("cpu")
#device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N = 64
D_in = 1000
H = 100
D_out = 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # IMPLEMENT NETWORK TRAINING USING PYTORCH
    y_pred=x.mm(w1).clamp(min=0).mm(w2) #fonction ReLu activate function
    loss=((y_pred-y)**2).sum()
    
    print(t,loss.data) #Loss function
    
    loss.backward()
    
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data
    
    # clear gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

0 tensor(34554704.)
1 tensor(31203180.)
2 tensor(32000518.)
3 tensor(30943786.)
4 tensor(25417720.)
5 tensor(16907294.)
6 tensor(9551311.)
7 tensor(5033432.5000)
8 tensor(2769287.7500)
9 tensor(1697355.6250)
10 tensor(1170294.5000)
11 tensor(883521.5625)
12 tensor(706567.7500)
13 tensor(584532.1875)
14 tensor(493372.0938)
15 tensor(421808.0625)
16 tensor(363902.3125)
17 tensor(315946.1875)
18 tensor(275782.1562)
19 tensor(241822.4844)
20 tensor(212982.6875)
21 tensor(188334.4375)
22 tensor(167142.4375)
23 tensor(148788.8438)
24 tensor(132853.5312)
25 tensor(118939.2500)
26 tensor(106746.0156)
27 tensor(96023.2969)
28 tensor(86564.0312)
29 tensor(78197.5078)
30 tensor(70777.4062)
31 tensor(64176.8750)
32 tensor(58292.4102)
33 tensor(53035.7617)
34 tensor(48330.9805)
35 tensor(44107.6719)
36 tensor(40309.8594)
37 tensor(36888.5352)
38 tensor(33800.9062)
39 tensor(31010.8535)
40 tensor(28485.0039)
41 tensor(26190.1406)
42 tensor(24107.6699)
43 tensor(22212.2676)
44 tensor(20486.1602)
45 t

411 tensor(0.0006)
412 tensor(0.0006)
413 tensor(0.0006)
414 tensor(0.0005)
415 tensor(0.0005)
416 tensor(0.0005)
417 tensor(0.0005)
418 tensor(0.0005)
419 tensor(0.0005)
420 tensor(0.0005)
421 tensor(0.0005)
422 tensor(0.0004)
423 tensor(0.0004)
424 tensor(0.0004)
425 tensor(0.0004)
426 tensor(0.0004)
427 tensor(0.0004)
428 tensor(0.0004)
429 tensor(0.0004)
430 tensor(0.0004)
431 tensor(0.0003)
432 tensor(0.0003)
433 tensor(0.0003)
434 tensor(0.0003)
435 tensor(0.0003)
436 tensor(0.0003)
437 tensor(0.0003)
438 tensor(0.0003)
439 tensor(0.0003)
440 tensor(0.0003)
441 tensor(0.0003)
442 tensor(0.0003)
443 tensor(0.0003)
444 tensor(0.0003)
445 tensor(0.0002)
446 tensor(0.0002)
447 tensor(0.0002)
448 tensor(0.0002)
449 tensor(0.0002)
450 tensor(0.0002)
451 tensor(0.0002)
452 tensor(0.0002)
453 tensor(0.0002)
454 tensor(0.0002)
455 tensor(0.0002)
456 tensor(0.0002)
457 tensor(0.0002)
458 tensor(0.0002)
459 tensor(0.0002)
460 tensor(0.0002)
461 tensor(0.0002)
462 tensor(0.0002)
463 tensor(0

## Optional: Exercice
Add one more hidden layer with 100 neurons. Use successively sigmoid (torch.sigmoid) and relu (torch.relu) activation functions.

Note: you might need to increase the numbers of epochs (this solution is much less effective than the previous one !)
