# Python library

First of all, we create a new folder in our project and name it `mydl` (for "My Deep Learning"). Inside this folder, we create a new file named `__init__.py` (this file can be empty). This file is necessary to tell Python that the folder is a package and can be imported. 

### The `Sequential` class

Inside the `mydl` folder, we create a new file named `architecture.py`. This file will contain the architecture of our neural network. We will start by creating a class named `Sequential` (this is the only one we treat in this course). A `Sequential` object is a list of layers that are executed in sequence. 

At the moment we just define the initializer. It receives a list `layers` of layers and stores it as an attribute of the object. 

In [1]:
class Sequential:
  
  def __init__(self, layers):
    self.layers = layers

Of course this has no meaning if we don't have `Layer` objects. We will create them in the next section. 

### The `Layer` class

We create a new file named `layers.py` inside the `mydl` folder. This file will contain the definition of the `Layer` class and its subclasses.

The `Layer` class is an abstract class that defines the interface for all layers. Now we are interested in initializing the parameters of the layer. 

In [2]:
class Layer:
  
  def __init__(self):
    self.parameters = {}

We create a `Linear` class that inherits from `Layer`. We recall that a linear layer is structured in the following way:
- it receives a tensor $x \in \mathbb{R}^{N \times M_{in}}$ as input. Here, $N$ is the number of samples and $M_{in}$ is the number of input features.
- it is defined by a weight matrix $W \in \mathbb{R}^{M_{in} \times M_{out}}$ and a bias vector $b \in R^{1 \times M_{out}}$. Here, $M_{out}$ is the number of output features.
- the output of the layer is given by $y = xW + b \in \mathbb{R}^{N \times M_{out}}$.
Hence, to initialize a `Linear` layer, we need to specify the number of input features (`fan_in`, using the jargon of logic gates) and the number of output features (`fan_out`). Given these two numbers, we can initialize the weight matrix $W$ with random values and the bias vector $b$ with zeros.

In [3]:
import torch

In [4]:
class Linear(Layer):
  def __init__(self, fan_in, fan_out):
    super().__init__()
    self.parameters['W'] = torch.randn((fan_in,fan_out), dtype=torch.float32, requires_grad=False) 
    self.parameters['b'] = torch.zeros((1,fan_out), dtype=torch.float32, requires_grad=False)

*Comment*: the `torch` library allows to compute the gradients of the loss function with respect to the parameters of the model. This is done by the `autograd` module. The `requires_grad` attribute of a tensor tells PyTorch to compute the gradients of the tensor with respect to the loss function. By default, this attribute is set to `True`. Here, we set it to `False`, because we want to write from scratch the backpropagation algorithm.

We can check if the code is working. [See the code in this other notebook.](08-testing_the_library.ipynb#importing_the_library)

### The `forward` method

The `forward` method is used to do a forward pass in a single layer of the network and also in the whole network. Let's implement it in the `Layer` class.

In [5]:
class Layer:
  
  def __init__(self):
    self.parameters = {}
    
  def forward(self, x):
    raise NotImplementedError # Raising an error if the forward method is not implemented in the subclass

Let's implement the `forward` method in the `Linear` class. 

In [6]:
class Linear(Layer):
  
  def __init__(self, fan_in, fan_out):
    super().__init__()
    self.parameters['W'] = torch.randn((fan_in,fan_out), dtype=torch.float32, requires_grad=False) 
    self.parameters['b'] = torch.zeros((1,fan_out), dtype=torch.float32, requires_grad=False)
    
  def forward(self, x):
    return x @ self.parameters['W'] + self.parameters['b']

We can also do this for other layers. Let us define the `Sigmoid` activation layer and its forward method.

In [7]:
class Sigmoid(Layer):
  
  def __init__(self):
    super().__init__() 
    
  def forward(self, x):
    return 1/(1+torch.exp(-x))

We are ready to implement the `forward` method in the `Sequential` class. This method will iterate over the layers of the network and apply the `forward` method of each layer. It will return the output of the network.

In [8]:
class Sequential:
  
  def __init__(self, layers):
    self.layers = layers
    
  def forward(self, x):
    for layer in self.layers:
      x = layer.forward(x)
    return x

We can check if the code is working. [See the code in this other notebook.](08-testing_the_library.ipynb#testing_the_forward_method)

### The `Loss` class 

We create a new file named `losses.py` inside the `mydl` folder. This file will contain the definition of the `Loss` class and its subclasses.

In [9]:
class Loss:
  
  def __init__(self):
    pass # No need to initialize anything
  
  def __call__(self, *args, **kwds):
    raise NotImplementedError

We start with the `MSE` loss, i.e., Mean Squared Error. It is the error used, for example, in linear regression.

In [10]:
class MSE(Loss):
  def __init__(self):
    super().__init__()
  
  def __call__(self, y_pred, y_true):
    return torch.mean((y_pred - y_true)**2)

We can check if the code is working. [See the code in this other notebook.](08-testing_the_library.ipynb#loss)

### The backpropagation algorithm

See the [notes](../notes/08%20-%20Backpropagation.pdf) for the explanation of the backpropagation algorithm.

We start by implementing the backward pass in the `MSE` loss. See the [notes](../notes/09%20-%20Grads%20-%20MSE.pdf) for the computation of the gradient of the mean squared error with respect to the output of the network.

In [11]:
class MSE(Loss):
  
  def __init__(self):
    super().__init__()
  
  def __call__(self, y_pred, y_true):
    return torch.mean((y_pred - y_true)**2)
  
  def backward(self, y_pred, y_true):
    n_samples = y_pred.shape[0]
    return 2*(y_pred - y_true).t()/n_samples # this is dL_dy_pred

We can check if the code is working. [See the code in this other notebook.](08-testing_the_library.ipynb#backward_pass_loss)

For layers, we need to implement the `backward` method. 

In [12]:
class Layer:

  def __init__(self):
    self.parameters = {}
    self.gradL_d = {} 
  
  def forward(self, x):
    """
    Forward pass through the layer.
    """
    raise NotImplementedError  

  def backward(self, dL_dy):
    raise NotImplementedError  

We want to implement the backward pass in the `Linear` layer. See the [notes](../notes/10%20-%20Grads%20|%20Linear%20layer.pdf) for the computation of the gradients of the loss with respect to the weights, the bias, and the inputs of the linear layer.

In [13]:
class Linear(Layer):

  def __init__(self, fan_in, fan_out):
    super().__init__()
    self.parameters['W'] = torch.randn((fan_in,fan_out), dtype=torch.float32, requires_grad=False) 
    self.parameters['b'] = torch.zeros((1,fan_out), dtype=torch.float32, requires_grad=False)
  
  def forward(self, x):
    self.x = x # Storing the input tensor for the backward pass
    return x @ self.parameters['W'] + self.parameters['b']
  
  def backward(self, dL_dy):
    self.gradL_d['W'] = (dL_dy @ self.x).t()
    self.gradL_d['b'] = (dL_dy @ torch.ones(dL_dy.shape[1],1)).t()
    return self.parameters['W'] @ dL_dy # this is dL_dx

We need to implement the `backward` method in all layers. We have written another layer, the `Sigmoid` layer. See the [notes](../notes/11%20-%20Grads%20-%20Nonlinear%20activations.pdf) for the computation of the gradients of the loss with respect to the inputs of a nonlinear activation layer.

In [14]:
class Sigmoid(Layer):

  def __init__(self):
    super().__init__()
    
  def forward(self, x):
    self.y = 1/(1+torch.exp(-x))
    return self.y
  
  def backward(self, dL_dy):
    return dL_dy * (self.y * (1-self.y)).t()

### Backward pass in the `Sequential` class

Once we have a `backward` method in all layers, we can implement backpropagation in the `Sequential` class. The method `backward` will store the gradients of the loss with respect to the parameters of the network.

In [15]:
class Sequential:
  
  def __init__(self, layers):
    self.layers = layers
    
  def forward(self, x):
    for layer in self.layers:
      x = layer.forward(x)
    return x
  
  def backward(self, y_true, loss):
    y_pred = self.layers[-1].y # the attribute y (output) of the last layer is the final prediction
    dL_dy = loss.backward(y_pred, y_true) # computing the differential of the loss with respect to the prediction
    for layer in reversed(self.layers):
      dL_dy = layer.backward(dL_dy) # storing all gradients and backpropagating the differential of the loss

Let us check that the backward pass in the `Sequential` class is working. [See the code in this other notebook.](08-testing_the_library.ipynb#backward_pass_sequential)

### The `Optimizer` class

The last ingredient we need to train a neural network is a numerical optimization algorithm. For this reason we define the `Optimizer` class. We create a new file named `optimizers.py` inside the `mydl` folder. This file will contain the definition of the `Optimizer` class and its subclasses (at the moment, only `GD`).

We start by creating a base class `Optimizer`. 

In [16]:
class Optimizer: 
  
  def __init__(self):
    pass
  
  def update(self): # this method will be implemented in the subclasses and has the role of updating the parameters of the model (one iteration of the optimization algorithm)
    raise NotImplementedError

At the moment, we studied only the gradient descent algorithm. The `GD` class is initialized with the learning rate `learning_rate`. The `update` method receives the network as input and updates the parameters of the network using the gradient descent update rule.

In [17]:
class GD(Optimizer):
  
  def __init__(self, learning_rate):
    self.learning_rate = learning_rate
    
  def update(self, network):
    for layer in network.layers:
      for key in layer.parameters.keys():
        layer.parameters[key] -= self.learning_rate*layer.gradL_d[key]

Let us check if the update method of the `GD` class is working. [See the code in this other notebook.](08-testing_the_library.ipynb#optimizer)

### The `train` method in the `Sequential` class

We have all the ingredients to train a neural network. We can implement the `train` method in the `Sequential` class.

The method accepts the input data `x_train` and the target data `y_train`. It also accepts the `loss` object used as the loss function and the `optimizer` object used to update the parameters of the network for a given number of epochs `n_epochs`.

It returns the loss of the network at each epoch, so that we can plot it and check if the network is learning.

In [18]:
class Sequential:
  
  def __init__(self, layers):
    self.layers = layers
    
  def forward(self, x):
    for layer in self.layers:
      x = layer.forward(x)
    return x
  
  def backward(self, y_true, loss):
    y_pred = self.layers[-1].y
    dL_dy = loss.backward(y_pred, y_true)
    for layer in reversed(self.layers):
      dL_dy = layer.backward(dL_dy)
      
  def train(self, x_train, y_train, loss, optimizer, n_epochs):
    print('Training the network...')
    losses_train = []
    for epoch in range(n_epochs):
      y_pred = self.forward(x_train)
      current_loss = loss(y_pred, y_train)
      losses_train.append(current_loss)
      self.backward(y_train, loss)
      optimizer.update(self)
    print('Training complete.')
    return losses_train

Let us check if the `train` method is working. [See the code in this other notebook.](08-testing_the_library.ipynb#training)