In [1]:
import torch
import gzip
import pickle
import requests
import os
import numpy as np
import torch
import torch.nn as nn
import time
import matplotlib.pyplot as plt
from PIL import Image
import imageio.v2 as imageio
from IPython.display import Markdown, display, Video
from io import BytesIO

assesment_draw_and_fill = False

# Delving deeper into pytorch

The goal of this assesment is to delve deeper in the fundamentals of pytorch. Due to my background we will be using regresion models of the form $f : \mathbb{R} \rightarrow \mathbb{R}$ and start working with images. 

As we will see, many of the things we have been implementing so far, are wrapped in several ways in the pytorch library, for our easy usage. Pytorch is an excellent library with many ideas copied by Tensorflow or Jax. Perhaps one of their differential ideas is the `torch.nn module` (copied by Tensorflow and implemented in Flax, which is a Jax wrapper), and their data managment tools.

Pytorch has also copied many strenghts from Jax like forward automatic differentiation and being able of obtaining gradient computational graphs as functions. While Jax has many strenghts such as jit compilation (also copied by pytorch) or XLA compilation (which I think is also now in pytorch as well).

But we should be clear that one of the fathers of the automatic differentiation engine in pytorch, which comes from the Hips lab, was hired by google to create Jax.

## Datahandling

The nice thing about pytorch is that implements different modules to allow us nice transformations over different types of data. Note that this can be used within any machine model (including sklearn) because it is just a data processing pipeline.

In its beginning we had the torchvision module, but nowdays we also have torchaudio and torchtext.

Data handling in pytorch has two main components: a dataset and a dataloader.

A dataset is in charged of all the processes involved in a data handling machine learning pipeline. These include, for example:

* Loading the data from the disk into the RAM memory. If for example the data fits in memory you might decide to load all the data. If however the data is huge you can decide having pointers to memory, and load the data just on demand. You could also want to keep the paths to the different data, or just have a chunk of your data into memory, and later on load new data from disk.
* Applying transformations to the data: normalization, feature selection, data augmentation, etc. Many of these transformation are already implemented in pytorch, with depending on the module, will be more adequate for audio, video, images or text. Obviously, pytorch allows you to include your own transformations as function pointers.

A dataloader is in charged of "asking" the dataset for new data and give this data to use it so that you can use it in your machine learning pipeline. Cool things about dataloaders include:

* Multithreading dataloading: when data is expensive to preprocess, you can configure the dataloader to use parallel computing to perform these taks in parallel.
* You can also tell the dataloader how many data you want to receive from memory (for minibatch gradient descent) and so on.
* Since the dataloader "asks" the dataset, this obviously implies that the dataloader receives, as argument, the dataset you want it to operate on.

### Datasets

Let's work with a new dataset which is call de Mnist. Mnist consists of 60 thousand 28x28 pixel grayscale images. Each of these images correspond to a number from 0 to 9. 

One of the coolest things of torchvision is that it has many famous datasets included, and mnits is one of those. You can check, for example: `https://pytorch.org/vision/stable/datasets.html`.

However, since our goal is to understand how to construct our own dataset, let's do it. As you can see in the documentation: `https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset`, a dataset is a python class where we should overwrite two methods: `__getitem__()` and `__len__()`. The earlier is the way a dataloader will tell you which sample you need to provide back, while the later is the one telling the dataloader how many samples your dataset contains. 

One cool concept introduced in torchvision is the concept of transformations. Transformations are function pointers or classes implementing a `__call__` method, that can be used by our dataset to modify the image on the fly before giving it back through the get_item method. Note that in Machine Learning we might sometimes want to perform data preprocessing just once, but it would also be nice to perform data preprocessing on the fly, for some particular tasks.

When to use a function pointer and when to use a class? Well simple, when you want the transformation to be configurable.

Let's create our custom mnist classification dataset. To do so, we need to download the data to a specific directory, and then since this data fits in memory well, our dataset can directly keep the data in RAM memory.

Download your data from ```http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz``` and load it in memory. You will load a train, validation and test split. Join the three splits into a single split for the images and the labels. So in the end you should finish with two variables X, and T keeping images and labels.

In [2]:
if assesment_draw_and_fill:

    display(Markdown("""
```python
dataset_dir = '/tmp/mnist.pkl.gz'

try:
    with gzip.open(dataset_dir, 'rb') as f:
        try:
            train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
        except:
            train_set, valid_set, test_set = pickle.load(f)
except:
    raise ValueError("Download data from http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz and place it in a directory")
    
## Read and concatenate all data into X and Y 
X_tr, Y_tr = train_set
X_va, Y_va = valid_set
X_te, Y_te = test_set

X = ... ## check np.vstack and np.hstack
T = ...
```
"""
))

In [3]:
dataset_dir = '/tmp/mnist.pkl.gz'

try:
    with gzip.open(dataset_dir, 'rb') as f:
        try:
            train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
        except:
            train_set, valid_set, test_set = pickle.load(f)
except:
    raise ValueError("Download data from http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz and place it in a directory")
    
## Read and concatenate all data into X and Y 
X_tr, Y_tr = train_set
X_va, Y_va = valid_set
X_te, Y_te = test_set

X = np.vstack((X_tr,X_te,X_va))
T = np.hstack((Y_tr,Y_te,Y_va))

Now let's create the code for our dataset. Think in a way to apply transformations in the `__getitem__` method before returning the image. 

```python
class MNISTDataset(torch.utils.data.Dataset):
    def __init__(self, X, Y, transforms = None):
        ## call parent method
        super().__init__()
        self.X = X
        self.Y = Y
        ...
        
    def __len__(self):
        return len(self.Y)
    
    def __getitem__(self, idx):
        X = self.X[idx]
        
        ...
                
        return X, self.Y[idx]
```

In [4]:
class MNISTDataset(torch.utils.data.Dataset):
    def __init__(self, X,T, transforms = None):
        ## call parent method
        super().__init__()
        self.X = X
        self.T = T
        self.transforms = transforms
        
    def __len__(self):
        return len(self.T)
    
    def __getitem__(self, idx):
        X = self.X[idx]
        if self.transforms is not None:
            for t in self.transforms:
                X = t(X)
                
        return X, self.T[idx]

Once done, we can create our variable keeping our dataset.

Since this mnist version of the dataset is given prepared for a fully connected neural network, let's use a reshape transformation that turns our dataset into an image that we can visualize. Since mnist is 28 by 28 pixels we need to reshape into this shape. So when instancing our class passed into transforms the function pointer to the reshape function

```python
def reshape(X):
    return np.reshape(...)

mnist_dataset = MNISTDataset(X = ..., T = ..., transforms = ...)
```


In [5]:
def reshape(X):
    return np.reshape(X,(28,28))

mnist_dataset = MNISTDataset(X = X, T = T, transforms = [reshape])

### Dataloader

To grab images from our dataset, we can use a dataloader, which has many usefull functions as I mentioned before. The coolest thing is that this dataloader provide us with an interable so that we can iterate over it to retrieve the full dataset. Once the full dataset has been retrieved, we can iterate again.

Let's use a batch_size of one, so that on each iteration we receive one image, and let's see the images

```python
mnist_loader = torch.utils.data.DataLoader(
                                                dataset = mnist_dataset,
                                                batch_size = 1, 
                                                shuffle=True,
                                                num_workers=1,
                                            )
```

In [6]:
if assesment_draw_and_fill:
    display(Markdown("""
```python
mnist_loader = torch.utils.data.DataLoader(
                                                dataset = mnist_dataset,
                                                batch_size = 1, 
                                                shuffle=True,
                                                num_workers=1,
                                            )
fig, ax = plt.subplots(1,1)
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

counter = 0
for ... in ...:
    ax.clear()
    ax.imshow(x[0], cmap = 'gray')
    ax.set_title(f'Class {y.item()}')
    
    ## add frame for video creation
    buf = BytesIO()
    fig.savefig(buf, format="png", dpi=100)
    
    buf.seek(0)
    frame = imageio.imread(buf) 
    writer.append_data(frame)  

    if counter == 10:
        break
        
    counter += 1

writer.close() 
plt.close()

display(Video(data=video_filename, embed=True))
os.remove(video_filename)
```
"""
))

In [7]:
mnist_loader = torch.utils.data.DataLoader(
                                                dataset = mnist_dataset,
                                                batch_size = 1, 
                                                shuffle=True,
                                                num_workers=1,
                                            )
fig, ax = plt.subplots(1,1)
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

counter = 0
for x,y in mnist_loader:
    ax.clear()
    ax.imshow(x[0], cmap = 'gray')
    ax.set_title(f'Class {y.item()}')
    
    ## add frame for video creation
    buf = BytesIO()
    fig.savefig(buf, format="png", dpi=100)
    
    buf.seek(0)
    frame = imageio.imread(buf) 
    writer.append_data(frame)  

    if counter == 10:
        break
        
    counter += 1

writer.close() 
plt.close()

In [8]:
display(Video(data=video_filename, embed=True))
os.remove(video_filename)

**Important:** In the past, we needed to apply a `transform.toTensor()` transformation in the dataset, to convert internal dataset type into torch tensors. However, it looks like this is now implicitly done by the data loader.

In [9]:
print(type(x[0]))
print(type(mnist_dataset[0][0]))

<class 'torch.Tensor'>
<class 'numpy.ndarray'>


### Task 1: Implementing two custom transformations

Our goal will be to create different versions of the mnist dataset that we will use to train convolutional models and to learn models using self-supervision, which is the fundamental machine learning technique used by language models nowdays (such as your favourite tool chatGPT).

We will create a transformation to randomly rotate images a given angle, and a transformation to randomly mask the image using a black square.

First of all, you must think if you want this transformation configurable or not. If you want the configurable (for example the size of the mask can vary or the maximum angle of rotation) you will need to create a python instance class.

I'll make both of them configurable.

In [10]:
if assesment_draw_and_fill:
    display(Markdown("""
```python
def reshape(X):
    return np.reshape(X,(28,28))

class RandomBlankSquare:
    def __init__(self,min_h,max_h, min_v,max_v):
        self.min_h = ...
        self.max_h = ...
        self.min_v = ...
        self.max_v = ...
        
    def __call__(self, x):
        min_h_idx = np.random.randint(..., size=(1,), dtype = int).item()
        max_h_idx = np.random.randint(..., size=(1,), dtype = int).item()

        min_v_idx = np.random.randint(..., size=(1,), dtype = int).item()
        max_v_idx = np.random.randint(..., size=(1,), dtype = int).item()

        x...

        return x

class RandomRotation:
    def __init__(self,max_degree):
        self.max_deg = max_degree
        
    def __call__(self, x):
        rot_deg = ...*2*self.max_deg - self.max_deg 
        
        x = Image.fromarray(x)
        x = x.rotate(...)
        x = np.array(x)
        return x
```
"""
))

In [11]:
def reshape(X):
    return np.reshape(X,(28,28))

class RandomBlankSquare:
    def __init__(self,min_h,max_h, min_v,max_v):
        self.min_h = min_h
        self.max_h = max_h
        self.min_v = min_v
        self.max_v = max_v
        
    def __call__(self, x):
        min_h_idx = np.random.randint(0, self.min_h, size=(1,), dtype = int).item()
        max_h_idx = np.random.randint(self.min_h, self.max_h, size=(1,), dtype = int).item()

        min_v_idx = np.random.randint(0, self.min_v, size=(1,), dtype = int).item()
        max_v_idx = np.random.randint(self.min_v, self.max_v, size=(1,), dtype = int).item()

        x[min_v_idx:max_v_idx,min_h_idx:max_h_idx] = 0.0

        return x

class RandomRotation:
    def __init__(self,max_degree):
        self.max_deg = max_degree
        
    def __call__(self, x):
        rot_deg = np.random.random(size=(1,)).item()*2*self.max_deg - self.max_deg 
        
        x = Image.fromarray(x)
        x = x.rotate(rot_deg)
        x = np.array(x)
        return x
        

Now experiment in using these transformations. What happens if you do not reshape first?

```python
mnist_dataset = MNISTDataset(
                                X = X, 
                                T = T, 
                                transforms = [
                                            reshape, 
                                            RandomBlankSquare(min_h = 10, max_h = 20,min_v = 10, max_v = 20),
                                            RandomRotation(70),
                                        ]
                         )


mnist_loader = torch.utils.data.DataLoader(
                                                dataset = mnist_dataset,
                                                batch_size = 1, 
                                                shuffle=True,
                                                num_workers=1,
                                            )
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")
fig, ax = plt.subplots(1,1)

counter = 0
for x,y in mnist_loader:
    
    ax.imshow(x[0], cmap = 'gray')
    ax.set_title(f'Class {y.item()}')
    
    ## add frame for video creation
    buf = BytesIO()
    fig.savefig(buf, format="png", dpi=100)
    
    buf.seek(0)
    frame = imageio.imread(buf) 
    writer.append_data(frame)  
    
    if counter == 10:
        break

    counter += 1

writer.close()
plt.close()
```

In [12]:
mnist_dataset = MNISTDataset(
                                X = X, 
                                T = T, 
                                transforms = [
                                            reshape, 
                                            RandomBlankSquare(min_h = 10, max_h = 20,min_v = 10, max_v = 20),
                                            RandomRotation(70),
                                        ]
                         )


mnist_loader = torch.utils.data.DataLoader(
                                                dataset = mnist_dataset,
                                                batch_size = 1, 
                                                shuffle=True,
                                                num_workers=1,
                                            )
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")
fig, ax = plt.subplots(1,1)

counter = 0
for x,y in mnist_loader:
    
    ax.imshow(x[0], cmap = 'gray')
    ax.set_title(f'Class {y.item()}')
    
    ## add frame for video creation
    buf = BytesIO()
    fig.savefig(buf, format="png", dpi=100)
    
    buf.seek(0)
    frame = imageio.imread(buf) 
    writer.append_data(frame)  
    
    if counter == 10:
        break

    counter += 1

writer.close()
plt.close()

In [13]:
display(Video(data=video_filename, embed=True))
os.remove(video_filename)

### Task 2: Creating a pytorch dataset with our regresions problems.

Remember from our previous task we had the following regresion data.

* task $f: \mathbb{R} \rightarrow \mathbb{R}$
$$
(x_1,t_1) = (0,0.2)\\
(x_2,t_2) = (1,0.5)\\
(x_3,t_3) = (2,2.8)
$$

* task $f: \mathbb{R} \rightarrow [0,1]$
$$
\begin{split}
(x_1,t_1) &= (-0.13459237,0)\\
(x_2,t_2) &= (-3.3015387,0)\\
(x_3,t_3) &= (0.74481176,0)\\
(x_4,t_4) &= (2.62434536,1)\\
(x_5,t_5) &= (0.38824359,1)\\
(x_6,t_6) &= (0.47182825,1)\\
(x_7,t_7) &= (-0.07296862,1)\\
\end{split}
$$


* task $f: \mathbb{R}^2 \rightarrow [0,1]$
$$
\begin{split}
(x^1_{1},x^1_{2},t^1) &= (0,1,0)\\
(x^2_{1},x^2_{2},t^2) &= (1.5,2.0,0)\\
(x^3_{1},x^3_{2},t^3) &= (2,1,0)\\
(x^4_{1},x^4_{2},t^4) &= (5,3,0)\\
(x^5_{1},x^5_{2},t^5) &= (3,4,1)\\
(x^6_{1},x^6_{2},t^6) &= (4,5,1)\\
(x^7_{1},x^7_{2},t^7) &= (5,1,1)\\
\end{split}
$$

Create the three datasets containing one of these datasets and wrap them with their corresponding loaders.

In [14]:

if assesment_draw_and_fill:
    display(Markdown("""
```python
class RtoR(torch.utils.data.Dataset):
    def __init__(self):
        ## call parent method
        super().__init__()
        self.X = np.array(..., dtype = np.float32).reshape(...)
        self.T = np.array(..., dtype = np.float32).reshape(...)
        
    def __len__(self):
        return ...
    
    def __getitem__(self, idx):
        return ...

class Rto01(torch.utils.data.Dataset):
    def __init__(self):
        ## call parent method
        super().__init__()
        self.X = np.array(..., dtype = np.float32).reshape(...)
        self.T = np.array(..., dtype = np.float32).reshape(...)
    
    def __len__(self):
        return ...
    
    def __getitem__(self, idx):
        return ...

class R2to01NL(torch.utils.data.Dataset):
    def __init__(self):
        ## call parent method
        super().__init__()

        # input to our model. Represents time in seconds
        self.X = np.array(..., dtype = np.float32).reshape(...)

        # outputs associated to each input. 
        self.T =  np.array(..., dtype = np.float32).reshape(...)

    def __len__(self):
        return ...
    
    def __getitem__(self, idx):
        return ...
        
rtor_loader = torch.utils.data.DataLoader(
                                            dataset = RtoR(),
                                            batch_size = len(RtoR()), 
                                            shuffle=True,
                                            num_workers=1,
                                        )

rto01_loader = torch.utils.data.DataLoader(
                                            dataset = Rto01(),
                                            batch_size = len(Rto01()), 
                                            shuffle=True,
                                            num_workers=1,
                                         )

r2to01nl_loader = torch.utils.data.DataLoader(
                                            dataset = R2to01NL(),
                                            batch_size = len(R2to01NL()), 
                                            shuffle=True,
                                            num_workers=1,
                                         )
```
"""
))

In [15]:
class RtoR(torch.utils.data.Dataset):
    def __init__(self):
        ## call parent method
        super().__init__()
        self.X = np.array([0,1,2], dtype = np.float32).reshape(3,1)
        self.T = np.array([0.2,0.5,2.8], dtype = np.float32).reshape(3,1)
        
    def __len__(self):
        return len(self.T)
    
    def __getitem__(self, idx):
        return self.X[idx], self.T[idx]

class Rto01(torch.utils.data.Dataset):
    def __init__(self):
        ## call parent method
        super().__init__()
        self.X = np.array([-0.13459237,-3.3015387,0.74481176,2.62434536,0.38824359,0.47182825,-0.07296862], dtype = np.float32).reshape(7,1)
        self.T = np.array([0,0,0,1,1,1,1], dtype = np.float32).reshape(7,1)
    
    def __len__(self):
        return len(self.T)
    
    def __getitem__(self, idx):
        return self.X[idx], self.T[idx]

class R2to01NL(torch.utils.data.Dataset):
    def __init__(self):
        ## call parent method
        super().__init__()

        # input to our model. Represents time in seconds
        self.X = np.array([[0,1],
                           [1.5,2.0],
                           [2,1],
                           [5,3],
                           [3,4],
                           [4,5],
                           [5,1]], dtype = np.float32).reshape(7,2)

        # outputs associated to each input. 
        self.T =  np.array([0,0,0,0,1,1,1], dtype = np.float32).reshape(7,1)

    def __len__(self):
        return len(self.T)
    
    def __getitem__(self, idx):
        return self.X[idx], self.T[idx]

In [16]:
rtor_loader = torch.utils.data.DataLoader(
                                            dataset = RtoR(),
                                            batch_size = len(RtoR()), 
                                            shuffle=True,
                                            num_workers=1,
                                        )

rto01_loader = torch.utils.data.DataLoader(
                                            dataset = Rto01(),
                                            batch_size = len(Rto01()), 
                                            shuffle=True,
                                            num_workers=1,
                                         )

r2to01nl_loader = torch.utils.data.DataLoader(
                                            dataset = R2to01NL(),
                                            batch_size = len(R2to01NL()), 
                                            shuffle=True,
                                            num_workers=1,
                                         )

## Torch nn module

The torch nn module is perhaps one of the greatest modules I have used as a machine learning researcher. Why? Basically because you have 100% flexibility in what you want to do and how to do it, and the torch.nn module handles the boring parts, like tracking which parameters do you want to optimize, moving your model parameters to the GPU or switching the behaviour of different classes.

So the goal of the torch nn module is making the creation of your machine learning models easier. Again, this is Neural Network's agnostic; you can use the torch nn module to implement any machine learning model you want.

Like with dataloaders, the torch nn module has a base class, call module, that any model must inherit. In this case we are required to overwrite the `forward` method, and as you'll see while not mandatory the `__init__` one.

We will start seeing the different beautiful features about this module during our lectures.

So for analogy with what you have done with sklearn, let's create a linear model in pytorch, that can be adapted two different classes.

In this case, the basic structure will be something like:

```python
import torch.nn as nn
import torch.nn.functional as F

class LinearModel(nn.Module):
    def __init__(self):
        super().__init__()
        pass
    def forward(self,x):
        pass
```

#### `__init__` method

The goal of these method is to initialize the computational graph that will be implementing your desired machine learning model. As you know, machine learning is about defining a set of computer operations that implement a function, and adjusting these operations for our task at hand. This adjustment is done via its parameters.

#### `forward` method

The forward method is in charged of using the parameters to implement the set of operations that implements your desired function / computational graph.


As we know, a linear model can be easily framed via the decomposition of the function being modelled in the functional part, and the link function. Moreover, linear models are always mappings from $f: \mathcal{X} \rightarrow \mathcal{Y}$, where, usually, $ \mathcal{X} \subset \mathbb{R}^n, \mathcal{Y} \subset \mathbb{R}^m$. With this, we can generalize any type of machine learning modeling for any kind of input and output, where all models share in common that the set where the function lives are linear models.

**disclaimer:** I will use an academic implementation of what I am describing here with words. A more correct implementation would use a base class instance for the funcitonal part, and another class instance for the task specific (link function, loss function etc) part, where inheritance would be use.

Thus, for any kind of input and output (at least those living in a subset of the real space), we know that a linear model can be express, with no loss of generalization, via the following mappings:

$$
\begin{split}
z &= (x^T \cdot W)^T + b\\
y &= \Phi(z)
\end{split}
$$

where $W \in \mathbb{R}^{n\times m}, b \in \mathbb{R}^{m}$, are the set of learnable parameters, i.e. parameters that require gradient.

Let's create our LinearModel. I will add an additional method for loss computation, since loss computation is model specific. Other people prefer having this loss computation outside.

To train our model, we now need to perform our 5 operations:

* Initialization
* Forward
* Backward
* Update
* Zero grad

However, note that to optimize our parameters it would be beneficial to add these parameters to a list that we can track. Just think in a model with million of parameters, whose parameterization is configurable; which would always require programming structures such as for loops, and conditional parameter creations depending on user requirements. To do so, let's add a method that adds all our parameters to this list, so that we can trace them. This method will be called ```def trace_parameter(self)```. Let's also create another method call ```def get_parameters(self)``` that returns this parameter list.



In [17]:
if assesment_draw_and_fill:
    display(Markdown("""
```python
class LinearModel(nn.Module):
    def __init__(self, dim_in, dim_out, link_function, loss_function):
        super().__init__()
        
        ## create parameters
        self.W = torch.tensor(..., requires_grad = ..., dtype = torch.float32)
        self.b = torch.zeros(..., requires_grad = ..., dtype = torch.float32)
        
        ## trace parameters
        self.parameter_list = []
        self._trace_parameter(...)
        self._trace_parameter(...)
        
        ## link and loss funciton
        self.link = ...
        self.loss = ...
        
    def forward(self,x, apply_link):
        y = x ... self.W ... self.b
    
        if apply_link:
            y = ...
        
        return y
    
    def compute_loss(self,t,y):
        return ...
    
    def _trace_parameter(self,p):
        self.parameter_list...
    
    def get_parameters(self):
        return ...
```
"""
                    ))

In [18]:
class LinearModel(nn.Module):
    def __init__(self, dim_in, dim_out, link_function, loss_function):
        super().__init__()
        
        ## create parameters
        self.W = torch.tensor(np.random.randn(dim_in,dim_out), requires_grad = True, dtype = torch.float32)
        self.b = torch.zeros(dim_out, requires_grad = True, dtype = torch.float32)
        
        ## trace parameters
        self.parameter_list = []
        self._trace_parameter(self.W)
        self._trace_parameter(self.b)
        
        ## link and loss funciton
        self.link = link_function
        self.loss = loss_function
        
    def forward(self,x, apply_link):
        y = torch.matmul(x, self.W) + self.b
    
        if apply_link:
            y = self.link(y)
        
        return y
    
    def compute_loss(self,t,y):
        return self.loss(y,t)
    
    def _trace_parameter(self,p):
        self.parameter_list.append(p)
    
    def get_parameters(self):
        return self.parameter_list

Finally, let's create a model instance:

In [19]:
if assesment_draw_and_fill:
    display(Markdown("""
```python
model = LinearModel(dim_in = ..., dim_out = ..., link_function = ..., loss_function = ...)
```
"""))

In [20]:
model = LinearModel(dim_in = 1, dim_out = 1, link_function = torch.sigmoid, loss_function = nn.MSELoss())

Let's now train our model

In [21]:
if assesment_draw_and_fill:
    display(Markdown("""
```python
## ================= ##
## Training pipeline ##
## ================= ##
## to display learning functions
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

fig, (ax1, ax2) = plt.subplots(1,2)
N_points = ...
x_range = torch.linspace(...,...,N_points).reshape(N_points,1)
loss_acc_list = []

## create your data loader
ror_loader = torch.utils.data.DataLoader(
                                            dataset = RtoR(),
                                            batch_size = len(RtoR()), 
                                            shuffle=True,
                                            num_workers=1,
                                        )

## Model Initialization
def linear(x):
    '''Linear link function.'''
    return ...
model = LinearModel(dim_in = ..., dim_out = ..., link_function = ..., loss_function = ... )

## optimization hyperparaemteres
epochs = ...
lr = ...
parameters = ...

## training loop
for e in range(epochs):
    loss_acc = 0.0
    for ... in ror_loader:
        
        ## forward
        y = ...
        L = ...
        loss_acc += L.item()
        
        ## backward
        L...
        
        ## update
        for p in parameters:
            p.data = p.data ...
            
        ## zero grad
        for p in ...:
            p...
       
    print(f"On epoch {...} got loss {...}", end = "\r")
    
    ## plot regressed line and loss
    with torch.no_grad():
        ## add loss to loss list
        loss_acc_list.append(loss_acc)
        
        ax1.cla()
        ax2.cla()
        
        ax1.plot(x,t,'+')
        ax1.plot(x_range, model(x_range, apply_link = False))
        ax1.set_ylim([0,3])
        
        ax2.plot(np.arange(e+1), loss_acc_list)
        ax2.set_xlim([0,epochs])
    
        ## add frame for video creation
        buf = BytesIO()
        fig.savefig(buf, format="png", dpi=100)
        
        buf.seek(0)
        frame = imageio.imread(buf) 
        writer.append_data(frame) 

writer.close()
plt.close()

display(Video(data=video_filename, embed=True))
os.remove(video_filename)
```
"""
                    ))

In [22]:
## ================= ##
## Training pipeline ##
## ================= ##
## to display learning functions
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

fig, (ax1, ax2) = plt.subplots(1,2)

N_points = 100
x_range = torch.linspace(-4,4,N_points).reshape(N_points,1)

loss_acc_list = []

## create your data loader
ror_loader = torch.utils.data.DataLoader(
                                            dataset = RtoR(),
                                            batch_size = len(RtoR()), 
                                            shuffle=True,
                                            num_workers=1,
                                        )

## Model Initialization
def linear(x):
    '''Linear link function.'''
model = LinearModel(dim_in = 1, dim_out = 1, link_function = linear, loss_function = nn.MSELoss() )

## optimization hyperparaemteres
epochs = 100
lr = 0.1
parameters = model.get_parameters()

## training loop
for e in range(epochs):
    loss_acc = 0.0
    for x,t in ror_loader:
        
        ## forward
        y = model(x, apply_link = False)
        L = model.compute_loss(t,y)
        loss_acc += L.item()
        
        ## backward
        L.backward()
        
        ## update
        for p in parameters:
            p.data = p.data -lr*p.grad
            
        ## zero grad
        for p in parameters:
            p.grad.zero_()
       
    print(f"On epoch {e} got loss {loss_acc}", end = "\r")
    
    ## plot regressed line and loss
    with torch.no_grad():
        ## add loss to loss list
        loss_acc_list.append(loss_acc)
        
        ax1.cla()
        ax2.cla()
        
        ax1.plot(x,t,'+')
        ax1.plot(x_range, model(x_range, apply_link = False))
        ax1.set_ylim([0,3])
        
        ax2.plot(np.arange(e+1), loss_acc_list)
        ax2.set_xlim([0,epochs])
    
        ## add frame for video creation
        buf = BytesIO()
        fig.savefig(buf, format="png", dpi=100)
        
        buf.seek(0)
        frame = imageio.imread(buf) 
        writer.append_data(frame) 

writer.close()
plt.close()

On epoch 99 got loss 0.22222222387790688

In [23]:
display(Video(data=video_filename, embed=True))
os.remove(video_filename)

However, this might be a good solution for an initial/toy application. Note that once you create a more complex application, you would probably require additional methods that reset this parameter list or that are able to register parameters coming from a class implementing complex functionalities that your model wants to use. Obviously, the nn module is able to handle the way you do this through different functionalities implemented. In this tutorial, we will cover 9 of them: 
* **`def parameters()`**
* **`def named_parameters()`**
* **`nn.Parameter()`**
* **`def to()`**
* **`def zero_grad()`**
* `def train() def eval()`
* `nn.Sequential`
* `nn.ParameterList`
* `nn.ModuleList`

First, suppose you want to implement a machine learning operation that you want to use within a bigger model (this is very typical, for example, when implementing a residual network or a transformer model). We can create this operation as a `nn.Module`. Let's create a Linear layer.

In [24]:
if assesment_draw_and_fill:
    display(Markdown("""
```python    
class Linear(...):
    def __init__(self, dim_in, dim_out):
        super().__init__()
        ## create parameters
        self.W = nn.Parameter(torch.tensor(np.random.randn(...), dtype = torch.float32))
        self.b = nn.Parameter(torch.zeros(..., dtype = torch.float32))
    def forward(self, x):
        return ...
```
"""))

In [25]:
class Linear(nn.Module):
    def __init__(self, dim_in, dim_out):
        super().__init__()
        ## create parameters
        self.W = nn.Parameter(torch.tensor(np.random.randn(dim_in,dim_out), dtype = torch.float32))
        self.b = nn.Parameter(torch.zeros(dim_out, dtype = torch.float32))
    def forward(self, x):
        return torch.matmul(x,self.W) + self.b
        

Let's now create this linear model using our created Linear module.

In [26]:
if assesment_draw_and_fill:
    display(Markdown("""
```python   
class LinearModel(nn.Module):
    def __init__(self, dim_in, dim_out, link_function, loss_function):
        super().__init__()
        self.layer = Linear(...,...)
        self.link = ...
        self.loss = ...
        
    def forward(self,x, apply_link):
        y = ...
    
        if apply_link:
            y = ...
        
        return ...

    def compute_loss(self,t,y):
        return ...
```
"""
                    ))

In [27]:
class LinearModel(nn.Module):
    def __init__(self, dim_in, dim_out, link_function, loss_function):
        super().__init__()
        self.layer = Linear(dim_in,dim_out)
        self.link = link_function
        self.loss = loss_function
        
    def forward(self,x, apply_link):
        y = self.layer(x)
    
        if apply_link:
            y = self.link(y)
        
        return y

    def compute_loss(self,t,y):
        return self.loss(y,t)

Here is one of my favourites things about this module. Why will we use `nn.Parameter()` in the Linear module rather than a torch tensor that requires grad?. Well, first of all when you use `nn.Parameter` the result is a normal torch tensor, as the ones you have used, with required_gradient being True. The other thing is that internally the nn.Module has machinery to trace this parameter as a learnable parameter. This has two implications:

* This parameter is added to a list of learnable parameters that we can easy grab as an iterator
* Any module that uses this submodule (as our LinearModel is doing) will also have all this parameters register as learnable parameters. So if our big super class is using different submodules (in this case we are only using one) the big submodule will have all the parameters of all the model through a single interface.

Imagine if you had to define this trace_parameter methods that trace any learnable parameter from any sub and super modules, and have logic to correctly unify them so that super modules see it. It is something we can do, obviously, but is something boring and tedious and out of the scope of what we want to do with pytorch, that is function modeling.

How do we access these parameters? Easily, through two methods (I think there are more)

In [28]:
if assesment_draw_and_fill:
    display(Markdown(
"""
```python
linear_layer = Linear(10,10)

for p in linear_layer.parameters():
    print(p.shape, " ", p.requires_grad)

for n,p in linear_layer.named_parameters():
    print(n , " ", p.shape, " ", p.requires_grad)

linear_layer = LinearModel(10,1, link_function = torch.sigmoid, loss_function = nn.MSELoss())

for p in linear_layer.parameters():
    print(p.shape, " ", p.requires_grad)

for n,p in linear_layer.named_parameters():
    print(n , " ", p.shape, " ", p.requires_grad)
```
"""
))

In [29]:
linear_layer = Linear(10,10)

for p in linear_layer.parameters():
    print(p.shape, " ", p.requires_grad)

for n,p in linear_layer.named_parameters():
    print(n , " ", p.shape, " ", p.requires_grad)

linear_layer = LinearModel(10,1, link_function = torch.sigmoid, loss_function = nn.MSELoss())

for p in linear_layer.parameters():
    print(p.shape, " ", p.requires_grad)

for n,p in linear_layer.named_parameters():
    print(n , " ", p.shape, " ", p.requires_grad)


torch.Size([10, 10])   True
torch.Size([10])   True
W   torch.Size([10, 10])   True
b   torch.Size([10])   True
torch.Size([10, 1])   True
torch.Size([1])   True
layer.W   torch.Size([10, 1])   True
layer.b   torch.Size([1])   True


Let's put all this in practice. Now, let's create logistic linear regresion model for the problem $f:\mathbb{R} \rightarrow [0,1]$. Check carefully the usage of `def zero_grad()` and `def to`. The first method reset the gradients of all internal parameters (honestly, this is not a very used feature from the nn.Module and you'll see later why. The second is a simple way to decide which device you want your model to run on. By default it will use your computer CPU, but if you have GPU just execute `model.to('cuda')`, and everything will run in GPU. The way you did this in Theano was a bit more complex (honestly I do not remember how since it is more than 8 years I have not code in that software). In tensorflow it has changed towards the way it is done in pytorch but it used to be through the tf.Session which was a shit. 

Moreover, to train models in multiple GPUs at the same time you just need to wrap your model with a Pytorch method and everything will run on multiple GPUs. Again it is very easy. Couple of years since I train models these ways and is something I need to refresh before including it in this tutorial.

In [30]:
if assesment_draw_and_fill:
    display(Markdown(
"""
```python
## ================= ##
## Training pipeline ##
## ================= ##
## to display learning functions
fig, (ax1, ax2) = plt.subplots(1,2)

video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

x_range = torch.linspace(-4,4,100).reshape(100,1)
loss_acc_list = []

## create your data loader
rto01_loader = torch.utils.data.DataLoader(
                                            dataset = Rto01(),
                                            batch_size = len(Rto01()), 
                                            shuffle=True,
                                            num_workers=1,
                                         )

## Initialization
model = LinearModel(dim_in = ..., dim_out = ..., link_function = ..., loss_function = ... )

# move model to the desired device
device = 'cpu'
model.to(device) 

## optimization hyperparaemteres
epochs = ...
lr = ...
# grab parameters easily. Since it returns an iterator we need to keep it in a list because iterators can only be iterated once
parameters = ...

## training loop
for ... in range(...):
    loss_acc = 0.0
    for ... in rto01_loader:
        ## move to device
        x, t = x.to(device), t.to(device)
        
        ## forward
        y = ...
        L = ...
        loss_acc += L.item()
        
        ## backward
        L...
        
        ## update
        for p in parameters:
            p.data = ...
            
        ## zero grad
        model.zero_grad()
        
    print(f"On epoch {e} got loss {loss_acc}", end = "\r")
    
    ## plot regressed line and loss
    with torch.no_grad():
        ## add loss to loss list
        loss_acc_list.append(loss_acc)
        
        ax1.cla()
        ax2.cla()
        
        ax1.plot(x,t,'+')
        ax1.plot(x_range, model(x_range, apply_link = True))
        ax1.set_ylim([-0.5,1.5])
        
        ax2.plot(np.arange(e+1), loss_acc_list)
        ax2.set_xlim([0,epochs])
    
        ## add frame for video creation
        buf = BytesIO()
        fig.savefig(buf, format="png", dpi=100)
        
        buf.seek(0)
        frame = imageio.imread(buf) 
        writer.append_data(frame) 
  

writer.close()
plt.close()

display(Video(data=video_filename, embed=True))
os.remove(video_filename)
```
"""
))

In [31]:
## ================= ##
## Training pipeline ##
## ================= ##
## to display learning functions
fig, (ax1, ax2) = plt.subplots(1,2)

video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

x_range = torch.linspace(-4,4,100).reshape(100,1)
loss_acc_list = []

## create your data loader
rto01_loader = torch.utils.data.DataLoader(
                                            dataset = Rto01(),
                                            batch_size = len(Rto01()), 
                                            shuffle=True,
                                            num_workers=1,
                                         )

## Initialization
model = LinearModel(dim_in = 1, dim_out = 1, link_function = torch.sigmoid, loss_function = nn.BCEWithLogitsLoss() )

# move model to the desired device
device = 'cpu'
model.to(device) 

## optimization hyperparaemteres
epochs = 100
lr = 0.1
# grab parameters easily. Since it returns an iterator we need to keep it in a list because iterators can only be iterated once
parameters = [p for p in model.parameters()]

## training loop
for e in range(epochs):
    loss_acc = 0.0
    for x,t in rto01_loader:
        ## move to device
        x, t = x.to(device), t.to(device)
        
        ## forward
        y = model(x, apply_link = False)
        L = model.compute_loss(t,y)
        loss_acc += L.item()
        
        ## backward
        L.backward()
        
        ## update
        for p in parameters:
            p.data = p.data -lr*p.grad
            
        ## zero grad
        model.zero_grad()
        
    print(f"On epoch {e} got loss {loss_acc}", end = "\r")
    
    ## plot regressed line and loss
    with torch.no_grad():
        ## add loss to loss list
        loss_acc_list.append(loss_acc)
        
        ax1.cla()
        ax2.cla()
        
        ax1.plot(x,t,'+')
        ax1.plot(x_range, model(x_range, apply_link = True))
        ax1.set_ylim([-0.5,1.5])
        
        ax2.plot(np.arange(e+1), loss_acc_list)
        ax2.set_xlim([0,epochs])
    
        ## add frame for video creation
        buf = BytesIO()
        fig.savefig(buf, format="png", dpi=100)
        
        buf.seek(0)
        frame = imageio.imread(buf) 
        writer.append_data(frame) 

writer.close()
plt.close()

On epoch 99 got loss 0.5143628716468811

In [32]:
display(Video(data=video_filename, embed=True))
os.remove(video_filename)

To finish with this introduction to pytorch we will see the following methods and introduce optimizers:

* `def parameters()`
* `def named_parameters()`
* `nn.Parameter()`
* `def to()`
* `def zero_grad()`
* `def train() def eval()`
* **`nn.Sequential()`**
* **`nn.ParameterList`**
* **`nn.ModuleList`**


To do so, let's work on the regression problem $f:\mathbb{R}^2 \rightarrow [0,1]$, and use a deep fully connected neural network to model this problem. I will implement it in several ways to showcase the differente features.

In [33]:
if assesment_draw_and_fill:
    display(Markdown(
"""
```python
def linear_activation(...):
    return ...

class Layer(nn.Module):
    def __init__(self, dim_in, dim_out, act):
        super().__init__()
        ## create parameters
        self.linear = nn.Linear(..., ...)
        self.act = ...
        
    def forward(self, x):
        return ...

class FCModuleList(nn.Module):
    def __init__(self, dim_in, dim_out, neurons_hidden : list, hidden_activations:list, link_function, loss_function):
        super().__init__()

        assert len(neurons_hidden) == len(hidden_activations), "List specifying hidden activations and number of hidden layers must coincide"

        module_list = nn.ModuleList([])

        # input layer hidden layers
        for num_neur, act in zip(neurons_hidden, hidden_activations):
            module_list.append(Layer(..., ..., ...))
            dim_in = ...

        # output layer
        o_layer = Layer(..., ..., act = ...)
        module_list.append(o_layer)

        self.layers = module_list
       
        ## Loss and link function
        self.link = ...
        self.loss = ...
        
    def forward(self,x, apply_link):
        for l in self.layers:
            x = ...
        y = x
        if apply_link:
            y = self.link(...)
        
        return ...

    def compute_loss(self,t,y):
        return ...


class FCSequential(nn.Module):
    def __init__(self, dim_in, dim_out, neurons_hidden : list, hidden_activations:list, link_function, loss_function):
        super().__init__()

        assert len(neurons_hidden) == len(hidden_activations), "List specifying hidden activations and number of hidden layers must coincide"

        module_list = []

        # input layer hidden layers
        for num_neur, act in zip(neurons_hidden, hidden_activations):
            module_list.append(...)
            dim_in = ...

        # output layer
        o_layer = ...
        module_list.append(o_layer)

        self.layers_forward = nn.Sequential(*module_list)
       
        ## Loss and link function
        self.link = ...
        self.loss = ...
        
    def forward(self,x, apply_link):
        ... = self.layers_forward(x)
    
        if apply_link:
            y = self.link(...)

        return ...

    def compute_loss(self,t,y):
        return ...
```
"""
))

In [34]:
def linear_activation(x):
    return x

class Layer(nn.Module):
    def __init__(self, dim_in, dim_out, act):
        super().__init__()
        ## create parameters
        self.linear = nn.Linear(dim_in, dim_out)
        self.act = act
        
    def forward(self, x):
        return self.act(self.linear(x))

class FCModuleList(nn.Module):
    def __init__(self, dim_in, dim_out, neurons_hidden : list, hidden_activations:list, link_function, loss_function):
        super().__init__()

        assert len(neurons_hidden) == len(hidden_activations), "List specifying hidden activations and number of hidden layers must coincide"

        module_list = nn.ModuleList([])

        # input layer hidden layers
        for num_neur, act in zip(neurons_hidden, hidden_activations):
            module_list.append(Layer(dim_in, num_neur, act))
            dim_in = num_neur

        # output layer
        o_layer = Layer(dim_in, dim_out, act = linear_activation)
        module_list.append(o_layer)

        self.layers = module_list
       
        ## Loss and link function
        self.link = link_function
        self.loss = loss_function
        
    def forward(self,x, apply_link):
        for l in self.layers:
            x = l(x)
        y = x
        if apply_link:
            y = self.link(y)
        
        return y

    def compute_loss(self,t,y):
        return self.loss(y,t)


class FCSequential(nn.Module):
    def __init__(self, dim_in, dim_out, neurons_hidden : list, hidden_activations:list, link_function, loss_function):
        super().__init__()

        assert len(neurons_hidden) == len(hidden_activations), "List specifying hidden activations and number of hidden layers must coincide"

        module_list = []

        # input layer hidden layers
        for num_neur, act in zip(neurons_hidden, hidden_activations):
            module_list.append(Layer(dim_in, num_neur, act))
            dim_in = num_neur

        # output layer
        o_layer = Layer(dim_in, dim_out, act = linear_activation)
        module_list.append(o_layer)

        self.layers_forward = nn.Sequential(*module_list)
       
        ## Loss and link function
        self.link = link_function
        self.loss = loss_function
        
    def forward(self,x, apply_link):
        y = self.layers_forward(x)
    
        if apply_link:
            y = self.link(y)

        return y

    def compute_loss(self,t,y):
        return self.loss(y,t)

As you can see, `nn.ModuleList` works like a python list, however each of nn.Parameters of the modules you add, if they are nn.Modules, will be added to the parameter list so that you can then apply gradient descent easily over them. As you might imagine `nn.ParameterList` works the same with lists of parameters. There are others working similar with different python structures such as `nn.ModuleDict`

The sequential is kind of a restrictive version of the previous features, but very usefull. Basically, Sequential receives a list of modules and on call, it sucesively apply one after other, feeding the output of the first, as input to the second and so on. So sequential lacks the flexibility of you deciding in which order you want to apply the operations of your model (some networks ramify into branches).


Also, as you can see, Pytorch already has implemented many of the most popular layers. So we do not need to implement the Linear layer and just use `nn.Linear`

In [35]:
model_A = FCModuleList(
    dim_in = 2, 
    dim_out = 1, 
    neurons_hidden = [10,10], 
    hidden_activations = [torch.relu, torch.relu], 
    link_function = torch.sigmoid, 
    loss_function = nn.BCEWithLogitsLoss()
)

model_B = FCSequential(
    dim_in = 2, 
    dim_out = 1, 
    neurons_hidden = [10,10], 
    hidden_activations = [torch.relu, torch.relu], 
    link_function = torch.sigmoid, 
    loss_function = nn.BCEWithLogitsLoss()
)

## Torch Optimizers

Before training the last model is mandatory to talk about torch optimizers. Optimizers are that, python classes that implement different optimizers. 

Up to know we have been using vanila gradient descent, where the update function is very easy. However, there are other optimizers which track old gradient computations to improve the convergence of the optimization process. Thus, it requires a bit more complex implementations and variables to keep internal computations. To avoid the need to implement and debug these algorithms, pytorch give us most of them.

Optimizers share things in common. On class instance you pass in the list of trainable parameters, and the hyperparameters of the learning algorithm, such as learning rate.

```python
opt = torch.optim.SGD(params = model.parameters(), lr=0.001, momentum=0.9)
```

Another cool thing is that after backwards, the update and zero_grad steps can be done easily by doing:

```python
opt.step()
opt.zero_grad()
```

In [36]:
if assesment_draw_and_fill:
    display(Markdown(
"""
```python
## to display learning functions
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

fig, (ax1, ax2) = plt.subplots(1,2, figsize = (10,5))

loss_acc_list = []

N_points_domain = 100
thr_prob = 0.5 # use to plot our classification guess
x1, x2 = np.meshgrid(np.linspace(-2,7,N_points_domain, dtype = np.float32),np.linspace(-2,7,N_points_domain, dtype = np.float32))

# reshape for neural network
x_range = torch.from_numpy(
    np.hstack((np.reshape(x1, (N_points_domain**2,1)),np.reshape(x2, (N_points_domain**2,1))))
).float()

# allocate memory to plot decision thresholds
y_range_plot = np.zeros((N_points_domain,N_points_domain), np.float32)

# class colors
color_c0 = 'C0'
color_c1 = 'C1'

## ======= DATA ==========
## create your data loader
r2to01nl_loader = torch.utils.data.DataLoader(
                                            dataset = R2to01NL(),
                                            batch_size = len(R2to01NL()), 
                                            shuffle=False,
                                            num_workers=1,
                                         )

## Initialization
model = FCModuleList(
    dim_in = ..., 
    dim_out = ..., 
    neurons_hidden = ..., 
    hidden_activations = ..., 
    link_function = ..., 
    loss_function = ...
)
# move model to the desired device
device = 'cpu'
model.to(device) 
model.float()

## optimization hyperparaemteres
epochs = ...
optimizer = torch.optim.SGD(...,..., momentum = 0.9)

for ... in range(...):
    loss_acc = 0.0
    for x, t in r2to01nl_loader:
        
        ## move to your computing platform
        x, t = x.to(device), t.to(device)
        
        ## 2.forward
        y = ...
        L = ...
        
        # 3. Backward
        L...
        loss_acc += L.item()
        
        # 4 Parameter update
        ...
        
        # 5 Zero grad
        ...

        print(f"On epoch {e} got loss {loss_acc}", end = "\r")
        
        with torch.no_grad():
            if e == 0:
                idx_class0 = (t == 0).squeeze()
                idx_class1 = (t == 1).squeeze()
                
            loss_acc_list.append(loss_acc)
            
            ## forward to plot decision thresholds
            y_range = model(x_range, apply_link = True)

            # reshape back to plotting
            y_range = np.reshape(y_range, (N_points_domain,N_points_domain))
            
            # start plotting
            ax1.cla()
            ax2.cla()
    
            # plot dataset    
            ax1.plot(x[idx_class0][:,0],x[idx_class0][:,1],'o', color = color_c0, markersize = 8, label = r'observations class 0 cat')
            ax1.plot(x[idx_class1][:,0],x[idx_class1][:,1],'*', color = color_c1,markersize = 8, label = r'data observations class 1dog')
            ax1.set_xlabel('peso ($x_1$)')
            ax1.set_ylabel('altura ($x_2$)')
    
            ## plot prediction probability for class 1 and 0
            idx_range1 = y_range > thr_prob
            idx_range0 = ~idx_range1
    
            y_range_plot[idx_range1] = y_range[idx_range1]
            y_range_plot[idx_range0] = np.nan
    
            ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Oranges"), alpha = 0.5)
            contourf1 = ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Oranges"), alpha = 0.5)
    
            y_range_plot[idx_range0] = y_range[idx_range0]
            y_range_plot[idx_range1] = np.nan
    
            ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Blues"), alpha = 0.5)
            contourf2 = ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Blues"), alpha = 0.5)
    
            # decision threshold
            contour1 = ax1.contour(x1, x2, y_range, levels = [thr_prob], colors = ["k"], linewidth = 4, label = f'decision threshold p = {thr_prob}')
            ax1.clabel(contour1, inline=True, fontsize=8, fmt="%.2f")
    
            ## set legend
            ax1.legend()
    
            ## set contour bar level
            if e == 0:
                cbar2 = fig.colorbar(contourf2, ax=ax1, orientation='vertical')
                cbar2.set_label('Probability of being a cat')
    
                #cbar1 = fig.add_axes([0.92, 0.15, 0.02, 0.7])  # Ajusta la posición [izq, abajo, ancho, alto]
                cbar1 = fig.colorbar(contourf1, ax=ax1, orientation='vertical')
                cbar1.set_label('Probability of being a dog')
    
            ax2.plot(np.arange(0,e+1),loss_acc_list)
            ax2.set_xlim([0,epochs])
            ax2.set_xlabel('epochs')
            ax2.set_ylabel('loss')

            ## add frame for video creation
            buf = BytesIO()
            fig.savefig(buf, format="png", dpi=100)
            
            buf.seek(0)
            frame = imageio.imread(buf) 
            writer.append_data(frame) 

writer.close()
plt.close()

display(Video(data=video_filename, embed=True))
os.remove(video_filename)
```
"""    
))

In [37]:
## to display learning functions
video_filename = "/tmp/aux4.mp4"
writer = imageio.get_writer(video_filename, format="FFMPEG", mode="I", fps=1, codec="libx264")

fig, (ax1, ax2) = plt.subplots(1,2, figsize = (10,5))

loss_acc_list = []

N_points_domain = 100
thr_prob = 0.5 # use to plot our classification guess
x1, x2 = np.meshgrid(np.linspace(-2,7,N_points_domain, dtype = np.float32),np.linspace(-2,7,N_points_domain, dtype = np.float32))

# reshape for neural network
x_range = torch.from_numpy(
    np.hstack((np.reshape(x1, (N_points_domain**2,1)),np.reshape(x2, (N_points_domain**2,1))))
).float()

# allocate memory to plot decision thresholds
y_range_plot = np.zeros((N_points_domain,N_points_domain), np.float32)

# class colors
color_c0 = 'C0'
color_c1 = 'C1'

## ======= DATA ==========
## create your data loader
r2to01nl_loader = torch.utils.data.DataLoader(
                                            dataset = R2to01NL(),
                                            batch_size = len(R2to01NL()), 
                                            shuffle=False,
                                            num_workers=1,
                                         )

## Initialization
model = FCModuleList(
    dim_in = 2, 
    dim_out = 1, 
    neurons_hidden = [10,10], 
    hidden_activations = [torch.relu, torch.relu], 
    link_function = torch.sigmoid, 
    loss_function = nn.BCEWithLogitsLoss()
)
# move model to the desired device
device = 'cpu'
model.to(device) 
model.float()

## optimization hyperparaemteres
epochs = 100
optimizer = torch.optim.SGD(model.parameters(), lr = 0.1, momentum = 0.9)

for e in range(epochs):
    loss_acc = 0.0
    for x, t in r2to01nl_loader:
        
        ## move to your computing platform
        x, t = x.to(device), t.to(device)
        
        ## 2.forward
        y = model(x, apply_link = False)
        L = model.compute_loss(t,y)
        
        # 3. Backward
        L.backward()
        loss_acc += L.item()
        
        # 4 Parameter update
        optimizer.step()
        
        # 5 Zero grad
        optimizer.zero_grad()

        print(f"On epoch {e} got loss {loss_acc}", end = "\r")
        
        with torch.no_grad():
            if e == 0:
                idx_class0 = (t == 0).squeeze()
                idx_class1 = (t == 1).squeeze()
                
            loss_acc_list.append(loss_acc)
            
            ## forward to plot decision thresholds
            y_range = model(x_range, apply_link = True)

            # reshape back to plotting
            y_range = np.reshape(y_range, (N_points_domain,N_points_domain))
            
            # start plotting
            ax1.cla()
            ax2.cla()
    
            # plot dataset    
            ax1.plot(x[idx_class0][:,0],x[idx_class0][:,1],'o', color = color_c0, markersize = 8, label = r'observations class 0 cat')
            ax1.plot(x[idx_class1][:,0],x[idx_class1][:,1],'*', color = color_c1,markersize = 8, label = r'data observations class 1dog')
            ax1.set_xlabel('peso ($x_1$)')
            ax1.set_ylabel('altura ($x_2$)')
    
            ## plot prediction probability for class 1 and 0
            idx_range1 = y_range > thr_prob
            idx_range0 = ~idx_range1
    
            y_range_plot[idx_range1] = y_range[idx_range1]
            y_range_plot[idx_range0] = np.nan
    
            ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Oranges"), alpha = 0.5)
            contourf1 = ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Oranges"), alpha = 0.5)
    
            y_range_plot[idx_range0] = y_range[idx_range0]
            y_range_plot[idx_range1] = np.nan
    
            ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Blues"), alpha = 0.5)
            contourf2 = ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Blues"), alpha = 0.5)
    
            # decision threshold
            contour1 = ax1.contour(x1, x2, y_range, levels = [thr_prob], colors = ["k"], linewidth = 4, label = f'decision threshold p = {thr_prob}')
            ax1.clabel(contour1, inline=True, fontsize=8, fmt="%.2f")
    
            ## set legend
            ax1.legend()
    
            ## set contour bar level
            if e == 0:
                cbar2 = fig.colorbar(contourf2, ax=ax1, orientation='vertical')
                cbar2.set_label('Probability of being a cat')
    
                #cbar1 = fig.add_axes([0.92, 0.15, 0.02, 0.7])  # Ajusta la posición [izq, abajo, ancho, alto]
                cbar1 = fig.colorbar(contourf1, ax=ax1, orientation='vertical')
                cbar1.set_label('Probability of being a dog')
    
            ax2.plot(np.arange(0,e+1),loss_acc_list)
            ax2.set_xlim([0,epochs])
            ax2.set_xlabel('epochs')
            ax2.set_ylabel('loss')

            ## add frame for video creation
            buf = BytesIO()
            fig.savefig(buf, format="png", dpi=100)
            
            buf.seek(0)
            frame = imageio.imread(buf) 
            writer.append_data(frame) 

writer.close()
plt.close()

On epoch 0 got loss 0.73414146900177

  ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Oranges"), alpha = 0.5)
  contourf1 = ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Oranges"), alpha = 0.5)
  ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Blues"), alpha = 0.5)
  contourf2 = ax1.contourf(x1, x2, y_range_plot, levels = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], cmap = plt.cm.get_cmap("Blues"), alpha = 0.5)
  contour1 = ax1.contour(x1, x2, y_range, levels = [thr_prob], colors = ["k"], linewidth = 4, label = f'decision threshold p = {thr_prob}')


On epoch 99 got loss 0.0005788892158307135

In [38]:
display(Video(data=video_filename, embed=True))
os.remove(video_filename)