# Topology of Deep Neural Networks

This notebook will show you how easy it is to use gdeep to reproduce the experiments of the paper [Topology of Deep Neural Networks](https://arxiv.org/pdf/2004.06093.pdf), by Naizat et. al. In this work, the authors studied the evolution of the topology of a dataset as embedded in the successive layers of a Neural Network, trained for classification on this dataset.

Their main findings can be summarized as follows:

- Neural networks tend to simplify the topology of the dataset accross layers.

- This decrease in topological complexity is more efficient when the activation functions are non-homeomorphic, as it is the case for ReLu or leakyReLu.

Here is an illustration from the paper:

![img](./images/topology_accross_layers.png)

The main steps of this tutorial will be as follows:

1. Create the Entangled Tori dataset.
2. Build several fully connected networks, with different activation functions.
3. Train these networks to classify the Entangled Tori datasets.
4. Visualise in tensorboard the persistence diagrams of the dataset embedded in each layers of each network.
5. Study the decrease in topological complexity of the dataset accross layers



## Import relevant librairies

In [None]:
%reload_ext autoreload
%autoreload 2

# deep learning
import torch
from torch.optim import Adam
from torch import nn 

# gdeep
from gdeep.data.datasets import DatasetBuilder, DataLoaderBuilder
from gdeep.models import FFNet
from gdeep.visualization import persistence_diagrams_of_activations
from gdeep.data.preprocessors import ToTensorImage
from gdeep.trainer import Trainer
from gdeep.search import Benchmark
from gdeep.search import GiottoSummaryWriter

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram

#Tensorboard
import tensorboard as tb



# Initialize the tensorboard writer

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/examples` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualization results.


In [None]:
writer = GiottoSummaryWriter()

# Generate the Entangled Tori dataset and prepare the dataloaders

![img](./images/entangled_tori.png)


In [None]:
from torch.utils.data import  RandomSampler
db = DatasetBuilder(name="EntangledTori")
ds_tr, ds_val, ds_ts = db.build( n_pts = 50)
dl_tr, dl_val, dl_ts = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build(    
     [{"batch_size":100, "sampler":RandomSampler(ds_tr)}, 
     {"batch_size":100, "sampler":RandomSampler(ds_tr)}, 
     {"batch_size":100, "sampler":RandomSampler(ds_tr)}]
     )

# Define models with different activations functions

In [None]:
import torch.nn.functional as F

# Choose the achitecture of the fully connected network
architecture = [3,5,5,5,2]
# Choose the loss function for training
loss_function = nn.CrossEntropyLoss()
# Choose the set of activation functions to equip the neural network with
activation_string = ["relu", "leakyrelu", "tanh", "sigmoid"]
activation_functions = [F.relu, F.leaky_relu, torch.tanh, torch.sigmoid]

In [None]:
# Define the models and trainers
models = []
writers = []
trainers = []
for i in range(len(activation_functions)):
    model_temp = FFNet(arch = architecture, activation = activation_functions[i])
    writer_temp = GiottoSummaryWriter(log_dir='runs/' + model_temp.__class__.__name__ + activation_string[i])
    trainer_temp = Trainer(model_temp, [dl_tr, dl_ts], loss_function, writer_temp)
    models.append(model_temp)
    writers.append(writer_temp)
    trainers.append(trainer_temp)

# Let's train our models!

You can monitor the training in the tensorboard page

In [None]:
for pipe in trainers:
    pipe.train(
        Adam,
        7,
        False,
        {"lr": 0.01},
        {"batch_size": 200})

# For each model, let's plot the topology of the dataset embedded in each layer of the network

We start by the Betti curves. For a subset of size `batch_size` of the dataset, we compute the successive Betti numbers of the Vietoris-Rips complex of radius filtration_value of the subset embedded in each layer of the network. The result is plotted in tensorboard.

In [None]:
from gdeep.visualization import Visualiser

# Choose the size of the subset of the dataset
batch_size = 500

# Extract the subset of the dataset for plotting
one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr,)).build(
    [{"batch_size": batch_size, "sampler": RandomSampler(ds_tr)}]) 

batch_for_plotting = next(iter(one_batch_dataset))

# For each model, plot the Betti curve
for pipe in trainers:
    vs = Visualiser(pipe)
    vs.plot_betti_numbers_layers(homology_dimensions=[0,1], 
        batch=batch_for_plotting, 
        filtration_value=0.5)
    del vs


If you have a bit more time, you can even compute the Persistence Diagrams of the subset of the dataset embedded in each layers, and plot them in tensorboard! The computation might take a few minutes.

In [None]:
for pipe in trainers:
    vs = Visualiser(pipe)
    vs.plot_persistence_diagrams(batch_for_plotting, k=0)
    del vs


## Conclusion

As can be observed in the tensorboard plots, the neural network tend to simplify the topology of the dataset accross layers, in order to perform classification. This simple observation highlights the importance to understand topologically the operations performed by deep learning models.