# Toplogy of Deep Neural Networks

This notebook will show you how easy it is to use gdeep to reproduce the experiments of the paper [Topology of Deep Neural Networks](https://arxiv.org/pdf/2004.06093.pdf), by Naizat et. al. In this work, the authors studied the evolution of the topology of a dataset as embedded in the successive layers of a Neural Network, trained for classification on this dataset.

Their main findings can be summarized as follows:

- Neural networks tend to simplify the topology of the dataset accross layers.

- This decrease in topological complexity is more efficient when the activation functions are non-homeomorphic, as it is the case for ReLu or leakyReLu.

Here is an illustration from the paper:

![img](./images/topology_accross_layers.png)

The main steps of this tutorial will be as follows:

1. Create the Entangled Tori dataset.
2. Build several fully connected networks, with different activation functions.
3. Train these networks to classify the Entangled Tori datasets.
4. Visualise in tensorboard the persistence diagrams of the dataset embedded in each layers of each network.
5. Study the decrease in topological complexity of the dataset accross layers



## Import relevant librairies

In [1]:
%reload_ext autoreload
%autoreload 2

# deep learning
import torch
from torch.optim import Adam, SGD
import numpy as np
from torch import nn
from torch import autograd  

#gdeep
from gdeep.data.datasets import DatasetBuilder, DataLoaderBuilder
from gdeep.models import FFNet
from gdeep.visualisation import persistence_diagrams_of_activations
from gdeep.data.preprocessors import ToTensorImage
from gdeep.trainer import Trainer
from gdeep.search import Benchmark
from gdeep.search import GiottoSummaryWriter



# plot
import plotly.express as px
import pandas as pd


writer = GiottoSummaryWriter()

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram

#Tensorboard

import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile


No TPUs...


# Initialize the tensorboard writer

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualisation results.


# Generate the Entangled Tori dataset and prepare the dataloaders

![img](./images/entangled_tori.png)


In [2]:
from torch.utils.data import  RandomSampler
db = DatasetBuilder(name="EntangledTori")
ds_tr, ds_val, ds_ts = db.build( n_pts = 50)
dl_tr, dl_val, dl_ts = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build(    
     [{"batch_size":100, "sampler":RandomSampler(ds_tr)}, 
     {"batch_size":100, "sampler":RandomSampler(ds_tr)}, 
     {"batch_size":100, "sampler":RandomSampler(ds_tr)}]
     )

# Define models with different activations functions

In [3]:
import torch.nn.functional as F

# Choose the achitecture of the fully connected network
architecture = [3,5,5,5,5,5,2]
# Choose the loss function for training
loss_function = nn.CrossEntropyLoss()
# Choose the set of activation functions to equip the neural network with
activation_string = ["relu", "leakyrelu", "tanh", "sigmoid"]
activation_functions = [F.relu, F.leaky_relu, torch.tanh, torch.sigmoid]

In [4]:
# Define the models and trainers
models = []
writers = []
trainers = []
for i in range(len(activation_functions)):
    model_temp = FFNet(arch = architecture, activation = activation_functions[i])
    writer_temp = GiottoSummaryWriter(log_dir='runs/' + model_temp.__class__.__name__ + activation_string[i])
    trainer_temp = Trainer(model_temp, [dl_tr, dl_ts], loss_function, writer_temp)
    models.append(model_temp)
    writers.append(writer_temp)
    trainers.append(trainer_temp)

# Let's train our models!

You can monitor the training in the tensorboard page

In [5]:
for pipe in trainers:
    pipe.train(
    Adam,
    10,
    False,
    {"lr": 0.01},
    {"batch_size": 200})

Epoch 1
-------------------------------
Epoch training loss: 0.616096 	Epoch training accuracy: 62.94%                                                            
Time taken for this epoch: 1.00s
Learning rate value: 0.01000000



Cannot store data in the PR curve



Validation results: 
 accuracy: 71.81%,                 Avg loss: 0.551802 

Epoch 2
-------------------------------
Epoch training loss: 0.520029 	Epoch training accuracy: 72.37%                                                
Time taken for this epoch: 1.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 71.96%,                 Avg loss: 0.509592 

Epoch 3
-------------------------------
Epoch training loss: 0.508392 	Epoch training accuracy: 72.53%                                                
Time taken for this epoch: 1.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 72.91%,                 Avg loss: 0.498214 

Epoch 4
-------------------------------
Epoch training loss: 0.505045 	Epoch training accuracy: 72.77%                                                
Time taken for this epoch: 1.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 71.84%,                 Avg loss: 0.511900 

Epoch 5
-------------------------------
Epoc

# For each model, let's plot the persistence diagrams of the dataset embedded in each layer of the network

To see the persistence diagrams, go to the tensorboard page!

In [9]:
from gdeep.visualisation import Visualiser
batch_size = 1000
one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build(
    [{"batch_size":batch_size, "sampler":RandomSampler(ds_tr)}, 
    {"batch_size":batch_size, "sampler":RandomSampler(ds_tr)}, 
    {"batch_size":batch_size, "sampler":RandomSampler(ds_tr)}]) 

for pipe in trainers:
    vs = Visualiser(pipe)
    vs.plot_betti_curves_layers(homology_dimensions = [1], 
        batch = next(iter(one_batch_dataset)), 
        k=0)





In [None]:


for pipe in trainers:
    vs = Visualiser(pipe)
    vs.plot_persistence_diagrams(next(iter(one_batch_dataset)), k= 0)



one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build([{"batch_size":batch_size, "sampler":RandomSampler(ds_tr)}, {"batch_size":batch_size, "sampler":RandomSampler(ds_tr)}, {"batch_size":batch_size, "sampler":RandomSampler(ds_tr)}]) 

for pipe in trainers:
    vs = Visualiser(pipe)
    vs.betti_plot_layers(homology_dimension = [0,1], batch = next(iter(one_batch_dataset)), k=0)


In [None]:
from gdeep.analysis.interpretability import Interpreter


vs = Visualiser(trainers[0]) 
vs.plot_3d_dataset()

In [None]:
from gtda.homology import VietorisRipsPersistence

data =torch.stack(activations[:-2]).cpu().detach().numpy()
VR = VietorisRipsPersistence()
VR.fit_transform_plot(data)

In [None]:
pd.show()

In [None]:
ds_tr