# Toplogy of Deep Neural Networks

This notebook will show you how easy it is to use gdeep to reproduce the experiments of the paper [Topology of Deep Neural Networks](https://arxiv.org/pdf/2004.06093.pdf), by Naizat et. al. In this work, the authors studied the evolution of the topology of a dataset as embedded in the successive layers of a Neural Network, trained for classification on this dataset.

Their main findings can be summarized as follows:

- Neural networks tend to simplify the topology of the dataset accross layers.

- This decrease in topological complexity is more efficient when the activation functions are non-homeomorphic, as it is the case for ReLu or leakyReLu.

Here is an illustration from the paper:

![illustration](/notebook_images/tda_dl/intro.png)

The main steps of this tutorial will be as follows:

1. 


In [44]:
%reload_ext autoreload
%autoreload 2

# deep learning
import torch
from torch.optim import Adam, SGD
import numpy as np
from torch import nn
from torch import autograd  

#gdeep
from gdeep.data.datasets import DatasetBuilder, DataLoaderBuilder
from gdeep.models import FFNet
from gdeep.visualisation import persistence_diagrams_of_activations
from gdeep.data.preprocessors import ToTensorImage
from gdeep.trainer import Trainer
from gdeep.search import Benchmark



# plot
import plotly.express as px
import pandas as pd
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram

#Tensorboard

import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile


# Initialize the tensorboard writer

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualisation results.


In [45]:
db = DatasetBuilder(name="EntangledTori")
ds_tr, ds_val, ds_ts = db.build()
dl_tr, dl_val, dl_ts = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build()

# Import the Entangled Tori dataset and prepare the dataloaders

In [46]:
db = DatasetBuilder(name="EntangledTori")
ds_tr, ds_val, ds_ts = db.build()
dl_tr, dl_val, dl_ts = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build()

## Define models with different activations functions

In [47]:
import torch.nn.functional as F
architecture = [3,10,10,10,10,2]
loss_function = nn.CrossEntropyLoss()
activation_string = ["relu", "leakyrelu", "tanh", "sigmoid"]
activation_functions = [F.relu, F.leaky_relu, F.tanh, F.sigmoid]
models = []
writers = []
trainers = []
for i in range(len(activation_functions)):
    model_temp = FFNet(arch = architecture, activation = activation_functions[i])
    writer_temp = SummaryWriter(log_dir='runs/' + model_temp.__class__.__name__ + activation_string[i])
    trainer_temp = Trainer(model_temp, [dl_tr, dl_ts], loss_function, writer_temp)
    models.append(model_temp)
    writers.append(writer_temp)
    trainers.append(trainer_temp)








In [48]:
for pipe in trainers:
    pipe.train(
    Adam,
    3,
    False,
    {"lr": 0.01},
    {"batch_size": 32})

Epoch 1
-------------------------------
Epoch training loss: 0.687144 	Epoch training accuracy: 55.23%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 59.38%,                 Avg loss: 0.668257 

Epoch 2
-------------------------------
Batch training loss:  0.6569196581840515  	Batch training accuracy:  56.25  	[ 2 / 40 ]                     


Cannot store data in the PR curve



Epoch training loss: 0.670238 	Epoch training accuracy: 57.58%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 57.19%,                 Avg loss: 0.653777 

Epoch 3
-------------------------------
Epoch training loss: 0.647436 	Epoch training accuracy: 58.52%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 58.44%,                 Avg loss: 0.626329 

Epoch 1
-------------------------------
Epoch training loss: 0.687558 	Epoch training accuracy: 53.28%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 60.94%,                 Avg loss: 0.666075 

Epoch 2
-------------------------------
Epoch training loss: 0.660901 	Epoch training accuracy: 57.58%                                                
Time taken


nn.functional.tanh is deprecated. Use torch.tanh instead.



Epoch training loss: 0.684465 	Epoch training accuracy: 58.28%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 56.25%,                 Avg loss: 0.670698 

Epoch 2
-------------------------------
Epoch training loss: 0.654094 	Epoch training accuracy: 59.84%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 61.25%,                 Avg loss: 0.636051 

Epoch 3
-------------------------------
Epoch training loss: 0.602042 	Epoch training accuracy: 65.31%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 65.94%,                 Avg loss: 0.588885 

Epoch 1
-------------------------------
Batch training loss:  0.7146308693018827  	Batch training accuracy:  34.375  	[ 11 / 40 ]                     


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



Epoch training loss: 0.700545 	Epoch training accuracy: 49.69%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 43.75%,                 Avg loss: 0.709261 

Epoch 2
-------------------------------
Epoch training loss: 0.692876 	Epoch training accuracy: 52.50%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 56.25%,                 Avg loss: 0.683622 

Epoch 3
-------------------------------
Epoch training loss: 0.691344 	Epoch training accuracy: 53.28%                                                
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 59.06%,                 Avg loss: 0.687904 



In [49]:
from gdeep.analysis.interpretability import Interpreter
from gdeep.visualisation import Visualiser

vs = Visualiser(trainers[0]) 
vs.plot_3d_dataset()



In [43]:
one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build([{"batch_size":1600}, {"batch_size":1600}, {"batch_size":1600}]) 


for pipe in trainers:
    vs = Visualiser(pipe)
    vs.plot_persistence_diagrams(next(iter(one_batch_dataset)))


nn.functional.tanh is deprecated. Use torch.tanh instead.


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



In [45]:
# initialise the benchmarking class. When we do not specify it, it will use KFold with 5 splits
bench = Benchmark(models_dicts, dataloaders_dicts, loss_fn, writer)

# start the benchmarking
bench.start(SGD, 2, False, {"lr": 0.01}, {"batch_size": 32}, n_accumulated_grads=2)


Epoch 1
-------------------------------
Epoch training loss: 0.684559 	Epoch training accuracy: 53.67%                                      ]                     
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 51.43%,                 Avg loss: 0.684506 

Epoch 2
-------------------------------
Epoch training loss: 0.667289 	Epoch training accuracy: 55.62%                                                           
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 56.14%,                 Avg loss: 0.668249 

Epoch 3
-------------------------------
Epoch training loss: 0.651055 	Epoch training accuracy: 58.23%                                                           
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 55.14%,                 Avg loss: 0.648084 

Epoch 4
-------------------------------
Epoch training loss: 0.623932 	Epoch training a

In [50]:
# train NN
model = FFNet(arch=[3,10,10,10,10,2])
print(model)
pipe = Trainer(model, (dl_tr, dl_ts), nn.CrossEntropyLoss(), writer)
pipe.train(Adam, 100, False, {"lr":0.01}, {"batch_size":50})

FFNet(
  (linears): ModuleList(
    (0): Linear(in_features=3, out_features=10, bias=True)
    (1): Linear(in_features=10, out_features=10, bias=True)
    (2): Linear(in_features=10, out_features=10, bias=True)
    (3): Linear(in_features=10, out_features=10, bias=True)
    (4): Linear(in_features=10, out_features=2, bias=True)
  )
)
Epoch 1
-------------------------------
Epoch training loss: 0.674135 	Epoch training accuracy: 54.95%                                                           
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 59.86%,                 Avg loss: 0.646908 

Epoch 2
-------------------------------
Epoch training loss: 0.643848 	Epoch training accuracy: 57.15%                                                           
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 62.86%,                 Avg loss: 0.607025 

Epoch 3
-------------------------------
Epoch training l

(0.2513222247362137, 90.85714285714286)

In [51]:
from gdeep.analysis.interpretability import Interpreter
from gdeep.visualisation import Visualiser

vs = Visualiser(pipe)
one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build([{"batch_size":1600}, {"batch_size":1600}, {"batch_size":1600}]) 



# the diagrams can be seen on tensorboard!
vs.plot_persistence_diagrams(next(iter(one_batch_dataset)))


In [52]:
vs.plot_3d_dataset()