# Toplogy of Deep Neural Networks

This notebook will show you how easy it is to use gdeep to reproduce the experiments of the paper [Topology of Deep Neural Networks](https://arxiv.org/pdf/2004.06093.pdf), by Naizat et. al. In this work, the authors studied the evolution of the topology of a dataset as embedded in the successive layers of a Neural Network, trained for classification on this dataset.

Their main findings can be summarized as follows:

- Neural networks tend to simplify the topology of the dataset accross layers.

- This decrease in topological complexity is more efficient when the activation functions are non-homeomorphic, as it is the case for ReLu or leakyReLu.

Here is an illustration from the paper:

![illustration](/notebook_images/tda_dl/intro.png)

The main steps of this tutorial will be as follows:

1. Import the Entangled Tori dataset.
2. Build several fully connected networks, with different activation functions.
3. Train these networks to classify the Entangled Tori datasets.
4. Visualise in tensorboard the persistence diagrams of the dataset embedded in each layers of each network.
5. Study the decrease in topological complexity of the dataset accross layers



## Import the packages that will be needed for the notebook

In [1]:
%reload_ext autoreload
%autoreload 2

# deep learning
import torch
from torch.optim import Adam, SGD
import numpy as np
from torch import nn
from torch import autograd  

#gdeep
from gdeep.data.datasets import DatasetBuilder, DataLoaderBuilder
from gdeep.models import FFNet
from gdeep.visualisation import persistence_diagrams_of_activations
from gdeep.data.preprocessors import ToTensorImage
from gdeep.trainer import Trainer
from gdeep.search import Benchmark



# plot
import plotly.express as px
import pandas as pd
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram

#Tensorboard

import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile


No TPUs...


## 1. Initialize the tensorboard writer and import the Entangled Tori dataset

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualisation results.


In [2]:
from torch.utils.data import  RandomSampler
db = DatasetBuilder(name="EntangledTori")
ds_tr, ds_val, ds_ts = db.build( n_pts = 50)
dl_tr, dl_val, dl_ts = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build(    
     [{"batch_size":100, "sampler":RandomSampler(ds_tr)}, 
     {"batch_size":100, "sampler":RandomSampler(ds_tr)}, 
     {"batch_size":100, "sampler":RandomSampler(ds_tr)}]
     )

## 2. Choose the architecture and activations functions of the models

In [3]:
import torch.nn.functional as F
architecture = [3,5,5,5,5,2]
loss_function = nn.CrossEntropyLoss()
activation_string = ["relu", "leakyrelu", "tanh", "sigmoid"]
activation_functions = [F.relu, F.leaky_relu, F.tanh, F.sigmoid]









In [4]:
models = []
writers = []
trainers = []
for i in range(len(activation_functions)):
    model_temp = FFNet(arch = architecture, activation = activation_functions[i])
    writer_temp = SummaryWriter(log_dir='runs/' + model_temp.__class__.__name__ + activation_string[i])
    trainer_temp = Trainer(model_temp, [dl_tr, dl_ts], loss_function, writer_temp)
    models.append(model_temp)
    writers.append(writer_temp)
    trainers.append(trainer_temp)

In [5]:
epochs = 10

for pipe in trainers:
    pipe.train(
    Adam,
    epochs,
    False,
    {"lr": 0.01},
    {"batch_size": 100})

Epoch 1
-------------------------------
Epoch training loss: 0.637522 	Epoch training accuracy: 58.57%                                                             	Batch training accuracy:  51.0  	[ 34 / 320 ]                      	Batch training accuracy:  54.0  	[ 84 / 320 ]                      95 / 320 ]                      	Batch training accuracy:  60.0  	[ 141 / 320 ]                      	Batch training accuracy:  68.0  	[ 297 / 320 ]                     
Time taken for this epoch: 5.00s
Learning rate value: 0.01000000



Cannot store data in the PR curve



Validation results: 
 accuracy: 64.26%,                 Avg loss: 0.581360 

Epoch 2
-------------------------------
Epoch training loss: 0.531500 	Epoch training accuracy: 68.68%                                                             	Batch training accuracy:  67.0  	[ 5 / 320 ]                     72.0  	[ 49 / 320 ]                       	Batch training accuracy:  76.0  	[ 186 / 320 ]                     
Time taken for this epoch: 5.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 73.17%,                 Avg loss: 0.476448 

Epoch 3
-------------------------------
Epoch training loss: 0.477901 	Epoch training accuracy: 72.68%                                                 4.0  	[ 10 / 320 ]                      70.0  	[ 136 / 320 ]                     
Time taken for this epoch: 5.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 73.03%,                 Avg loss: 0.486837 

Epoch 4
-------------------------------
Epoch training loss: 0.472


nn.functional.tanh is deprecated. Use torch.tanh instead.



Epoch training loss: 0.602180 	Epoch training accuracy: 61.16%                                                              	Batch training accuracy:  47.0  	[ 14 / 320 ]                      / 320 ]                      	Batch training accuracy:  61.0  	[ 88 / 320 ]                       	Batch training accuracy:  66.0  	[ 248 / 320 ]                     
Time taken for this epoch: 5.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 61.94%,                 Avg loss: 0.541753 

Epoch 2
-------------------------------
Epoch training loss: 0.536967 	Epoch training accuracy: 65.11%                                                              	Batch training accuracy:  64.0  	[ 77 / 320 ]                     ]                     
Time taken for this epoch: 4.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 65.36%,                 Avg loss: 0.521280 

Epoch 3
-------------------------------
Epoch training loss: 0.508890 	Epoch training accuracy: 65.98% 


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



Epoch training loss: 0.671928 	Epoch training accuracy: 55.22%                                       ]                     	Batch training accuracy:  45.0  	[ 55 / 320 ]                     
Time taken for this epoch: 4.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 58.15%,                 Avg loss: 0.625799 

Epoch 2
-------------------------------
Epoch training loss: 0.615248 	Epoch training accuracy: 62.17%                                                             
Time taken for this epoch: 5.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 63.56%,                 Avg loss: 0.606195 

Epoch 3
-------------------------------
Epoch training loss: 0.591221 	Epoch training accuracy: 64.47%                                                              60.0  	[ 114 / 320 ]                       	Batch training accuracy:  67.0  	[ 253 / 320 ]                     
Time taken for this epoch: 5.00s
Learning rate value: 0.01000000
Validation results: 

In [6]:
from gdeep.analysis.interpretability import Interpreter
from gdeep.visualisation import Visualiser

vs = Visualiser(trainers[0]) 
vs.plot_3d_dataset()

In [7]:
one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build([{"batch_size":3000}, {"batch_size":3000},{"batch_size":3000}]) 


for pipe in trainers:
    vs = Visualiser(pipe)
    vs.plot_persistence_diagrams(next(iter(one_batch_dataset)))

TypeError: '_SingleProcessDataLoaderIter' object is not subscriptable

In [None]:
# train NN
model = FFNet(arch=[3,10,10,10,10,2])
print(model)
pipe = Trainer(model, (dl_tr, dl_ts), nn.CrossEntropyLoss(), writer)
pipe.train(Adam, 100, False, {"lr":0.01}, {"batch_size":50})

FFNet(
  (linears): ModuleList(
    (0): Linear(in_features=3, out_features=10, bias=True)
    (1): Linear(in_features=10, out_features=10, bias=True)
    (2): Linear(in_features=10, out_features=10, bias=True)
    (3): Linear(in_features=10, out_features=10, bias=True)
    (4): Linear(in_features=10, out_features=2, bias=True)
  )
)
Epoch 1
-------------------------------
Epoch training loss: 0.674135 	Epoch training accuracy: 54.95%                                                           
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 59.86%,                 Avg loss: 0.646908 

Epoch 2
-------------------------------
Epoch training loss: 0.643848 	Epoch training accuracy: 57.15%                                                           
Time taken for this epoch: 0.00s
Learning rate value: 0.01000000
Validation results: 
 accuracy: 62.86%,                 Avg loss: 0.607025 

Epoch 3
-------------------------------
Epoch training l

(0.2513222247362137, 90.85714285714286)

In [None]:
from gdeep.analysis.interpretability import Interpreter
from gdeep.visualisation import Visualiser

vs = Visualiser(pipe)
one_batch_dataset, _, _ = DataLoaderBuilder((ds_tr, ds_val, ds_ts)).build([{"batch_size":1600}, {"batch_size":1600}, {"batch_size":1600}]) 



# the diagrams can be seen on tensorboard!
vs.plot_persistence_diagrams(next(iter(one_batch_dataset)))


In [52]:
vs.plot_3d_dataset()

In [71]:
 model_temp = FFNet(arch = [2,3,3])

TypeError: super(type, obj): obj must be an instance or subtype of type

In [69]:
architecture

[3, 10, 10, 10, 10, 2]