# Computation of decision boundary

In this short notebook we describe how to use giotto-deep to compute the decision boundary of a classifier.

The idea of the algorithm is to use gradient descent to push a certain amount of input poits towards the boundary of the classifier. 

More formally, we are computing the following:

$$ \frac{\partial || M(x) - 1/2||^2}{\partial x}$$

where $M$ is the model whose output is the softmax, and $x$ the input. If teh output is $1/2$, it means that the model is undecided between the two classes at hand: this is the decison bounday.

## Content

This is what we are going to do:
 1. build the dataset
 2. build and train the model
 3. visualise the decison boundary
 4. compute the topology of decision boundary
 5. (extra) lower level use of the modules

In [None]:
%reload_ext autoreload
%autoreload 2
# deep learning
import torch
from torch.optim import Adam, SGD
import numpy as np
from torch import nn
from gdeep.models import FFNet
from gdeep.data.datasets import DatasetBuilder, DataLoaderBuilder
from gdeep.trainer import Trainer
from torch import autograd

# plot
import plotly.express as px
import pandas as pd
from gdeep.search import GiottoSummaryWriter

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram


## Initialize the tensorboard writer

In order to analyse the results of your models, you need to start tensorboard.
On the terminal, move inside the `/examples` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training of your model to see all the visualization results.

In [None]:
writer = GiottoSummaryWriter()

## Build datatset

We want to test our method on a 3D dataset made of 2 entangled tori: basically, like two consecutive rings in a chain. We expect that the neural network decision boundary has a very non-trivial shape


In [None]:
bd = DatasetBuilder(name="DoubleTori")
ds_tr, ds_val, _ = bd.build()
# train_indices = list(range(160))
dl = DataLoaderBuilder((ds_tr, ds_val))
dl_tr, dl_val, dl_ts = dl.build()


## Train the model

In giotto deep, once the model and the datasets are defined, it is a matter of a couple of lines to start the trianing.

In [None]:
# train NN
model = FFNet(arch=[3, 10, 10, 2])
print(model)
pipe = Trainer(model, (dl_tr, dl_ts), nn.CrossEntropyLoss(), writer)
pipe.train(SGD, 5, False, {"lr": 0.01}, {"batch_size": 1})


## Visualising the decision boundary

We are sending to the tensorboard the visualization data: hence, you can explore the different sections to find the different plots there. Note that the interactive 3D decison boundary can be found in the **projector** section.

In [None]:
from gdeep.visualization import Visualiser

vs = Visualiser(pipe)
vs.plot_interactive_model()
db, _, _ = vs.plot_decision_boundary()


## Topology of decision boundary

We chec with Giotto-tda that the topology of the decison boundary:

In [None]:
# check topology from d_final
try:
    vr = VietorisRipsPersistence(
        collapse_edges=True,
        max_edge_length=1,
        metric="euclidean",
        n_jobs=-1,
        homology_dimensions=(0, 1, 2),
    )
    diag = vr.fit_transform([db])

    plot_diagram(diag[0]).show()
except ValueError:
    print("Due to the stocasticity of gnerating the pointts, none of them survided the filtering, and hence ``db`` was empty")

# Extra: lower level use of the modules

In this short section, we show how to directly use the functionalities of the decision boundary calculators. You will see how to define the point sampler, initialise it, initialise the boundary coputation and run it.

In [None]:
from gdeep.analysis.decision_boundary import (
    QuasihyperbolicDecisionBoundaryCalculator,
    UniformlySampledPoint,
)

n_samples = 100

# remove the gradients
for param in model.parameters():
    param.requires_grad = False

# define the point sampler
point_sample_generator = UniformlySampledPoint(
    [(-2, 4), (-2, 2), (-2, 2), (0, 2 * np.pi), (-1.0, 1.0)], n_samples=n_samples
)
point_sample_tensor = torch.from_numpy(point_sample_generator()).float()

phi = point_sample_tensor[:, -2].reshape(-1, 1)
theta = point_sample_tensor[:, -1].reshape(-1, 1)
theta = torch.acos(theta)

# set up the initial points
y0 = torch.cat(
    (
        torch.sin(theta) * torch.cos(phi),
        torch.sin(theta) * torch.sin(phi),
        torch.cos(theta),
    ),
    -1,
)

# initialise the decision boundary calculator
g = QuasihyperbolicDecisionBoundaryCalculator(
    model=model,
    initial_points=point_sample_tensor[
        :, :3
    ],  # torch.ones_like(y0).to(dev),#torch.distributions.uniform.Uniform(-10.,10.).sample((n_samples, 3)).to(dev),
    initial_vectors=y0,
    integrator=None, 
)

# run the computations!
g.step(100)


### Plotting the decision boundary

Once points have been moved around, we can filter them out by checking their loss compared to "1/2" and only consider points that are very close to the boundary for display.

In [None]:
# get the points that are close to the decison boundary and remove the outliers
sample_points_boundary = g.get_filtered_decision_boundary(0.01).detach().cpu().numpy()

# add the plot to tensorboard
writer.add_embedding(sample_points_boundary, tag="Decision boundary of entangled tori")
