# Compatification of feature space
Analysing decison boundaries is not an easy task, especially given the fact that the feature space is non compact.

On compact spaces it is easier to work, as they a re close and bounded (Heine-Borel). 

We propose here a method to compactifiy the feature space $\mathbb R^n$ to the projective space $\mathbb RP^n$.

The decision boundary, gets therefore sampled in each chart of $\mathbb RP^n$ uniformly. When charts are put together, the resulting point cloud (defined abstractly via a dissimilarity matrix `d_final`), can be used to compute the topology of the *compactified* decision boundary.

We believe that the topology so obtained can furthe be exploited for regularisation purposes.

In [1]:
%reload_ext autoreload
%autoreload 2

# deep learning
import torch
from torch.optim import Adam, SGD
import numpy as np
from torch import nn
from gdeep.models import FFNet
from gdeep.data import TorchDataLoader
from gdeep.pipeline import Pipeline
from torch import autograd  

# plot
import plotly.express as px
import pandas as pd
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram


# Build datatset

We want to test our method on a 3D dataset made of 2 separate blob. We expect that the neural network decision boundary looks like and hyperplane in $\mathbb R^3$.

After compactification, we would expect to find $\mathbb RP^2$ as final result.

In [2]:
dl = TorchDataLoader(name="DoubleTori")
dl_tr, dl_ts = dl.build_dataloader(batch_size=1)

In [3]:
# train NN
model = FFNet(arch=[3,10,10,2])
print(model)
pipe = Pipeline(model, (dl_tr, dl_ts), nn.CrossEntropyLoss(), writer)
pipe.train(SGD, 15, batch_size=1, lr=0.01)

FFNet(
  (layer0): Linear(in_features=3, out_features=10, bias=True)
  (layer1): Linear(in_features=10, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)
TOTAL EPOCHS  15
Epoch 1
-------------------------------
Training loss: 0.785355  [160/160]
Time taken for this epoch: 0s
Validation results: 
 Accuracy: 2.0%,                 Avg loss: 0.227097 

Epoch 2
-------------------------------
Training loss: 1.221726  [160/160]
Time taken for this epoch: 0s
Validation results: 
 Accuracy: 5.0%,                 Avg loss: 0.207667 

Epoch 3
-------------------------------
Training loss: 0.321733  [160/160]
Time taken for this epoch: 0s
Validation results: 
 Accuracy: 5.0%,                 Avg loss: 0.203786 

Epoch 4
-------------------------------
Training loss: 1.309438  [160/160]
Time taken for this epoch: 0s
Validation results: 
 Accuracy: 5.0%,                 Avg loss: 0.200171 

Epoch 5
-------------------------------
Training loss: 1.310086  [1

In [4]:
from gdeep.visualisation import Visualiser

vs = Visualiser(pipe)
vs.plot_data_model()
db, _, _ = vs.plot_decision_boundary()


AttributeError: module 'tensorflow._api.v2.io.gfile' has no attribute 'get_filesystem'

# Topology
We chec with Giotto-tda that the topology of the decison boundary is indeed that one of $\mathbb RP^2$, as expected.

In [None]:
# check topology from d_final

vr = VietorisRipsPersistence(collapse_edges=True, max_edge_length=1,
                             metric="euclidean", n_jobs=-1, 
                             homology_dimensions=(0,1,2))
diag = vr.fit_transform([db])

plot_diagram(diag[0])

# Lower level use of the modules

In this short section,. we show how to directly use the functionalities of the decision boundary calculators.

In [None]:
from gdeep.analysis.decision_boundary import QuasihyperbolicDecisionBoundaryCalculator, UniformlySampledPoint

n_samples = 100

for param in model.parameters():
    param.requires_grad = False

point_sample_generator = UniformlySampledPoint([(-2,4),(-2,2),(-2,2),(0,2*np.pi),(-1.,1.)], n_samples=n_samples)
point_sample_tensor = torch.from_numpy(point_sample_generator()).float()

phi = point_sample_tensor[:,-2].reshape(-1,1)
theta = point_sample_tensor[:,-1].reshape(-1,1)
theta = torch.acos(theta)

y0 = torch.cat((torch.sin(theta) * torch.cos(phi),\
                torch.sin(theta) * torch.sin(phi),\
                torch.cos(theta)), -1)


g = QuasihyperbolicDecisionBoundaryCalculator(
            model=model,
            initial_points=point_sample_tensor[:,:3],#torch.ones_like(y0).to(dev),#torch.distributions.uniform.Uniform(-10.,10.).sample((n_samples, 3)).to(dev),
            initial_vectors=y0,
            integrator=None #lambda params: torch.optim.Adam(params, )
)
g.step(100)

In [None]:
sample_points_boundary = g.get_filtered_decision_boundary(0.01).detach().cpu().numpy()
#sample_points_boundary = g.get_decision_boundary().detach().cpu().numpy()

writer.add_embedding(sample_points_boundary,
                     tag='Decision boundary of entangled tori'
                     )
