# Compatification of feature space
Analysing decison boundaries is not an easy task, especially given the fact that the feature space is non compact.

On compact spaces it is easier to work, as they a re close and bounded (Heine-Borel). 

We propose here a method to compactifiy the feature space $\mathbb R^n$ to the projective space $\mathbb RP^n$.

The decision boundary, gets therefore sampled in each chart of $\mathbb RP^n$ uniformly. When charts are put together, the resulting point cloud (defined abstractly via a dissimilarity matrix `d_final`), can be used to compute the topology of the *compactified* decision boundary.

We believe that the topology so obtained can furthe be exploited for regularisation purposes.

In [9]:
%reload_ext autoreload
%autoreload 2

# deep learning
import torch
import numpy as np
from torch import nn
from gdeep.create_nets import Net, train_classification_nn, ToPyTorchNN
from gdeep.decision_boundary import UniformlySampledPoint, GradientFlow, Compactification
from torch import autograd  

# plot
import plotly.express as px
import pandas as pd

# ML
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances

# TDA
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_diagram


# Build datatset

We want to test our method on a 3D dataset made of 2 separate blob. We expect that the neural network decision boundary looks like and hyperplane in $\mathbb R^3$.

After compactification, we would expect to find $\mathbb RP^2$ as final result.

In [5]:
# create or import data
X_temp, y = make_blobs(n_samples=1000, centers=2, n_features=3,random_state=42)
scaler = MinMaxScaler((-2,2))
X = scaler.fit_transform(X_temp)
df_X = pd.DataFrame(X,columns=["x","y","z"])
df_X['label'] = y

px.scatter_3d(df_X, x="x",y="y",z="z",color="label")

In [6]:
# train NN
neural_net = Net(arch=[3,3])
learn = train_classification_nn(neural_net, X=X, y=y, lr=0.001, n_epochs=100, bs=32)
trained_net = learn.model

epoch,train_loss,valid_loss,accuracy,time
0,0.807614,0.86233,0.43,00:00
1,0.807084,0.861372,0.43,00:00
2,0.806386,0.86013,0.43,00:00
3,0.805439,0.858466,0.43,00:00
4,0.804156,0.856259,0.43,00:00
5,0.802443,0.853405,0.43,00:00
6,0.800215,0.849796,0.43,00:00
7,0.797395,0.845341,0.43,00:00
8,0.793916,0.839933,0.43,00:00
9,0.789705,0.833359,0.43,00:00


In [12]:
trained_net_pytorch = ToPyTorchNN(trained_net)

# initlaisation of the compactification
cc=Compactification(precision = 0.1,
                    n_samples= 500,
                    epsilon = 0.051,
                    n_features = X.shape[1],
                    n_epochs = 1000,
                    #boundary_tuple = ((-1,1),(-1,1),(-1,1)),
                    neural_net=trained_net_pytorch)

d_final, label_final = cc.create_final_distance_matrix()


In [5]:
cc.plot_chart(0)
cc.plot_chart(1)
cc.plot_chart(2)
cc.plot_chart(3)

# Embedding in 3D
The resulting compactified decision boundary gets embedded in 3D via the MDS method, as this method is suited to embed points into $\mathbb R^n$starting from a pairwise distance matrix.

In [14]:
from sklearn.manifold import MDS

embedding = MDS(n_components=3,dissimilarity="precomputed")
X_transformed = embedding.fit_transform(d_final)
print(embedding.stress_)
df3=pd.DataFrame(X_transformed,columns=["x","y","z"])
df3["label"]=label_final
px.scatter_3d(df3, x="x",y="y",z="z",color="label")

326417.5654923092


# Topology
We chec with Giotto-tda that the topology of the decison boundary is indeed that one of $\mathbb RP^2$, as expected.

In [15]:
# check topology from d_final

vr = VietorisRipsPersistence(collapse_edges=True, max_edge_length=1,
                             metric="precomputed", n_jobs=-1, 
                             homology_dimensions=(0,1,2))
diag = vr.fit_transform([d_final])

plot_diagram(diag[0])