# Computing decision boundary for geometric examples in dimension 2 and 3

In this notebook we will use gradient flow from gdeep.decision_boundary to compute the decision boundary of a neural network trained on a binary classification problem

In [32]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline


import math

seed=42

import numpy as np
from sklearn import datasets

import torch
import torch.nn as nn
import torch.nn.functional as F


from gtda.plotting import plot_point_cloud

import pandas as pd
import plotly.express as px

from gdeep.decision_boundary import *
from gdeep.create_nets import Net
from gdeep.plotting import plot_decision_boundary, plot_activation_contours
from gdeep.create_nets.utility import train_classification_nn
from gdeep.create_data.tori import make_torus_point_cloud, Rotation

## Example 1: Decision boundary for two circles in $\mathbb R^2$

We have constructed two data sets $A$ and $B$ with the labels $0$ and $1$. These are either concentric or separated from one another by adding `[2,0]` or not.

In [22]:
data, label = datasets.make_circles(n_samples=5000, noise=0.05, factor=0.3, random_state=seed)
df=pd.DataFrame(data, columns=["x","y"])
df["label"]=label

In [23]:
px.scatter(df,x="x",y="y",color="label")

In [24]:
# define fully connected NN architecture
circle_detect = Net(0, [2,10,10])


# Train neural net on binary classification task
train_classification_nn(circle_detect, data, label, n_epochs=10)

epoch,train_loss,valid_loss,accuracy,time
0,0.678466,0.668934,0.455,00:00
1,0.618413,0.587176,0.953,00:00
2,0.511491,0.471549,1.0,00:00
3,0.423609,0.397729,1.0,00:00
4,0.371523,0.357968,1.0,00:00
5,0.345541,0.339283,1.0,00:00
6,0.333301,0.330603,1.0,00:00
7,0.327649,0.326662,1.0,00:00
8,0.325293,0.325208,1.0,00:00
9,0.324635,0.324984,1.0,00:00


<fastai.tabular.learner.TabularLearner at 0x7f23561fbd30>

We will uniformely distribute points in the box `[(-1,1),(-1,1)]` and we pull push these point `x_i` in the opposite direction of the gradient of gradient of `(Net(x_i)-0.5)^2`. This will be done `n_epochs` times.

In [37]:
gradient_flow_db = GradientFlow(circle_detect, boundary_tuple=[(-1,1),(-1,1)])
sample_points_boundary = gradient_flow_db.compute_boundary()
plot_decision_boundary(data, label, sample_points_boundary, n_components=2)

We can verify our result by looking at the contour plot of the neural net.

In [7]:
plot_activation_contours(circle_detect)

## Example 2: Decision boundary for two tori in $\mathbb R^3$

We will first generate a binary data set with two unentanbled tori in $\mathbb R^3$ and similarly for two entanbled tori.

The point clouds for the tori and the lables are stored in a dictionary.

In [8]:
torus_point_cloud = {'ent': {0: {}, 1: {}}, 'unent': {0: {}, 1: {}}}
torus_labels = {'ent': {0: {}, 1: {}}, 'unent': {0: {}, 1: {}}}

# Generate torus point cloud for unentangled tori
torus_point_cloud['unent'][0], torus_labels['unent'][0] = make_torus_point_cloud(0, 50, 0.0,\
    Rotation(1,2,math.pi/2), np.array([[0,0,0]]), radius=.3)
torus_point_cloud['unent'][1], torus_labels['unent'][1]  = make_torus_point_cloud(1, 50, 0.0,\
    Rotation(1,2,0), np.array([[6,0,0]]), radius=.3)

# Generate torus point cloud for unentangled tori
torus_point_cloud['ent'][0], torus_labels['ent'][0] = make_torus_point_cloud(0, 50, 0.0,\
    Rotation(1,2,math.pi/2), np.array([[0,0,0]]), radius=.3)
torus_point_cloud['ent'][1], torus_labels['ent'][1]  = make_torus_point_cloud(1, 50, 0.0,\
    Rotation(1,2,0), np.array([[2,0,0]]), radius=.3)


# Concatenate torus point clouds
tori_point_cloud = {}
tori_labels = {}

for config in ['ent', 'unent']:
    tori_point_cloud[config] = np.concatenate((torus_point_cloud[config][0],\
                                torus_point_cloud[config][1]), axis=0)
    tori_labels[config] = np.concatenate((torus_labels[config][0],\
                                torus_labels[config][1]), axis=0)

# Plot data sets
df_tori = {}

for config in ['ent', 'unent']:
    df_tori[config] = pd.DataFrame(tori_point_cloud[config], columns = ["x", "y", "z"])
    fig = px.scatter_3d(df_tori[config], x="x", y="y", z="z", color=tori_labels[config], title="Tori "+config+"angled")
    fig.show()

In the next step we will train fully connected neural networks on the binary classification task.

In [9]:
# Define neural network architecture
tori_detect_nn = {}
tori_detect_nn['unent'] = Net(0, [3,20,20,20,20])
tori_detect_nn['ent'] = Net(0, [3,20,20,20,20])

# Print the architecture of both neural nets
for config in ['ent', 'unent']:
    print('Architecture of Neural Net for ' + config + 'angled:\n', tori_detect_nn[config])

# Train neural neural nets on data sets
for config in ['ent', 'unent']:
    print('Training of Neural Net for ' + config + 'angled')
    train_classification_nn(tori_detect_nn[config], tori_point_cloud[config], tori_labels[config], n_epochs=18)

Architecture of Neural Net for entangled:
 Net(
  (layer0): Linear(in_features=3, out_features=20, bias=True)
  (layer1): Linear(in_features=20, out_features=20, bias=True)
  (layer2): Linear(in_features=20, out_features=20, bias=True)
  (layer3): Linear(in_features=20, out_features=20, bias=True)
  (layer4): Linear(in_features=20, out_features=2, bias=True)
)
Architecture of Neural Net for unentangled:
 Net(
  (layer0): Linear(in_features=3, out_features=20, bias=True)
  (layer1): Linear(in_features=20, out_features=20, bias=True)
  (layer2): Linear(in_features=20, out_features=20, bias=True)
  (layer3): Linear(in_features=20, out_features=20, bias=True)
  (layer4): Linear(in_features=20, out_features=2, bias=True)
)
Training of Neural Net for entangled


epoch,train_loss,valid_loss,accuracy,time
0,0.821966,0.732256,0.553,00:00
1,0.56782,0.499527,0.811,00:00
2,0.414597,0.344546,0.996,00:00
3,0.329646,0.319527,1.0,00:00
4,0.317448,0.315672,1.0,00:00
5,0.314795,0.314272,1.0,00:00
6,0.313935,0.313735,1.0,00:00
7,0.313585,0.313495,1.0,00:00
8,0.313425,0.313383,1.0,00:00
9,0.313348,0.31333,1.0,00:00


Training of Neural Net for unentangled


epoch,train_loss,valid_loss,accuracy,time
0,0.814904,0.802261,0.511,00:00
1,0.814904,0.802261,0.511,00:00
2,0.814904,0.802261,0.511,00:00
3,0.814904,0.802261,0.511,00:00
4,0.814904,0.802261,0.511,00:00
5,0.814904,0.802261,0.511,00:00
6,0.814904,0.802261,0.511,00:00
7,0.814904,0.802261,0.511,00:00
8,0.814904,0.802261,0.511,00:00
9,0.814904,0.802261,0.511,00:00


As in the first example we will sample points in a box and let them flow in the direction of the decison boundary using our gradient flow method.

In [10]:
# Apply gradient flow to detect decision boundary
n_samples = 10000

boundary_tuple = {}

boundary_tuple['ent']   = [(-2, 4), (-2, 2), (-2, 2)]
boundary_tuple['unent'] = [(-3, 7), (-2, 2), (-2, 2)]

sample_points_boundary = {}

for config in ['ent', 'unent']:
    sample_points_boundary[config] = gradient_flow(tori_detect_nn[config],\
        boundary_tuple[config], n_samples=n_samples)


In the last step we plot the data set and the computed decision boundary

In [11]:
for config in ['ent', 'unent']:
    plot_decision_boundary(tori_point_cloud[config], tori_labels[config], sample_points_boundary[config], n_components=3)

IndexError: tuple index out of range

## TODO: Density of decision boundary points

too be written

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance_matrix.html


![](images/density_point_cloud.png)

