# Comparison of sparsification strategies

In this tutorial we describe the use of the python package [dowker_comparison](https://github.com/blasern/dowker_homology) to calculate the persistent homology of point clouds. This is meant as a reference to compare the different strategies and examples in [Sparse Dowker Nerves (SDN)](https://arxiv.org/abs/1802.03655) and [Sparse Filtered Nerves (SFN)](https://arxiv.org/abs/1810.02149). The recommended sparsification strategy is implemented in [dowker_homology](https://github.com/mbr085/Sparse-Dowker-Nerves). 

In this package we provide classes for calculating persistent homology using different truncation and sparsification methods. We have intrinsic and ambient classes for most methods. All classes are follow the naming scheme `Dowker_Intrinsic_Truncation_Sparsification` or `Dowker_Ambient_Truncation_Sparsification` and take the same arguments as their counterparts in [dowker_homology](dowker_homology_tutorial.html).

### Truncation methods
We have implemented truncation methods described by Cavanna, Jahanseir and Sheehy in [A Geometric Perspective on Sparse Filtrations](https://arxiv.org/abs/1506.03797) and Definition 6 in [Sparse Dowker Nerves](https://arxiv.org/abs/1802.03655), which will be termed 'Sheehy'. 

### Restriction methods
The two restriction methods implemented here come from [Sparse Dowker Nerves](https://arxiv.org/abs/1802.03655) and are termed 'Sheehy' for the method described in Proposition 1 and 'Parent' for the method described in Definition 8.

### Examples
In this tutorial we present two quick examples. For a more complete overview see our [examples](https://github.com/mbr085/Sparse-Dowker-Nerves/tree/master/examples). Note that these may take a long time to run. 

## Clifford torus
We start with calculating Čech persistent homology of a point cloud in ℝ<sup>4</sup>. First we import the relevant python packages, set a random seed and generate the sample data used for the examples below. The data we use are 100 points on the Clifford torus. 

In [None]:
import time
start = time.time()

In [None]:
%%time
# import packages
import numpy as np
import pandas as pd
from scipy.spatial.distance import cdist
import urllib.request
from io import BytesIO
import dowker_homology as dh
import dowker_comparison as dc
np.random.seed(1)

In [None]:
# generate data
def clifford_torus(N):
    x = np.linspace(0, 2*np.pi, num=N, endpoint=False).reshape(N,1)
    y = np.sqrt(N)*x
    return np.hstack((np.cos(x), np.sin(x), np.cos(y), np.sin(y)))

coords = clifford_torus(100)

In [None]:
%%time
# set parameters
params = {'dimension': 1, 
          'multiplicative_interleaving': 2.5}
# initiate intrinsic dowker homology objects
dowker_intrinsic = dh.Dowker_Intrinsic(**params)
dowker_intrinsic_sheehy_sheehy = dc.Dowker_Intrinsic_Sheehy_Sheehy(**params)
dowker_intrinsic_sheehy_parent = dc.Dowker_Intrinsic_Sheehy_Parent(**params)
# initiate ambient dowker homology objects
dowker_ambient = dh.Dowker_Ambient(**params)
dowker_ambient_sheehy_sheehy = dc.Dowker_Ambient_Sheehy_Sheehy(**params)
dowker_ambient_sheehy_parent = dc.Dowker_Ambient_Sheehy_Parent(**params)

In [None]:
%%time
# calculate persistent homology
dowker_intrinsic.persistence(X=coords)
dowker_intrinsic_sheehy_sheehy.persistence(X=coords)
dowker_intrinsic_sheehy_parent.persistence(X=coords)
dowker_ambient.persistence(X=coords)
dowker_ambient_sheehy_sheehy.persistence(X=coords)
dowker_ambient_sheehy_parent.persistence(X=coords)

The method `cardinality_information` prints the cardinalities of the unreduced Čech nerve, of the Čech nerve of the farthest point sample and of the sparse nerve.

In [None]:
# summarize sizes
dh.dowker_functions.summarize_dowker(dowker_intrinsic,
                                     dowker_intrinsic_sheehy_sheehy,
                                     dowker_intrinsic_sheehy_parent,
                                     dowker_ambient,
                                     dowker_ambient_sheehy_sheehy,
                                     dowker_ambient_sheehy_parent)

In this case the parent restriction is slightly smaller than the Sheehy restriction. Note that for the intrinsic case the two restrictions result in homotopy equivalent nerves. In the ambient case they may differ slightly because in the parent restriction the original filtration values are used. The truncation and restriction methods presented in Sparse Filtered Nerves are considerably smaller, but result in different persistent homology with the same interleaving guarantees. 

In [None]:
print(np.allclose(dowker_intrinsic_sheehy_sheehy.homology, 
                  dowker_intrinsic_sheehy_parent.homology))

Let us just have a look at the resulting persistence diagrams

In [None]:
%%time
# plot persistent homology
dowker_intrinsic.plot_persistence(title = 'intrinsic')
dowker_intrinsic_sheehy_sheehy.plot_persistence(title = 'intrinsic sheehy sheehy')
dowker_intrinsic_sheehy_parent.plot_persistence(title = 'intrinsic sheehy parent')
dowker_ambient.plot_persistence(title = 'ambient')
dowker_ambient_sheehy_sheehy.plot_persistence(title = 'ambient sheehy sheehy')
dowker_ambient_sheehy_parent.plot_persistence(title = 'ambient sheehy parent')

## Klein bottle
Next we calculate Čech persistent homology of a point cloud in ℝ<sup>3</sup> from the [persistent homology roadmap](https://github.com/n-otter/PH-roadmap) by Nina Otter et al. 

In [None]:
# Download data and transform to numpy array
url = ('https://raw.githubusercontent.com/' +
       'n-otter/PH-roadmap/master/data_sets/' +
       'roadmap_datasets_point_cloud/' +
       'klein_bottle_pointcloud_new_400.txt')

response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8')
coords = np.genfromtxt(BytesIO(data), delimiter=" ")

In [None]:
%%time
# set parameters
params = {'dimension': 1,
          'multiplicative_interleaving': 1.5, 
          'n_samples': 100}
# initiate intrinsic dowker homology objects
dowker_intrinsic = dh.Dowker_Intrinsic(**params)
dowker_intrinsic_sheehy_sheehy = dc.Dowker_Intrinsic_Sheehy_Sheehy(**params)
dowker_intrinsic_sheehy_parent = dc.Dowker_Intrinsic_Sheehy_Parent(**params)
# initiate ambient dowker homology objects
dowker_ambient = dh.Dowker_Ambient(**params)
dowker_ambient_sheehy_sheehy = dc.Dowker_Ambient_Sheehy_Sheehy(**params)
dowker_ambient_sheehy_parent = dc.Dowker_Ambient_Sheehy_Parent(**params)
# calculate persistent homology
dowker_intrinsic.persistence(X=coords)
dowker_intrinsic_sheehy_sheehy.persistence(X=coords)
dowker_intrinsic_sheehy_parent.persistence(X=coords)
dowker_ambient.persistence(X=coords)
dowker_ambient_sheehy_sheehy.persistence(X=coords)
dowker_ambient_sheehy_parent.persistence(X=coords)

The method `cardinality_information` prints the cardinalities of the unreduced Čech nerve, of the Čech nerve of the farthest point sample and of the sparse nerve.

In [None]:
# summarize sizes
dh.dowker_functions.summarize_dowker(dowker_intrinsic,
                                     dowker_intrinsic_sheehy_sheehy,
                                     dowker_intrinsic_sheehy_parent,
                                     dowker_ambient,
                                     dowker_ambient_sheehy_sheehy,
                                     dowker_ambient_sheehy_parent)

In this case the parent restriction was larger than the Sheehy restriction for the intrinsic case and smaller for the ambient case.

In [None]:
print(np.allclose(dowker_intrinsic_sheehy_sheehy.homology, 
                  dowker_intrinsic_sheehy_parent.homology))

Let us just have a look at the resulting persistence diagrams

In [None]:
%%time
# plot persistent homology
dowker_intrinsic.plot_persistence(title = 'intrinsic')
dowker_intrinsic_sheehy_sheehy.plot_persistence(title = 'intrinsic sheehy sheehy')
dowker_intrinsic_sheehy_parent.plot_persistence(title = 'intrinsic sheehy parent')
dowker_ambient.plot_persistence(title = 'ambient')
dowker_ambient_sheehy_sheehy.plot_persistence(title = 'ambient sheehy sheehy')
dowker_ambient_sheehy_parent.plot_persistence(title = 'ambient sheehy parent')

This notebook took less than one minute to run.

In [None]:
total_time = time.time() - start
total_time / 60