# Notebook 1: Shape hunting

In this notebook, we will look at Vietoris-Rips complexes on noisy data and see if it can detect the shape.

This notebook contains **2 compulsory Exercises.**

In [None]:
!pip install gudhi

In [None]:
#some imports
import gudhi
import numpy as np
import matplotlib.pyplot as plt

Documentation for `gudhi`, just for reference:
http://gudhi.gforge.inria.fr/python/latest/

## Example 1: a noisy circle
Let's start with a noisy data set sampled from a circle. See Notebook 0 for the details on how we generated this data set. We'll use less points than we did in Notebook 0 to make things easier.

In [None]:
#make the noisy data
np.random.seed(2022) #feel free to ignore this, it just makes sure everyone gets the same random noise
X = [np.cos(t*2*np.pi) for t in np.arange(0,1,0.1)]
Y = [np.sin(t*2*np.pi) for t in np.arange(0,1,0.1)]
X_noise = np.random.normal(scale=0.05, size=len(X))
Y_noise = np.random.normal(scale=0.05, size=len(Y))
X = np.array(X) + np.array(X_noise)
Y = np.array(Y) + np.array(Y_noise)

In [None]:
#look at it in the plane
plt.scatter(X, Y)
plt.axis('equal')

Now we are going to let gudhi work out the VR-complex persistent homology for this data set. Gudhi has an in-built function for calculating VR-complexes. We use the "zip" function to turn our X and Y vectors into a list of pairs $(x_1,y_1), (x_2,y_2), \ldots $

In [None]:
rips = gudhi.RipsComplex(max_edge_length=10, points=zip(X, Y))

We're now going to turn this complex into a SimplexTree object, which is gudhi's (rather badly named) main way of representing filtered simplicial complexes. We'll specify a maximum dimension so the poor algorithm doesn't have to deal with 30-simplices.

In [None]:
simplex_tree = rips.create_simplex_tree(max_dimension=3)

If you want to look inside the SimplexTree, use get_filtration(). It returns what's called an *iterator*, which is like a list but which you can only access by walking through one step at a time. Let's print out the entries. 

Make sure you understand what you're looking at. Each entry is a simplex (with vertices given by numbers) and a filtration value, that is, the value when it appears for the first time. 

In [None]:
printed = 0
for f in simplex_tree.get_filtration():
  print(f)

Okay, time for persistence, let's compute it using get_persistence().

In [None]:
persistence = simplex_tree.persistence()

Let's look at persistence. Each entry has the dimension, followed by the (birth, death) pair. Again, make sure this feels right to you.

In [None]:
persistence

You could plot this manually if you wanted, but gudhi has built-in persistence plotting. 

In [None]:
gudhi.plot_persistence_diagram(persistence, legend=True)
plt.gca().set_aspect('equal')

We can also plot barcodes

In [None]:
gudhi.plot_persistence_barcode(persistence, legend=True)

Did the barcodes and diagram look like you expected? Did they "find the circle"? 

## Exercise 1: Two circles

By filling in the ... below, make two noisy circles with different centers, plot them and then plot their persistence. Depending on your choice of centers, the persistence may or may not look like your expected! 

**Side note:** If you want to combine two datasets, you can use the np.append() function, like this:

In [None]:
X = [1,1,1]
print("Before:", X)
X_new = np.append(X, [1,2,3])
print("After: ", X_new)

In [None]:
#make the first set of noisy data
X = [np.cos(t*2*np.pi) for t in np.arange(0,1,0.1)]
Y = [np.sin(t*2*np.pi) for t in np.arange(0,1,0.1)]
X_noise = np.random.normal(scale=0.05, size=len(X))
Y_noise = np.random.normal(scale=0.05, size=len(Y))
X1 = np.array(X) + np.array(X_noise)
Y1 = np.array(Y) + np.array(Y_noise)

In [None]:
#make the second set of noisy data with a different center
X = [np.cos(t*2*np.pi) + ... for t in np.arange(0,1,0.1)]
Y = [np.sin(t*2*np.pi) + ... for t in np.arange(0,1,0.1)]
X_noise = np.random.normal(scale=0.05, size=len(X))
Y_noise = np.random.normal(scale=0.05, size=len(Y))
X2 = np.array(X) + np.array(X_noise)
Y2 = np.array(Y) + np.array(Y_noise)

In [None]:
X_together = np.append(... , ...)
Y_together = np.append(... , ...)

In [None]:
#look at it in the plane
plt.scatter(... , ...)
plt.axis('equal')

In [None]:
rips = gudhi.RipsComplex(max_edge_length=10, points=zip(... , ...))
simplex_tree = rips.create_simplex_tree(max_dimension=3)
persistence = ...

In [None]:
gudhi.plot_persistence_diagram(..., legend=True)
plt.gca().set_aspect('equal')

In [None]:
gudhi.plot_persistence_barcode(..., legend=True)

# Exercise 2: Two circles with different radii

**(a)** Write code to make two noisy circles with different centers _and_ different radii. Plot the VR persistence and adjust the noise level, centers and radii until you see two different high persistence points in dimension 1.

**(b)** Take what you did for (a) and increase the scale of the noise being added (i.e. the standard deviation) until you only see one high persistence point (i.e. one circle) in the persistence diagram. It may help to make the radii very different.

*Note: there's no formal definition of "high persistence", so feel free to use your judgement about how far from the diagonal counts as high persistence*

# Challenges
The following challenges below are for extra credit for 400 level students. For 600 level students, I require that you attempt at least one of these.
1. Write some code to generate noisy data with $k$ clusters for a $k$ of your choice. Look the persistence and see if it detects the number of clusters.
2. Write a simple algorithm that counts clusters in 2D data by counting persistence features over a certain persistence threshold. Apply this algorithm to the datasets you generated in (1.) and see how often it correctly outputs the number of clusters you designed it to have. 
3. Make a 3D data set which is a noisy sample from a hollow sphere and compute its persistence.