# Drawing Cycles with Giotto-TDA

## 1. Introduction and Motivation

Information captured by persistent homology is commonly represented by the persistence diagram or, equivalently, a persistence barcode. Even if mathematically sound, these representations are not well suited for exploratory data analysis. They represent effectively the lifespan of each topological feature but they loose connection with the original domain space.

The main challenge that a user has to face is then "guessing" if the topological features captured by persistent homology correspond to the features of interest for her/his application.

...


## 2. Anaysis

### 2a. Dataset

![dataset](./pictures/dataset.png)

We are using a sample of the original MNIST dataset downloaded from [here](http://yann.lecun.com/exdb/mnist/). The dataset provided with this submission includes *** handwritten digitis (50 images per digit). The original dataset already provides digits that have been size-normalized and centered in a fixed-size image.

### 2b. Preprocessing

The first step for using persistent homology is that of defining a filtration on the input data.
Since our goal is to study "meaningful" cycles for the input digits, we cannot use the input scalar function that associates low values to the backround and high function values to the digits' pixels.

A natural alternative would be negating the input scalar function in order to have low function values corresponding to the pixels caracterizing the handwritten digit and high function values corresponding to the pixels characterizing the background.

However, we noticed that large flat regions in the input scalar field (the background) can originate spurious cycles due to the preprocessing algorithm in TTK. For this reason we have decided to modify the input scalar function by using the distance transform.

![filtrations](pictures/filtrations.png)
Image generated with `./scripts/filtrations.py`

For each image, persistence diagrams and representative cycles are generated using the Topology Toolkit (TTK) [TTK](https://topology-tool-kit.github.io), in particular a [module](https://github.com/IuricichF/PersistenceCycles) we developed in house for the efficient computation of representative cycles on 2D and 3D meshes.

Files generated by TTK are in the folder `data/vtk/*`.
VTK files are read with `meshio`.

In [38]:
import sys
!{sys.executable} -m pip install meshio



In [49]:
import meshio
meshio.read("./data/vtk/0/3_pd.vtk")

<meshio mesh object>
  Number of points: 26
  Number of cells:
    line: 13
  Point data: CellDimension, Filtration
  Cell data: Persistence, Type

The persistence diagram is defined as a set of lines representing persistence pairs. 
For a line, the two end-points are geometrically located where the homology class was born/died.

For each line the file contains:
- the persistence value associated with the pair
- the dimension of the homology class

For each point the file contains:
- the dimension of the corresponding simplex
- the filtration value

In [48]:
import meshio
meshio.read("./data/vtk/0/3_cycles.vtk")

<meshio mesh object>
  Number of points: 47
  Number of cells:
    line: 47
  Cell data: CycleId, Filtration

The representative cycles are defined as chains of lines. 

For each line the file contains:
- the unique id of the cycle associated to the persistence pair. In practice, edges having the same value of CycleId, belong to the same cycle. Moreover, the CycleID indicate the persistence pair corresponding to the cycle. 
- the persistence value associated to the persistence pair


### 2b. Visualizing and comparing cycles

COMMENT - You can focus on this part for now. 

The idea is to have and showcase
- a function for visualizing all cycles overlaied to the original image (i think you can use the original vtk files instead of the json files which are only good for the web-interface)
- a function for visualizing in a single plot, persistence images (from giotto-tda), persistence diagram (from my vtk file), persistence cycles on top of original image (reusing previous function)
- a function to compare two datasets. This should create a plot calling the previous visualization on two distinct images. This function should have an input parameter were we can specify to highlight the most distinct pixels in the persistence image and the most distinct cycles.

## 3. Benchmark

## 4. Limitations and Perspectives