<a href="https://colab.research.google.com/github/fbeilstein/topological_data_analysis/blob/master/lecture_9_persistence_homology_complexes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Definition 4.**
**$p$-Cycles $Z_p(\mathcal{K})$** is a set of **$p$-cycles** $z_p$: $Z_p(\mathcal{K}) = \{z_p \in C_p(\mathcal{K}) | \partial_p z_p = 0\}$, i.e. the \textbf{kernel of $\partial_p$}.



**Definition 5.**
**$p$-Boundaries $B_p(\mathcal{K})$** is a set of **$p$-cycles** $b_p$:
$$
B_p(\mathcal{K}) = \{b_p \in C_p(\mathcal{K}) | \exists c_{p+1} \in C_{p+1}(\mathcal{K}): \partial_{p+1} c_{p+1} = b_p\},
$$
i.e. the **image of $C_{p+1}(\mathcal{K})$ under $\partial_{p+1}$**.



**Theorem.**
$\partial_{p-1} \circ \partial_p = 0$.
Thus $B_p(\mathcal{K})\triangleleft Z_p(\mathcal{K})$.

$\blacktriangleleft$ see~\cite{maunder,nash}; see **Definitions 4, 5**, note all subgroups of abelian groups are normal.$\blacksquare$


**Definition 6.**
The **$p$-dimensional homology group** of $\mathcal{K}$ is the quotient group $H_p(\mathcal{K}) = Z_p(\mathcal{K}) / B_p(\mathcal{K})$.



**Definition 4** rigorously defines cycles for us, while **Definition 5** tells us which of them are "filled in," i.e. contain no holes. The last **Definition 6** says "consider closed cycles but disregard anything that is filled in," i.e. we are only interested in "something with holes." Now the problem is that elements of $H_p(\mathcal{K})$ are not only those with one hole, but they are rather "generated by holes." So we need to "extract basis" somehow and the following **Theorems 2 and 3** come in handy.


**Theorem 2.**
Homology group $H_p(\mathcal{K})$ of complex $\mathcal{K}$ is a finitely generated abelian group.

$\blacktriangleleft$ see~\cite{maunder,nash} $\blacksquare$


**Theorem 3.**
Let $A$ be a finitely generated (not free!) abelian group with $n$ generators, then there exists a unique (except for the order of its members) list of primes $p_1$,...,$p_m$ (not necessarily distinct) and positive integers $s_1$,...,$s_m$, such that
$$
A \cong G \oplus \underbrace{\mathbb{Z}_{p_1^{s_1}} \oplus \cdots \oplus \mathbb{Z}_{p_m^{s_m}}}_{T},
$$
where $T$ is called the **torsion subgroup**, $\mathbb{Z}_{{p_i^{s_i}}}$ are cyclic groups of order $p_i^{s_i}$, and $G$ is free abelian group.
The rank of $G$ is $n - m$.

$\blacktriangleleft$ see~\cite{hungerford} Theorem 2.6$\blacksquare$


The procedure is somewhat similar to the decomposition of a number into prime factors.
In practice, it is performed by representing operators $\partial_p$ as matrices and employing the Smith normal form~\cite{smith}, but here we only outline the theoretical basis.


**Definition 7.**
The rank of $G$ from **Theorem 3** for $A = H_p(\mathcal{K})$ is called the **$p$-th Betti number $\beta_p$** of the geometric simplicial complex $K$.


Please note: despite the fact we have defined Betti numbers for abstract simplicial complex $\mathcal{K}$, they are inherently connected to its geometric realization $K$.
Thus Betti numbers can be treated as topological characteristics of the polyhedron $|K|$.
Moreover, topology makes no distinction between homeomorphic spaces, thus the same characteristic can be prescribed to any space $\mathbb{X}$ that is homeomorphic to $|K|$.
This property is summarized by the following.


**Definition 8.**
A **triangulation** of topological space $\mathbb{X}$ is a geometric simplicial complex $K$ together with a homeomorphism $f: |K|\to \mathbb{X}$.
If there exists such $K$ the space $\mathbb{X}$ is called **triangulable**.
The homology groups of a triangulable space $\mathbb{X}$ are defined $H_p(\mathbb{X}) = H_p(\mathcal{K})$.


**Theorem.**
Homology groups $H_p(\mathbb{X})$ and $H_p(\mathbb{Y})$ of homeomorphic topological spaces are isomorphic for each $p$.

$\blacktriangleleft$ see~\cite{vick} Theorem 1.7$\blacksquare$


The latter means that homology groups of the triangulable space (**Definition 8**) are well-defined and that the notion of Betti numbers can be extended to topological spaces that are homeomorphic to some polyhedron $|K|$

[For visuals please check](https://fbeilstein.github.io/topological_data_analysis/homology_explorer/homology_explorer.html).




###From Data to Simplicial Complexes


At the moment we have done nothing to our dataset and it's about time to take the data points into account. Here we present a few methods of creating simplicial complexes out of date apoints thus connecting their positions to the topology of a certain manifold. There are different methods of constructing an abstract simplicial complex from the data points and the exact choice may depend on the problem and computational resources at your disposition. Note that **Theorem 1** warranties that whatever we come up with will have a geometric representation, but its dimension may be higher than the original dataset we started with.
Here we start with the two most popular choices ($\alpha$ complex that we use in the article can be thought of as a variation of \u{C}ech complex)


**Definition 9.**
Let $(M;\rho)$ be a metric space.
Given a finite set of points $x_i$ in $M$ (the dataset) and a real number $\alpha > 0$, the (abstract) \textbf{Vietoris-Rips complex} is constructed as follows:
* its abstract vertices $v_i$ are in one-to-one correspondence with $x_i$ from $M$;
* it contains a simplex $\sigma^n = (v_0;\cdots;v_n)$ if and only if for each pair of vertices $v_i$ and $v_j$ the distance between corresponding points in $M$ is $\rho(x_i;x_j) \leq \alpha$.


**Definition 10.**
Let $(M;\rho)$ be a metric space.
Given a finite set of points $x_i$ in $M$ (the dataset) and a real number $\alpha > 0$, the (abstract) **\u{C}ech complex** is constructed as follows:
* its abstract vertices $v_i$ are in one-to-one correspondence with $x_i$ from $M$;
* it contains a simplex $\sigma^n = (v_0;\cdots;v_n)$ if and only if there is non-empty intersection $\bigcap_{i=0}^n B(x_i;\alpha) \neq \emptyset$.


Computer scientists often prefer the Vietoris-Rips complex (**Definition 9**) as you need less computational resources to calculate it in higher-dimensional spaces.
On the other hand, the following **Theorem 4** is a cornerstone of topological data analysis that connects \u{C}ech complex and topology of the union of balls from **Definition 10**, thus it's often preferred by physicists.


**Definition.**
Given an open cover $\mathcal{U} = (U_i)_{i \in I}$ of topological space $\mathbb{X}$, the nerve of $\mathcal{U}$ is the abstract simplicial complex $C(\mathcal{U})$ whose vertices are the $U_i$'s and such that
$$
\sigma = (U_{i_0};\cdots;U_{i_k}) \in C(\mathcal{U}) \iff \bigcap_{j=0}^k U_{i_j} \neq \emptyset.
$$


**Theorem 4.**
(Nerve Theorem). Let $\mathcal{U} = (U_i)_{i \in I}$ be a cover of a paracompact space $\mathbb{X}$ by open sets such that the intersection of any subcollection of the $U_i$'s is either empty or contractible. Then, $\mathbb{X}$ and the nerve $C(\mathcal{U})$ are homotopy-equivalent (a different term, don't confuse with homology), i. e. $\mathbb{X}$ and the nerve $C(\mathcal{U})$ are smoothly deformable into one another. See https://encyclopediaofmath.org/wiki/Homotopy_type

$\blacktriangleleft$ see \cite{hatcher} Corollary 4G.3 or \cite{alexandroff}$\blacksquare$


The latter **Theorem 4** is very famous but its statement needs few more complex concepts from topology such as "paracompactness," "contractibility," "homotopy equivalence."
To avoid the problem let's reformulate it in a more convenient form.


**Theorem 5.**
Assume that we are given a finite set of points $x_i$ in $\mathbb{R}^n$ and a real number $\alpha > 0$.
Consider homology groups $H_p(\mathbb{X})$ of the topological space $\mathbb{X}$ obtained as a union of closed balls $B(x_i;\alpha)$.
$H_p(\mathbb{X})$ are isomorphic to the homology groups of the polyhedron of the realization of \u{C}ech complex of these points with parameter $\alpha$.

$\blacktriangleleft$ This is a partial case of \cite{borsuk} Corollary 3, p.234 $\blacksquare$


Please note that the polyhedron of \u{C}ech complex may be non-homeomorphic to $\mathbb{X}$ even in simple cases and belong to a higher-dimensional space [For visuals please check](https://fbeilstein.github.io/topological_data_analysis/persistent_homology_explorer/persistent_homology_explorer.html).


Also note, that everything in the **Definitions 10 and 9** depends on $\alpha$: the Betti curves we draw in the article depend on this $\alpha$. A manifold is not something given to us but rather hypothesized by us, thus in practice, it may be reasonable to use complexes other than \u{C}ech complex, especially when they are easier to compute. In this work, we used $\alpha$-complex that is homotopy equivalent to the \u{C}ech complex, but it could have been some different complex as well.

**Definition 11.**
The **$p$-th Betti curve** is a plot of $\beta_p$ from **Definition 7** vs parameter $\alpha$ that we used to construct a simplicial complex (see **Definitions 10 and 9**).


The last **Definition 11**, basically, finishes the consideration.