# Topological Data Analysis

We'll now experiment with calculating the topology of objects within various different MuSpAn domains. A good dataset for this is the ```'Synthetic-Points-Architecture'``` dataset, which is designed to have structures with interesting topology (in this context, loops or connected components).

In [None]:
import muspan as ms
import matplotlib.pyplot as plt

domain = ms.datasets.load_example_domain('Synthetic-Points-Architecture')
ms.visualise.visualise(domain, 'Celltype')

Unlike many of the other methods that we've looked at this week, MuSpAn's topological data analysis tools just take in a single population of objects. Let's start by calculating the topology of each of the different cell types individually. We'll use ```ms.topology.vietoris_rips()``` to generate the persistence diagrams associated with each point cloud.

1. Edit the code below to plot the persistence diagrams associated with each cell type.

In [None]:
cell_types = ['A','B','C','D']

for i, cell_type in enumerate(cell_types):
    fig, axes = plt.subplots(nrows=1, ncols=2)
    population = ?????
    # Visualise the points in the left hand axis
    ms.visualise.visualise(domain, color_by='Celltype', objects_to_plot=population, ax=axes[0])
    # Now visualise the persistence diagram on the right hand axis
    ????   
    

2. Focus on the `H_0` part of the persistence diagrams. Each point represents a distinct "connected component". Can you see this in the data that you've plotted? How do the plots from Celltypes A and B differ from Celltype C? And how does the Celltype D plot differ?

3. Try loading in the different synthetic datasets and using the Vietoris Rips filtration to answer the following questions:
    - How many "crypts" are there in the ```'Synthetic-Points-Architecture'``` dataset?
    - How many distinct clusters are there in the ```'Synthetic-Points-Aggregation'``` dataset?
    - How many aggregates are there in the Celltype C population of the ```'Synthetic-Points-Architecture'``` dataset?
    - If you combine the Celltype A and B populations in the ```'Synthetic-Points-Architecture'``` dataset into a single population, how many loops are there? Is this the same as the number of Celltype C aggregates? Can you see why?
    - Can you identify any loops in the Celltype C population in the ```'Synthetic-Points-Exclusion'``` dataset? Can you see why / why not?

4. Load the ```'Xenium-Healthy-Colon'``` dataset. 
    - Using queries, select one cell from the PROG population.
    - Find the area and perimeter of the cell nucleus.
    - Get all transcripts for this cell which are not in the nucleus.
    - Calculate the Vietoris-Rips persistence diagram for these transcripts. Can you see the loop formed around the hole left by the nucleus?
    - How does the 'death' value of this loop relate to the area and perimeter of the nucleus that you calculated earlier? 
    - - (Hint - divide the death value by sqrt(3)...)
    - - (Hint - assuming that the nucleus was a circle, given the area you've found, what would it's radius be?)
