Pipeline of my work on the usage of persistence homology to cluster single sample Lioness networks.
- Download data from TCGA using tcga_download.R
- Expression matrix is subsetted for the L1000 genes.
- Expression counts are log transformed.
- For each sample we run Lioness. Transform the score into probabilities. Artificially set the diagonal to 1. We consider this sort of a "correlation similarity" matrix.
- This is transformed into a "distance"/similarity network.
- Run Persistence homology using Gudhi
- Export all figures and diagonal.
- Create distance matrices using the bottleneck distance. (Maybe this can be improved to use the Wasserstein distance)
- Cluster using k-means, generally with k=2.
- Create design file to be used with Limma in R.
- In R run Limma and export the t-scores.
- Pre-ranked gene set enrichment test (gsea).