Skip to content

Analyzing MCL clusterings (effects of Inflation value)

Felipe Vaz Peres edited this page May 6, 2024 · 6 revisions

Analyzing MCL clusterings

MCL clustering was performed with this script, using 10 Inflation values for each dataset - 1.3, 1.8, 2.3, 2.8, 3.3, 3.8, 4.3, 4.8, 5.3, 5.8.

The correlation of gene expression leads to distinct networks for the 3 datasets (varying in levels of inflation and also cluster size).

Note: These results do not indicate a single clustering as the 'best' option, as all clusterings appear to be at least acceptable. They do help in illustrating the relative advantages of each clustering.

I developed the following scripts to plot Cluster Size Distribution and Efficiency Peak

Hoang2017

Almost all genes are part of three modules (size ~30k). As inflation increases, the network gradually breaks down.

All clusterings captures huge edge mass (~70 percent) using only ~6 percent of 'area'.

Correr2020

Almost all genes are part of four modules (size ~35k). As inflation increases, the network gradually breaks down.

All clusterings captures huge edge mass fraction (~99 percent) using only ~10 percent of 'area'.

Perlo2022

Almost all genes are part of a single connected component (size ~14k). As inflation increases, the network gradually breaks down, eventually forming clusters with 1 and 2 genes.

This data shows that there is little variaton in the cluster structure. The 1.3 clustering captures nearly all edge mass (89 percent) using only 6 percent of 'area'. The 5.8 clustering captures 66 percent of the mass using 1.2 percent of area.