Skip to content
Jeremy edited this page Mar 31, 2025 · 14 revisions

Presentation

LAGOON-MCL has been designed to process large sets of protein sequences and their annotations using sequence similarity networks and clustering, while requiring as few resources as possible (CPUs and RAM) for each process.

Sequence similarity networks (or graphs) can be used to visualize the relationships between proteins. We can then apply graph clustering algorithms to build protein clusters. Because of the relationship that exists between proteins (similarity), the resulting clusters can be putative protein families, i.e. clusters made up of sequences with a similar function. LAGOON-MCL builds clusters in two steps (1) pairwise alignment of all sequences with Dimond BLASTp, construction of the SSN, (2) clustering of the network with Markov CLustering algorithm.

LAGOON-MCL then links the annotations (e.g. functional, taxonomic, etc.) of the sequences to the clusters. A homogeneity score is calculated (more information here). This is an indication of the consistency of annotations within a cluster. For this, annotations can either be supplied by the user or determined by the pipeline using the Pfam database.

Pipeline

Clone this wiki locally