Mirlo helps automate the construction of data sets for multi-gene phylogenetic analyses. It takes whole proteomes as input, identifies single-copy gene families, aligns the proteins, and evaluates the 'phylogenetic signal' of each alignment.
What mirlo does:
- Identify single-copy gene families (clusters) in an OrthoFinder report.
- Align each single-copy family using MAFFT
- Construct a phylogenetic tree of each family using PhyML and compute SH-like support values for each branch.
- Compute a 'phylogenetic signal' for each family by computing the mean SH-like values for all branches in each tree. This step is inspired by Salichos L, and Rokas A: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 2013, 497:327–331. The authors show that genes with higher phylogenetic signals have phylogenies that are more congruent with the species tree. Here, I use SH-like support values instead of bootstrap values because they are much faster to compute.
What mirlo does NOT do:
-
It does not edit the alignments
-
It does not construct a phylogenetic tree from the concatenated alignment.
Mirlo is a work in progress.
If you have questions or need help, email me:
Michael Thon mthon@usal.es
- OrthoFinder http://www.stevekellylab.com/software/orthofinder
- ProtTest 3.4 https://code.google.com/p/prottest3/
- PhyML
- mafft
- BioPython
Uncompress the Mirlo distribution file somewhere on your computer.
You should have one file of protein sequences in fasta format for each species that you plan to include in your analysis. The file names should be short as they will be used during the mirlo analysis to indicate each species in the multiple sequence alignments.
-
run OrthoFinder.
-
run mirlo.py You can run
python mirlo.py -h
for more information on the command line parameters. You will need the Orthogroups.csv file that was generated by OrthoFinder. Mirlo will output a list of orthologous clusters that have a single gene copy in each species, followed by a value that represents phylogenetic signal. From that list you can decide which orthologous groups you want to include in the concatenated alignment. For example, you may want to use the five clusters with the highest phylogenetic signal. -
use
cat_alignments.py
to concatenate the selected clusters into one large alignment. -
The concantenated alignment can then be used for phylogenetic analysis.