Skip to content
/ mirlo Public

A set of scripts to help automate the construction of data sets for multi-gene phylogenetic analyses.

License

Notifications You must be signed in to change notification settings

mthon/mirlo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mirlo

Overview

Mirlo helps automate the construction of data sets for multi-gene phylogenetic analyses. It takes whole proteomes as input, identifies single-copy gene families, aligns the proteins, and evaluates the 'phylogenetic signal' of each alignment.

What mirlo does:

  • Identify single-copy gene families (clusters) in an OrthoFinder report.
  • Align each single-copy family using MAFFT
  • Construct a phylogenetic tree of each family using PhyML and compute SH-like support values for each branch.
  • Compute a 'phylogenetic signal' for each family by computing the mean SH-like values for all branches in each tree. This step is inspired by Salichos L, and Rokas A: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 2013, 497:327–331. The authors show that genes with higher phylogenetic signals have phylogenies that are more congruent with the species tree. Here, I use SH-like support values instead of bootstrap values because they are much faster to compute.

What mirlo does NOT do:

  • It does not edit the alignments

  • It does not construct a phylogenetic tree from the concatenated alignment.

Mirlo is a work in progress.

Contact

If you have questions or need help, email me:

Michael Thon mthon@usal.es

Software Requirements

Installation

Uncompress the Mirlo distribution file somewhere on your computer.

Instructions

You should have one file of protein sequences in fasta format for each species that you plan to include in your analysis. The file names should be short as they will be used during the mirlo analysis to indicate each species in the multiple sequence alignments.

  1. run OrthoFinder.

  2. run mirlo.py You can run python mirlo.py -h for more information on the command line parameters. You will need the Orthogroups.csv file that was generated by OrthoFinder. Mirlo will output a list of orthologous clusters that have a single gene copy in each species, followed by a value that represents phylogenetic signal. From that list you can decide which orthologous groups you want to include in the concatenated alignment. For example, you may want to use the five clusters with the highest phylogenetic signal.

  3. use cat_alignments.py to concatenate the selected clusters into one large alignment.

  4. The concantenated alignment can then be used for phylogenetic analysis.

About

A set of scripts to help automate the construction of data sets for multi-gene phylogenetic analyses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages