A python script for mitochondrial supermatrix phylogenomics.
usage: phylomito.py [-h] -i [INPATH] [-o [OUTPATH]]
[-e [EXTENSION [EXTENSION ...]]] [-b [BOOTSTRAP]]
[-p [PROTEIN]] [-g [GENE_TREE]] [-d [DLOOP]]
This program is licensed under GPLv3.
- Save your genebank files in a folder (for example, ./genebank/) and create a folder for the output (./output/).
- Make sure your genebank files have the extension '.gbk' or '.gb'.
- Run the command:
python phylomito.py -i ./genebank/ -o ./outpath/
- Your results will be in the ./outpath/ folder. The final tree file will be named
all_nuc.phy_phyml_tree.txt
by default.
You need to install PhyML, Clustal Omega, python 3 and the Biopython library to run this program.
This program was tested on a Linux machine.
This program finds all genebank files (mitogenomes) in a folder and saves, in a multifasta file, each gene that is present in all mitogenomes.
These files are aligned with CLUSTALW and the alignment is concatenated in a single file (all_nuc.aln
, by default) that contains all aligned genes from all mitogenomes.
Phyml uses this file to generate a Maximum Likelihood tree.
- You can generate an amino acid alignment and phylogeny using the -p (or --protein) flag. The default is nucleotidic alignment and phylogeny.
- Running the program with the -g (or --gene_tree) flag will generate a tree for every gene.
- Using -g along with -d (or --dloop) will generate a tree of the DLOOP region. The supermatrix tree will also include this region. Do not use the -d flag with the -p flag, as it will translate the DLOOP region, generating gibberish.
- Default number of bootstrap resamples is 100. You can change this with the -b (or --bootstrap) flag. Changing this will affect how long it takes to run the phylogeny.
The most common problem is bad formatted genebank files. The error will look like this:
mitochondria1.gb
mitochondria2.gb
mitochondria3.gb
COX_1 is not a known gene. Replace the CDS gene id with one of the following:
ND1 ND2 COX1 COX2 ATP8 ATP6 ND3 ND4L ND4 ND5 CYTB ND6 COX3
Traceback (most recent call last):
File "/path/phylomito/phylomito.py", line 315, in <module>
main(args)
File "/path/phylomito/phylomito.py", line 70, in main
split_seqs(inpath, outpath, protein, extension, dloop)
File "/path/phylomito/phylomito.py", line 140, in split_seqs
gene_key = gene_dict[header]
KeyError: 'COX_1'
Where mitochondria3.gb
is the file where the error was found.
In this case the solution is to edit your file and replace all occurrences of COX_1
with COX1
.