MITONOTATE

Annotation pipeline for ciliate mitochondrial genomes.

Cite with DOI: 10.5281/zenodo.159532

Please refer to built-in help message for more information.

 $ perl mitonotate.pl --help

Description of the pipeline

Structural annotation

tRNA genes
- tRNAscan-SE with genetic code 4
rRNA genes
- nhmmer with custom HMM models for SSU and LSU genes
Protein-coding genes
- Prodigal in single mode, genetic code 4

Functional annotation

Protein-coding genes (a.a. sequences) are compared against the following databases:

Swissprot mitochondrial sequences (curated set supplied with Prokka), by blastp
MitoCOGs database (Kannan et al. 2014), by blastp
MFannot-annotated ciliate mitochondrial genomes, by blastp
Pfam-A domains, by hmmscan

Sequence statistics calculated:

No. of transmembrane domains, by tmhmm and Scampi2
Hydropathicity score ("GRAVY" score), with BioPerl
Sequence length (a.a.)

Ortholog clustering is performed with FastOrtho:

Reciprocal all-vs-all blastp
FastOrtho implementation of the OrthoMCL algorithm (using mcl)
Protein sequences for each ortholog cluster extracted, aligned with muscle
HMM-HMM search against UniProt20 database with hhblits

Output

Results are combined into a single tab-separated table, sorted by the protein sequence ID. Actual assignment of a product name has to be done manually, especially for putative distant homologs.

Coordinates of genomic features (CDSs, rRNAs, and tRNAs) are also written in GFF3 format.

Output from intermediate steps are also saved, to allow for troubleshooting.

Configuring the annotation run

Mitonotate requires three configuration files to tell it where to look for input, databases, and which steps in the pipeline to run.

Descriptions of the input files are given in the built-in help message.

Cutoff-scores for the tRNA (tRNAscan-SE Cove score) and rRNA (nhmmer bitscore) predictions can be given at the command line with options --cove and --nhmmer_cutoff respectively.

Templates for each input file are supplied with the script. It is recommended to build your run by modifying the supplied templates, and not by writing your own configuration files from scratch.

Software dependencies

Mitonotate checks your path for the required dependencies on startup. It addition, it also requires BioPerl to calculate sequence statistics. Please remember to cite the dependencies if you use them!

Requires the following non-Core Perl modules:

threads
Log::Message::Simple
Bio::SeqIO
Bio::Tools::SeqStats

Known issues

MitoCOGs files downloaded from NCBI have DOS-style line endings. When running in Linux/Unix, the files must first be converted with dos2unix or a similiar utility.

Synteny plots for small genomes

Make synteny plots for a set of contigs or small genomes with synplot.pl. Given a set of Fasta files, each representing a single genome, and GFF3 feature tables with CDS features for those Fasta files, perform Blastp between each adjacent pair of genomes, and make synteny plots.

Requires the accompanying R script synplot.R in the same folder as the perl script, and the command Rscript in your PATH.

 $ perl synplot.pl -f genome1.fasta, genome2.fasta,genome3.fasta -g features1.gff,features2.gff,features3.gff -o output_prefix

More information in help message at perl synplot.pl --help.

Support

We regret that no support for the use of this pipeline or the installation of dependencies can be offered for the foreseeable future. Caveat emptor!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
FastOrtho2fasta.pl		FastOrtho2fasta.pl
LICENSE		LICENSE
README.md		README.md
db_config_template		db_config_template
fasta_list_template		fasta_list_template
gcode.4		gcode.4
gff2pep.pl		gff2pep.pl
mfannot2gff.pl		mfannot2gff.pl
mitonotate.pl		mitonotate.pl
run_flags_template		run_flags_template
synplot.R		synplot.R
synplot.pl		synplot.pl
synteny_do_prodigal.sh		synteny_do_prodigal.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MITONOTATE

Description of the pipeline

Structural annotation

Functional annotation

Output

Configuring the annotation run

Software dependencies

Known issues

Synteny plots for small genomes

Support

About

Releases 1

Packages

Languages

License

kbseah/mitonotate

Folders and files

Latest commit

History

Repository files navigation

MITONOTATE

Description of the pipeline

Structural annotation

Functional annotation

Output

Configuring the annotation run

Software dependencies

Known issues

Synteny plots for small genomes

Support

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages