This repo contains a collection of scripts that were used for the de novo assembly of Acropora gemmifera transcriptome.
Oldach, M.J. and Vize, P.D., 2018. De novo assembly and annotation of the Acropora gemmifera transcriptome. Marine Genomics, 40, pp.9-12. https://doi.org/10.1016/j.margen.2017.12.007
Please cite the manuscript if you use it.
Matthew J. Oldach
- E-mail: moldach686@gmail.com
- Twitter: @MattOldach
- Website: https://moldach.github.io/
Use the runall
script adapted from Matt MacManes's Oyster River Protocol which combines scripts for the optimization of (eukaryotic) transcriptome assembly, using a multi-kmer multi-assembler approach, then merges those assemblies into 1 final assembly.
Note: ammendements to the Oyster River Protocol may have happened since this pipeline was created/run. If your intent is reproducing the results of the paper then use the runall
script as is. If you are using this repo for your own de novo eukaryotic assembly it is prudent to check the official repo and make any necessary changes to the runall
script.
- gene names
- KOG
- KAAS
- GO terms
dammit is a simple de novo transcriptome annotator built by Camille Scott. It was born out of the observation that: annotation is mundane and annoying; all the individual pieces of the process exist already; and, the existing solutions are overly complicated or rely on crappy non-free software
cough cough (meaning) BLAST2GO
!
dammit runs a relatively standard annotation protocol for transcriptomes: it begins by building gene models with Transdecoder, and then uses the following protein databases as evidence for annotation:
- Pfam-A a large collection of protein families, represented by multiple sequence alignments and hidden Markov models (HMMs)
- Rfam a collection of RNA families, each represented by multiple sequence
- OrthoDB a catalog of orthologous protein-coding genes across vertebrates, arthropods, fungi, plants, and bacteria.
- uniref90 UniRef90 is built by clustering UniRef100 sequences with 11 or more residues using the CD-HIT algorithm
Follow along with the docs for dammit
for this step: https://angus.readthedocs.io/en/2018/dammit_annotation.html or the workshop
You will now have a number of files. Most importantly is the <filename>.dammit.fasta
file which will be used in the next steps:
Feed your transcriptome from dammit!
into:
- WebMGA for KOG (http://weizhong-lab.ucsd.edu/webMGA/server/kog/)
- KAAS for KEGG (http://www.genome.jp/tools/kaas/)
- Sma3s for GO terms (http://www.bioinfocabd.upo.es/node/11#output)
Append the information for KOG, KEGG, and GO terms onto the fasta file produced by dammit for the assembled and annotated final product.
- Impacted by the quality of the sequence
- Iterative, never perfect, can always be improved with new evidence and improved algorthms
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This code is released under the MIT License - see the LICENSE.md file for details.
- Code from
runall.sh
script was adapted from macmanes-lab/Oyster_River_Protocol which was released under the CC0 1.0 Universal license.