Transcriptome Assembly and Annotation Pipeline

Publication

This repo contains a collection of scripts that were used for the de novo assembly of Acropora gemmifera transcriptome.

Oldach, M.J. and Vize, P.D., 2018. De novo assembly and annotation of the Acropora gemmifera transcriptome. Marine Genomics, 40, pp.9-12. https://doi.org/10.1016/j.margen.2017.12.007

Please cite the manuscript if you use it.

Contact information

Matthew J. Oldach

Assembly

Use the runall script adapted from Matt MacManes's Oyster River Protocol which combines scripts for the optimization of (eukaryotic) transcriptome assembly, using a multi-kmer multi-assembler approach, then merges those assemblies into 1 final assembly.

Note: ammendements to the Oyster River Protocol may have happened since this pipeline was created/run. If your intent is reproducing the results of the paper then use the runall script as is. If you are using this repo for your own de novo eukaryotic assembly it is prudent to check the official repo and make any necessary changes to the runall script.

Annotation

gene names
KOG
KAAS
GO terms

dammit is a simple de novo transcriptome annotator built by Camille Scott. It was born out of the observation that: annotation is mundane and annoying; all the individual pieces of the process exist already; and, the existing solutions are overly complicated or rely on crappy non-free software

cough cough (meaning) BLAST2GO!

dammit runs a relatively standard annotation protocol for transcriptomes: it begins by building gene models with Transdecoder, and then uses the following protein databases as evidence for annotation:

Pfam-A a large collection of protein families, represented by multiple sequence alignments and hidden Markov models (HMMs)
Rfam a collection of RNA families, each represented by multiple sequence
OrthoDB a catalog of orthologous protein-coding genes across vertebrates, arthropods, fungi, plants, and bacteria.
uniref90 UniRef90 is built by clustering UniRef100 sequences with 11 or more residues using the CD-HIT algorithm

Follow along with the docs for dammit for this step: https://angus.readthedocs.io/en/2018/dammit_annotation.html or the workshop

You will now have a number of files. Most importantly is the <filename>.dammit.fasta file which will be used in the next steps:

Feed your transcriptome from dammit! into:

WebMGA for KOG (http://weizhong-lab.ucsd.edu/webMGA/server/kog/)
KAAS for KEGG (http://www.genome.jp/tools/kaas/)
Sma3s for GO terms (http://www.bioinfocabd.upo.es/node/11#output)

Append the information for KOG, KEGG, and GO terms onto the fasta file produced by dammit for the assembled and annotated final product.

Caveats of Transcriptome Assembly and Annotation

Impacted by the quality of the sequence
Iterative, never perfect, can always be improved with new evidence and improved algorthms

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Licence

This code is released under the MIT License - see the LICENSE.md file for details.

Acknowledgments

Code from runall.sh script was adapted from macmanes-lab/Oyster_River_Protocol which was released under the CC0 1.0 Universal license.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.gitignore		.gitignore
CITATION		CITATION
CONTRIBUTING.md		CONTRIBUTING.md
GO_Biological-Processes.R		GO_Biological-Processes.R
KEGG_append.txt		KEGG_append.txt
KOG-freq-fig.R		KOG-freq-fig.R
KOG_append.sh		KOG_append.sh
LICENSE.md		LICENSE.md
README.md		README.md
Transcriptome_Assembly-Annotation.Rproj		Transcriptome_Assembly-Annotation.Rproj
runall.sh		runall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriptome Assembly and Annotation Pipeline

Publication

Contact information

Assembly

Annotation

Caveats of Transcriptome Assembly and Annotation

Contributing

Licence

Acknowledgments

About

Releases

Packages

Languages

License

moldach/Transcriptome_Assembly-Annotation

Folders and files

Latest commit

History

Repository files navigation

Transcriptome Assembly and Annotation Pipeline

Publication

Contact information

Assembly

Annotation

Caveats of Transcriptome Assembly and Annotation

Contributing

Licence

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages