Skip to content
nielshanson edited this page Sep 13, 2013 · 2 revisions

About

The GreenGenes 16s rRNA taxonomic database has been updated (May 2013) and has a new home on the web (http://greengenes.secondgenome.com/). However, the format of the database has changed as it no longer comes with a .fasta file that is ammenable to BLAST database creation. This script combines the available Taxonomy, GenBank sequence references, and sequences to construct such a file.

Common Usage

$ python prepare_gg.py -f gg_13_5.fasta -a gg_13_5_accessions.txt -t gg_13_5_taxonomy.txt -o GREENGENES_gg16S-20XX-XX-XX

where,

  • gg_13_5.fasta is a .fasta file with ggIDs in the headers
  • gg_13_5_accessions.txt is a mapping file linking the ggIDs to external database accessions
  • gg_13_5_taxonomy.txt is a mapping file linking the ggIDs to the 16s rRNA taxonomy
  • GREENGENES_gg16S-20XX-XX-XX is the output BLAST-database ready .fasta file

Clone this wiki locally