Skip to content
nielshanson edited this page Feb 21, 2014 · 22 revisions

Welcome to the Wiki for the utilities repo. Here we've put some documentation and examples on how to use some of the analytical scripts.

NCBI Submssion

Analysis

  • calculate_4mer_freq.py: A script to calculate the tetra-nucleotide (4-mer) frequency from a set of .fasta files, creating a matrix of all 256 4-mers as a tab-delimited file --- ready for loading into R.
  • fosmid_qc: A program to semi-automate the rather involved process of mapping sequenced to their fosmid-ends.

DataBase Preparation

  • prepare_gg.py: The GreenGenes 16s rRNA database has been updated (May 2013) and has a new home on the web. However, the format of the database has changed as it no longer comes with a .fasta file that is ammenable to BLAST database creation. This script combines the available taxonomy, GenBank, and sequence references to construct such a file.

Clone this wiki locally