Skip to content
nielshanson edited this page Feb 25, 2014 · 22 revisions

Welcome to the Wiki for the utilities repo. Here we've put some documentation and examples on how to use some of the analytical scripts.

NCBI Submssion

Analysis

  • calculate_4mer_freq.py: A script to calculate the tetra-nucleotide (4-mer) frequency from a set of .fasta files, creating a matrix of all 256 4-mers as a tab-delimited file --- ready for loading into R.
  • fosmid_qc: A program to semi-automate the rather involved process of mapping sequenced to their fosmid-ends.
  • jgi_fosmid_id2plate: A procedure to reconcile the naming difference between the JGI and the GSC.
  • cor_to_network.py: A script that takes a correlation matrix and converts it to network format (node and edge files).
  • SparCC: A few examples of using the SparCC to calculate correlations and bootstrapped p-values on OTUs

DataBase Preparation

  • prepare_gg.py: The GreenGenes 16s rRNA database has been updated (May 2013) and has a new home on the web. However, the format of the database has changed as it no longer comes with a .fasta file that is amenable to BLAST database creation. This script combines the available taxonomy, GenBank, and sequence references to construct such a file.

Clone this wiki locally