-
Notifications
You must be signed in to change notification settings - Fork 11
Home
nielshanson edited this page Feb 25, 2014
·
22 revisions
Welcome to the Wiki for the utilities repo. Here we've put some documentation and examples on how to use some of the analytical scripts.
-
generate_ncbi_gss: A script to create .gss and the parallel .pub, .lib, and .cont files from .fasta files containing environmental fosmid-end sequences for submission to dbGSS. For more information see the official dbGSS website at the NCBI.
-
gss_to_fasta.py: A script to create .fasta files from .gss files.
- calculate_4mer_freq.py: A script to calculate the tetra-nucleotide (4-mer) frequency from a set of .fasta files, creating a matrix of all 256 4-mers as a tab-delimited file --- ready for loading into R.
- fosmid_qc: A program to semi-automate the rather involved process of mapping sequenced to their fosmid-ends.
- jgi_fosmid_id2plate: A procedure to reconcile the naming difference between the JGI and the GSC.
- cor_to_network.py: A script that takes a correlation matrix and converts it to network format (node and edge files).
- SparCC: A few examples of using the SparCC to calculate correlations and bootstrapped p-values on OTUs
- prepare_gg.py: The GreenGenes 16s rRNA database has been updated (May 2013) and has a new home on the web. However, the format of the database has changed as it no longer comes with a .fasta file that is amenable to BLAST database creation. This script combines the available taxonomy, GenBank, and sequence references to construct such a file.