You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nielshanson edited this page Feb 21, 2014
·
22 revisions
Welcome to the Wiki for the utilities repo. Here we've put some documentation and examples on how to use some of the analytical scripts.
NCBI Submssion
generate_ncbi_gss.py: A script to create .gss and the parallel .pub, .lib, and .cont files from .fasta files containing environmental fosmid-end sequences for submission to dbGSS. For more information see the official dbGSS website at the NCBI.
gss_to_fasta.py: A script to create .fasta files from .gss files.
Analysis
calculate_4mer_freq.py: A script to calculate the tetra-nucleotide (4-mer) frequency from a set of .fasta files, creating a matrix of all 256 4-mers as a tab-delimited file --- ready for loading into R.
fosmid_qc: A program to semi-automate the rather involved process of mapping sequenced to their fosmid-ends.
cor_to_network.py: A script that takes a correlation matrix and converts it to network format (node and edge files).
DataBase Preparation
prepare_gg.py: The GreenGenes 16s rRNA database has been updated (May 2013) and has a new home on the web. However, the format of the database has changed as it no longer comes with a .fasta file that is ammenable to BLAST database creation. This script combines the available taxonomy, GenBank, and sequence references to construct such a file.