A python front end for Chris Quince's AmpliconNoise c code
C Python Perl Shell
Switch branches/tags
Nothing to show
Latest commit 607178c Jun 12, 2012 Connor McCoy Connor McCoy CHANGELOG
Permalink
Failed to load latest commit information.
ampliconnoise Bug fixes for SeqDist and SeqNoise Jul 22, 2011
anoisetools
.gitignore
CHANGELOG
COPYING
README.rst
anoise.py
distribute_setup.py
setup.py Clip based on flow_index Jun 12, 2012

README.rst

anoisetools

Python package for prepping 454 data for use with AmpliconNoise (Quince et al BMC Bioinformatics 2011, Quince et al Nature Methods 2009):

raw.sff -> anoisetools -> Processed Data

The source for AmpliconNoise is also included.

For flowgram data, we target the original .sff files.

Installation

  1. Ensure that your computer meets the minimum requirements. Currently, just Python 2.7, plus the requirements of AmpliconNoise.

  2. Install BioPython if you don't have it. Note that numpy is not required for this project. If you're not planning to use BioPython, you can answer "No" when the BioPython installer prompts you about numpy.

  3. Download and install:

    curl -L https://github.com/fhcrc/ampliconnoise/tarball/master | tar xjf -
    cd fhcrc-ampliconnoise-*
    python2.7 setup.py install  # may require sudo
    

    See Installing Python Modules for more information and options.

  4. Build the AmpliconNoise binaries, and ensure they're present in your path

Running setup.py installs the anoisetools package, mostly accessible from the anoise script.

Overview

anoise is called with a subcommand:

anoise [subcommand]

Help can be accessed via anoise -h or anoise <subcommand> -h.

For our analyses, initial preprocessing is two steps:

  • Split the original .sff file into one .sff per barcoded sample
  • Process each sample using wrappers for PyroNoise and SeqNoise

Splitting Sequences

To split an .sff, use anoise split, providing a file with comma-delimited base_path_for_output,barcode,primer records, e.g.:

sample1/sample1,ATAG,TAAATGGCAGTCTAGCAGAARAAG

will fill ./sample1/sample1.sff, with all sequences starting with ATAG, followed by TAAATGGCAGTCTAGCAGAARAAG.

Degenerate primers should be specified as such.

If the barcode map is named barcodes.csv, and the full SFF is G0YK51K01.sff, one would call [1]:

anoise split barcodes.csv G0YK51K01.sff

Running PyroNoise and SeqNoise

For each sample in our analyses, we follow a process along the lines of:

#!/bin/sh
MPIARGS="-np 12"
TMP_DIR="."

# Run PyroNoise
# This cleans flowgrams prior
anoise pyronoise \
  --mpi-args "$MPIARGS" \
  --temp-dir $TMP_DIR \
  sample1.sff

anoise truncate "{barcode}" 400 < sample1-pnoise_cd.fa > sample1-pnoise_trunc.fa

# Run SeqNoise
anoise seqnoise \
  --mpi-args "$MPIARGS" \
  --stub sample1 \
  --temp-dir $TMP_DIR \
  sample1-pnoise_trunc.fa \
  sample1-pnoise.mapping

Both pyronoise and seqnoise create a temporary direcory for processing. If running MPI jobs spanning multiple nodes, be sure to set --temp-dir to a location accessible from all.

[1]Note: the split step creates a child process for each sample.