# Notes:
* Transitive alignments are not accurate, find cases where transitive alignments fail.
* Can two sequences that do not co-occur in a sample (group) ever be merged together? (Command 14)
* Greg didn't get a rabund file at step 23

# 1. About
This notebook runs through the Mothur MiSeq SOP tutorial found here: http://www.mothur.org/wiki/MiSeq_SOP .

# 2. Necessary Files
First, we should download all the files used in the tutorial.  The below cell does just that.  In total, the files are about 65MB and may take a few minutes to download/unizp.  See http://www.mothur.org/wiki/MiSeq_SOP#Logistics for more details.

In [None]:
#####################################################
##### Downloads and unzips all tutorial files #######
#####################################################

import urllib2
import zipfile
import os
from subprocess import call

# make a directory for our tutorial, and jump into it
root="mothur_notebook"
if os.getcwd().split('/')[-1] != root:
    if not os.path.isdir(root):
        os.mkdir(root)
    os.chdir(root)
print "Current working directory is: " + os.getcwd()

# Zips to grab
zips = [('http://www.mothur.org/w/images/5/59/Trainset9_032012.pds.zip','ref/'),
        ('http://www.mothur.org/w/images/9/98/Silva.bacteria.zip',''),
        ('http://www.mothur.org/w/images/d/d6/MiSeqSOPData.zip','')]

# Zip Directory names:
seq='MiSeq_SOP'
silva='Silva.bacteria/silva.bacteria'
train='ref/Trainset9_032012.pds'

if not os.path.isdir("ref"):
    os.mkdir("ref")
    
    
# Grab and unzip the zip files
for url, path in zips:
    target = url.split('/')[-1]
    if not os.path.isfile(target):
        resource = urllib2.urlopen(url)
        print "Downloading " + target + "...\n"
        open(target,'wb').write(resource.read())
        
        print "Extracting " + target + "...\n"
        zipfile.ZipFile(target).extractall(path)

# 3. MothurMagic
To use Mothur in IPython, we need to load the mothurmagic extension. See https://github.com/SchlossLab/ipython-mothurmagic for more details.

In [7]:
%install_ext https://raw.githubusercontent.com/SchlossLab/ipython-mothurmagic/master/mothurmagic.py
%load_ext mothurmagic



Installed mothurmagic.py. To use it, type:
  %load_ext mothurmagic


# 4. Commands

## 1.  make.contigs(file=stability.files, processors=8)
Joins left and right reads in a a fasta file.
### 1.1 Output Files:
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.contigs.groups     </td><td>  Which sample each sequence belongs to</td></tr>
<tr><td>stability.scrap.contigs.fasta</td><td> 	Contigs thrown out because they don't pass some criterion</td></tr>
<tr><td>stability.scrap.contigs.qual </td><td>	Quality of contigs thrown out because they don't pass some criterion</td></tr>
<tr><td>stability.trim.contigs.fasta </td><td>	Assembled for and rev sequences</td></tr>
<tr><td>stability.contigs.qual       </td><td>  Quality assembled for and rev sequences</td></tr>
<tr><td>stability.contigs.report     </td><td>  Assembly report for each sequence</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
make.contigs(file=stability.files, processors=8)

## 2.  summary.seqs(fasta=stability.trim.contigs.fasta)
Summarizes a fasta file.
### 2.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.summary	</td><td>  For each sequence name, keeps track of length, nbases, homopolymers and dereplication count.</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
summary.seqs(fasta=stability.trim.contigs.fasta)

## 3.  screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, maxlength=275)
Removes sequences that do not match specified parameters.
### 3.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.fasta</td><td> 	Sequences that contain 0 ambig and are shorter than 275</td></tr>
<tr><td>stability.trim.contigs.bad.accnos</td><td> 	List of bad sequences and the reason they were discarded, i.e. number of ambigs or len</td></tr>
<tr><td>stability.contigs.good.groups	 </td><td> 	For each sequence, litst the group to which it belongs</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, maxlength=275)

## 4.  unique.seqs(fasta=stability.trim.contigs.good.fasta)
Dereplicates identical sequences, keeping track of related sequences.
### 4.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.names</td><td> 	Tab file; for a sequence, list all the sequences that are completely identical</td></tr>
<tr><td>stability.trim.contigs.good.unique.fasta</td><td> 	List of unique sequences</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
unique.seqs(fasta=stability.trim.contigs.good.fasta)

## 5.  count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)
Creates a table counting the abundance of each sequence in each sample (group).
### 5.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.count_table</td><td> 	Table where row are unique seqs and cols are sample. Cell represent abundance of seq in samples</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)

## 6.  summary.seqs(fasta=stability.trim.contigs.good.unique.fasta, count=stability.trim.contigs.good.count_table)
Summarizes a fasta file.  In this case, summarizing the list of unique sequences and updating their abundance.
### 6.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.summary </td><td> 	Subset of sequences that are unique. Only difference with their description in the previous summary is the numSeqs column, which now contains the total number of sequences included into this sequence</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
summary.seqs(fasta=stability.trim.contigs.good.unique.fasta, count=stability.trim.contigs.good.count_table)

## 7.  pcr.seqs(fasta=silva.bacteria/silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=4)
Trims the fasta file (in this case the reference DataBase) to the specified region., i.e, 11894 to 25319
### 7.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>../silva.bacteria/silva.bacteria.pcr.fasta </td><td> 	The trimmed fasta file (DB)</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
pcr.seqs(fasta=silva.bacteria/silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=4)

## 8. system(mv silva.bacteria/silva.bacteria.pcr.fasta ref/silva.bacteria/silva.v4.fasta)
Renames the file 'silva.bacteria/silva.bacteria.pcr.fasta' to 'silva.bacteria/silva.v4.fasta'
### 8.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>silva.bacteria/silva.v4.fasta </td><td> 	The renamed file</td></tr>
</table>

In [None]:
%%mothur
system(mv silva.bacteria/silva.bacteria.pcr.fasta silva.bacteria/silva.v4.fasta)

## 9.  align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.bacteria/silva.v4.fasta)
Align our unique sequences against the reference DataBase
### 9.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.align </td><td> 	The reads alignment file (not including DB seqs)</td></tr>
<tr><td>stability.trim.contigs.good.unique.align.report </td><td> 	Contains pairwise alignment information for the sequences and their hits. Ex. alignment length, start in query, start in hit, end in query, etc.</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.bacteria/silva.v4.fasta)

## 10.  summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)
Summarizes a fasta file.  In this case, updates the start & stop columns to reflect the new alignment.
### 10.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.summary</td><td> 	Updates the new start and end values for each of the sequence within the context of the larger alignement</td></tr>
</table>
 

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

## 11.  screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=1968, end=11550, maxhomop=8)
Removes sequences with that start after 1968 or end before 11550.  This enables alignment with the database
### 11.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.good.count_table</td><td> 	 New table with updated counts that show removed sequences</td></tr>
<tr><td>stability.trim.contigs.good.unique.good.align</td><td> 	 Alignment file without the removed sequences</td></tr>
<tr><td>stability.trim.contigs.good.unique.bad.accnos</td><td>   Sequences that were removed and reason for removal</td></tr>
<tr><td>stability.trim.contigs.good.unique.good.summary</td><td> New summary file without the removed sequences</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=1968, end=11550, maxhomop=8)

## 12.  filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)
Removes gap-only ('-') columns, and any columns containing a '.' from all alignments.
### 12.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.good.filter.fasta</td><td> 	 New trimmed alignment without dots or dashes</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)

## 13.  unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)
Filtering in the previous step may have introduced duplicates, dereplicate these and keep track of their counts
### 13.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.good.filter.count_table</td><td> 	 New table with updated counts that show dereplicated sequences</td></tr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.fasta</td><td> 	 New alignment without the replicated sequences</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)

## 14.  pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)
Removes sequences with pyrosequencing errors by clustering within groups (samples) sequences that are within 2bp of eachother.
### 14.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta</td><td>Contains all the clusters representatives</td></tr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table </td><td> New counts that represent merged, similar (within 2bp) sequences </td></tr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.F3D0.map  </td><td> Shows the clusters for each sample (group) </td></tr>
<tr><td> ...other samples map files </td><td> ... </td></tr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.F3D9.map </td><td> Shows the clusters for each sample (group) </td></tr>
 
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)

## 15.  chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
Removes chimeras from the count files.  Chimeras are still present in the fasa file.
### 15.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table    </td><td>      New count table not including chimeras in counts
 </td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.chimeras   </td><td> Log of analysis done to find chimeras      </td></tr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos     </td><td>    List of chimeras   </td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)

## 16.  remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)
Removes chimeras from the fasta files
### 16.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta</td><td>  New fasta file without the chimeras </td></tr>
</table>


In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)

## 17.  classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, reference=ref/trainset9_032012.pds.fasta, taxonomy=ref/trainset9_032012.pds.tax, cutoff=80)
Classifies the samples into Taxae using the training data
### 17.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy    </td><td>       Taxonomic classfication for each sequence based on the algorithm implemented in  classify.seqs </td></tr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary
</td><td>   Summary of how many sequences occur at each lineage. The info is split taxonomic level -- according to taxonomic tree hierarchy.</td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, reference=ref/trainset9_032012.pds.fasta, taxonomy=ref/trainset9_032012.pds.tax, cutoff=80)

## 18.  remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
Removes non-bacterial sequences
### 18.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy</td><td> The new taxonomy without the undesirable taxe. Number in the taxonomy indicate the posterior 
probability according to the Wang method   </td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta   </td><td>  New fasta alignment file without the undesirable taxa     </td></tr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table    </td><td>   New counts table without the undesirable taxa    </td></tr>
</table>


In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

## 19.  get.groups(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, groups=Mock)
Pulls out sequences that belong to the "Mock" community.  Creates a separate count table for the selected sequences.
### 19.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta   </td><td>  Contains only sequences that belong to the Mock Community     </td></tr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table   </td><td>    Count for every sequence from mock   </td></tr>
</table>



In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
get.groups(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, groups=Mock)

## 20.  seq.error(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, reference=MiSeq_SOP/HMP_MOCK.v35.fasta, aligned=F)
Computes the error rate associated with misidentification of a sample with known composition (Mock).  Error rate is defined as 1-(Sum of bases in query - Sum of mismatches to reference)/Sum of bases in query.
### 20.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.summary</td><td>   For every sequence in mock, finds it taxonomy and compute difference and error </td></tr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.seq   </td><td>   Expanded cigar version of the match information.    </td></tr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.chimera</td><td>?</td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.seq.forward   </td><td>  ? Shows the distribution of errors per base in the forward reads (info on F and R can be obtained form the initial files)     </td></tr>
<tr><td>    stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.seq.reverse  </td><td>    ? Shows the distribution of errors per base in the reverse reads (info on F and R can be obtained form the initial files)   </td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.count   </td><td>   ?? Frequency of the number of error (not including substitutions in the sequences)    </td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.matrix   </td><td>  Matrix with counts of reqA,refA reqA,refC, reqA,refG, reqA,refT, etc...      </td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.error.ref   </td><td>    Shows the alignment of the query against the ref.   </td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
seq.error(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, reference=MiSeq_SOP/HMP_MOCK.v35.fasta, aligned=F)

## 21.  dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, cutoff=0.20)
Calculates the pairwise distance (mismatches) between all sequences 
### 21.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.dist   </td><td>    List version of the distance matrix between all the sequences from the mock community   </td></tr>
</table>


In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, cutoff=0.20)

## 22.  cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table)
Clusters sequences into OTUs using the default (average neighbor) algorithm
### 22.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list   </td><td>List of clusters based on the dist.seqs list</td></tr>
</table>


In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table)

## 23.  make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, label=0.03)
Counts the number of OTUs and their abundance at a specified level (label).
### 23.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared   </td><td>    For the chosen label, shows the number of OTUs and their abundance   </td></tr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.Mock.rabund  </td><td> ?Same as above without the header??Didn't show up when Greg ran it </td></tr>
</table>


In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, label=0.03)

## 24.  rarefaction.single(shared=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared)
Generates intra-sample rarefaction curves using a re-sampling without replacement approach
### 24.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.groups.rarefaction   </td><td>Table with average number of clusters with lower and upper confidence intervals for each subset size of the rarefaction process </td></tr>
</table>
.

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
rarefaction.single(shared=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared)

## 25.  remove.groups(count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, groups=Mock)
Removes sequences belonging to a group (in this case, mock) from .fasta, .name, .group, .list, .taxonomy, and .shared files.
### 25.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta   </td><td>  The new fasta file without the mock sequences</td></tr>
<tr><td>    stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table   </td><td>The mock file without the counts sequences</td></tr>
<tr><td>stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy    </td><td>The taxonomy file without the mock sequences   </td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
remove.groups(count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, groups=Mock)

## 26.  cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)
Sorts sequences into bins by taxonomy and clusters within those bins
### 26.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list    </td><td>   List the clusters at different cutoff values, their abundance and the list of sequences which they are comprised of
    </td></tr>
</table>

In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)

## 27.  make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, label=0.03)
Shows the abundance of each OTU in each sample (group), and for all samples.
### 27.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared    </td><td> At 0.03 (97% simmiliarity), shows the abundance of each OTU in each sample (group)    </td></tr>
<tr><td>  stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.F3D0.rabund    </td><td> The abundance of each OTU in this sample (group)      </td></tr>
<tr><td>  ...Other groups    </td><td>   ....    </td></tr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.F3D9.rabund   </td><td>  The abundance of each OTU in this sample (group)       </td></tr>
<tr><td>      </td><td>       </td></tr>
</table>




In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, label=0.03)

## 28.  classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, label=0.03)
Classifies OTUs and counts abundance in each sample.
### 28.1 Output Files
<table>
<hr><td>Output File</td><td>Description</td></hr>
<tr><td>   stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.0.03.cons.taxonomy   </td><td>  Map from OTU # to species, with the abundnace of that species</td></tr>
<tr><td> stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.0.03.cons.tax.summary</td><td>  Taxonomy of OTUs and their abundance in each group (sample) </td></tr>
</table>




In [None]:
%%mothur
set.dir(input=./MiSeq_SOP)
classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, label=0.03)

In [56]:
print "HI"
%
%%mothur
seq.summary(fasta=stability.trim.contigs.fasta)


SyntaxError: invalid syntax (<ipython-input-56-2a7d3f9f2256>, line 3)