Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
bin
data
perl
rmGaps
src
trimAlign
README

README

This is the README file for MCclassifier

Copyright (C) 2016 Pawel Gajer (pgajer@gmail.com) and Jacques Ravel (jravel@som.umaryland.edu)

Permission to use, copy, modify, and distribute this software and its
documentation with or without modifications and for any purpose and
without fee is hereby granted, provided that any copyright notices
appear in all copies and that both those copyright notices and this
permission notice appear in supporting documentation, and that the
names of the contributors or copyright holders not be used in
advertising or publicity pertaining to distribution of the software
without specific prior permission.

THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
OR PERFORMANCE OF THIS SOFTWARE.


Introduction
************

MC classifier is a super fast 16S rRNA gene fragment species level
classifier. For each 16S rRNA amplicon region a reference set of high quality
sequences known to be found in a given environment is used to generate a higher
order Markov chain models for each taxonomic ranks starting from reference
specie, corresponding genera, up to phylum level. A query sequence is first
classified at the phylum level and then through lower taxonomic ranks. At each
level the most probable model is accepted if the log odds of the probability of a
sequence being generated by the model is above certain threshold. Due to high
speed of MC classifier, the classification can be done on all reads even in a
large project.


Installation
************

Assuming that you are in the top level directory (containing this README file),
run

cd src
make

This will generated an executable 'classify' in the bin subdirectory of the top
level directory. Copy the executable to a directory included in your PATH
variable or add the path of the bin directory to your PATH variable.


Usage
*****

Here is an example of running classify on a fasta file of 10,000 vaginal Illumina
sequences from the V3-V4 region

   classify --rev-comp -i test10k.fa -d vaginal_319_806_rc_MCo7p2 -o mcDir

A count_tbl.pl scrip (located in the bin directory) can be used to generate a
sample x count contingency table

   count_tbl.pl -i mcDir/MC.order7.results.txt -o mcDir/spp.count.tbl.txt



The MC classifier can be run in a mode where all sequences are classified to
the species level by suppressing taxon error thresholding. To run MC classifier
in this mode, use --skip-err-thld flag

   classify --skip-err-thld -i test10k.fa -d vaginal_319_806_rc_MCo7p2 -o mcDir


To get more info about the classifier's options run

   classify -h


To build MC models for a specific environment, please contact the author.
You can’t perform that action at this time.