Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
No description, website, or topics provided.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
This is the README file for MCclassifier Copyright (C) 2016 Pawel Gajer (firstname.lastname@example.org) and Jacques Ravel (email@example.com) Permission to use, copy, modify, and distribute this software and its documentation with or without modifications and for any purpose and without fee is hereby granted, provided that any copyright notices appear in all copies and that both those copyright notices and this permission notice appear in supporting documentation, and that the names of the contributors or copyright holders not be used in advertising or publicity pertaining to distribution of the software without specific prior permission. THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Introduction ************ MC classifier is a super fast 16S rRNA gene fragment species level classifier. For each 16S rRNA amplicon region a reference set of high quality sequences known to be found in a given environment is used to generate a higher order Markov chain models for each taxonomic ranks starting from reference specie, corresponding genera, up to phylum level. A query sequence is first classified at the phylum level and then through lower taxonomic ranks. At each level the most probable model is accepted if the log odds of the probability of a sequence being generated by the model is above certain threshold. Due to high speed of MC classifier, the classification can be done on all reads even in a large project. Installation ************ Assuming that you are in the top level directory (containing this README file), run cd src make This will generated an executable 'classify' in the bin subdirectory of the top level directory. Copy the executable to a directory included in your PATH variable or add the path of the bin directory to your PATH variable. Usage ***** Here is an example of running classify on a fasta file of 10,000 vaginal Illumina sequences from the V3-V4 region classify --rev-comp -i test10k.fa -d vaginal_319_806_rc_MCo7p2 -o mcDir A count_tbl.pl scrip (located in the bin directory) can be used to generate a sample x count contingency table count_tbl.pl -i mcDir/MC.order7.results.txt -o mcDir/spp.count.tbl.txt The MC classifier can be run in a mode where all sequences are classified to the species level by suppressing taxon error thresholding. To run MC classifier in this mode, use --skip-err-thld flag classify --skip-err-thld -i test10k.fa -d vaginal_319_806_rc_MCo7p2 -o mcDir To get more info about the classifier's options run classify -h To build MC models for a specific environment, please contact the author.