PhylOTU identifies microbial OTUs from metagenomic data
Perl C++ R
Switch branches/tags
Nothing to show
Pull request Compare This branch is 34 commits behind sharpton:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Welcome to PhylOTU.

This software is still in beta and is subject to change. More specific documentation is being worked up. 
In the meantime, please see the source code for more information ( and and 
contact the author (Thomas Sharpton - with specific inquiries.

Keep an eye out for the PhylOTU manuscript, which is currently in review.

August 2010 TJS


PhylOTU is a software workflow that identifies OTUs directly from metagenomic reads. Because of the 
particular methodology employed in PhylOTU, reads that share no sequence overlap (i.e., cannot be 
aligned to one another) can still be clustered into the same OTU. In addition, reads are not recruited into
top BLAST hits, but are effectively clustered de novo.

Briefly, PhylOTU uses a CMmodel of the 16S locus (via INFERNAL) to align metagenomic reads. The CMmodel
is contrusted using a reference alignment of high-quality, full-length 16S sequences. A phylogenetic tree
is constructed from the multiple sequence alignment of reads and reference sequences via FastTree. The
phylogenetic distance spanning every pair of reads is then calculated from the resulting tree and used 
to create a distance matrix describing pairwise read relationships. This distance matrix is then fed into 
MOTHUR, which implements average-neighbor clustering to identify OTUs. The final output (a list of OTU clusters)
is a file in the matrix subdirectory of the metagenomic sample set with the following typed name:



There are lots of dependencies. Please read carefully to ensure your system is properly configured.

A. Software
PhylOTU is primarily written in Perl 5 and R. Run time dependencies include the following Perl packages:
-Bioperl-live libraries

In addition, you must have the following R library installed:

PhylOTU stitches together various software packages written by other authors. While the software can easily
be modified to accomodate various software suites, it is currently designed to implement the following 
tools, which you will also need to install on your system:

-FastTree (
-BLAST [legacy version] (

Please see the authors' websites for instructions regarding the installation of any of the above software.

B. Reference Data

PhylOTU leverages reference 16S sequence data (high-quality, full-length) to construct an evolutionary model via
the cmbuild function in INFERNAL. 

Because of the particular licensing agreement, we cannot currently distribute the reference data we use in our study
with this software (though we are working with the publishers of the reference data to change this). We recommend 
contacting the Ribosomal Database Project to obtain the reference library described in the RDP v 10 manuscript:

Cole et. al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. Epub 2008 Nov 12.

Alternatively, create your own 16S reference alignment and use the following example command to create a cm model:

cmbuild --rf --ere 1.4 <name of model> <reference alignment file>


PhylOTU is run directly at the command line and is implemented via a single command:

perl -i <metagenomic sequence file>

That said, there are many settings that need to be controlled for successful implementation of PhylOTU. Please see the 
source code of for more details at this time. The output of PhylOTU is a database of files in addition to
run-time logs. You may elect to pipe these logs to a file to reduce screen clutter with the follwing command:

perl -i <metagenomic sequence file> > standard.out 2> error.log

Please send all bug reports and inquiries to the author (Thomas Sharpton -