TABLE OF CONTENTS
- EXAMPLE ANALYSES
- GETTING STARTED
- FURTHER READING
- COPYRIGHT AND LICENSE
sowhat automates the SOWH test, a statistical test of phylogenetic topologies using a parametric bootstrap. It works on amino acid, nucleotide, and binary character state datasets.
A peer-reviewed manuscript describing
sowhat is available at Systematic Biology: http://sysbio.oxfordjournals.org/content/early/2015/07/30/sysbio.syv055.abstract
sowhat includes several features that provide flexibility and aid in the interpretation and assessment of SOWH test results, including:
- The test is performed with the adjustment suggested by Susko 2014 (http://dx.doi.org/10.1093/molbev/msu039).
- Partitions, including partitions by codon position, can be used.
- Missing data (gaps in alignment) are propagated from the original dataset to the simulated dataset.
- Confidence intervals are estimated for the p-value, which helps the investigator assess if a sufficient number of bootstrap replicates have been sampled.
sowhat is in active development. Please use with caution. We appreciate hearing about your experience with the program via the issue tracker.
https://github.com/josephryan/sowhat (click the "Download ZIP" button at the bottom of the right column).
sowhat is available in a docker conatiner (thanks to @xqua for troublshooting). To load a container with
sowhat and the required dependencies, use the following:
docker pull shchurch/sowhat
sowhat and documentation, use the following:
perl Makefile.PL make make test sudo make install
To install without root privelages try:
perl Makefile.PL PREFIX=/home/myuser/scripts make make test sudo make install
INSTALL WITH DEPENDENCIES
You can install SOWHAT and all the required dependencies listed above on a clean Ubuntu 15.04
machine with the following commands (executables will be placed in
sudo apt-get update sudo apt-get install -y r-base-core cpanminus unzip gcc git sudo cpanm Statistics::R sudo cpanm JSON sudo Rscript -e "install.packages('ape', dependencies = T, repos='http://cran.rstudio.com/')" cd ~ git clone https://github.com/josephryan/sowhat.git cd `sowhat`/ # To work on the development branch (not recommended) execute: git checkout -b Development origin/Development sudo ./build_3rd_party.sh perl Makefile.PL make make test sudo make install
build_3rd_party.sh installs some dependencies from versions that are cached in
this repository. They may be out of date.
Several test datasets are provided in the
examples/ directory. To run example analyses
on these datasets, execute:
examples.sh and the resulting
test.output/ directory for more on the specifics of
Warning: Some of the examples take time (especially those that use Garli). For a quick example run
make test and see the output in the
1. Alignment (DNA, AA, or binary characters)
Format: non-interleaved PHYLIP format
This can be DNA, amino acid, or binary characters. Often, you would have performed phylogenetic analyses on this alignment and recovered a result that was in conflict with an a priori hypothesis.
2. Constraint tree
Format: Newick format
The constraint tree represents a hypothesis that you would like to compare to the ML tree or some alternative hypothesis. In most cases you will want a tree that is mostly unresolved except for the clade being tested.
For example if your ML tree showed a sister relationship between two taxa 'A' and 'B' and you want to compare this result to topology with a sister relationship between 'A' and 'C,' you would create the following constraint tree:
Note that the relationship
F is unresolved.
3. RAxML model
The only other required parameter when using RAxML is
This option can specify any of the models that are available to RAxML. Running
sowhat with the option
--raxml_model=available will provide a list of all possible models.
Other RAxML parameters (including number of threads) can be specified with the option:
--rax='/usr/local/bin/raxmlHPC-PTHREADS -T 20'
examples.sh for examples of
sowhat samples 1000 bootstrap replicates. This can be adjusted with
--reps=[sample size]. A sufficient sample size can be assessed by checking the reported confidence interval around the p-value.
Examining the results
The results of the SOWH test are included in a file called
sowhat.results.txt, which can be found in the directory specified with the
At the bottom of
sowhat.results.txt is a p-value representing the probability that the test statistic would be observed under the null hypothesis.
A run that has been cut short can be restarted using the
--restart option. In this case the null distribution will be recalculated iteratively using the previously simulated samples in the null distributions. Only the most recent two generations of sequence simulation and tree estimation will be reperformed to prevent any errors from an unfinished tree estimation.
Additional outputs include
- detailed information on the model used for simulating new alignments in the file
- information on the null distribution in
- the trace file for the run is printed to
- program files printed to a directory
sowhat_scratch. Within this directory, the files ending in
...i.0.0represent the initial search of the empirical alignment file.
Results can be printed to a file
sowhat.results.json using the option
Running large analyses
The SOWH test can take a lot of time, especially on datasets where a single tree search can take many hours. Threads can be incorporated into raxml as described above with the
--rax options, which can speed up the tree searches considerably.
In some cases, though, the user may want to further parallelize the
sowhat test. The following option allows a user to run the tree searches on simulated datasets simultaneously, for example on a cluster.
To use this option, you must specify the following options:
--print_tree_scripts --reps=[sample size, default=1000]
The initial two tree searches on the observed data will be performed. Subsequently
sowhat will generate simulated alignments and print a series of scripts to execute the tree searches to the folder
Each of these scripts must be executed externally, and can be run simultaneously. After they have all been completed, the user reruns
sowhat with the following options:
--print_tree_scripts --reps=[same number of reps] --restart
One note: if the inital sample size is too low (the confidence interval around the p-value indicates that the results are not definitive), the user can generate additional tree scripts by rerunning the
sowhat command with the following options:
--print_tree_scripts --reps=[some higher number of reps] --restart
sowhat will not calculate the statistics until the number of tree scripts specified in the number of reps have been executed successfully.
See this page for descriptions of additional options and how to use more complex models.
sowhat --constraint=NEWICK_CONSTRAINT_TREE --aln=PHYLIP_ALIGNMENT --name=NAME_FOR_REPORT --dir=DIR [--debug] [--garli=GARLI_BINARY_OR_PATH_PLUS_OPTIONS] [--garli_conf=PATH_TO_GARLI_CONF_FILE] [--help] [--initial] [--json] [--max] [--raxml_model=MODEL_FOR_RAXML] [--nogaps] [--partition=PARTITION_FILE] [--pb=PB_BINARY_OR_PATH_PLUS_OPTION [--pb_burn=BURNIN_TO_USE_FOR_PB_TREE_SIMULATIONS] [--plot] [--ppred=PPRED_BINARY_OR_PATH_PLUS_OPTIONS] [--print_tree_scripts] [--rax=RAXML_BINARY_OR_PATH_PLUS_OPTIONS] [--reps=NUMBER_OF_REPLICATES] [--resolved] [--rerun] [--restart] [--runs=NUMBER_OF_TESTS_TO_RUN] [--seqgen=SEQGEN_BINARY_OR_PATH_PLUS_OPTIONS] [--treetwo=NEWICK_ALTERNATIVE_TO_CONST_TREE] [--usepb] [--usegarli] [--usegentree=NEWICK_TREE_FOR_SIMULATING_DATA] [--version]
Extensive documentation is embedded inside of
sowhat in POD format and
can be viewed by running any of the following:
`sowhat` --help perldoc `sowhat` man `sowhat` # available after installation
A peer-reviewed manuscript describing
sowhat is available at Systematic Biology:
Church, Samuel H., Joseph F. Ryan, and Casey W. Dunn. "Automation and Evaluation of the SOWH Test with SOWHAT" Systematic Biology 2015 Nov;64(6):1048-58. doi: 10.1093/sysbio/syv055
Also see the file
Goldman, Nick, Jon P. Anderson, and Allen G. Rodrigo. "Likelihood-based tests of topologies in phylogenetics." Systematic Biology 49.4 (2000): 652-670. doi:10.1080/106351500750049752
Swofford, David L., Gary J. Olsen, Peter J. Waddell, and David M. Hillis. Phylogenetic inference. (1996): 407-514. http://www.sinauer.com/molecular-systematics.html
We have tested
sowhat on OS X 10.9, OS X 10.10, Ubuntu Server 10.04 (Amazon ami-d05e75b8), and Ubuntu Desktop 15.04. It will likely work on a variety of other Unix-like operating systems.
The dependencies listed below are required by
sowhat. They must be installed and
available in the appropriate
PATH. If they are not installed already, follow the
installation instructions in the links provided for each tool. We have tested
with the indicated dependency versions. Other versions may be incompatible, and should be
used with caution. These external tools are the result of a considerable amount of work by other investigators, please also cite them when you cite
General system tools:
- Perl, which comes with most operating systems
- The Statistics::R Perl module.
Statistics::Rhas additional requirements, as described at http://search.cpan.org/dist/Statistics-R/README. Use the
local::liboption to install
sudo. Use the boostrap method found at http://search.cpan.org/~haarg/local-lib-2.000004/lib/local/lib.pm for installation information. Once local::lib has been installed, and with R installed, install the Statistics::R package as you would normally. The use local::lib option must be activated in the program as well.
- The IPC::Run Perl module is currently needed for
make testto work correctly (optional).
To use more alternative models, you will need to install the following optional dependency:
To print results to a json file, you will need to install the following optional dependency:
- The JSON Perl module.
COPYRIGHT AND LICENSE
Copyright (C) 2015 Samuel H. Church, Joseph F. Ryan, Casey W. Dunn
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program in the file LICENSE. If not, see http://www.gnu.org/licenses/.