Interested in modeling continuous trait evolution on a phylogeny as a Lévy process? This git repository contains the most recent source code for the method we described in Systematic Biology, found here. We whipped together this guide to respond to questions received thus far. Please feel welcome to email us with questions about the method or the model: mlandis (at) berkeley (dot) edu.
Michael, Josh, and Mason
July 2, 2013
- Method now produces TreeFig compatible .jump_snr.txt output file to show which branches have strong signal for jumps (described in paper).
- SNR is now scaled by inverse square root of the branch length instead of just the branch length.
- Improved jump normal (modelType=3) performance.
From the command line, navigate into the
Before compiling, verify you have the GNU Scientific Library (GSL) installed. For Mac OS X, you can install GSL using Homebrew</a<> as follows:
brew install gsl. For Ubuntu, you can install GSL as follows:
apt-get install libgsl0-dev. Advice for how to accomplish on Windows is welcome.
To compile, issue the command:
g++ -O3 -lgsl *.cpp -o creepy-jerk(For Ubuntu-based Linux, Heroen Verbruggen found the following command was needed: g++ -O3 -o creepy-jerk
gsl-config --cflags -libs. For Debian 7, Joseph Brown used:
g++ -O3 -lgsl -lgslcblas *.cpp -o creepy-jerkwith gcc v4.7.2, and found a 3.5x speedup using
icpc -O3 -lgsl -lgslcblas *.cpp -o creepy-jerkwith Intel (icc v15.0.0) compilers).
You should now find an executable called
creepy-jerkin your current directory.
- Copy the
creepy-jerkexecutable into th
examplesfolder. Here, you will find the tree file, the taxa file, and the three data files used in Landis, Schraiber, and Liang (2013).
- To run an example, execute
- You may find it easier to simply modify these text files than to construct new command lines (next section). NOTE: all examples are configured to run with modelType=3, the compound Poisson process with normally distributed jumps.
- The command-line syntax is as follows:
./creepy-jerk -flag_name1=flag_value1 -flag_name2=flag_value2 .... If a flag is not called, the application runs under the default values given below.
- Example command-line string:
./creepy-jerk -modelType=3 -simName=example -printStdOut=True -printFreqStdOut=1000 -printFreqJump=1000 -printFreqMH=1000 -numCycles=2000000 -outputFilePath=./ -inputFilePath=./ -dataFileName=data.txt -taxaFileName=taxa.txt -treeFileName=tree.txt</code>
Analysis model type: 1=alpha-stable, 3=compound Poisson with normally distributed jumps, 4=variance Gamma, 5=pure Brownian motion (default 3)
Number of MCMC cycles for the analysis (default 10000000)
MCMC parameter sample frequency (default: 10000)
MCMC per-branch jump size sample frequency, indexed by branch's immediately descendant node (default: 10000)
Burn-in period for computing the branch-length-normalized signal-to-noise ratio tree figure (default: 0.25 * numCycles)
Random number generator seed (default: system time, i.e. effectively random)
Standard deviation of the normally distributed jump proposal, adjustable in case of poor mixing (default: 1.0)
Briefly, this is the fraction of the branch length converted into Brownian motion, and is used to prohibit singularities in our formulation of the Levy pdf. Default setting recommended (default: 0.9995).
Name of file containing Newick formatted string with taxa labels and branch lengths (default: "")
Name of file containing list taxa present in tree (default: "")
Name of file containing continuous traits, ordered according to taxaFileName (default: "")
Full filepath of input files (default: "./")
Full filepath of output files (default: "./")
Enable printing of MCMC parameter state to console (default: true)
Console print frequency (default: 1000)
To interpret your MCMC posterior, we recommend using Tracer. The parameters file contains the MCMC chain states sampled at the interval defined by printFreqMH. All fields are tab-delimited, with each column corresponding to a state variable, with those states being (in order):
The MCMC sample's discrete time index
The full model log likelihood (i.e. lnL = lnB + lnJ, see below)
The pure-diffusion model log likelihood
The pure-jump model log likelihood
The variance per unit time, which is a function of the modelType variable.
The kurtosis per unit time, which is a function of the modelType variable
The model parameters, where sigma-modelType is the Brownian motion component rate parameter, and the remaining parameters (if any) are the jump component parameters
####output_filename.jumps.txt and output_filename.jump_snr.txt
The jumps file contains the MCMC chain states sampled at the interval defined by printFreqJump. Each row contains a Newick string with node annotations reporting the sampled sum of trait change drawn from the jump measure (hereafter, jump values). Although the jump values are associated with nodes, they correspond to the jumps occurring along the branch from the node's ancestor to the node itself. Note, the topology of the tree remains constant in the current implementation of creepy-jerk.
The jump_snr file contains the final branch-length-normalized signal-to-noise ratio of the tree's posterior jump values. The color scale is defined such that a value of 0.0 is gray, and the intensity of colors indicates the deviation in this value from 0.0. Finally, these values exclude the burn-in period as defined by the snrBurnIn flag (default: 0.25 * numCycles).
In the convention of the New Hampshire eXtended (.nhx) format, jump values for branches are given as [&j=value] immediately following the taxon label or divergence event.
To quickly view the posterior jumps.txt and jumps_snr.txt output, we recommend using the tree visualization software, FigTree 1.4. Load the jump file by selecting File -> Open.