A specialized surrogate function for phylogenetic likelihoods.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
example Update example. Feb 9, 2017
lcfit_cpp_src Centralize compiler flags. Feb 9, 2017
lcfit_src Added compatibility with GSL2 Mar 16, 2018
lib/cmake Fix for GSL finding Jun 23, 2014
sims
test
theory Add sagemath worksheets for lcfit2. Mar 23, 2017
.gitignore Update .gitignore. Sep 6, 2016
.travis.yml
CMakeLists.txt Centralize compiler flags. Feb 9, 2017
COPYING
Doxyfile Update Doxyfile. Aug 25, 2015
Makefile
README.md Add some usage examples to README.md. Feb 14, 2017
mainpage.dox

README.md

lcfit build status

Likelihood curve fitting by nonlinear least squares.

Building

Compile-time dependencies

Building the library requires CMake.

Compiling the lcfit C library requires the GNU Scientific Library and NLopt. Compiling the lcfit C++ extension library, the lcfit-compare tool, and the test suite requires a C++11-compatible compiler. Additionally, the lcfit-compare tool requires the libraries bpp-core, bpp-seq, and bpp-phyl from Bio++ 2.2.0.

On Debian/Ubuntu:

sudo apt-get install \
    libgsl0-dev \
    libnlopt-dev \
    libbpp-core-dev \
    libbpp-seq-dev \
    libbpp-phyl-dev

Compiling

Run make to obtain static and dynamic libraries. Run make doc to build documentation (requires Doxygen).

To build the lcfit-compare tool required for running the example and simulations, run make lcfit-compare.

Running unit tests

To build and run the test suite, run make test.

Running simulations

nestly is used to build an extensive hierarchy of directories and configuration files to measure the behavior of lcfit when applied to a wide variety of data.

Running the simulations requires Python 2.7, R 3.2.4, and BppSuite 2.2.0 in addition to the dependencies required for the lcfit-compare tool described above. Instructions for installing the Python and R package dependencies and running the simulations can be found in sims/README.md.

Basic usage

Fitting a model

The most straightforward way of fitting a model to an empirical log-likelihood function is via the C API function lcfit_fit_auto() declared in lcfit_select.h. This function accepts a callback function for computing the log-likelihood at a given branch length, an initial model, and the minimum and maximum branch lengths to consider.

As an example, assume you have a function for computing log-likelihood with the following signature:

double my_lnl(tree_t* tree, int node_id, double branch_length);

To adapt this function for lcfit_fit_auto(), you might do something like the following:

typedef struct {
    tree_t* tree;
    int node_id;
} my_lnl_data_t;

double my_lnl_callback(double branch_length, void* data)
{
    my_lnl_data_t* lnl_data = (my_lnl_data_t*) data;
    return my_lnl(lnl_data->tree, lnl_data->node_id, branch_length);
}

You could then use the callback and data struct like this:

my_lnl_data_t my_lnl_data;

// populate my_lnl_data struct
my_lnl_data.tree = ...
my_lnl_data.node_id = ...

// choose initial model
bsm_t lcfit_model = DEFAULT_INIT;  // DEFAULT_INIT defined in lcfit.c

// set bounds
double min_t = ...
double max_t = ...

// fit model and estimate maximum-likelihood branch length
double ml_t = lcfit_fit_auto(&my_lnl_callback, &my_lnl_data, &lcfit_model, min_t, max_t);

In C++, callback data isn't limited to plain datatypes or C-style structs. See the function log_likelihood_callback() in lcfit_cpp_src/lcfit_compare.cc for an example of using a C++ class in the callback function for computing log-likelihoods.

Sampling from the estimated posterior distribution

The lcfit C++ API includes a simple rejection sampler for sampling from the unnormalized posterior given an lcfit model and an exponential prior. It's used as follows:

gsl_rng* rng = gsl_rng_alloc(gsl_rng_default);

bsm_t lcfit_model = ...  // fitted model
double lambda = ...  // exponential distribution rate parameter

lcfit::rejection_sampler sampler(rng, lcfit_model, lambda);

// single sample
double x = sampler.sample();

// multiple samples
std::vector<double> xs = sampler.sample_n(1000);

The rejection sampler class also has a few functions exposed for computing the likelihood, density, and cumulative density at a given branch length. The density and cumulative density functions rely on GSL for numerical integration.