We are building a Python-interface C++ library for phylogenetic variational inference so that you can express interesting parts of your phylogenetic model in Python/TensorFlow/PyTorch/etc and let libsbn handle the tree structure and likelihood computations for you.
- If you are on linux, install gcc >= 8, which is standard in Debian Buster and Ubuntu 18.04
- If you are on OS X, use a recent version of Xcode and install command line tools
Then, install the
hmc-clock branch of BEAGLE.
This will require a from-source installation, as in their docs, but you have to do a full
git clone (no
You can see a full installation procedure by taking a look at the conda-beagle Dockerfile.
To install additional dependencies, use the associated conda environment file:
conda env create -f environment.yml conda activate libsbn
If you want to specify your compiler manually, set the
CXX shell variables to your desired compiler command.
The notebooks require R, IRKernel, rpy2 >=3.1.0, and some R packages such as ggplot and cowplot. Do not install R via conda. Doing so will install the conda compiler toolchain, this will mess up our compilation.
For your first build, do
git submodule update
- Respond to interactive prompts about where
hmc-clockBEAGLE is installed
conda activate libsbn
After these steps
make will build, run tests, and install the Python packages, and this should be the only command you need to run after modifying the code.
The build process will modify the conda environment to point
[DY]LD_LIBRARY_PATH to where BEAGLE is installed.
If you get an error about missing BEAGLE, just
conda activate libsbn again and you should be good.
If you want to modify your desired BEAGLE installation location, do
unset BEAGLE_PREFIX and start the steps above again starting at
- (Optional) If you modify the lexer and parser, call
make bison. This assumes that you have installed Bison > 3.4 (
conda install -c conda-forge bison).
- (Optional) If you modify the test preparation scripts, call
make prep. This assumes that you have installed ete3 (
conda install -c etetoolkit ete3).
The following two papers will explain what this repository is about:
- Zhang & Matsen IV, NeurIPS 2018. Generalizing Tree Probability Estimation via Bayesian Networks; 👉🏽 blog post.
- Zhang & Matsen IV, ICLR 2019. Variational Bayesian Phylogenetic Inference; 👉🏽 blog post.
Our documentation consists of:
- Online documentation
- Derivations in
doc/tex, which explain what's going on in the code.
libsbn is written in C++17.
The associated Python module,
vip, is targeting Python 3.6.
We want the code to be:
- correct, so we write tests
- efficient in an algorithmic sense, so we consider algorithms carefully
- clear to read and understand, so we write code with readers in mind and use code standards
- fast, so we do profiling to find and eliminate bottlenecks
- robust, so we use immutable data structures and safe C++ practices
- simple and beautiful, so we keep the code as minimal and DRY as we can without letting it get convoluted or over-technical
- Prefer a functional style: returning variables versus modifying them in place. Because of return value optimization, this doesn't have a performance penalty.
- RAII. No
- Classic/raw pointers are used as non-owning references only.
- Prefer variable names and simple coding practices to code comments. If that means having long identifier names, that's fine! If you can't make the code use and operation inherently obvious, please write documentation.
- TODO comments don't get merged into master. Rather, make an issue on GitHub.
- Always use curly braces for the body of conditionals and loops, even if they are one line.
The C++ core guidelines are the authority for how to write C++, and we will follow them.
More generally, we use clang-tidy to check our code according to the
.clang-tidy file in the root of the repo.
For issues not covered by these guidelines (especially naming conventions), we will use the Google C++ Style Guide.
There are certainly violations of these guidelines in the code, so fix them when you see them!
Add a test for every new feature.
- Code changes start by raising an issue proposing the changes, which often leads to a discussion
- Make a branch associated with the issue named with the issue number and a description, such as
4-efficiency-improvementsfor a branch associated with issue #4 about efficiency improvements
- If you have another branch to push for the same issue (perhaps a fresh, alternate start), you can just name them consecutively
4-2-etc, and so on
- Push code to that branch
- Once the code is ready to merge, open a pull request
- Code review on GitHub
- Squash and merge, closing the issue via the squash and merge commit message
- Delete branch
- Erick Matsen (@matsen): implementation, design, janitorial duties
- Mathieu Fourment (@4ment): implementation of substitution models and BEAGLE likelihoods/gradients, design
- Seong-Hwan Jun (@junseonghwan): implementation of SBN gradients, design
- Cheng Zhang (@zcrabbit): concept, design, algorithms
- Christiaan Swanepoel (@christiaanjs): design
- Matthew Karcher (@mdkarcher): SBN expertise
- Xiang Ji (@xji3): gradient expertise
- Marc Suchard (@msuchard): gradient expertise
If you are citing this library, please cite the NeurIPS and ICLR papers listed above. We require BEAGLE, so please also cite these papers:
- Jaime Huerta-Cepas: several tree traversal functions are copied from ete3
- Thomas Junier: parts of the parser are copied from newick_utils
- The parser driver is derived from the Bison C++ example
In addition to the packages mentioned above we also employ: