Skip to content
Python-interface C++ library for variational Bayesian phylogenetics
C++ Python Terra Yacc LLVM Jupyter Notebook Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
doc
lib
notebooks
src
test
vip
.clang-format
.clang-tidy
.gitignore
.gitmodules
.travis.yml
Dockerfile
LICENSE
Makefile
README.md
SConstruct
environment.yml
pyproject.toml
setup.cfg
setup.py

README.md

libsbn

Docker Cloud Build Status   Travis CI status

We are building a Python-interface C++ library for phylogenetic variational inference so that you can express interesting parts of your phylogenetic model in Python/TensorFlow/PyTorch/etc and let libsbn handle the tree structure and likelihood computations for you.

Dependencies

  • If you are on linux, install gcc >= 8, which is standard in Debian Buster and Ubuntu 18.04
  • If you are on OS X, use a recent version of Xcode and install command line tools

Then, install the hmc-clock branch of BEAGLE. This will require a from-source installation, as in their docs, but you have to do a full git clone (no --depth=1). You can see a full installation procedure by taking a look at the conda-beagle Dockerfile.

To install additional dependencies, use the associated conda environment file:

conda env create -f environment.yml
conda activate libsbn

If you want to specify your compiler manually, set the CC and CXX shell variables to your desired compiler command.

The notebooks require R, IRKernel, rpy2 >=3.1.0, and some R packages such as ggplot and cowplot. Do not install R via conda. Doing so will install the conda compiler toolchain, this will mess up our compilation.

Building

For your first build, do

  • git submodule update
  • scons
  • Respond to interactive prompts about where hmc-clock BEAGLE is installed
  • conda activate libsbn
  • make

After these steps make will build, run tests, and install the Python packages, and this should be the only command you need to run after modifying the code.

The build process will modify the conda environment to point [DY]LD_LIBRARY_PATH to where BEAGLE is installed. If you get an error about missing BEAGLE, just conda activate libsbn again and you should be good. If you want to modify your desired BEAGLE installation location, do unset BEAGLE_PREFIX and start the steps above again starting at scons.

  • (Optional) If you modify the lexer and parser, call make bison. This assumes that you have installed Bison > 3.4 (conda install -c conda-forge bison).
  • (Optional) If you modify the test preparation scripts, call make prep. This assumes that you have installed ete3 (conda install -c etetoolkit ete3).

Understanding

The following two papers will explain what this repository is about:

Our documentation consists of:

Contributing

libsbn is written in C++17.

The associated Python module, vip, is targeting Python 3.6.

Style

We want the code to be:

  1. correct, so we write tests
  2. efficient in an algorithmic sense, so we consider algorithms carefully
  3. clear to read and understand, so we write code with readers in mind and use code standards
  4. fast, so we do profiling to find and eliminate bottlenecks
  5. robust, so we use immutable data structures and safe C++ practices
  6. simple and beautiful, so we keep the code as minimal and DRY as we can without letting it get convoluted or over-technical

Also let's:

  • Prefer a functional style: returning variables versus modifying them in place. Because of return value optimization, this doesn't have a performance penalty.
  • RAII. No new.
  • Classic/raw pointers are used as non-owning references only.
  • Prefer variable names and simple coding practices to code comments. If that means having long identifier names, that's fine! If you can't make the code use and operation inherently obvious, please write documentation.
  • TODO comments don't get merged into master. Rather, make an issue on GitHub.
  • Always use curly braces for the body of conditionals and loops, even if they are one line.

The C++ core guidelines are the authority for how to write C++, and we will follow them. More generally, we use clang-tidy to check our code according to the .clang-tidy file in the root of the repo. For issues not covered by these guidelines (especially naming conventions), we will use the Google C++ Style Guide.

There are certainly violations of these guidelines in the code, so fix them when you see them!

Formatting

C++ gets formatted using clang-format, and Python gets formatted using Black and docformatter. See the Makefile for the invocations.

Tests

Add a test for every new feature.

Git flow

  • Code changes start by raising an issue proposing the changes, which often leads to a discussion
  • Make a branch associated with the issue named with the issue number and a description, such as 4-efficiency-improvements for a branch associated with issue #4 about efficiency improvements
  • If you have another branch to push for the same issue (perhaps a fresh, alternate start), you can just name them consecutively 4-1-blah, 4-2-etc, and so on
  • Push code to that branch
  • Once the code is ready to merge, open a pull request
  • Code review on GitHub
  • Squash and merge, closing the issue via the squash and merge commit message
  • Delete branch

Contributors

  • Erick Matsen (@matsen): implementation, design, janitorial duties
  • Mathieu Fourment (@4ment): implementation of substitution models and BEAGLE likelihoods/gradients, design
  • Seong-Hwan Jun (@junseonghwan): implementation of SBN gradients, design
  • Cheng Zhang (@zcrabbit): concept, design, algorithms
  • Christiaan Swanepoel (@christiaanjs): design
  • Matthew Karcher (@mdkarcher): SBN expertise
  • Xiang Ji (@xji3): gradient expertise
  • Marc Suchard (@msuchard): gradient expertise

Citations

If you are citing this library, please cite the NeurIPS and ICLR papers listed above. We require BEAGLE, so please also cite these papers:

Acknowledgements

  • Jaime Huerta-Cepas: several tree traversal functions are copied from ete3
  • Thomas Junier: parts of the parser are copied from newick_utils
  • The parser driver is derived from the Bison C++ example

In addition to the packages mentioned above we also employ:

You can’t perform that action at this time.