Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

GAPML: GESTALT analysis using penalized maximum likelihood

Associated paper: Feng, et al. (In press). "Estimation of cell lineage trees by maximum-likelihood phylogenetics". Annals of Applied Statistics. Preprint available on BioRxiv.


You will need to have PHYLIP mix installed so that you can call mix on the command line.

We use pip to install things into a python virtual environment. FIRST: if you want tree visualizations from ete3, you will need to do something special since we are using pip (rather than conda). Install PyQt5 (sudo apt-get on linux, brew install on mac. For mac, see Then run pip install PyQt5 in your virtual environment. AFTERWARDS: Install the libraries into your virtual environment through the requirements.txt file.

We use nestly + SCons to run simulations/analyses. You should install scons outside the virtual environment, for a python 2.* or a 3.5+ environment. Then activate the virtual environment and then run scons ___.

You need to create a file in the gestalt folder that provides the paths to the different executables. (mix and bhv distance calculators) For example:

MIX_PATH = "~/my_path/mix" # download from phylip website
BHV_PATH = "~/my_path/gtp.jar" # download from
RSPR_PATH = "~/my_path/rspr" # download from

GESTALT pipeline make data restrict to observing the first K alleles or create tree topologies to fit to to select the best topology given a set of possible topologies

  • --obs-file: Pickle file with observations generated by or
  • --topology-file: Pickle file with candidate topologies from
  • --out-model-file: Name of output file.
  • --log-file: Name of log file
  • --branch-pen-params: Candidate penalty parameters for penalizing differences between branch lengths. We will tune over these using a variant of cross-validation.
  • --target-lam-pen-params: Candidate penalty parameters for penalizing differences between target cut rates. We will tune over these using a variant of cross-validation.
  • --num-penalty-tune-iters: Number of iterations we should spend on tuning the penalty parameters.
  • --num-penalty-tune-splits: Number of data splits to create (typically 3 - 5)
  • --max-fit-splits: Maximum number of data splits to use for fitting and tuning the penalty parameters (typically same value as --num-penalty-tune-splits, but you can use a smaller number to speed up the computation)
  • --num-chad-tune-iters: Number of iterations to spend on tuning the tree topology with subtree prune and regraft (SPR) moves.
  • --num-chad-stop: If we don't change the tree topology for this many steps, then stop tuning the topology.
  • --max-chad-tune-search: Maximum number of SPR moves to consider at each iteration
  • --max-extra-steps: Maximum number of hidden cuts we should consider when approximating the likelihood. (A large number means more computation time.)
  • --max-sum-states: Maximum number of ancestral states to sum over when computing the likelihood. (A large number means more computation time.)
  • --max-iters: Maximum number of iterations for tuning branch lengths and mutation parameters.
  • --num-inits: Number of initializations to try when minimizing penalized log likelihood with respect to the branch lengths and mutation parameters output the fitted tree in newick format

Fitting the trees via alternative methods uses Sanderson 2002, the chronos function in R package ape

How to get things running quickly and how to run analyses in the paper

The full pipelines are specified in the sconscript files. To run it, we use scons. For example: scons simulation_topol_consist/ The sconscript files provides the full pipeline for generating data and analyzing it (see simulation_topol_consist/sconscript as an example) as well as processing data from GESTALT and analyzing it (see analyze_gestalt/sconscript as an example).

Running tests

To run all the tests:

python3 -m unittest

To run specific test module tests/

python3 -m unittest tests.<test_me>

Experimental results

The fitted trees for the two adult fish in the paper is available at gestalt/analyze_gestalt/_output/ADR*_PMLE_tree.json.


No description, website, or topics provided.







No releases published


No packages published

Contributors 4