Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

What can phylogenetic metrics tell us about useful diversity in evolutionary algorithms?

This repository contains all code and supplemental material for "What can phylogenetic metrics tell us about useful diversity in evolutionary algorithms?", originally presented at Genetic Programming in Theory and Practice, 2021. This paper is available as a preprint and in the published conference proceedings.

supplemental preprint paper OSF DOI



It is generally accepted that "diversity" is associated with success in evolutionary algorithms. However, diversity is a broad concept that can be measured and defined in a multitude of ways. To date, most evolutionary computation research has measured diversity using the richness and/or evenness of a particular genotypic or phenotypic property. While these metrics are informative, we hypothesize that other diversity metrics are more strongly predictive of success. Phylogenetic diversity metrics are a class of metrics popularly used in biology, which take into account the evolutionary history of a population. Here, we investigate the extent to which 1) these metrics provide different information than those traditionally used in evolutionary computation, and 2) these metrics better predict the long-term success of a run of evolutionary computation. We find that, in most cases, phylogenetic metrics behave meaningfully differently from other diversity metrics. Moreover, our results suggest that phylogenetic diversity is indeed a better predictor of success.

Supplemental information

The supplemental information for this paper is here.


The C++ code to run these experiments requires:


You can compile and run the code used in the paper as follows:

# Clone Empirical
git clone --recursive 

# Clone EC-ecology-toolbox
git clone

# Clone the repo for this project
git clone --recursive

### Complex fitness landscapes

# Compile the executable to run experiments for this project

# Run an experiment. To set parameters, use command line flags
# e.g. to set the selection scheme, run ./ecology_parameter_sweep -SELECTION 2
# To see all options, run ./ecology_parameter_sweep --help

### Exploration diagnostic

# all of the code for the exploration diagnostic lives in the exploration_diagnostic submodule
cd exploration_diagnostic

# the dia_world executable can be configured in the same way as the ecology_parameter_sweep executable


Research overview

Phenotypic diversity vs phylogenetic diversity.

In short, phenotypic diversity measures the diversity of phenotypes in the population at any one point in time. Phylogenetic diveristy measures the diversity of evolutionary history represented in a population. We wrote a lot more about building phylogenies in the context of computational evolution in this paper.

As an example, the following figure shows two different phylogenies (ancestry trees). Arrows show parent-child relationships. Each node is a taxonomically unique phenotype (i.e., a phenotype with a unique evolutionary origin). For simplicity, leaf nodes in these diagrams are assumed to be the current set of taxa in the population; in reality, there could be non-leaf nodes corresponding to extant taxa. A) A population with high phenotypic diversity (phenotypic richness = 5) and low phylogenetic diversity (mean pairwise distance = 2). B) A population with low phenotypic diversity (phenotypic richness = 2) and high phylogenetic diversity (mean pairwise distance = 6).

Example of populations with different levels of phenotypic and phylogenetic diveristy

Research questions

  1. Is phylogenetic diversity meaningfully different from phenotypic diversity in the context of evolutionary computation?

The answer to this question is important. Intuitively, we might think that since these are both types of diversity, they should correlate pretty closely. Given that phylogenetic diversity is more computationally intensive to measure, if we're going to argue that it's something evolutionary computation researchers should pay attention to (spoilers: we are!), we need to show that it is meaningfully different.

  1. Is phylogenetic diversity more informative about outcomes in evolutionary computation than phenotypic diversity?

The importance of this question is more obvious. We know that diversity is centrally linked to the success of evolutionary algorithms. There are hints scattered across the literature that certain types of diversity are more "useful" to solving problems than others. So our goal is for this work to move us towards a better understanding of which types of diversity we should be promoting in evolutionary algorithms.

Study design

We ran 5 selection schemes (random, tournament, fitness sharing, lexicase selection, and eco-ea) on 5 different problems (one designed to be a clean test environment, and 4 chosen to evoke the messy realities of real problems) and gathered a ludicrous amount of data. Here and in the paper, we attempt to focus very closely on getting answers to the two specific questions that we asked above (to avoid overwhelming ourselves or the reader with a firehose of data). There are many intriguing aspects of this data set that raise further questions, which we look forward to addressing in the future.


  1. Phylogenetic diversity and phenotypic diversity behave differently to an extent that was even surprising to us.

  2. Phylogenetic diversity is more predictive of success than phenotypic diversity in the vast majority of cases. The differences are often substantial (check out our effect sizes!).

Caveats/areas for future research

  • Phylogenetic diversity and phenotypic diversity are both broad classes of metrics, and there is substantial variation in how different phylogenetic diversity metrics behave in different contexts.

  • There is clearly variation in all of this over time and by fitness landscape.

Measuring phylogenies

If you want to start measuring phylogenetic diversity in your own research, check out the following phylogeny-tracking libraries:

  • phylotrackpy (python)
  • Empirical phylogeny tracker (C++) - part of a larger library, but the library is header-only so you can just use the systematics manager module. Phylotrackpy is just a wrapper around this code and is currently better documented, so we currently reccomend reading the phylotrackpy documentation to even if you're using the C++ implementation

Repository contents

  • .github/workflows: Github actions to automatically build supplemental material
  • analysis: Contains Rmarkdown (and compiled html) for all code used to analyze data for this project. This is broken into analysis for the exploration diagnostic and analysis for the other four problems. These files are compiled into the supplemental material
  • book: The supplemental material (this is a pre-compiled backup - the full supplemental material is auto-generated via github actions)
  • exploration_diagnostic: A submodule containing all code for the exploration diagnostic experiments.
  • scripts: Contains all scripts used to wrangle experiments and data
    • A script to count the number of times a phylogeny rediscovers the same phenotype
    • A script to remove duplicate lines from phylogenies (can happen when phylogeny files from multiple time points are concatenated)
    • The script used to submit experiments for this paper to our computing cluster
    • The script used to pull all of the data into a single file.
  • source: Contains the code used to generate the executable that was run in all complex fitness landscape experiments:
    • ecology_world.hpp: Contains the bulk of the code for the experiment
    • BitSorterMutators.hpp: Handles mutations for the sorting networks problem
    • TaskSet.hpp: Used to build the Logic-9 problem
    • TestcaseSet.hpp: Used for the program synthesis problem
    • org_types.hpp: Used to help wrangle the different representations used for each fitness landscape
    • native: Contains the .cc file that is ultimately compiled.
  • testcases: Contains test cases (from the program synthesis benchmark suite) used for the program synthesis problems.
  • DESCRIPION: For keeping tracking of R dependencies
  • LICENSE: The MIT License
  • Makefile: contains instructions for compiling the executable
  • _bookdown.yml: For the supplemental material
  • \output.yml: For the supplemental material
  • book.bib: For the supplemental material
  • Builds the supplemental material
  • index.Rmd: Contains the beginning of the supplemental material
  • packages.bib: Citations for all the R packages we used
  • style.css: Styling for the supplemental material
  • tail.Rmd: For the supplemental material