Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


For measuring distances between genetic programming trees, both syntactically and via operators.

This is GPDistance, a set of tools in Java and Python for research into distances between trees in genetic programming. It calculates two types of distance:

  • Syntactic distances work by comparing the contents and structures of two trees. For now the syntactic distances are tree-edit distance (TED), tree-alignment distance (TAD), overlap distance (OVD), feature-vector distance (FVD), and normalised compression distance (NCD). The Java implementation of the Zhang-Shasha tree-edit distance algorithm is Copyright (C) 2012 Mateusz Pawlik and Nikolaus Augsten (see It also uses a Java transliteration of the Clojure overlap distance by Lee Spector (see

  • Operator distances work by studying the transition probabilities between pairs of nodes via particular operators. We start with the transition probability (TP) between each pair of nodes. We can calculate the mean first passage time (MFPT), the shortest path (SP), and the minimum number of steps (STEPS). The MFPT is calculated with the help of, Copyright (C) 2012 Sergio J. Rey of PySAL (see, derived from the treatment in John G. Kemeny and J. Laurie Snell, 1976, Finite Markov Chains, Springer-Verlag, Berlin.


You need to get the JScheme library (in fact, it's only needed for peripheral functionality, but I haven't bothered to remove the dependency yet) and the RTED library (which is used for core functionality). Download and copy them to java/libraries:

Then there's a Makefile. Try make to compile everything. It calls into Makefiles in sub-directories to compile them. The Python code requires NumPy and SciPy. Use apt or your favourite package manager, or try pip install numpy scipy, but be warned that installing them is known to cause pain especially on OSX.


See the root Makefile to see some possibilities for what to run. Eg to write matrices with all pairwise distances for the complete space of depth 1, try make completeMatrices1. Larger experiments, such as make completeMatrices2 or make uniformSampleMatricesDepth6 will take a long time, some hours.


There are a few assumptions in the code which would need to be checked before extending it to consider (eg) other mutation operators, other languages, etc.


If you wish to use and cite this work, please cite this earlier paper which used many of the same concepts and methods (a newer publication is in preparation):

McDermott, O'Reilly, Vanneschi, and Veeramachaneni, "How Far Is It From Here to There? A Distance that is Coherent with GP Operators", EuroGP 2011, Springer.


For measuring distances between genetic programming trees, both syntactically and via operators.






No releases published


No packages published