Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Heuristics for the Smallest Grammar Problem (SGP)

The smallest grammar problem is the problem of finding the smallest context-free grammar that generates exactly one given sequence. Approximating the problem with a ratio of less than 8569/8568 is known to be NP-hard. Most work on this problem has focused on finding decent solutions fast (mostly in linear time), rather than on good heuristic algorithms.

This project provides a hybrid of a max-min ant system and a genetic algorithm that in combination with a novel local search outperforms the state of the art on all files of the Canterbury corpus, a standard benchmark suite. Furthermore, this hybrid performs well on a standard DNA corpus.


  • Go version 1.x


Most of the code is released under the Apache 2.0 license (see LICENSE). The only exception is src/qsufsort/qsufsort.go which is a modified version of the file from the Go library and thus released under a BSD-style license (see src/qsufsort/LICENSE). All source code files contain a copyright notice header.


To build everything, use:

A explanation of all command line arguments is provided with:

sgp --help
Usage of sgp:
  -choice="": file containing an encoded choice of yields to start with
  -cpuprofile="": enables CPU profiling
  -csv=false: output in CSV format
  -i="": input file
  -initls=false: save intermediate results of initial local search
  -irr=10: number of IRR runs
  -ls=true: use local search
  -lscandidates=1000: max #candidates considered for bottom-up local search
  -m="hybrid": metaheuristic (genetic, ants, hybrid, none)
  -number=-1: this number is appended to the created files
  -outputlimit=-1: output every grammar of this or smaller size
  -print=false: prints the grammar human readable to stdout
  -rirr=false: randomized IRR
  -seed=-1: seed for pseudo-random number generator
  -sizelimit=-1: stop if a grammar of this size is reached
  -timelimit=0: time limit with unit (e.g. 10s)

Run the hybrid

sgp --i <input file> --timelimit 10s

Any result can easily be checked for correctness with a separate small checker. The checker is an independent programm that has less than 200 lines of code. Thus, the implementation fulfills the requirements of a Certifying Algorithm. The choice of nonterminals is used as a compact way of storing grammars and also plays the role of the witness:

checker --f <input file> --g <grammar to verify> -s <grammar size>


Canterbury corpus:


This project started during my Master's thesis at Saarland University in winter 2012/13. The results are published in the following paper:

An Effective Heuristic for the Smallest Grammar Problem. Florian Benz and Timo Kötzing. GECCO '13: Proceedings of the 15th International Conference on Genetic and Evolutionary Computation


Heuristics for the Smallest Grammar Problem (SGP)







No releases published


No packages published