Permalink
Fetching contributors…
Cannot retrieve contributors at this time
247 lines (174 sloc) 10.3 KB

Module: io.jenetics.prog

Javadoc

Genetic programming (GP) is a technique where a set of genes are modified (evolved) using an evolutionary algorithm. The io.jenetics.prog modules contains a ProgramGene, together with its corresponding ProgramChromosome, which allows to create operation (Op) trees.

The io.jenetics.prog module contains the classes which are needed for doing Genetic programming with the Jenetics library. It introduces a new Chromosome/Gene pair, which are the main entry points.

public interface Op<T> {
    public String name();
    public int arity();
    public T apply(T[] args);
}

The generic type of the Op enforces data-type constraints for the created program tree and makes the implementation a strongly typed GP algorithm.

Operations

When creating a new program tree, it is not necessary to implement own instance of the ProgramGene or ProgramChromosome class. The extension point for own programs is the Op interface.

 final Op<Double> myop = Op.of("myop", 3, v -> v[0]*v[1] + v[2]);

In the example above, a new operation with the "myop" and arity 3 is defined. Whenever the operation is evaluated, the function f(x, y, z) = x*y + z is executed.

Note
The class MathOp in the io.jenetics.prog.op package defines a set of mathematical standard operations/functions.

When creating a new ProgramChromosome we must distinguish two different kind of operations:

  1. Non-terminal operations have an arity greater than zero and form their own sub-tree.

  2. Terminal operations have an arity of zero and form the leaves of a program tree.

There are currently three different types of terminal operations implemented, Var, Const and the EphemeralConst class.

Var

The Var operation defines a variable of a program, which are set from the program arguments.

final ISeq<Op<Double>> terminals = ISeq.of(
    Var.of("x", 0), Var.of("y", 1), Var.of("z", 2)
);

The terminal operation list in the example code above will lead to a program which takes three input parameters, x, y and z, with the argument indices 0, 1 and 2.

Const

The Const operation will always return the same, constant, value when evaluated.

final Op<Double> one = Const.of(1.0);
final Op<Double> pi = Const.of("π", Math.PI);

We can create a constant operation in to flavours, with a value only and with a dedicated name. If a constant has a name, the symbolic name is used, instead of the value, when the program tree is printing.

EphemeralConst

An ephemeral constant is a special constant, which is only constant within an tree. If a new tree is created, a new constant is created, by the Supplier function the ephemeral constant is created with.

final Op<Double> rand1 = EphemeralConst.of(Math::random);
final Op<Double> rand2 = EphemeralConst.of("R", Math::random);

Program trees

Program creation

The ProgramChromosome comes with some factory methods, which lets you easily create program trees with a given depth and a given set of operations and terminals.

final int depth = 5;
final Op<Double> operations = ISeq.of(...);
final Op<Double> terminals = ISeq.of(...);
final ProgramChromosome<Double> program =
    ProgramChromosome.of(5, operations, terminals);

The code snippet above will create a perfect program tree of depth 5. All non-leaf nodes will contain operations, randomly selected from the operations set. Whereas all leaf nodes are filled with operations from the terminals set.

Note
The created program tree is perfect, which means that all leaf nodes have the same depth. If new trees needs to be created during evolution, they will be created with the depth, operations and terminals defined by the template program tree.

The GP Engine is created in the exact same way as the GA Engine.

final Engine<ProgramGene<Double>, Double> engine = Engine
    .builder(Main::error, program)
    .minimizing()
    .alterers(
        new SingleNodeCrossover<>(),
        new Mutator<>())
    .build();

For a detailed description on the correct Engine setup have a look at the Example section.

Program repair

The specialized Alterer classes for the ProgramChromosome guarantees that the program tree after the alter operation is still valid. They obey the tree structure of the chromosome, although they are altering the flattened ProgramGene sequence. General alterers, not written for ProgramChromosome, will most likely destroy the tree property of the altered chromosome. There are essentially two possibility for handling invalid tree chromosomes: 1) marking the chromosome as invalid or 2) try to repair the invalid chromosome. Possibility 1) would be easier, but it would also lead to a possible large number of invalid chromosomes.

Note
Jenetics allows the usage of arbitrary Alterer implementations. Even alterers not implemented for ProgramChromosomes. Chromosomes destroyed by the alterer are repaired.

Examples

Symbolic regression

Symbolic regression involves finding a mathematical expression, in symbolic form, that provides a good, best, or perfect fit between a given finite sampling of values of the independent variables and the associated values of the dependent variables.–John R. Koza, Genetic Programming I

The following example shows how to solve a GP problem with Jenetics. We are trying to find a polynomial (or an arbitrary mathematical function) which approximates a given data set.

Table 1. Table Sample points

x

y

0.00

0.0000

0.10

0.0740

0.20

0.1120.

0.30

0.1380

…​

…​

The sample points has been created with the function f(x) = 4*x^3 - 3*x^2 + x. The knowledge of the creating function makes it easier to compare the quality of the evolved function. For the example we created 21 data points.

Note
The function which created the sample points is not needed in the error function we have to define for the GP. It just let us verify the final, evolved result.

As first step, we have to define the set of allowed non-terminal and the terminal operations the GP is working with. Selecting the right set of operation has a big influence on the performance of the GP. If the operation set is bigger than necessary, we will expand the potential search space, and the execution time for finding a solution. For our polynomial example we will chose the following operations and terminals.

static final ISeq<Op<Double>> OPERATIONS = ISeq.of(
    MathOp.ADD,
    MathOp.SUB,
    MathOp.MUL
);

static final ISeq<Op<Double>> TERMINALS = ISeq.of(
    Var.of("x", 0),
    EphemeralConst.of(() ->
        (double)RandomRegistry.getRandom().nextInt(10))
);

The chosen non-terminal operation set is sufficient to create any polynomial. For the terminal operations, we added a variable "x", with index zero, and an ephemeral int constant. The purpose of the ephemeral constant is to create constant values, which will differ for every tree, but stay constant within a tree.

In the next step define the fitness function for the GP, which will be an error function we will minimize.

// The lookup table where the data points are stored.
static final double[][] SAMPLES = new double[][] {
    {-1.0, -8.0000},
    {-0.9, -6.2460},
    ...
};

static double error(final ProgramGene<Double> program) {
    return Arrays.stream(SAMPLES).mapToDouble(sample -> {
        final double x = sample[0];
        final double y = sample[1];
        final double result = program.eval(x);
        return abs(y - result) + program.size()*0.0001;
    })
    .sum();
}

The error function calculates the sum of the (absolute) difference between the sample value and the value calculated the by the evolved program (ProgramGene). Since we prefer compact programs over complex one, we will add a penalty for the program size (the number of nodes of the program tree).

Caution
The penalty for the tree size must be small enough to not dominate the error function. We still want to find an approximating function and not the smallest possible one.

After we have defined the error function, we need to define the proper Codec.

static final Codec<ProgramGene<Double>, ProgramGene<Double>> CODEC =
    Codec.of(
        Genotype.of(ProgramChromosome.of(
            // Program tree depth.
            5,
            // Chromosome validator.
            ch -> ch.getRoot().size() <= 50,
            OPERATIONS,
            TERMINALS
        )),
        Genotype::getGene
    );

There are two particularities in the definition of the ProgramChromosome:

  1. Since we want to narrow the search space, we are limiting the depth of newly created program trees to 5.

  2. Because of crossover operations performed during evolution, the resulting programs can grow quite big. To prevent an unlimited growth of the program trees , we mark programs with more than 50 nodes as invalid.

Now we are ready to put everything together:

public static void main(final String[] args) {
    final Engine<ProgramGene<Double>, Double> engine = Engine
        .builder(Polynomial::error, CODEC)
        .minimizing()
        .alterers(
            new SingleNodeCrossover<>(),
            new Mutator<>())
        .build();

    final ProgramGene<Double> program = engine.stream()
        .limit(500)
        .collect(EvolutionResult.toBestGenotype())
        .getGene();

    System.out.println(Tree.toString(program));
}

The GP is capable of finding the polynomial which created the sample data. After a few tries, we got the following (correct) output program:

add
├── mul
│   ├── x
│   └── sub
│       ├── 0.0
│       └── mul
│           ├── x
│           └── sub
│               ├── sub
│               │   ├── sub
│               │   │   ├── sub
│               │   │   │   ├── 3.0
│               │   │   │   └── x
│               │   │   └── x
│               │   └── x
│               └── x
└── x

This program can be reduced to 4*x^3 - 3*x^2 + x, which is exactly the polynomial, which created the sample data.