Author: | Ralf Schlatterbeck <rsc@runtux.com> |
---|
News Oct 2023:
- Add Differential Evolution for integer
- Add Negative Assortative Mating option
News 04-2023:
The last build on PyPi was broken for serial installs, it was missing the
mpi_stub.c
needed for the serial version. Parallel installs were still possible so I didn't notice, sorry!Add MPI_Abort to the wrapper, it is called with:
pga.MPI_Abort (errcode)
See below in secion Running with MPI how this can be used to abort the MPI run in case of exception in the
evaluate
method.
News 12-2022: Add regression test and update to new upstream with several bug-fixes. Includes also some bug fixes in wrapper.
News 10-2022: Add user defined datatypes. Example in examples/gp
use
user defined data types to implement genetic programming (we represent
expressions by a tree data structure). This uses the new serialization
API in pgapack to transfer a Python pickle representation to peer MPI
processes. Also incorporate the latest changes in pgapack which
optimizes duplicate checking. This is of interest for large population
sizes in the genetic programming examples. Note that the
gene_difference
method has been renamed to gene_distance
.
News 08-2022: Epsilon-constrained optimization and a crossover variant that preserves permutations (so with integer genes the gene can represent a permutation).
News 03-2022: Attempt to make this installable on Windows. This involves some workaround in the code because the visual compiler does not support certain C99 constructs.
News 01-2022: This version wraps multiple evaluation with NSGA-III (note the additional 'I').
News 12-2021: This version wraps multiple evaluation values from your objective function: Now you can return more than one value to either use it for constraints (that must be fulfilled before the objective is optimized) or for multi-objective optimization with the Nondominated Sorting Genetic Algorithm V.2 (NSGA-II). You can combine both, multi-objective optimization and constraints.
News: This version wraps the Differential Evolution method (that's quite an old method but is newly implemented in pgapack).
PGAPy is a wrapper for PGAPack, the parallel genetic algorithm library (see PGAPack Readme), a powerfull genetic algorithm library by D. Levine, Mathematics and Computer Science Division Argonne National Laboratory. The library is written in C. PGAPy wraps this library for use with Python. The original PGAPack library is already quite old but is one of the most complete and accurate (and fast, although this is not my major concern when wrapping it to python) genetic algorithm implementations out there with a lot of bells and whistles for experimentation. It also has shown a remarkably small number of bugs over the years. It supports parallel execution via the message passing interface MPI in addition to a normal "serial" version. That's why I wanted to use it in Python, too.
To get started you need the PGAPack library, although it now comes bundled with PGApy, to install a parallel version you currently need a pre-installed PGAPack compiled for the MPI library of choice. See Installation section for details.
There currently is not much documentation for PGAPy.
You really, absolutely need to read the documentation that comes
with PGAPack.
The PGAPack user guide is now shipped together with PGAPy. It is
installed together with some examples in share/pgapy, wherever the
Python installer decides to place this in the final installation
(usually /usr/local/share
on Linux).
The original PGAPack library can still be downloaded from the PGAPack ftp site, it is written in ANSI C but will probably not compile against a recent version of MPI. It will also not work with recent versions of PGAPy. Note that this version is not actively maintained. I've started a PGAPack fork on github where I've ported the library to the latest version of the MPI standard and have fixed some minor inconsistencies in the documentation. I've also implemented some new features, notably enhancements in selection schemes, a new replacement strategy called restricted tournament replacement [1], [2], [4] and, more recently, the differential evolution strategy [5], [6]. In addition this version now supports multi objective optimization with NSGA-II [7] and many-objective optimization with NSGA-III [8], [9]. It also supports the Epsilon Constraint method [10].
Note: When using NSGA_III replacement for multi (or many-) objective optimization you need to either
set reference points on the hyperplane intersecting all axes at offset 1. These reference points can be obtained with the convenience function
pga.das_dennis
, it creates a regular set of reference points using an algorithm originally publised by I. Das and J. E. Dennis [12]. These points are then passed as the parameterreference_points
to thePGA
constructor.See
examples/dtlz2.py
for a usage example and the user guide for the bibliographic reference. The function gets the dimensionality of the objective space (num_eval
minusnum_constraint
) and the number of partition to use.Or set reference directions (in the objective space) with the
reference_directions
parameter, number of partitions for these directions with therefdir_partitions
parameter (seedas_dennis
above, this uses Das/Dennis points internally), and a scale factor with the parameterrefdir_scale
.
You can set both, these parameters are not mutually exclusive.
I'm mainly testing pgapy on Linux. But I've recently made it run on Windows, too but I'm not very actively testing on Windows. Let me know if you run it on Windows, sucessfully or not sucessfully.
As mentioned above, you can find my PGAPack fork on github, this repository has the three upstream releases as versions in git and contains some updates concerning support of newer MPI versions and documentation updates. I've also included patches in the git repository of the Debian maintainer of the package, Dirk Eddelbuettel. I'm actively maintaining that branch, adding new features and bug-fixes.
To get you started, I've included some very simple examples in
examples
, e.g., one-max.py
implements the "Maxbit" example
similar to one in the PGAPack documentation. The examples were inspired
by the book "Genetic Algorithms in Python" but are written from scratch
and don't include any code from the book. The examples illustrates
several points:
- Your class implementing the genetic algorithm needs to inherit from pga.PGA (pga is the PGAPy wrapper module).
- You need to define an evaluation function called
evaluate
that returns a sequence of numbers indicating the fitness of the gene given. It gets the parametersp
andpop
that can be used to fetch allele values from the gene using theget_allele
method, for more details refer to the PGAPack documentation. The number of evaluations returned by your function is defined with the constructor parameternum_eval
, the default for this parameter is 1. If your evaluation function does not return multiple evaluations (with the default setting ofnum_eval
) you can either return a one-element sequence or a single return value. - When using multiple evaluations, these can either be used for
constraints (the default) or for multi-objective optimization. In the
latter case, the number of constraints (which by default is one less
than the number of evaluations set with the parameter
num_eval
) must be set to a number that leaves at least two evaluations for objectives. The number of constraints can be set with the parameternum_constraint
. When using multi-objective optimization, you need one of the two replacement-typesPGA_POPREPL_NSGA_II
orPGA_POPREPL_NSGA_III
, set this with thepop_replace_type
parameter. - You can define additional functions overriding built-in functions
of the PGAPack library, illustrated by the example of
print_string
. Note that we could call the original print_string method of our PGA superclass. In the same way you can implement, e.g., your own crossover method. - The constructor of the class needs to define the Gene type, in the examples we use int and bool built-in datatypes.
- The length of the gene needs to be given in the constructor.
- We often want to maximize the numbers returned by our evaluation
function, set the parameter
maximize
to False if you want to minimize. - For non-binary genes we can define an array of init values, each entry containing a sequence with lower and upper bound. The array has to have the length of the gene. Note that the upper bound is included in the range of possible values (unlike the python range operator but compatible with the PGAPack definition).
- In the constructor of the class we can add parameters of the genetic algorithm. Not all parameters of PGAPack are wrapped yet, currently you would need to consult the sourcecode of PGAPy to find out which parameters are wrapped. In the example we define several print options.
- Finally the genetic algorithm is started with the
run
method.
When you extend PGAPy – remember not all functions of PGAPack are wrapped yet and you may need additional functions – you should stick to my naming conventions when making changes. The following naming conventions were used for the wrapper:
- Constants of PGAPack like
PGA_REPORT_STRING
are used as-is in uppercase. These constants can be directly imported from the wrapper module. Not all constants are wrapped so far, if you need more, add them to the constdef array in pgamodule.c and send me a patch. - For methods of the pga.PGA class I've removed the
PGA
prefix used throughout PGAPack and converted the method to lowercase with underscores between uppercase words in the original function name, soPGARun
becomesrun
,PGACheckStoppingConditions
becomescheck_stopping_conditions
. An exception of the lowercase-rule is whenever a name contains "GA" (for "genetic algorithm"), SoPGASetMaxGAIterValue
becomesmax_GA_iter
. - Where possible I've made a single class method where PGAPack needs a
separate function for each datatype, so
PGAGetBinaryAllele
,PGAGetCharacterAllele
,PGAGetIntegerAllele
,PGAGetRealAllele
all becomeget_allele
. Same holds true forset_allele
. - Whenever a name in PGAPack has a "Value" or "Flag" suffix, I've left
this out, so
PGAGetFitnessCmaxValue
becomesfitness_cmax
andPGAGetMutationAndCrossoverFlag
becomesmutation_and_crossover
, the only exception to this rule is for the two functionsPGAGetMutationRealValue
andPGAGetMutationIntegerValue
which becomemutation_value
not justmutation
. - Some fields can take multiple values (they are implemented by ORing
integer constants, in python they are specified as a list or tuple of
constants). These are converted to plural (if not already plural in
PGAPack), e.g.,
PGASetStoppingRuleType
becomesstopping_rule_types
. - Internal method names in the wrapper program have a leading PGA_ – so
the class method
set_allele
is implemented by the C-functionPGA_set_allele
inpgamodule.c
.
PGAPack has a lot of PGASet
and PGAGet
functions for setting
parameters. These are reflected in constructor parameters on the one hand
and in (typically read-only, but see below) properties of a PGA
object on the other hand. The
following table gives an overview of all the original PGAPack names and
the names of the python wrapper. For the PGAPack name I've only listed
the PGASet
function, in many cases there is a corresponding
PGAGet
function. If a corresponding read-only property exists for a
constructor parameter this is indicated in the "Prop" column. In some
cases properties are missing because no corresponding PGAGet
function
is implemented in PGAPack, in other cases returning a numeric value that
has a symbolic constant in PGApy doesn't make much sense.
The properties have the same name as the constructor parameter.
There are Properties that don't have a corresponding constructor
parameter, namely the eval_count
property (returning the count of
function evaluations), the
GA_iter
property that returns the current GA generation, and the
mpi_rank
property that returns the MPI rank of the current process
(this is sorted under PGAGetRank).
In the type
column I'm listing the Python type. If the type is followed by a number,
more than one item of that type is specified (a sequence in Python). Some
entries contain "sym", these are integer values with a symbolic constant,
the value "msym" indicates that several values denoted by a list of
symbolic constants can be given. A special case are the
PGASetRealInitRange
, PGASetRealInitPercent
,
PGASetIntegerInitRange
functions. These take two values for each
allele of the gene. In python this is a sequence of 2-tuples.
Note that this means that you can have different ranges of allowed values
for each allele.
The num_eval
property is special: Due to limitations of the C
programming language, for multiple evaluations in C the first evaluation
is returned as the function return-value of the evaluate
function
and all other parameters are returned in an auxiliary array. PGAPack
specifies the number of auxiliary evaluations to be returned. In Python
the evaluation function can always return a sequence of evaluation
values and the num_eval
is one more than PGAGetNumAuxEval
would
return. The default for num_eval
is 1.
The first two (mandatory) constructor parameters are the type of the gene
(this takes a Python type, e.g., bool
for a binary genome or int
for an integer genome) and the length. Note that the string_length
is
implicitly set with the length
parameter. The string_length
is
also available as the length of the PGA
object using the Python
built-in len
function.
Some properties can now also be set during the run of the optimizer.
These currently are crossover_prob
, epsilon_exponent
,
multi_obj_precision
, p_tournament_prob
, and
uniform_crossover_prob
. Just assign to the member variable of
the optimizer (child of PGA.pga) object.
PGAPack name | Constructor parameter | Type | Prop |
---|---|---|---|
PGASetCrossoverBoundedFlag |
crossover_bounded |
int | yes |
PGASetCrossoverBounceBackFlag |
crossover_bounce_back |
int | yes |
PGASetCrossoverSBXEta |
crossover_SBX_eta |
float | yes |
PGASetCrossoverSBXOncePerString |
crossover_SBX_once_per_string |
int | yes |
PGASetCrossoverProb |
crossover_prob |
float | yes |
PGASetCrossoverType |
crossover_type |
sym | no |
PGASetDEAuxFactor |
DE_aux_factor |
double | yes |
PGASetDECrossoverProb |
DE_crossover_prob |
double | yes |
PGASetDECrossoverType |
DE_crossover_type |
sym | no |
PGASetDEDither |
DE_dither |
double | yes |
PGASetDEDitherPerIndividual |
DE_dither_per_individual |
bool | yes |
PGASetDEJitter |
DE_jitter |
double | yes |
PGASetDENumDiffs |
DE_num_diffs |
int | yes |
PGASetDEProbabilityEO |
DE_probability_EO |
double | yes |
PGASetDEScaleFactor |
DE_scale_factor |
double | yes |
PGASetDEVariant |
DE_variant |
sym | yes |
PGASetEpsilonExponent |
epsilon_exponent |
float | yes |
PGASetEpsilonGeneration |
epsilon_generation |
int | yes |
PGASetEpsilonTheta |
epsilon_theta |
int | yes |
PGAGetEvalCount |
eval_count |
int | yes |
PGASetFitnessCmaxValue |
fitness_cmax |
float | yes |
PGASetFitnessMinType |
fitness_min_type |
sym | yes |
PGASetFitnessType |
fitness_type |
sym | yes |
PGAIntegerSetFixedEdges |
fixed_edges |
no | |
PGAIntegerSetFixedEdges |
fixed_edges_symmetric |
bool | no |
PGAGetGAIterValue |
GA_iter |
int | yes |
PGASetIntegerInitPermute |
integer_init_permute |
int2 | no |
PGASetIntegerInitRange |
init |
no | |
PGASetMaxFitnessRank |
max_fitness_rank |
float | yes |
PGASetMaxGAIterValue |
max_GA_iter |
int | yes |
PGASetMaxNoChangeValue |
max_no_change |
int | no |
PGASetMaxSimilarityValue |
max_similarity |
int | yes |
PGASetMixingType |
mixing_type |
sym | no |
PGASetMultiObjPrecision |
multi_obj_precision |
int | yes |
PGASetMutationAndCrossoverFlag |
mutation_and_crossover |
int | yes |
PGASetMutationBounceBackFlag |
mutation_bounce_back |
int | yes |
PGASetMutationBoundedFlag |
mutation_bounded |
int | yes |
PGASetMutationIntegerValue |
mutation_value |
int | yes |
PGASetMutationOrCrossoverFlag |
mutation_or_crossover |
int | yes |
PGASetMutationPolyEta |
mutation_poly_eta |
float | yes |
PGASetMutationPolyValue |
mutation_poly_value |
float | yes |
PGASetMutationProb |
mutation_prob |
float | yes |
PGASetMutationRealValue |
mutation_value |
float | yes |
PGASetMutationType |
mutation_type |
sym | no |
PGASetNAMWindowSize |
nam_window_size |
int | yes |
PGASetNoDuplicatesFlag |
no_duplicates |
int | no |
PGASetNumAuxEval |
num_eval |
int | yes |
PGASetNumConstraint |
num_constraint |
int | yes |
PGASetNumReplaceValue |
num_replace |
int | yes |
PGASetPopSize |
pop_size |
int | yes |
PGASetPopReplaceType |
pop_replace_type |
sym | no |
PGASetPrintFrequencyValue |
print_frequency |
int | yes |
PGASetPrintOptions |
print_options |
msym | no |
PGASetPTournamentProb |
p_tournament_prob |
float | yes |
PGASetRandomizeSelect |
randomize_select |
int | yes |
PGASetRandomSeed |
random_seed |
int | yes |
PGAGetRank |
mpi_rank |
int | yes |
PGASetRealInitRange |
init |
no | |
PGASetRealInitPercent |
init_percent |
no | |
PGASetReferenceDirections |
refdir_partitions |
int | no |
PGASetReferenceDirections |
refdir_scale |
double | no |
PGASetReferenceDirections |
reference_directions |
no | |
PGASetReferencePoints |
reference_points |
no | |
PGASetRestartFlag |
restart |
int | yes |
PGASetRestartFrequencyValue |
restart_frequency |
int | yes |
PGASetRTRWindowSize |
rtr_window_size |
int | yes |
PGASetSelectType |
select_type |
sym | no |
PGASetStoppingRuleType |
stopping_rule_types |
msym | no |
PGASetStringLength |
string_length |
int | yes |
PGASetSumConstraintsFlag |
sum_constraints |
int | yes |
PGASetTournamentSize |
tournament_size |
int | yes |
PGASetTournamentWithReplacement |
tournament_with_replacement |
int | yes |
PGASetTruncationProportion |
truncation_proportion |
float | yes |
PGASetUniformCrossoverProb |
uniform_crossover_prob |
float | yes |
Note: The mutation_or_crossover and mutation_and_crossover parameters are deprecated, use mixing_type instead!
The following are the methods that can be used during the run of the
genetic search. The run
method is used to start the search. This can
be used, to, e.g., set an allele during hill-climbing in a custom
endofgen
method. Note that some methods only apply to certain gene
types, e.g. the encode_int_
methods can only be used on binary
alleles (they encode an integer value as a binary or gray code
representation into the gene). Other methods take or return different
types depending on the type of gene, e.g. get_allele
or
set_allele
, they call different backend functions depending on the
gene type. With the set_random_seed
method, the random number
generator can be re-seeded. It is usually best to seed the generator
once at (before) the beginning by specifying random_seed
in the
constructor. For further details consult the user guide.
The method get_evaluation
will return a double for a single
evaluation and a tuple of double for multiple evaluations (when num_eval
is >1)
Method | Parameters | Return |
---|---|---|
check_stopping_conditions |
True if stop should occur | |
encode_int_as_binary |
p, pop, frm, to, val | None |
encode_int_as_gray_code |
p, pop, frm, to, val | None |
encode_real_as_binary |
p, pop, frm, to l, u, val | None |
encode_real_as_gray_code |
p, pop, frm, to l, u, val | None |
euclidian_distance |
p1, pop1 p2, pop2 | float |
fitness |
pop | None |
get_allele |
p, pop, index | allele value |
get_best_index |
pop | index of best string |
get_best_report_index |
pop, idx | index of best eval with idx |
get_evaluation |
p, pop | evaluation of p |
get_evaluation_up_to_date |
p, pop | True if up-to-date |
get_fitness |
p, pop | fitness of p (float) |
get_gene |
p, pop | get gene (user data types) |
get_int_from_binary |
p, pop, frm, to | int |
get_int_from_gray_code |
p, pop, frm, to | int |
get_iteration |
deprecated, use GA_iter |
|
get_real_from_binary |
p, pop, frm, to, l, u | float |
get_real_from_gray_code |
p, pop, frm, to, l, u | float |
random01 |
float between 0 and 1 | |
random_flip |
probability | 0 or 1 |
random_gaussian |
mean, stddev | float |
random_interval |
l, r | int between l, r |
random_uniform |
l, r | float between l, r |
run |
None | |
select_next_index |
pop | index selected individual |
set_allele |
p, pop, i, value | None |
set_evaluation |
p, pop, value | None |
set_evaluation_up_to_date |
p, pop, status | None |
set_gene |
p, pop, gen | set gene (user data types) |
set_random_seed |
seed | None (use constructor!) |
PGAPack has the concept of user functions. These allow customization of
different areas of a genetic algorihm. In Python they are implemented as
methods that can be changed in a derived class. One of the methods that
must be implemented in a derived class is the evaluate
function
(although technically it is not a user function in PGAPack). It
interprets the gene and returns an evaluation value or a sequence of
evaluation values if you set the num_eval
constructor parameter.
PGAPack computes a fitness from the raw evaluation value. For some
methods an up-call into the PGA class is possible, for some methods this
is not possible (and in most cases not reasonable). Note that for the
stop_cond
method, the standard check for stopping conditions can be
called with:
self.check_stopping_conditions()
The following table lists the overridable methods with their parameters (for the function signature the first parameter self is omitted). Note that in PGAPack there are additional user functions that are needed for user-defined data types which are currently not exposed in Python. In the function signatures p denotes the index of the individual and pop denotes the population. If more than one individual is specified (e.g., for crossover) these can be followed by a number. For crossover c1 and c2 denote the destination individuals (children). The propability for the mutation method is a floating-point value between 0 and 1. Remember to count the number of mutations that happen, and return that value for the mutation method!
Method | Call Signature | Return Value | Up-Call |
---|---|---|---|
check_duplicate |
p1, pop1, p2, pop2 | True if dupe | no |
stop_cond |
True to stop | no | |
crossover |
p1, p2, p_pop, c1, c2, c_pop | None | no |
endofgen |
None | no | |
evaluate |
p, pop | sequence of float | no |
gene_distance |
p1, pop1, p2, pop2 | float | no |
hash |
p, pop | int | no |
initstring |
p, pop | None | no |
mutation |
p, pop, propability | #mutations | no |
pre_eval |
pop | None | no |
print_string |
file, p, pop | None | yes |
The following PGAPack constants are available:
Constant | Description |
---|---|
PGA_CROSSOVER_EDGE | Edge crossover for permutations |
PGA_CROSSOVER_ONEPT | One-point Crossover |
PGA_CROSSOVER_SBX | Simulated Binary Crossover |
PGA_CROSSOVER_TWOPT | Two-point Crossover |
PGA_CROSSOVER_UNIFORM | Uniform Crossover |
PGA_FITNESSMIN_CMAX | Map fitness by subtracting worst |
PGA_FITNESSMIN_RECIPROCAL | Map fitness via reciprocal |
PGA_FITNESS_NORMAL | Linear normalization of fitness |
PGA_FITNESS_RANKING | Linear fitness ranking |
PGA_FITNESS_RAW | Identity fitness function |
PGA_MUTATION_CONSTANT | Mutation by adding/subtracting constant |
PGA_MUTATION_GAUSSIAN | Mutation by selecting from Gaussian distribution |
PGA_MUTATION_PERMUTE | Mutation swaps two random genes |
PGA_MUTATION_POLY | Polynomial Mutation |
PGA_MUTATION_RANGE | Replace gene with uniform selection from init range |
PGA_MUTATION_UNIFORM | Mutation uniform from interval |
PGA_NEWPOP | Symbolic constant for new population |
PGA_OLDPOP | Symbolic constant for old population |
PGA_POPREPL_BEST | Population replacement best strings |
PGA_POPREPL_NSGA_II | Use NSGA-II replacement for multi-objective opt. |
PGA_POPREPL_NSGA_III | Use NSGA-III replacement for multi-objective opt. |
PGA_POPREPL_PAIRWISE_BEST | Compare same index in old and new population |
PGA_POPREPL_RANDOM_NOREP | Population replacement random no replacement |
PGA_POPREPL_RANDOM_REP | Population replacement random with replacement |
PGA_POPREPL_RTR | Restricted Tournament Replacement |
PGA_REPORT_AVERAGE | Report average evaluation |
PGA_REPORT_HAMMING | Report hamming distance |
PGA_REPORT_OFFLINE | Report offline |
PGA_REPORT_ONLINE | Report online |
PGA_REPORT_STRING | Report the string |
PGA_REPORT_WORST | Report the worst evaluation |
PGA_SELECT_LINEAR | Return individuals in population order |
PGA_SELECT_PROPORTIONAL | Fitness-proportional selection |
PGA_SELECT_PTOURNAMENT | Binary probabilistic tournament selection |
PGA_SELECT_SUS | Stochastic universal selection |
PGA_SELECT_TOURNAMENT | Tournament selection |
PGA_SELECT_TRUNCATION | Truncation selection |
PGA_STOP_MAXITER | Stop on max iterations |
PGA_STOP_NOCHANGE | Stop on max number of generations no change |
PGA_STOP_TOOSIMILAR | Stop when individuals too similar |
The latest version of PGAPy features user defined data types. Just
define your data type and pass it as the second parameter to the
PGA
constructor. The framework will take care of serializing the
data when transmitting via MPI
(if you're running a parallel
version).
If duplicate checking is enabled via the no_duplicates
constructor
parameter, your data type needs to define a __hash__
method (unless
the python default hash method fulfills your requirements).
User defined data types do not use alleles, so the normal get_allele
(and set_allele
) methods are not available. Instead the full
individual can be retrieved with the get_gene
method and set with
the set_gene
method.
With user data types you need to define the following methods:
check_duplicate (self, p1, pop1, p2, pop2)
if you enable duplicate checking with the crossover parameterno_duplicates
. This should return True when the two individuals are duplicates. Useget_gene
to retrieve the genes for the individualsp1
andp2
in populationspop1
andpop2
.crossover (self, p1, p2, ppop, c1, c2, cpop)
for crossover operation, useget_gene
for getting the parent genes for the parentsp1
andp2
in generationppop
and useset_gene
for setting the child genesc1
andc2
in generationcpop
.initstring (self, p, pop)
for initializing the given string, useset_gene
in that method for setting your object as a gene.mutation (self, p, pop, pm)
for the mutation operation. This should return the number of mutations performed. If duplicate checking is enabled, the framework will repeatedly call the mutation operator for mutating a duplicate individual into another individual that is no duplicate. This uses the return value of your mutation method. You will enter an endless loop if your mutation operator does not occasionally return an non-zero number of mutatations performed when duplicate checking is enabled. Thepm
parameter gives the mutation probability. Useget_gene
for retrieving the individual to be mutated and useset_gene
to update this individual after mutation.print_string (self, file, p, pop)
to print a gene object, useget_gene
for retrieving the individual to be printed.
For these methods it is generally a good idea to never modify an
individual in-place: This individual may be repeatedly used in genetic
operations (e.g. mutation and crossover), so when modifying it you will
produce erroneous results for later genetic operations. To copy a data
structure, python's deepcopy
function in the module copy
is
usually used.
In addition to the methods above you may want to define a stopping rule
with a stop_cond
method or override the way a hash is computed using
a hash
method. The default for computing a hash is to call
hash (gene)
where gene is an object of the user defined data type.
Other methods that may be used is an endofgen
method, a
gene_distance
method (e.g., when using Restricted Tournament
Replacement, with PGA_POPREPL_RTR
), or a pre_eval
method.
An example with user defined data types is in examples/gp
: This
implements Genetic Programming with a tree data structure. Note that the
Node
class in gp.py
has a __hash__
method that builds a hash
over the serialization of the tree (which is the same for individuals
with the same tree structure).
As already mentioned, not all functions and constants of PGAPack are wrapped yet – still for many applications the given set should be enough. If you need additional functions, you may want to wrap these and send me a patch.
Please use the Sourceforge Bug Tracker or the Github Bug Tracker and
- give a short description of what you think is the correct behaviour
- give a description of the observed behaviour
- tell me exactly what you did.
- if you can publish your source code this makes it a lot easier to debug for me
Project information and download from Sourceforge main page
or checkout from Github
or directly install via pypi.
PGApy, as the name suggests, supports parallelizing the evaluation function of the genetic algorithm. This uses the Message Passing Interface (MPI) standard.
To install a serial version (without parallel programming using MPI)
you can simply install from pypi using pip
. Alternatively when you
have unpacked or checked out from sources you can install with:
python3 setup.py install --prefix=/usr/local
If you want a parallel version using an MPI (Message-Passing Interface)
library you will have to install a parallel version of PGAPack first.
The easiest way to do this is to use my pgapack debian package builder
from github. Clone this repository, check out the branch master
,
install the build dependencies, they're listed in the file
debian/control
and build the debian packages using:
dpkg-buildpackage -rfakeroot
This builds pgapack debian packages for all supported MPI libraries in
debian, currently these are mpich
, openmpi
, and lam
. In addition
to the MPI libraries a serial version of the pgapack library is also
built. Proceed by installing the package pgapack and the MPI backend
library of choice. If you don't have a preference for an MPI library,
libpgapack-openmpi
is the package that uses the Debians default
preferences of an MPI library.
Once a parallel version of PGAPack is installed, you can install PGApy
as follows: You set environment variables for the PGA_PARALLEL_VARIANT
(one of mpich
, openmpi
, or lam
) and set the PGA_MODULE
to
module_from_parallel_install
. Finally you envoke the setup, e.g.:
export PGA_PARALLEL_VARIANT=openmpi export PGA_MODULE=module_from_parallel_install python3 setup.py install --prefix=/usr/local
Note that the same works with pip install
, i.e., after installation
of a parallel version of PGAPack you can directly install with pip
:
export PGA_PARALLEL_VARIANT=openmpi export PGA_MODULE=module_from_parallel_install pip install pgapy
or alternatively depending on how pip is installed on your system:
python3 -m pip install pgapy
If your MPI library is installed in a different place you should study
the Extension configurations in setup.py
to come up with an
Extension definition that fits your installation. If your installation
is interesting to more people, feel free to submit a patch that adds
your Extension-configuration to the standard setup.py
.
Newer python versions have deprecated installing into the system python
version, even in /usr/local
. You can still build the pgapy package
locally and install using the installer. On Debian Linux you need the
following packages installed:
apt-get install python3-pip python3-dev python3-toml \ python3-build python3-installer python3-venv python3-sphinx netpbm
Then you can build locally and install:
python3 -m build python3 -m installer dist/*.whl
Better yet, clone my releasetool package from github:
git clone https://github.com/schlatterbeck/releasetool.git
into a directory parallel to pgapy and build using:
make dist
This also will generate a Version.py file with the correct version number from git. If you want to install a parallel version you should set the environment variables:
export PGA_PARALLEL_VARIANT=openmpi export PGA_MODULE=module_from_parallel_install
before building.
To run a parallel version with MPI, a parallel version must be installed, see above in section Installation.
For a serial version, PGAPy makes sure that the otimization is aborted
if an exception occurs in the evaluate
function. This is currently not
the case for MPI, because the framework currently does not support
returning information to the rank-0 MPI leader process. A workaround is
as follows: Rename your evaluate
method to _evaluate
and catch
exceptions in a new evaluate
method that wraps _evaluate
.
Call MPI_Abort
if an exception occurs:
import traceback import sys ... def evaluate (self, p, pop): try: return self._evaluate (p, pop) except Exception: # Optionally log exception here print (traceback.format_exc ()) pga.MPI_Abort (1) sys.exit (1)
For testing – preferrably before installation you can build locally:
python3 setup.py build_ext --inplace
After this you have a pga.*.so
file in the local directory. Now you
can run the tests with:
python3 -m pytest test
This runs all the tests and can take a while. Note that the tests run
most of the examples in the examples
directory with different
command line parameters where available. To perform several optimization
runs in a single (Python-) process, we must call MPI_Init
explicitly (and not relying on PGAPack to call it implicitly). This is
because MPI_Init
may be called only once per process. Calling of
MPI_Init
and MPI_Finalize
is handled in a fixture in
test/conftest.py
For the python examples, the coverage can be computed with:
python3 -m pytest --cov examples test
or more verbose including untested lines with:
python3 -m pytest --cov-report term-missing --cov examples test
Performing a coverage analysis for the C code in pgamodule.c
is
currently possible only on Linux – at least, since I'm developing on
Linux this is the architecture where I've found out how to perform
coverage analysis including the C code.
To compile for coverage analysis:
export CFLAGS=-coverage python3 setup.py build_ext --inplace
This will create a file ending in .gcno
under the build
directory,
typically something like build/temp.linux-x86_64-3.9
when using
python3.9
on the x86_64
architecture. Running the tests will
create statistics data files with ending .gcda
. These are data files
for the GNU profiler gcov
. From these, .html
files can be
generated that can be inspected with a browser:
lcov --capture --directory . --output-file coverage.info genhtml coverage.info --output-directory coverage_out
Note that the lcov
program is part of the linux distribution.
The tests can be directly run under MPI. Note that currently the
--with-mpi
option of pytest
is not supported. This option
asumes that the package mpi4py
is used. But pgapy
uses only
calls from pgapack, which in turn calls MPI.
Running under MPI is done using:
mpirun $MPI_OPTIONS python3 -m pytest test
The MPI_OPTIONS
can be, e.g.:
MPI_OPTIONS=--machinefile ~/.mpi-openmpi --np 8
which would use a machine definition file for openmpi in your home directory and eight processes.
Running under MPI is especially useful for determining C code coverage.
Asuming a parallel version of openmpi
is installed, the code can be
compiled with:
PGA_PARALLEL_VARIANT=openmpi PGA_MODULE=module_from_parallel_install export CFLAGS=-coverage python3 setup.py build_ext --inplace
Note that the coverage analysis uses files in the build directory which need to be present before a parallel version can be started. Otherwise each parallel instance would try to create the coverage files resulting in race conditions. Once the coverage files are in place, the coverage framework ensures proper locking so that no two processes write concurrently to the same coverage files.
Creating the coverage files is best achieved by running the tests
without MPI first and then running the same version with a number of
processes under MPI. Running under MPI shows that the serialization and
deserialization code in pgamodule.c
is called.
As of this writing we get:
Lines: 1423 1475 96.5 % Functions: 131 133 98.5 %
[1] | Georges Harik. Finding multiple solutions in problems of bounded difficulty. IlliGAL Report 94002, Illinois Genetic Algorithm Lab, May 1994. |
[2] | Georges R. Harik. Finding multimodal solutions using restricted tournament selection. In Eshelman [3], pages 24–31. |
[3] | Larry J. Eshelman, editor. Proceedings of the 6th International Conference on Genetic Algorithms (ICGA). Morgan Kaufmann, July 1995. |
[4] | Martin Pelikan. Hierarchical Bayesian Optimization Algorithm: Toward a New Generation of Evolutionary Algorithms, volume 170 of Studies in Fuzziness and Soft Computing. Springer, 2005. |
[5] | Rainer Storn and Kenneth Price. Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4):341–359, December 1997. |
[6] | Kenneth V. Price, Rainer M. Storn, and Jouni A. Lampinen. Differential Evolution: A Practical Approach to Global Optimization. Springer, Berlin, Heidelberg, 2005. |
[7] | Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, April 2002. |
[8] | Kalyanmoy Deb and Himanshu Jain. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 18(4):577–601, August 2014. |
[9] | Himanshu Jain and Kalyanmoy Deb. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part II: Handling constraints and extending to an adaptive approach. IEEE Transactions on Evolutionary Computation, 18(4):602–622, August 2014. |
[10] | Tetsuyuki Takahama and Setsuko Sakai. Constrained optimization by the ε constrained differential evolution with an archive and gradient-based mutation. In [11]. |
[11] | IEEE Congress on Evolutionary Computation (CEC). Barcelona, Spain, July 2010. |
[12] | Indraneel Das and J. E. Dennis. Normal-boundary intersection: A new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8(3):631–657, August 1998. |
Version 2.4: New upstream
- Wrap Negative Assortative Mating
- Allow Differential Evolution for integer genes
- Upstream fixes feature interaction bugs with duplicate avoidance and Differential Evolution and RTR population replacement
- Add option to use Negative Assortative Mating in Royal Road example
Version 2.2.2: Add pyproject.toml
- Add pyproject.toml -- unfortunately it seems that binary modules cannot currently be described in the pyproject.toml, especially not the variant selection via the environment that is currently implemented in setup.py
Version 2.2.1: MPI_Abort
- Add MPI_Abort to the wrapper
- Include
mpi_stub.c
in the release (this is missing if some env variables are set, see above in Installation)
Version 2.2: Module directory
- Put the pga C-module inside a pga module
- Add several python-only modules to pga
- pga.__init__ exports everything to this is compatible
- pga.random includes a python Random class based on the pgapack random number generator
Version 2.1: Regression test
- PGAPack bug-fixes discovered during testing
- Bug-fixes of python wrapper
- Lots of tests with coverage of wrapper C-code > 90%
Version 2.0: User defined data types
- Implement user defined data types, note that your data type can be variable-size, e.g., a tree data structure. The framework takes care of serializing the data type and transmitting it to a remote MPI process if using a parallel version.
- When duplicate checking is enabled with the constructor parameter
no_duplicates
, the underlying pgapack code now uses a hash table. This means the effort is no longer quadratic in the population size but linear. - Example of Genetic Programming (GP) in the
examples/gp
directory - Rename the gene_difference method to gene_distance
Version 1.8: Epsilon-constrained optimization
- Epsilon-constrained optimization
- Precision for printing evals in multi-objective optimization, use this feature for making regression-test work on AMD where a floating-point difference in the 16th or so decimal place made a test fail
- Crossover for permutations
- Version-numbers: try to match pgapack, we might still diverge in the last digit, though
Version 1.2: Many-objective optimization with NSGA-III
- Implement NSGA-III
Version 1.1.6: Polynomial mutation and simulated binary crossover (SBX)
- Simulated binary crossover (SBX)
- Polynomial mutation
Version 1.1.1-1.1.5: Small PGAPack updates, fixes for non-debian
- Fix setup.py for non-debian systems
- Update to latest PGAPack with small changes
Version 1.1: Add multi-objective optimization with NSGA-II
- Wrap latest pgapack version 1.4
- This add multi-objective optimization using the Nondominated Sorting Genetic Algorithm version 2 (NSGA-II) by Deb et. al. This makes use of the previously-introduced option to return more than one value in the objective function. To use the feature you need to set the num_constraint parameter to a value that leave some of the function values returned by your evaluation function as objective function values (and not as constraints). See example in examples/multi.py.
Version 1.0: Add constraint handling
- Wrap latest pgapack version 1.3
- This adds auxiliary evaluations. Now your evaluation function can return multiple floating-point values as a sequence if you set the num_eval parameter >1 in the constructor. Currently additional evaluation values are used for constraint handling. Constraint values are minimized. Once they reach zero or a negative value they no longer count: The sum of all positive constraints is the overall constraint violation. For details see paper by Deb, 2000, see user guide for citation. If you're not using constraints, nothing in your code needs changes.
- This release may change the path an optimization takes. So for the same seed of the random number generator you will get a different result, at least if during the search there are individuals with the same evaluation (and different genetic material). This is due to a change of the sort function in pgapack (it switched to a stable sort from the C standard library).
Version 0.9: Allow installation of parallel version
- Pass argv (or sys.argv) to PGACreate
- Add a stanza to setup.py to allow a parallel installation with a given pgapack variant compiled for an MPI library. This currently needs a pre-installed pgapack debian package.
Version 0.8: Bugfix in real mutation
- Fix a core-dump in the latest pgapack
Version 0.7: Major changes in wrapping
- Now Differential Evolution is implemented, see the minfloat example and the user guide of pgapack.
Version 0.6: Major changes in wrapping
- Now the wrapping uses the standard Python recommendations on how to create a custom class.
- Update documentation
- Rename
fitness_cmax
(fromfitness_cmax_value
) - Better error checking of parameters
Version 0.5: Bug-fix release
- Now the
setup.py
works, previous version had an encoding problem - Wrap some minor new methods
- Bug-fix in PGAPack truncation selection
Version 0.4: Bundle PGAPack
- The PGAPack package is now included as a git submodule. By default we build against this library
- License fixes: The module long shipped a
COPYING
file that includes the 2-clause BSD license. But the headers ofsetup.py
andpgamodule.c
still included another license. This has been corrected.
Version 0.3: Feature enhancements, Bug fixes
Port to Python3, Python2 is still supported, license change.
- C-Code of wrapper updated to support both, Python2 and Python3
- Update documentation
- Fix some memory leaks that could result when errors occurred during some callback methods
- License change: We now have the 2-clause BSD license (similar to the MPICH license of PGAPack), this used to be LGPL.
Version 0.2: Feature enhancements, Bug fixes
64 bit support, more PGAPack functions and attributes wrapped, Readme-update: Sourceforge logo, Changes chapter.
- Bug-fixes for 64 bit architectures
- More functions and attributes of PGAPack wrapped
- Add a build-rule to
setup.py
to allow building for standard-install of PGAPack – this currently needs editing ofsetup.py
– should use autodetect here but this would require that I set up a machine with standard install of PGAPack for testing. - Add Sourceforge logo as required
- Add Changes chapter for automagic releases
- Add the
__module__
string to classPGA
in modulepga
. Now calling::help (pga)
in python works as expected, previously no help-text was given for the included module
Version 0.1: Initial freshmeat announcement
PGAPy is a wrapper for PGAPack, the parallel genetic algorithm library, a powerful genetic algorithm library. PGAPy wraps this library for use with Python. Pgapack is one of the most complete and accurate genetic algorithm implementations out there with a lot of features for experimentation.
- Initial Release