Lecture 2 -- Genetic Algorithms (GA)
===================

Notes by Luca Mariot

Covered topics:

* Basic principles and structure of GA
* Solutions encoding and genotype-phenotype mappings 
* Selection operators: roulette-wheel, tournament selection
* Variation operators: crossover (one-point, two-point and uniform) and mutation (bit-flip)
* Elitism, generational and steady-state GA

References:

* Sean Luke, *Essentials of Metaheuristics*
* David E. Goldberg, *Genetic Algorithms in Search, Optimization and Machine Learning*

Basic Principles and Structure of GA
-----------------------------------------
*Genetic Algorithms* (GA) are a metaheuristic optimization approach loosely based on the principles of evolution theory, in particular *natural selection*. They belong to a broader class of nature-inspired optimization methods collectively called *Evolutionary Computation* (EC) algorithms. John Holland originally introduced GAs to pursue two objectives: the abstraction of the adaptive processes in natural systems, and the design of artificial systems which would take advantage of the adaptive processes abstracted from natural systems. The second goal led Holland to apply GAs to combinatorial optimization problems.

The main idea behind GA is to represent a *population* of candidate solutions (also called *individuals*) as strings of bits (usually called *chromosomes*), and evolve them by means of stochastic variation operators, which are usually called *genetic operators*. The new solutions are successively evaluated against a *fitness function*, which generally represents the objective function to be optimized, and solutions with a higher fitness have a higher chance of being selected for the next iteration. The process is repeated until a certain number of generations is reached, or another termination criterion is met.

Goldberg points out four characteristics featured by GA which make them a more robust than single-state optimization methods:

* GA do not work directly on the parameters of the candidate solutions of a problem, but on a *coding* of the parameters (for example, a fixed-length bitstring representation).
* Instead of iteratively optimizing a single candidate solution, GAs evolve in parallel a *population* of solutions.
* GA use only the value of the fitness function to optimize the solutions. No additional information on the underlying search space (such as derivatives) is required (although, of course, it can help
* GA employ *probabilistic operators* to evolve their solutions, rather than *deterministic operators* such as those used by gradient descent or steepest-ascent hill-climbing.

The traditional genetic operators used by GA are the following:

* *Reproduction* (or *selection*) *operator*. This operator is used to select the individuals in the current population which will reproduce in the next generation. Usually, the value of the fitness function drives the selection process.
* *Crossover* (or *recombination*) *operator*. Crossover recombines the chromosomes representing two or more candidate solutions in order to produce a new chromosome (the offspring).
* *Mutation operator*. The aim of the mutation operator is to introduce random changes in the chromosome representing a candidate solution individual. Typically, a mutation operator flips the value of the bits in a chromosome with low probability.

The high-level functioning of a GA based on the three operators above and a fitness function to be optimized can be summarized as follows:

1. Initialize the population with a random set of candidate solutions represented as bitstrings, and compute their fitness values.
2. Using the reproduction operator, choose a subset of chromosomes from the current population which will reproduce in the next generation.
3. Generate the new offspring from the parents selected in the previous step by applying crossover and mutation operators.
4. Compute the fitness values of the candidate solutions corresponding to the chromosomes in the offspring.
5. Copy the offspring chromosomes in the new population. Optionally, it is possible to copy a few of the most fit solutions from the old population, by using an *elitist strategy*.
6. Return to point 2 until a termination condition is satisfied (e.g. a specific number of generations has been reached).
7. When the termination criterion is met, return the best candidate solution in the population.

In the rest of this lecture, we are going to explain more in detail each genetic operator and the other components characterizing GA. As a practical examples, we will apply GA to solve the combinatorial optimization problems of OneMax and BinInt introduced in the previous lecture.

Solutions encoding and genotype-phenotype mappings
---------------------------------------------------------

As we saw in the last lecture, an optimization problem $\mathcal{P}: \mathcal{I} \rightarrow \mathcal{S}$ can be specified by an *optimization setup* $OptSet: \langle \mathbb{G}, Op, dec, \Theta \rangle$ where $\mathbb{G}$ is the *genotype space*, that can be considered as the set searched by an optimization algorithm when solving a particular instance $I \in \mathcal{I}$ of $\mathcal{P}$. The elements of the genotype space are then mapped to candidate solutions in the solution space $S = \mathcal{P}(I)$ by means of the *decoding function* $dec: \mathbb{G} \rightarrow S$.

In the context of GA, the underlying genotype space is usually the set of binary strings $\mathbb{B}^n = \{0,1\}^n$, whose length $n \in \N$ depends on the size of the candidate solutions set $S$, which in turn is called *phenotype space*. If $S$ is a finite set composed of $N$ elements, then the length of the bitstrings must be at least $n = \lceil \log_2 N \rceil$, in order to have a surjective decoding function. In the case of continuous search spaces, we can still apply this argument as long as we allow to approximate real numbers in a certain interval with finite precision (which is of course the case when we want to use a computer for solving a continuous optimization problem).

The bitstrings of the genotype space are also called *chromosomes*, since, in a way, they encode the "genetic material" necessary to construct a candidate solution. Keeping this biological analogy, *genes* are usually the bits composing a chromosome (or subsets of bits in the chromosome), while an *allele* is the specific value that a gene assumes in a particular chromosome.

The choice of a suitable *genetic encoding* for the candidate solutions as bitstrings is a crucial step in the design a GA, and it highly depends on the optimization problem to be solved. This also leads to determine the decoding function $dec: \mathbb{G} \rightarrow S$ which maps each chromosome $c \in \mathbb{G}$ to a candidate solution $x \in S$. The *fitness function* $fit: \mathbb{G} \rightarrow \mathbb{R}$ assigns to each chromosome a real value used to measure how good the corresponding candidate solution is in solving the optimization problem. Usually, the fitness function simply amounts to the objective function $f: S \rightarrow \mathbb{R}$ applied on the decoded version of the chromosome, i.e.

$$ fit(c) = f(dec(c)) \enspace ,$$

for all chromosomes $c \in \mathbb{G}$. Of course, depending on the specific optimization goal put on the objective function, the fitness should be maximized or minimized accordingly. However, from a semantic abstract level the fitness is usually always maximized. Hence, when the objective function has to be minimized, the fitness function is defined as 

$$ fit(c) = -f(dec(c)) \enspace .$$

Let us see now some examples of solution encodings related to the optimization problem that we introduced in the last lecture, namely OneMax and BinInt.

### Direct binary encoding
When the candidate solutions of an optimization problem lend themselves naturally to a binary representation, one can simply choose to use it also for the chromosomes in the GA. In this case, we have that $\mathbb{G} = S = \{0,1\}^n$ for a certain $n \in \mathbb{N}$, and the decoding mapping $dec: \mathbb{G} \rightarrow S$ is the identity function. The fitness function, consequently, can also be directly identified with the objective function to be optimized.

This is the case both of the OneMax and BinInt problems: recall that we had

* Set of instances: $\mathcal{I} = \mathbb{N}$.
* Family of solution spaces: $\mathcal{S} = \{\mathbb{B}^n : n \in \mathbb{N} \}$, where $\mathbb{B}^n = \{0,1\}^n$.
* Optimization problem: $\mathcal{P}(n) = \mathbb{B}^n$, for all $n \in \mathbb{N}$.
* Objective functions: for the OneMax and BinInt problems, these were respectively

$$ f_{OneMax}(x) = \sum_{i=1}^n x_i \enspace ,$$
$$ f_{BinInt}(x) = \sum_{i=1}^n x_i\cdot 2^{i-1} \enspace ,$$

for all $x \in \mathbb{B}^n$. In both cases, we can thus set $\mathbb{G} = \mathbb{B}^n$, $dec = Id$, while the fitness functions will be respectively $fit_{OneMax}(c) = f_{OneMax}(c)$ and $fit_{BinInt}(c) = f_{BinInt}(c)$

### Gray coding
A gray code is a permutation of the set of bitstrings $\{0,1\}^n$ such that all adjacent bitstrings have Hamming distance $1$. An example of Gray code for $n=3$ is the following one:

$$ 000, 001, 011, 010, 110, 111, 101, 100 \enspace .$$

As it can be seen, each pair of adjacent bitstrings differs in exactly one place. Using a Gray coding for representing the chromosomes in a GA can be useful in those optimization problems where a small change in the fitness function requires a large modification in the chromosome.

Consider for example the situation where the candidate solutions are integer numbers from $0$ to $7$, while the chromosomes are the $3$-bit strings corresponding to the canonical binary representation of these numbers. Assume further that the objective function of $3$ and $4$ are respectively $f(3) =  5$ and $f(4) = 15$. In a maximization problem, the candidate solution $4$ is clearly better than $3$. However, using the canonical binary representation, the chromosomes of $3$ and $4$ are respectively $011$ and $100$, which have Hamming distance $3$. Hence, if the GA is tweaking the chromosome $011$, it should complement all positions in order to obtain the better solution $4$. On the other hand, using Gray coding, $3$ corresponds to $010$ while $4$ to $110$. This would require the GA to complement only the first position in order to obtain an increase in the fitness function.

The decoding function $dec$ to map a Gray-encoded chromosome $c$ to a bitstring in lexicographic order is easy to do. Going from left to right, we copy the first bit in the decoded solution. For all the remaining bits, we compute the XOR between the current bit in the chromosome and the previous bit in the decoded solution. Endoding is similar: starting from a lexicographically-ordered bitstring, again we copy the first bit in the Gray-encoded chromosome. Each subsequent bit in the chromosome is then determined by computing the XOR between the current and the previous bit in the lexicographically-ordered bitstrings.

The code below reports a Python implementation for the Gray encoder and decoder described above.

In [79]:
## Functions for Gray encoding/decoding. Bitstrings are assumed to be lists of 0 and 1

def gray_encoder(bitstring):
    gray_bitstring = [bitstring[0]]    #first bit is copied in the gray-encoded chromosome
    
    for i in range(1, len(bitstring)):
        gray_bitstring.append((bitstring[i-1]+bitstring[i])%2)    #append the XOR of the previous and current bit in the original string
    
    return gray_bitstring

def gray_decoder(chromosome):
    lex_bitstring = [chromosome[0]]    #first bit is copied in the gray-encoded chromosome
    
    for i in range(1, len(chromosome)):
        lex_bitstring.append((lex_bitstring[i-1]+chromosome[i])%2)    #append the XOR of the previous bit in the decoded string and the current bit in the gray-encoded string
    
    return lex_bitstring

## Example of main: for all bitstrings of length n, encode and decode them with the gray encoder/decoder functions
n = 4

print("Original\t", end =" ") 
print("Gray\t", end =" ")
print("Decoded")

for i in range(2**n):
    lex_i = [int(z) for z in list('{0:0b}'.format(i).zfill(n))]
    gray_i = gray_encoder(lex_i)
    dec_i = gray_decoder(gray_i)
    print(lex_i, end =" ") 
    print(gray_i, end =" ")
    print(dec_i)


Original	 Gray	 Decoded
[0, 0, 0, 0] [0, 0, 0, 0] [0, 0, 0, 0]
[0, 0, 0, 1] [0, 0, 0, 1] [0, 0, 0, 1]
[0, 0, 1, 0] [0, 0, 1, 1] [0, 0, 1, 0]
[0, 0, 1, 1] [0, 0, 1, 0] [0, 0, 1, 1]
[0, 1, 0, 0] [0, 1, 1, 0] [0, 1, 0, 0]
[0, 1, 0, 1] [0, 1, 1, 1] [0, 1, 0, 1]
[0, 1, 1, 0] [0, 1, 0, 1] [0, 1, 1, 0]
[0, 1, 1, 1] [0, 1, 0, 0] [0, 1, 1, 1]
[1, 0, 0, 0] [1, 1, 0, 0] [1, 0, 0, 0]
[1, 0, 0, 1] [1, 1, 0, 1] [1, 0, 0, 1]
[1, 0, 1, 0] [1, 1, 1, 1] [1, 0, 1, 0]
[1, 0, 1, 1] [1, 1, 1, 0] [1, 0, 1, 1]
[1, 1, 0, 0] [1, 0, 1, 0] [1, 1, 0, 0]
[1, 1, 0, 1] [1, 0, 1, 1] [1, 1, 0, 1]
[1, 1, 1, 0] [1, 0, 0, 1] [1, 1, 1, 0]
[1, 1, 1, 1] [1, 0, 0, 0] [1, 1, 1, 1]


Selection Operators
----------------------

The selection (or reproduction) operator is adopted in GA to select a subset of candidate individuals in the current population that will reproduce (i.e., over which the variation operators will be applied to create the population for the next iteration). This operator usually employs the fitness values of the individuals in the current population to stochastically drive the selection: intuitively, individuals with a high fitness will have a higher chance of being selected for reproduction. Hence, the selection operator can be thought as a (loose!) metaphor for the principle of natural selection in evolutionary biology. In general, if the selection operator samples individuals from the current population according to a certain probability distribution *with replacement*. Hence, during a single iteration of a GA an individual can be selected multiple times.

Two of the most used selection operators in GA are *roulette-wheel* and *tournament* selection, which we describe below.

### Roulette-wheel Selection
In roulette-wheel selection, the probability distribution through which the operator samples the individuals in the population is proportional to their fitness values. Intuitively, one can picture the procedure of this operator as spinning a roulette wheel, where each individual has a sector whose size depends on its fitness. The higher the fitness of an individual, the bigger its sector on the wheel will be, and thus the larger the probability that the ball will land on it when the wheel is spinned.

Formally, if the $n$ individuals in the population $p_1,\cdots,p_n \in \mathbb{G}$ respectively fitness values $fit(p_1), \cdots , fit(p_n)$, then the probability $Pr(p_i)$ of an individual $p_i$ of being selected through the roulette-wheel selection operator is

$$ Pr(p_i) = \frac{fit(p_i)}{\sum_{i=1}^n fit(p_i)} \enspace .$$

As an example, we report below the Python code for roulette-wheel selection.

In [80]:
## Python function for Roulette-wheel selection. We assume that the population is a list of lists,
## where each list corresponds to a bitstring chromosome. The fitness values of the individuals are
## held in a separate list order-wise, called fitnesses
import random

def roulette_wheel_selection(population, fitnesses):
    
    # Compute the sum of the fitness values in the population
    sumfit = 0
    for i in range(len(fitnesses)):
        sumfit += fitnesses[i]
        
    # Select a random point on the wheel
    landpoint = random.uniform(0, sumfit)
    
    # Find which slot of the wheel the landing point belongs to
    j=0
    partsum = 0.0
    
    while((partsum < landpoint) and (j != len(population)-1)):
        j += 1
        partsum += fitnesses[j]
    
    # Index j now holds the position of the individual to be selected,
    # return the corresponding chromosome in the population
    return population[j]

## Auxiliary test functions

# Generate a random population of n bitstrings of length l
def create_population(n, l):
    
    # n is the number of individuals to be created, l their length
    population = []
    
    for i in range(n):
        chromosome = []
        for j in range(l):
            chromosome.append(random.randint(0,1))
        population.append(chromosome)
    
    return population

# Evaluate the fitness function of the OneMax problem for a single individual
def evaluateOneMax(x):
    fit = 0
    for i in range(len(x)):
        fit += x[i]
    return fit

# Evaluate the fitness function of the OneMax problem for a population of individuals
def evaluateOneMaxPopulation(population):
    fitnesses = []
    for i in range(len(population)):
        fitnesses.append(evaluateOneMax(population[i]))
    return fitnesses


## Example of main to test roulette wheel selection with the OneMax fitness function.

n = 10
l = 6

population = create_population(n, l)
fitnesses = evaluateOneMaxPopulation(population)
print("The population is :", population)
print("Fitness values are:", fitnesses)

selected_individual = roulette_wheel_selection(population, fitnesses)
print("Selected individual: ", selected_individual)


The population is : [[0, 1, 0, 1, 0, 1], [1, 1, 1, 0, 1, 1], [1, 0, 1, 0, 0, 0], [0, 0, 1, 0, 1, 1], [0, 0, 1, 1, 1, 0], [0, 0, 0, 0, 1, 1], [1, 0, 0, 0, 1, 0], [1, 0, 0, 1, 0, 1], [0, 0, 1, 1, 1, 0], [0, 0, 1, 0, 1, 1]]
Fitness values are: [3, 5, 2, 3, 3, 2, 2, 3, 3, 3]
Selected individual:  [1, 0, 0, 0, 1, 0]


### Tournament Selection
The problem with roulette-wheel selection is that individuals with close enough fitness values will have almost identical probabilities of being selected. If these values are also high, then there is a good chance that the GA will never select the fittest individual in the population, thus converging to a suboptimal solution.

A first idea to counter this effect is to use *Stochastic Universal Sampling* (SUS): basically, this amounts to roulette-wheel selection, but with the guarantee that the fittest individual is selected at least once during a single generation. Another idea is the use of *fitness scaling*, to bias the size of the sectors on the wheel in favor of the fittest individuals.

A totally different approach is *tournament selection*, which exploits only the rank ordering information induced by the fitness values in the population. The idea is pretty simple: given a tournament size $t<n$, draw $t$ individuals at random from the population (with replacement). The winner of the tournament which gets selected is the one with the highest fitness value.

Do to its simplicity and resistance to noise introduced by the fitness function, tournament selection is nowadays one of the most used selection operators.

Below we report the Python code for tournament selection.

In [81]:
## Python function for Roulette-wheel selection. As before for roulette-wheel selection, we assume
## that the population is a list of bitstrings, while the fitness values of the individuals are
## held in a separate list order-wise
import random

def tournament_selection(population, fitnesses, tsize):
    
    # Randomly draw with replacement individuals from the population for tsize times,
    # keeping only the individual with highest fitness value
    best_fit = 0.0
    best_pos = 0
    for i in range(tsize):
        draw = random.randint(0,len(population)-1)
        if(fitnesses[draw] > best_fit):
            best_pos = draw
            best_fit = fitnesses[draw]
    
    return population[best_pos]

## Example of main to test tournament selection with the OneMax fitness function.

n = 10
l = 6
tsize = 5

population = create_population(n, l)
fitnesses = evaluateOneMaxPopulation(population)
print("The population is :", population)
print("Fitness values are:", fitnesses)

selected_individual = tournament_selection(population, fitnesses, tsize)
print("Selected individual: ", selected_individual)

The population is : [[0, 0, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0], [0, 1, 0, 0, 1, 0], [1, 0, 1, 0, 0, 0], [0, 1, 1, 0, 1, 0], [1, 0, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 1, 1], [0, 0, 1, 0, 1, 1], [0, 0, 0, 0, 1, 0]]
Fitness values are: [3, 5, 2, 2, 3, 3, 0, 3, 3, 1]
Selected individual:  [0, 0, 1, 0, 1, 1]


Variation Operators
----------------------
The variation operators of GA, also called *genetic operators*, create the new chromosomes starting from the individuals selected from the current population through the selection operator. Genetic operators are of two types: *crossover* (or *recombination* operators) and *mutation* operators.

### Crossover Operators
The aim of a crossover operator is to recombine the genotypes of two or more selected individuals to create new offspring chromosomes. The idea underlying crossover is that two candidate solutions with certain fitness values can give rise to a  solution with better fitness by combining their respective chromosomes. In particular, a crossover operator taking $k$ parent chromosomes as inputs can be applied with a certain *crossover probability* $p_c \in [0,1]$. In case crossover is not applied, the selected parents are simply copied in the new population as they are.

Crossover operators can be of various types, depending on the particular optimization problem and the encoding considered for the chromosomes in the GA. Below we describe the three most common crossover operators, namely one-point, two-point and uniform crossover.

#### One-Point Crossover
One-point crossover performs recombination between two parent chromosomes $p_1, p_2 \in \{0,1\}^n$ of length $n$ by first sampling a random cut position $\chi$ between $1$ and $n$. Then, two offspring chromosomes $c_1, c_2 \in \{0,1\}^n$ are created by swapping the bits of $p_1$ and $p_2$ up to the cut position $\chi$, and then by copying the bits of $p_1$ in $c_1$ and of $p_2$ in $c_2$ in the remaining positions.

Below a Python implementation of one-point crossover is provided.

In [82]:
## Function for one-point crossover between two binary strings

def one_point_crossover(parent1, parent2):
    # Copy the parent strings in the offspring strings
    child1 = list(parent1)
    child2 = list(parent2)
    
    # Select the cut point
    cutpoint = random.randint(0,len(parent1)-1)
    
    # Swap the bits in the children up to the cutpoint
    for i in range(0, cutpoint):
        child1[i] = parent2[i]
        child2[i] = parent1[i]
        
    return [child1, child2]

## Test for one-point crossover

l = 6   # length of the chromosomes

parents = create_population(2, l) # create a population of just two individuals
print("Parent 1: ", parents[0])
print("Parent 2: ", parents[1])
children = one_point_crossover(parents[0], parents[1])
print("Child 1: ", children[0])
print("Child 2: ", children[1])

Parent 1:  [1, 0, 1, 1, 0, 1]
Parent 2:  [1, 0, 0, 0, 1, 1]
Child 1:  [1, 0, 0, 1, 0, 1]
Child 2:  [1, 0, 1, 0, 1, 1]


#### Two-Point Crossover
Two-point crossover is based on principle analogous to that of one-point crossover, but instead of swapping the bits of the parents from the first position up to the cut point, it does so in the interval included between *two* cut points. In a sense, two-point crossover can still be considered as a one-point crossover operator, but by interpreting the chromosomes as *rings* instead of strings. Then, two-point crossover consists of swapping two pieces of the rings.

Below we report the Python code for two-point crossover.

In [83]:
## Function for two-point crossover between two binary strings

def two_point_crossover(parent1, parent2):
    # Copy the parent strings in the offspring strings
    child1 = list(parent1)
    child2 = list(parent2)
    
    # Select the two cut points
    cutpoint1 = random.randint(0,len(parent1)-1)
    cutpoint2 = random.randint(0,len(parent1)-1)
    
    # Swap the bits in the interval determined by the two cutpoints
    if(cutpoint1 <= cutpoint2):
        for i in range(cutpoint1, cutpoint2):
            child1[i] = parent2[i]
            child2[i] = parent1[i]
    else:
        for i in range(cutpoint2, cutpoint1):
            child1[i] = parent2[i]
            child2[i] = parent1[i]
        
    return [child1, child2]

## Test for one-point crossover

l = 6   # length of the chromosomes

parents = create_population(2, l) # create a population of just two individuals
print("Parent 1: ", parents[0])
print("Parent 2: ", parents[1])
children = two_point_crossover(parents[0], parents[1])
print("Child 1: ", children[0])
print("Child 2: ", children[1])

Parent 1:  [1, 0, 1, 1, 0, 1]
Parent 2:  [0, 1, 1, 0, 0, 0]
Child 1:  [1, 0, 1, 1, 0, 1]
Child 2:  [0, 1, 1, 0, 0, 0]


#### Uniform Crossover
The idea of one-point and two-point crossover can be of course generalized to $k>2$ cut points. *Uniform crossover*, on the other hand, is based on a different principle: iterates through the bits of the two parents, and for each position swaps the corresponding bits in the children with probability $p$. Usually, this probability depends on the length of the chromosomes, and it is often $p= 1/l$.

Below we report the Python code for uniform crossover

In [84]:
## Function for uniform crossover between two binary strings

def uniform_crossover(parent1, parent2, prob):
    # Copy the parent strings in the offspring strings
    child1 = []
    child2 = []
    
    # Iterates on the bits of the two parents
    for i in range(len(parent1)):
        if(random.uniform(0,1) < prob):
            #Swap the two bits in position i in the children
            child1.append(parent2[i])
            child2.append(parent1[i])
        else:
            #Copy the two bits in the usual order in the children
            child1.append(parent1[i])
            child2.append(parent2[i])
        
    return [child1, child2]

## Test for one-point crossover

l = 6   # length of the chromosomes

parents = create_population(2, l) # create a population of just two individuals
print("Parent 1: ", parents[0])
print("Parent 2: ", parents[1])
children = uniform_crossover(parents[0], parents[1], 1/l) #call uniform crossover with probability p=1/l of swapping bits
print("Child 1: ", children[0])
print("Child 2: ", children[1])

Parent 1:  [1, 1, 0, 1, 1, 0]
Parent 2:  [0, 1, 0, 0, 1, 1]
Child 1:  [1, 1, 0, 1, 1, 0]
Child 2:  [0, 1, 0, 0, 1, 1]


### Mutation Operators
As we said above, crossover takes two or more parent bitstrings and mix them in order to create offspring chromosome. Thus, in a sense, crossover is not inserting new genetic material in the population, but just recombining existing one. Indeed, it can be showed that for the class of *geometric* crossover operators (such as the one-point, two-point, and uniform crossover that we introduced above), the offspring is always included in the *bounding box* containing the two parents. Therefore, the chromosomes produced by crossover applied to the individuals selected by the reproduction operator will be inside the bounding box that includes all individuals in the current population. Crossover can thus be thought as a search operator that "exploits" the current neighborhood of candidate solutions induced by the population.

However, the consequence of applying only crossover operators to create the offspring will make the the population converge to a several copies of the same individual, eventually. In order to avoid this, the GA needs to introduce new *random* genetic material. This is usually accomplished by employing a *mutation operator* on the offspring produced by the crossover operator. Mutation typically works by randomly flipping the bits in the chromosome with a low *mutation probability* $p_m \in [0,1]$.

Below we report the Python code for such kind of mutation operator, which is the most commonly used.

In [85]:
## Function for bit-flip mutation on a binary string

def bitflip_mutation(chromosome, mutprob):
    
    mut_chromosome = list(chromosome)
    
    for i in range(len(chromosome)):
        if(random.uniform(0,1) < mutprob):
            #complement the current bit
            mut_chromosome[i] = (mut_chromosome[i]+1)%2
        
    return mut_chromosome

## Test for bit-flip mutation

l = 6   # length of the chromosomes
mutprob = 0.2

parents = create_population(2, l) # create a population of just two individuals
print("Parent 1: ", parents[0])
print("Parent 2: ", parents[1])

#Perform crossover and then mutation on the children
children = one_point_crossover(parents[0], parents[1]) #call uniform crossover with probability p=1/l of swapping bits
print("Child 1: ", children[0])
print("Child 2: ", children[1])
children[0] = bitflip_mutation(children[0], mutprob)
children[1] = bitflip_mutation(children[1], mutprob)
print("Mutated Child 1: ", children[0])
print("Mutated Child 2: ", children[1])


Parent 1:  [0, 0, 0, 1, 0, 1]
Parent 2:  [1, 1, 1, 1, 0, 0]
Child 1:  [1, 1, 0, 1, 0, 1]
Child 2:  [0, 0, 1, 1, 0, 0]
Mutated Child 1:  [1, 1, 0, 1, 0, 0]
Mutated Child 2:  [0, 0, 1, 1, 0, 0]


Elitism, Generational and Steady-State GA
-----------------------------------------------
So far, we have only surveyed the search operators of GA. After selection, crossover and mutation have been applied, the GA has to integrate the obtained offspring in the population. Thus, the first step is to evaluate the fitness function over all produced offspring chromosomes. Then, in the *generational* version of GA, the population is simply replaced by the new offspring for the next generation. This means that the selection and crossover operators are applied enough times to create a number of offspring chromosomes equal to the current population size, in order to keep the population stable.

Of course, by just replacing the old population with the new, the individual with the best fitness in the old population will not survive to the next generation. Hence, if the best offspring chromosome has a fitness value which is lower than that of the best individual in the old population, the GA will actually produce a worse candidate solution. In order to solve this problem, generational GA are usual coupled with an *elitist strategy*: if the best individual in the new population is not better than the best individual in the old population, then use the latter to overwrite one of the individuals in the new population. In this way, the best individual is always preserved, and its fitness is a monotone non-decreasing function of the number of GA iterations. How the individual in the new population to be replaced by the best of the old population must be selected? Several strategies are possible: for example, one could always overwrite the worst individual in the new population. However, this practice risks making the GA to converge prematurely to an optimal solution, since the diversity of the individuals quickly decreases in the first few iterations. A better strategy, in this sense, is to replace a random individual in the new population with the best one from the old population.

The code below assembles the various operators introdued above in a generational GA for the OneMax problem:

In [86]:
### Function for a generational GA on the OneMax problem. The parameters are:
### - population size
### - length of the bitstrings
### - maximum number of generations to be performed
### - tournament size
### - crossover probability
### - mutation probability

def generational_GA_OneMax(popsize, length, maxgen, tsize, pcross, pmut):
    
    # Generate initial population and evaluate its fitness
    population = create_population(popsize, length)
    fitnesses = evaluateOneMaxPopulation(population)
    #print("The initial population is :", population)
    #print("Fitness values are:", fitnesses)
    
    # Find best individual
    best_fit = max(fitnesses)
    best_pos = fitnesses.index(best_fit)
    
    print("Best initial individual is: ", end=" ")
    print(population[best_pos], end=" ")
    print(", fitness = ", best_fit)
    
    # Main loop: while the maximum number of generations has not been reached
    # and the fitness is not the optimal one, iterate the GA
    i = 0
    while((i<maxgen) and (best_fit < length)):
        
        # Step 1: pick popsize candidate individuals for reproduction with
        # tournament selection
        candidates = []
        for j in range(popsize):
            candidates.append(tournament_selection(population, fitnesses, tsize))
        
        ## Step 2: apply crossover and mutation on the adjacent pairs of candidates to create the offspring
        j = 0
        offspring = []
        while(j<len(candidates)):
            children = []
            if(random.uniform(0,1)<pcross):
                #apply one-point crossover
                children = uniform_crossover(candidates[j], candidates[j+1], 1/length)
            else:
                #copy the parents
                children = [candidates[j], candidates[j+1]]
                
            #mutate the offspring and put them in the new population
            children[0] = bitflip_mutation(children[0], pmut)
            children[1] = bitflip_mutation(children[1], pmut)
            
            offspring.append(children[0])
            offspring.append(children[1])
            
            j += 2
        
        ## Step 3: evaluate fitness of the offspring and apply elitism
        newfitnesses = evaluateOneMaxPopulation(offspring)
        best_fit_off = max(newfitnesses)
        best_pos_off = newfitnesses.index(best_fit_off)
        if(best_fit >= best_fit_off):
            #overwrite random individual with the best from the old population
            pos_replace = random.randint(0,len(offspring)-1)
            offspring[pos_replace] = list(population[best_pos])
            fitnesses[pos_replace] = best_fit
        else:
            #update best individual
            best_pos = best_pos_off
            best_fit = best_fit_off
        
        # Step 4: replace old population with new
        population = offspring
        i+=1
    
    print("Best individual at generation ", end=" ")
    print(i, end=": ")
    print(population[best_pos], end=" ")
    print(", fitness = ", best_fit)
    
## Test for generational GA
popsize = 20
length = 50
maxgen = 1000
tsize = 5
pcross = 1.0
pmut = 0.2

generational_GA_OneMax(popsize, length, maxgen, tsize, pcross, pmut)

Best initial individual is:  [1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1] , fitness =  33
Best individual at generation  1000: [0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0] , fitness =  38
