# Problem

We have a following equation below:

**Y = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6**

meaning there are 6 inputs (x's) and 6 weights (w's), one for each x. In this problem, we will try to use GE to learn how to maximize the above function by adjusting weights accordingly - meaning, it should learn that positive X's should be multiplied by high positive number, while negativive inputs, should be multiplied by negative weight.

### Fitness 

As described above, fitness is defined as result of the mentioned equation. In other words, fitness is calculated as the sum of product between each input and its corresponding weight. The higher the value, the better the fitness.

In [1]:
import numpy as np

def population_fitness(population, inputs):
    return np.sum(population*inputs, axis=1) # Calculate the given equation for each individual in the population

### Select parents

In this example, we will simply always select the best fitted parents to be included in the mating procedure. We do so by finding the index of an array where fitness is the highest and insert that individual from population into the corresponsing location in the parents array. After doing so, we change the fitness at the index level to something very small, so that this individual will not get selected again.

In [2]:
def select_parents(population, fitness, num_parents):
    parents = np.empty(num_parents) 
    for index in range(num_parents[0]):
        best_fitted_idx = np.where(fitness == np.max(fitness))
        best_fitted_idx = best_fitted_idx[0][0]
        parents[index, :] = population[best_fitted_idx, :]
        fitness[best_fitted_idx] = -999999
        
    return parents

### Breeding (crossover)

Offspring will be a mix of number taken from its parents, as cut in the crossover point. In this case we are using a single crossover point, which is in the center of the chromosome. Then we iterate across all the parents and have them produce offspring according to the rule described above - so, best fitted parent at location 0 will produce offspring with slighlty less fitted parent at location 1. Then parent at location 1 together with parent at location 2 will produce another offspring, then 2 with 3 etc, until we run out of offspring population size. In case we run out of parents, the loop will reiterate through parents again (thanks to module operation) and the least fitted parent will mate with the best fitted parent.

In [3]:
def crossover(parents, num_offsprings):
    offsprings = np.empty(num_offsprings)
    crossover_point = np.uint8(num_offsprings[1]/2) 
    
    for k in range(num_offsprings[0]):
        parent1_idx = k % parents.shape[0]
        parent2_idx = (k+1) % parents.shape[0]
        offsprings[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
        offsprings[k, crossover_point:] = parents[parent2_idx, crossover_point:]
        
    return offsprings

### Mutation

Mutation will happen to every single offspring - a random value between -1 and 1 will be added to the weight at random location. It is possible to specify which location to mutate though - to encourage more randomness, let's keep it random ;)

In [4]:
import random

def mutation(offsprings):
    for index in range(offsprings.shape[0]):
        random_value = np.random.uniform(-1.0, 1.0, 1)
        random_index = random.randint(0, offsprings.shape[1] - 1)
        offsprings[index, random_index] = offsprings[index, random_index] + random_value
        
    return offsprings

### Evolution

Based on the number of W's in our equation, each individual (*chromosome*) in the population will have 6 *genes* with specific *allele* value. Thus, population size will be of shape (num_individuals, num_weights)

Since Genetic Algorithm is super random, offspring might as well be less fitted then their parents. Because of this, it is best to keep some of the most fitted parents in the new generation as well for further mating. By doing so, we can guarantee that the new generation will preserve at least some of the good results and will not go completely worse, which can happen at the worst case scenario of we only keep offsprings.
Since we are mixing parents and offspring for another population, and always pick the most fitted ones, we can specify number of parents to be included in the mating process so we can avoid pointless calculations - same with the number of offsprings.

Let's keep it 50:50, so the new generation will have 4 solutions from parents and 4 from their offsprings.

In [5]:
equation_inputs = [4, -2, 3.5, 5, -11, -4.7]

NUM_WEIGHTS = len(equation_inputs)
NUM_INDIVIDUALS = 30
POPULATION_SIZE = (NUM_INDIVIDUALS, NUM_WEIGHTS) # We have 10 individuals in population each chromosome with 6 genes
NUM_GENERATIONS = 100
NUM_PARENTS = (int(NUM_INDIVIDUALS / 2), NUM_WEIGHTS) 
NUM_OFFSPRINGS = (NUM_INDIVIDUALS-NUM_PARENTS[0], NUM_WEIGHTS)

In [6]:
history = []
new_population = np.random.uniform(-4, 4, POPULATION_SIZE)

for generation in range(NUM_GENERATIONS+1):
    fitness = population_fitness(equation_inputs, new_population)        # Measure fitness of each chromosome in the population

    best_fitted_idx = np.where(fitness == np.max(fitness))
    best_weights = new_population[best_fitted_idx]
    score = round(np.max(fitness), 3)
    history.append((generation, score, best_weights))
    
    if generation % 10 == 0:
        print('At generation #{}, best achieved score is: {}'.format(generation, score))
        
    parents = select_parents(new_population, fitness, NUM_PARENTS)       # Select parents
    offsprings = crossover(parents, NUM_OFFSPRINGS)                      # Mate selected parents to create offsprings
    mutated_offsprings = mutation(offsprings)                            # Mutate offsprings
    
    new_population[0:parents.shape[0], :] = parents
    new_population[parents.shape[0]:, :] = mutated_offsprings

At generation #0, best achieved score is: 59.261
At generation #10, best achieved score is: 111.858
At generation #20, best achieved score is: 145.526
At generation #30, best achieved score is: 169.957
At generation #40, best achieved score is: 200.984
At generation #50, best achieved score is: 227.724
At generation #60, best achieved score is: 259.582
At generation #70, best achieved score is: 300.993
At generation #80, best achieved score is: 330.442
At generation #90, best achieved score is: 361.22
At generation #100, best achieved score is: 394.73


# Analysis and visualization

In [7]:
print('Weights maximizing the function found after 100 generations:')
print(history[-1][2])

Weights maximizing the function found after 100 generations:
[[  9.59637753  -7.03251579  12.79417243   5.46466277 -20.93926471
   -8.47761072]]


In [8]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool, LabelSet, Label

output_notebook()

In [9]:
generation, score, best_weights = zip(*history)

source = ColumnDataSource({
    'generation': generation, 
    'score': score, 
    'best_weights': np.round(best_weights, 2).tolist()
})
labels_source = ColumnDataSource({
    'generation': generation[::25], 
    'score': score[::25]
})

TOOLTIPS = [
    ('Best attempt:', '@score'),
    ('Generation:', '@generation'),
    ('Best weights:', '@best_weights')
]

hover = HoverTool(
    tooltips=TOOLTIPS,
    mode='vline'
)

labels = LabelSet(x='generation', y='score', text='score', level='glyph', angle=.6,
                  x_offset=0, y_offset=5, source=labels_source, render_mode='canvas')

p = figure(plot_width=800, 
           plot_height=500, 
           tools=[hover],
           title='Best fitted individual in each generation.')

p.line(x='generation', y='score', source=source, line_width=2)
p.xaxis.axis_label = 'Generation'
p.yaxis.axis_label = 'Score'
p.add_layout(labels)

show(p)