# Genetic Algorithm - Introduction

A genetic algorithm is a type of evolutionary strategy (ES). We will apply biological inspired operations to a problem. In our case we have 7 zoom participants. However some are more important than others. Given that you want to show as many important people as possible, but the cognitive cost increases. Who should you show? In general this method outperforms random search on bigger problems, where exhaustive search is infeasible.
Checkout this package if you want a reliable, often used ES solver in python: https://pypi.org/project/cma/, based on this: https://en.wikipedia.org/wiki/CMA-ES

In [2]:
import random
import numpy as np
import copy

We define our problem as follows. We use a nested list. 
Each sublist has three items. The participants name, their importance, and and flag on shown or not {0,1}.

In [3]:
#[who, importance, show]
problem = [
    ["Velko", 1, 0],
    ["Andreas", 2.22, 0],
    ["Otmar",3, 0],
    ["Hugo",2,0],
    ["Christian",3,0],
    ["David",2.5,0],
    ["Thomas",1.25,0]
]

## Objective

We define our objective relatively straight forward. If a person is shown we add their 
important to our score. We penalize with the number of people shown squared. 
The trade-off between showing people and cognitive cost is weighted.

In [4]:
def objective(possible_solution):
    score = 0
    n_shown = 0
    for person in possible_solution:
        score += person[1] * person[2]
        n_shown += person[2]
    return 0.8*score - 0.2*n_shown**2

## Solver
This solver is already getting a lot more complex than a random or full search. Which could suffice for this problem. 
There are plenty of libraries who have implemented this. 
Feel free to use those. There will be comments in the code explaining elements in more detail. 
In general the concept of ES is based around the notion of a population. 
A population consists out of chromosomes or individuals, and each individual has a gene. 
Individuals can be seen as feasible solutions. 

<img src="figures/genetic_algorithm.png">

We start by defining a random initial population of an certain size. For each individual in this population 
we calculate their fitness (in our case the objective defined above, higher is better). We take the best parents 
based on their fitness. From this selection we randomly select two. This will spawn a child. We do this be 
randomly selecting a cut-off point. For all the genes before this point we pick the genes from parent A, 
after this point we use the genes of parent B. There are also multiple way to do this. 
The important factor here is that you randomize which parent a gene is from. 

After this certain mutations can occur. We only use a swap-mutation. 
This means that two genes swap places. There are more possible mutations. Using mutations lowers the 
probability of getting stuck in local minima. 

We continue this till convergence (which we chose an arbitrary formulation for, defining convergence is challenging 
and important). 

In [13]:
def genetic_solver(problem, population_size = 50, n_parents=8, mutation_probability=0.05):
    population = []
    best_fitness = -np.inf
    converged = False
    best_sample = None
    best_sample_score = -np.inf
    
    # Define the original population. 
    for n_chromosome in range(population_size):
        #python has the annoying thing, that if you append lists to a list, 
        #they just append pointers to a memory place. So if you change it you change every nested list. 
        #we need to make a deepcopy. 
        chromosome = copy.deepcopy(problem) 
        for i in range(len(chromosome)):
            chromosome[i][2] = random.randint(0, 1)
        population.append((chromosome, objective(chromosome)))

    while not converged:
        children = []
        average_fitness = 0  
        
        # sort the population based on fitness ans select the best. 
        population.sort(key=lambda pair: pair[1], reverse=True)
        parent_population = [parent[0] for parent in population[:n_parents]]
        
        # make sure that the number of children == population size, so that doesnt change. 
        while len(children) < population_size:
            
            # randomly select two parents
            parents = random.sample(parent_population, 2)
            
            #randomly select a crossover point
            cross_over_point = random.randint(1, len(problem[0]))
            
            # now we are making a new child/chromosome/individual
            # for every gene in both parents, if it is before the cross over point use parentsA gene, else parent B
            child = []
            for gene, p in enumerate(zip(parents[0], parents[1])):
                if gene<cross_over_point:
                    child.append(p[0])
                else:
                    child.append(p[1])
            
            # randomly swap two genes with a small probability. 
            if random.random() < mutation_probability:
                swap_indices = random.sample(list(range(len(chromosome))), 2)
                geneA = child[swap_indices[0]][2]
                geneB = child[swap_indices[1]][2]
                child[swap_indices[0]][2] = geneB
                child[swap_indices[1]][2] = geneA
            
            # calculate the fitness of this style. Always store the best. 
            childs_fitness = objective(child)
            if childs_fitness>best_sample_score:
                best_sample_score = childs_fitness
                best_sample = child
            
            # save the kid and calculate the average fitness of this population
            average_fitness += childs_fitness/population_size
            children.append((copy.deepcopy(child), childs_fitness))

        # continue till we are converged. 
        # We define converge if our average fitness is not better than 95% of the current best know fitness.
        if  average_fitness > 0.95*best_fitness:
            best_fitness = average_fitness
            population = copy.deepcopy(children)
        else:
            return (best_sample, objective(best_sample))

In [14]:
print(genetic_solver(problem))
%timeit genetic_solver(problem)



([['Velko', 1, 0], ['Andreas', 2.22, 1], ['Otmar', 3, 1], ['Hugo', 2, 0], ['Christian', 3, 1], ['David', 2.5, 1], ['Thomas', 1.25, 0]], 5.376)
47.9 ms ± 8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
