# Genetic algorithms using `numpy`

In this demonstration, we will code up step-by-step, a simple GA for optimizing a trivial function with constraints. Further exploration in this example would be very useful for your project.

In [None]:
import numpy as np

# Do some Ipython black magic
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [None]:
SEED = 42
rng = np.random.default_rng(seed=SEED)

## Problem statement

We need to maximize the function  $ f(\mathbf{x}) = \sum_{i=1}^{6} w_i x_i $ for a given set of weights $ w_i $, with the constraints that $ x_i \; \in \; [-4,4] \; \forall \; i$. This means that the domain $\mathbb{D}$ of the search is $ \mathbb{D}:= [-4,4]^6$. The optimization problem can be succinctly represented as $\left(\mathbb{D}, \mathbb{R} , \mathbf{f}, \geq \right)$.

## Ease of solving the problem

We note that this example is trivial because given a set of weights, we need to pick either $x_i=-4$ or $x_i=4$ as the objective function is linear. That's precisely the point however, as we know the solution to the problem, and we can compare how GA performs (as a search algorithm in itself, and against design parameters)

## Given
The weight vector $\mathbf{w} = [6,8,-6,3,3,-4]$. For this case then, the optimal solution is $\mathbf{x}^* = [4, 4, -4, 4, 4, -4]$ which gives a maximum possible objective value $f^* = 120$

In [None]:
N_WEIGHTS = 6 # Helper variable
WEIGHTS = np.array([6,8,-6,3,3,-4], dtype = np.float64) # Optional dtype argument

In [None]:
# Confirm the shape so that we are happy
np.shape(WEIGHTS)

#  Representation

How do you want to encode this problem?  We need to keep in mind that variation needs to be done on this representation.

In [None]:
# I can define my optimal solution vector now, since I have decided my representation
X_STAR = # Fill in based on representation

Now that we have picked representation, let's start off the problem. We need to pick a population size (i.e. number of parents in the initial generation). Let's pick a nice number, say 10  

In [None]:
# Size of the population
POP_SIZE = 10

Remember! Each member of this population is a vector with shape `(6,)`. Is there a way to efficiently represent/generate/work with this entire population all at one go?

Of course! Put them together as a 2 dimensional array!

In [None]:
# Helper variable that generates POP_SIZE x NUM_WEIGHTS population
DOFS_IN_POP = (POP_SIZE, N_WEIGHTS)

The population will have `POP_SIZE` chromosomes where each chromosome has `NUM_WEIGHTS` genes. Let's generate the initial population that our GA algorithm will work on. This population needs to be randomly initialized, say around $0.0$. (Hint : The `numpy.random` module comes to mind)

In [None]:
curr_population = # Fill in

In [None]:
curr_population

# Fitness assignment
In this demo, we assign fitness directly using the objective function (without penalty, we'll deal with constraints later on). However you can best decide what fitness works for your problem (competitive? informal?...)

In [None]:
def calc_fitness(t_pop):
    """ Calculates fitness given the population using global weights
    The fitness function calulates the sum of products between each input and its corresponding weight.
    Returns a (POP_SIZE, ) numpy array
    """       
    # Remember : Fitness is a scalar value
    fitness = 0.0 # Fill in
    
    return fitness

In [None]:
# Sanity check
my_fitness = calc_fitness(curr_population)
print(my_fitness.shape)
print(my_fitness)

# Selection for variation

### How do you select parents to spread their genes? 
One way to do it is to select *best* parents only to mate. This is an example of a **determinstic selection scheme**, wherein we rank them by fitness and consider the best ones.

### How many parents to select for mating? 
This is up to the user. Some people prefer to use heuristics based on the population size for determining the mating pool size. Sometimes it depends on the selection scheme used (for example, in tournament selection where the $T$ parameter, along with the population size determines the mating pool size).

For simplicity in this demo, let's fix the number of parents that are selected to mate at every iteration and store it as a variable.


In [None]:
N_MATING = 0 # Fill in a number of your choice > 0 

def select_determinstic(t_pop, t_fitness):
    """ Given current population t_pop, select N_MATING number of parents using determinstic selection scheme
    based on t_fitness (the fitness of the population) 

    Returns parents and their fitness as a tuple (parent, fitness_of_parents)
    """
    # Fill in details here 
    parents = np.empty((N_MATING, N_WEIGHTS))
    
    return (parents, calc_fitness(parents))

In [None]:
# Sanity check
my_parents, my_par_fitness = select_determinstic(curr_population, my_fitness)
my_parents
my_par_fitness

### Can you show me an example of stochastic selection?

Let's implement Stochastic Universal Sampling, which consists of

* Sampling rate assignment
* Sampling

I will demonstrate this in class as its a bit more involved.

# Variation
Variation has two operations : crossover/recombination (produce new offspring from the parents selected in the prior step) and mutation (produced mutated/randomly offset offspring). Let's see each one separately.

### Recombination
For this demonstration, lets do a one-point crossover. This means we naively *mix* the solution vectors---we take some  components from one parent and the rest from another parent... 

Before proceeding, we want to determine the limit of offspring vectors to be produced. This has to be a reasonable number and should depend on the population and the number of parents selected for mating. This also corresponds to the $\lambda$ deterministic selection schemes.

We also need to select $p_c$, the crossover rate---it determines the probability of a crossover happening between two parents.

In our demo, let's fix $p_c$ to 1 (recombination happens always). We also fixed the offspring size above to $0.5 \cdot {\text{N-MATING}\choose 2} = 3$. This allows every selected parent to propogate their genes. The parent vectors are chosen in a fixed fashion (1 mates with 2, 2 with 3 and so on...)

Where to crossover? Sometimes we use an additional random integer (respecting array range constraints) that determines at which location a crossover should occur. In our example below, let's perform a *uniform* one-point crossover (uniform in the sense that we always do crossover at a selected index)

In [None]:
N_OFFSPRING =  0 # Fill in how many offspring you want
IDX_CROSSOVER = 9999 # Fill in at which index you want crossover

In [None]:
def crossover(t_parents):
    """ Given a set of parents, combine them and return offspring vectors
    
    Returns the offsprings and their fitness
    """
    
    # Create an emppty vector
    offspring = np.empty((N_OFFSPRING, N_WEIGHTS))

    # Fill in crossover details
    
    return (offspring, calc_fitness(offspring))

Does crossover create new offspring?

In [None]:
# Sanity check
my_offspring,  my_offspring_fitness = crossover(my_parents)
my_parents
my_offspring

It does! Are the offsprings fitter than their parents?

In [None]:
# Check if recombination is useful
my_par_fitness
my_offspring_fitness

They are! This is good news...

### Mutation
For this demonstration, lets add uniform random numbers drawn between $[-0.5, 0.5)$ to the offspring. But let's not do it everytime---we will do that with a probability $p_m = 0.5$, sampled for each offspring. Recollect that this is the mutation rate parameter.

In [None]:
PM = 1000000 # Mutation rate parameter
def mutation(t_offspring):
    """ Given a set of offsprings, introduce mutation in them
   
    Returns the mutated offsprings and their fitness
    """
    # Fill in details
    mutated_offspring = np.empty(t_offspring.shape)
    
    return (mutated_offspring, calc_fitness(mutated_offspring))

Can we see some mutation?

In [None]:
# Sanity checks
my_mut_offspring, my_mut_offspring_fitness = mutation(my_offspring)
my_offspring
my_mut_offspring

# Environmental selection
The final step is environmental selection. This comes in two parts:

1) Imposing any hard constraints, such as those violating range being chucked out

2) Picking the `POP_SIZE` best individuals and sending them to the next generation



Let's first throw out any individuals not adhering to the limits set by or constraints

In [None]:
def hard_constraint(t_total_pop):
    """ Scans the array for individual violating constraint bounds
    and chucks them out of processing

    t_total_pop includes all vectors (parents + mutated offsprings)
    (this is done for you below, so don't worry about it)
    """
    # Fill in details on how you eject solutions not in decision space
    # or search space
    modified_pop = t_total_pop.copy()
    
    return modified_pop

Let's see if the constraint check passes. Let's concatenate all vectors into one big unit and do constraint check.

In [None]:
# Testing constraints 
# Group curr_population, my_parents, my_mut_offspring here
# Stack them one top of another
total_population = np.vstack((curr_population, my_mut_offspring))

# Verify shape
total_population.shape

total_population = hard_constraint(total_population)

We then pick the top `POP_SIZE` individuals next, based on their fitness

In [None]:
def environmental_selection(t_total_pop):
    """ Calculate total population (after constraint checking) fitness,
    rank accoridingly and select only the top POP_SIZE individuals to
    pass on to the next generation
    """ 

    # Fill in details
    
    return t_total_pop

Check whether selecting top 10 works...

In [None]:
print("Before")
total_population.shape
calc_fitness(total_population)
new_population = environmental_selection(total_population)
print("After")
new_population.shape
calc_fitness(new_population)

# Let's hook them all up together!

And add some utilities to help us track progress

In [3]:
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
best_outputs = []
num_generations = 1000
curr_population = rng.uniform(low=-4.0, high=4.0, size=DOFS_IN_POP)
overall_max_fitness = -99999

# Run many iterations
# You should also have another convergence check
for generation in range(num_generations):
    print("Generation : ", generation)

    # Measuring the fitness of each chromosome in the population.
    fitness = calc_fitness(curr_population)

    # print("Fitness")
    # print(fitness)

    max_fitness = np.max(fitness)

    # The best result in the current iteration.
    print("Best result in current iteration {0} compared to overall {1}".format(max_fitness, max(max_fitness, overall_max_fitness)))
    best_outputs.append(max_fitness)
    
    # Selecting the best parents in the population for mating.
    parents, _ = select_determinstic(curr_population, fitness)
#     parents, _ = select_stochastic(curr_population, fitness)
    
    # print("Parents")
    # print(parents)

    # Generating next generation using crossover.
    offspring_crossed, _ = crossover(parents)

    # print("Crossover")
    # print(offspring_crossover)

    # Adding some variations to the offspring using mutation.
    offspring_mutated, _ = mutation(offspring_crossed)

    # print("Mutation")
    # print(offspring_mutation)

    # Check for constraints
    total_population = np.vstack((curr_population, offspring_mutated))
    total_population = hard_constraint(total_population)

    # Environmental selection
    curr_population = environmental_selection(total_population)
              
# Getting the best solution after iterating finishing all generations.
#At first, the fitness is calculated for each solution in the final generation.
fitness = calc_fitness(curr_population)

# Then return the index of that solution corresponding to the best fitness.
max_idx = np.argmax(fitness)

print("Best solution : ", new_population[max_idx, :])
print("Best solution fitness : ", fitness[max_idx])

Let's see how our GA performed!

In [None]:
plt.figure(figsize=(12,12))
plt.plot(best_outputs,'-o', lw=3, ms=20, label='from scratch')
plt.xlabel("Iteration")
plt.ylabel("Fitness")
plt.show()

In [None]:
plt.figure(figsize=(12,12))
plt.plot(120-np.array(best_outputs),'-o', lw=3, ms=20, label='from scratch')
plt.xlabel("Iteration")
plt.ylabel("Fitness")
plt.show()

## Yay! Looks like it works, although not very well. Explore and see what can be improved

Acknowledgement: Content from this notebook is drawn from Ahmed Gad's GA implementation, found at his [Github](https://github.com/ahmedfgad/GeneticAlgorithmPython).

## We are not done yet!

To have a black box optimizer, we need to wrap all this functionality into something that's abstracted away from the end user (in this case you). What does that mean? Think about:
- Change in the cost function (Knapsack, Rastrigin)
    - Function changes (and along with it discontinuities, ill-conditioning, non-separability...)
    - Dimensionality of the problem changes (n=1,...,100,...1000?)
    - Other concerns
- Change in the encoding (reals, bitvectors)
- ...

That is to say, write something well once and keep on reusing it---functions! Also store associated variables---classes!

## HW : Wrap this up in a GA class and reuse it in your applications (Project 1)

In [None]:
a = np.sin

In [None]:
a

In [None]:
def fitness(my_func):
    return my_func(2.0)
    
fitness(a)