# Solving problems by Searching

This notebook serves as supporting material for topics covered in **Chapter 3 - Solving Problems by Searching** and **Chapter 4 - Beyond Classical Search** from the book *Artificial Intelligence: A Modern Approach.* This notebook uses implementations from [search.py](https://github.com/aimacode/aima-python/blob/master/search.py) module. Let's start by importing everything from search module.

In [21]:

import random
import bisect
# Needed to hide warnings in the matplotlib sections
import warnings
#from numpy import *
warnings.filterwarnings("ignore")

For visualisations, we use networkx and matplotlib to show the map in the notebook and we use ipywidgets to interact with the map to see how the searching algorithm works. These are imported as required in `notebook.py`.

## CONTENTS

* Genetic Algorithm


## GENETIC ALGORITHM

Genetic algorithms (or GA) are inspired by natural evolution and are particularly useful in optimization and search problems with large state spaces.

Given a problem, algorithms in the domain make use of a *population* of solutions (also called *states*), where each solution/state represents a feasible solution. At each iteration (often called *generation*), the population gets updated using methods inspired by biology and evolution, like *crossover*, *mutation* and *natural selection*.

PERMUTATION ENCODING

GA Parameters:
We now need to define the maximum size of each population. Larger populations have more variation but are computationally more  expensive to run algorithms on.
As our population is not very large, we can afford to keep a relatively large mutation rate.
Termination after a predefined number of generations.
N is the size of the chromosmes, and [0,..,N-1] is the alphabet

In [22]:
max_population = 100
mutation_rate = 0.07 # 7% of the chromosones are mutated
ngen = 100 # maximum number of generations
N = 8 # chormosome size
gene_pool = list(range(N)) # alphabet
#gene_pool.remove(0)
print(gene_pool)

[0, 1, 2, 3, 4, 5, 6, 7]


Great! Now, we need to define the most important metric for the genetic algorithm, i.e the fitness function. This will simply return the number of matching characters between the generated sample and the target phrase.

In [23]:
def fitness_fn(sample):
    # initialize fitness to 0
    fitness = 0
    for i in range(len(sample)):
        # increment fitness by 1 for every matching character
        if sample[i] == target[i]:
           fitness += 1
    return fitness
# target
target = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

To generate `ngen` number of generations, we run a `for` loop `ngen` number of times. After each generation, we calculate the fitness of the best individual of the generation and compare it to the value of `f_thres` using the `fitness_threshold` function. After every generation, we print out the best individual of the generation and the corresponding fitness value. Lets now write a function to do this.

In [24]:
def genetic_algorithm_stepwise(population, fitness_fn, gene_pool, ngen=1200, pmut=0.1):
    for generation in range(int(ngen)):
        # Elitism may be here - ADDED
        previous_best = max(population, key=fitness_fn)
        population = [mutate2(uniform_crossover(*select(2, population, fitness_fn)), pmut) for i in range(len(population)-1)]
        population.append(previous_best)
        # stores the individual genome with the highest fitness in the current population
        current_best = max(population, key=fitness_fn)
        #print(f'Current best: {current_best}\t\tGeneration: {str(generation)}\t\tFitness: {fitness_fn(current_best)}\r', end='')
        print(f'Current best: {current_best}\t\tGeneration: {str(generation)}\t\tFitness: {fitness_fn(current_best)}\r')
    return max(population, key=fitness_fn)       

def init_population(pop_number, gene_pool):
    # a chromosome is a random permutation of the alphabet
    population = []
    for _ in range(pop_number):
        # Shuffle the gene pool and take the first pool_size elements as an individual
        v = gene_pool[:]
        random.shuffle(v)
        population.append(v)
    return population

def select(r, population, fitness_fn):
    fitnesses = map(fitness_fn, population)
    #scaling here
    sampler = weighted_sampler(population, fitnesses)
    return [sampler() for i in range(r)]

def weighted_sampler(seq, weights):
    """Return a random-sample function that picks from seq weighted by weights."""
    totals = []
    for w in weights:
        totals.append(w + totals[-1] if totals else w)
    return lambda: seq[bisect.bisect(totals, random.uniform(0, totals[-1]))]
    # bisect(a,x) -> insertion position of a in a sorted list x - AL REVES

def uniform_crossover(x, y):
    # x, y permutations of the alphabet
    n = 0
    child = [-1] * N
    indexes = [0] * N
    # de x se copian los valores de las posiciones con indexex[i] == 1 en las mismas posiciones en child
    for i in  range(N):
        indexes[i] = random.randint(0,1) 
        if indexes[i] == 1:
            child[i] = x[i]
            n += 1
    # El resto (N-n) se copia de y en su orden relativo, desde el principio
    i = 0 # indice en y
    k = 0 # indice en child
    for t in range(N-n):
        while y[i] in child[:]:
            i += 1
        while child[k] != -1:
            k += 1
        child[k] = y[i]
        i += 1   
    return child

def mutate2(x, pmut):
    if random.uniform(0, 1) >= pmut:
        return x
    i, j = random.sample(range(N), 2)
    x[i], x[j] = x[j], x[i]
    return x

The function defined above is essentially the same as the one defined in `search.py` with the added functionality of printing out the data of each generation.

We have defined all the required functions and variables. Let's now create a new population and test the function we wrote above.

In [25]:
population = init_population(max_population, gene_pool)
#print(population)
solution = genetic_algorithm_stepwise(population, fitness_fn, gene_pool, ngen, mutation_rate)
print("Target: ")
print(target)
print("Solution: ")
print(solution)
print("Fitness: ")
print(fitness_fn(solution))

Current best: [1, 2, 3, 7, 4, 6, 0, 5]		Generation: 0		Fitness: 4
Current best: [1, 2, 3, 7, 4, 6, 0, 5]		Generation: 1		Fitness: 4
Current best: [1, 2, 3, 5, 0, 6, 4, 7]		Generation: 2		Fitness: 4
Current best: [1, 2, 3, 7, 4, 6, 0, 5]		Generation: 3		Fitness: 4
Current best: [1, 2, 3, 7, 5, 6, 0, 4]		Generation: 4		Fitness: 5
Current best: [1, 2, 3, 4, 5, 6, 0, 7]		Generation: 5		Fitness: 6
Current best: [1, 2, 3, 4, 5, 6, 0, 7]		Generation: 6		Fitness: 6
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 7		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 8		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 9		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 10		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 11		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 12		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 13		Fitness: 7
Current best: [1, 2, 3, 4, 5, 6, 7, 0]		Generation: 14		Fitness: 7
Curre

The genetic algorithm was able to converge!
We implore you to rerun the above cell and play around with `target, max_population, f_thres, ngen` etc parameters to get a better intuition of how the algorithm works. To summarize, if we can define the problem states in simple array format and if we can create a fitness function to gauge how good or bad our approximate solutions are, there is a high chance that we can get a satisfactory solution using a genetic algorithm. 
- There is also a better GUI version of this program `genetic_algorithm_example.py` in the GUI folder for you to play around with.

#### Eight Queens

Let's take a look at a more complicated problem.

In the *Eight Queens* problem, we are tasked with placing eight queens on an 8x8 chessboard without any queen threatening the others (aka queens should not be in the same row, column or diagonal). In its general form the problem is defined as placing *N* queens in an NxN chessboard without any conflicts.

First we need to think about the representation of each solution. We can go the naive route of representing the whole chessboard with the queens' placements on it. That is definitely one way to go about it, but for the purpose of this tutorial we will do something different. We have eight queens, so we will have a gene for each of them. The gene pool will be numbers from 0 to 7, for the different columns. The *position* of the gene in the state will denote the row the particular queen is placed in.

For example, we can have the state "03304577". Here the first gene with a value of 0 means "the queen at row 0 is placed at column 0", for the second gene "the queen at row 1 is placed at column 3" and so forth.

We now need to think about the fitness function. On the graph coloring problem we counted the valid edges. The same thought process can be applied here. Instead of edges though, we have positioning between queens. If two queens are not threatening each other, we say they are at a "non-attacking" positioning. We can, therefore, count how many such positionings are there.

Let's dive right in and initialize our population:

In [26]:
population = init_population(100, gene_pool)
print(population[:5])

[[4, 2, 7, 6, 3, 1, 0, 5], [6, 1, 4, 5, 2, 7, 0, 3], [6, 4, 5, 1, 7, 0, 3, 2], [7, 2, 5, 6, 1, 0, 3, 4], [5, 7, 1, 2, 4, 6, 0, 3]]


We have a population of 100 and each individual has 8 genes. The gene pool is the integers from 0 to 7, in string form. Above you can see the first five individuals.

Next we need to write our fitness function. Remember, queens threaten each other if they are at the same row, column or diagonal.

Since positionings are mutual, we must take care not to count them twice. Therefore for each queen, we will only check for conflicts for the queens after her.

A gene's value in an individual `q` denotes the queen's column, and the position of the gene denotes its row. We can check if the aforementioned values between two genes are the same. We also need to check for diagonals. A queen *a* is in the diagonal of another queen, *b*, if the difference of the rows between them is equal to either their difference in columns (for the diagonal on the right of *a*) or equal to the negative difference of their columns (for the left diagonal of *a*). Below is given the fitness function.

In [27]:
def fitness(q):
    non_attacking = 0
    for row1 in range(len(q)):
        for row2 in range(row1+1, len(q)):
            col1 = int(q[row1])
            col2 = int(q[row2])
            row_diff = row1 - row2
            col_diff = col1 - col2

            if col1 != col2 and row_diff != col_diff and row_diff != -col_diff:
                non_attacking += 1

    return non_attacking

Note that the best score achievable is 28. That is because for each queen we only check for the queens after her. For the first queen we check 7 other queens, for the second queen 6 others and so on. In short, the number of checks we make is the sum 7+6+5+...+1. Which is equal to 7\*(7+1)/2 = 28.

Because it is very hard and will take long to find a perfect solution, we will set the fitness threshold at 25. If we find an individual with a score greater or equal to that, we will halt. Let's see how the genetic algorithm will fare.

In [28]:
solution = genetic_algorithm_stepwise(population, fitness, gene_pool, ngen=100)
print(solution)
#print(fitness(solution))

Current best: [4, 6, 0, 2, 7, 5, 3, 1]		Generation: 0		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 1		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 2		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 3		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 4		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 5		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 6		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 7		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 8		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 9		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 10		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 11		Fitness: 28
Current best: [2, 7, 3, 6, 0, 5, 1, 4]		Generation: 12		Fitness: 28
Current best: [5, 2, 4, 7, 0, 3, 1, 6]		Generation: 13		Fitness: 28
Current best: [4, 2, 0, 6, 1, 7, 5, 3]		Generation: 14		Fi

Above you can see the solution and its fitness score, which should be no less than 25.

This is where we conclude Genetic Algorithms.

<br>
This concludes the notebook.
Hope you learned something new!