Copyright **`(c)`** 2023 Giovanni Squillero `<giovanni.squillero@polito.it>`  
[`https://github.com/squillero/computational-intelligence`](https://github.com/squillero/computational-intelligence)  
Free for personal or classroom use; see [`LICENSE.md`](https://github.com/squillero/computational-intelligence/blob/master/LICENSE.md) for details.  

# LAB9

Write a local-search algorithm (eg. an EA) able to solve the *Problem* instances 1, 2, 5, and 10 on a 1000-loci genomes, using a minimum number of fitness calls. That's all.

### Deadlines:

* Submission: Sunday, December 3 ([CET](https://www.timeanddate.com/time/zones/cet))
* Reviews: Sunday, December 10 ([CET](https://www.timeanddate.com/time/zones/cet))

Notes:

* Reviews will be assigned  on Monday, December 4
* You need to commit in order to be selected as a reviewer (ie. better to commit an empty work than not to commit)

In [52]:
import random
from random import choices
from copy import copy
import numpy as np
import lab9_lib

# Local Search:
Local search is a heuristic method for solving computationally hard optimization problems. Local search can be used on problems that can be formulated as finding a solution maximizing a criterion among a number of candidate solutions. Local search algorithms move from solution to solution in the space of candidate solutions (the search space) by applying local changes, until a solution deemed optimal is found or a time bound is elapsed.

In [53]:
l = 1000
problems = [1, 2, 5, 10]
half_pop_size = 5
µ = 2 * half_pop_size

## Implementation:
Idea: with the fitness function I can see which pieces of string are important and which are not, and therefore I can preserve the important pieces and throw away the useless ones.

IMPORTANT: PROMOTE DIVERSITY (I can do it in the selection, crossover and mutation)
* distance metric: 
  - how far the individual is from a subset of the population to the whole population
  - from a single individual
* property of the population

3 levels of diversity:
* phenotype
* genotype
* fitness

In [54]:
# invece che fare find distribution potrei fare direttamente una funzione che mi prende inf e sup di ogni serie consecutiva di 1
# oppure fare direttamente l'and bit a bit e contare il numero di 1

# Evaluate diversity
def evaluate_diversity(e1, e2):
    cnt = 0
    for b1, b2 in zip(e1, e2):
        if b1 and b2:
            cnt = cnt + 1 
    return cnt / len(e1)


In [55]:
def init_population():
    return [(choices([0, 1], k=l), 0.0) for _ in range(µ)]

def evaluate_population(population, fitness):
    return [(individual[0], fitness(individual[0])) for individual in population]

def select_with_replacement(population):
    # select a random individual from the population
    # find the individual with 
    # the highest fitness, 
    # the vector with the most distinct distribution of ones compared to the others
    div_matr = np.zeros((len(population), len(population)))
    for i1, p1 in enumerate(population):
        for i2, p2 in enumerate(population):
            if i1 != i2:
                #the matrix is not symmetric since is added only for p2[1]
                div_matr[i1][i2] = (evaluate_diversity(p1[0], p2[0]) + p2[1]) / 2

    # find index of the individuals with highest fitness and highest diversity
    i1, i2 = np.unravel_index(np.argmax(div_matr), div_matr.shape)

    return population[i1], population[i2]

def crossover(parent1, parent2):
    # a two (rand) point crossover for now
    # swapping of two substrings of the same len but in random position
    # I want try to implement a circular translated swapping
    v = parent1[0]
    w = parent2[0]
    c = random.randint(0, l)
    d = random.randint(0, l)
    if c > d:
        c, d = d, c
    if c != d:
        v[c:d], w[c:d] = w[c:d], v[c:d]
    return (v, 0.0), (w, 0.0)

def mutate(individual):
    # bit flip mutation for now
    p = 0.5
    v = individual[0]
    for i in range(l):
        if p >= random.random():
            v[i] = 1 - v[i]
    return individual


In [56]:
def genetic_algorithm(fitness):
    Best = None
    # 1. Initialize population
    population = init_population()
    population = evaluate_population(population, fitness)
    # 2. Repeat
    found = -1
    for i in range(100):
        for p in population:
            if Best is None or p[1] > Best[1]:
                Best = p
                found = fitness._calls
        
        if Best is not None and Best[1]==1:
            break
        
        q = list()
        for _ in range(µ//2):
            # 2.1 Select parents
            parent_a, parent_b = select_with_replacement(population)
            # 2.2 Crossover
            child_a, child_b = crossover(copy(parent_a), copy(parent_b))
            
            # 2.3 Mutate
            mutated_a = mutate(child_a)
            mutated_b = mutate(child_b)
            q.append(mutated_a)
            q.append(mutated_b)
            
        population = evaluate_population(q, fitness)
    
    # 4. Return best individual
    return Best, found

In [57]:
for _ in range(3):
    my_list = list()
    found = -1
    fitness = None
    for prob in problems:
        fitness = lab9_lib.make_problem(prob)
        b, found = genetic_algorithm(fitness)
        my_list.append((prob, b[1], fitness.calls))
    for m in my_list:
        print(f"Problem\t{m[0]}:\t{m[1]:.2%},\tCalls:\t{m[2]},\tBest found at \t{found} fitness calls")
    print("-------------------------------------")

Problem	1:	53.50%,	Calls:	1010,	Best found at 	790 fitness calls
Problem	2:	51.20%,	Calls:	1010,	Best found at 	790 fitness calls
Problem	5:	30.14%,	Calls:	1010,	Best found at 	790 fitness calls
Problem	10:	24.96%,	Calls:	1010,	Best found at 	790 fitness calls
-------------------------------------
Problem	1:	53.30%,	Calls:	1010,	Best found at 	680 fitness calls
Problem	2:	49.20%,	Calls:	1010,	Best found at 	680 fitness calls
Problem	5:	21.06%,	Calls:	1010,	Best found at 	680 fitness calls
Problem	10:	20.64%,	Calls:	1010,	Best found at 	680 fitness calls
-------------------------------------
Problem	1:	53.60%,	Calls:	1010,	Best found at 	500 fitness calls
Problem	2:	51.00%,	Calls:	1010,	Best found at 	500 fitness calls
Problem	5:	21.51%,	Calls:	1010,	Best found at 	500 fitness calls
Problem	10:	16.23%,	Calls:	1010,	Best found at 	500 fitness calls
-------------------------------------


In [58]:
fitness = lab9_lib.make_problem(1)
for n in range(10):
    ind = choices([0, 1], k=1000)
    print(f"{''.join(str(g) for g in ind[0:5])}...: {fitness(ind):.2%}")

print(fitness.calls)

11111...: 51.00%
10010...: 47.20%
11010...: 51.10%
00000...: 49.70%
10110...: 50.00%
00101...: 50.70%
01110...: 50.30%
10100...: 49.40%
10100...: 50.70%
00011...: 49.20%
10


In [59]:
fitness = lab9_lib.make_problem(1)
for n in range(10):
    ind = choices([0, 1], k=50)
    print(f"{''.join(str(g) for g in ind)}: {fitness(ind):.2%}")

print(fitness.calls)

00010000000101001001100101010000110001111100000011: 36.00%
10000110000111000110010001011100111001011100001110: 46.00%
10010001011101111100101010001001111011100000011000: 48.00%
10000111000010010110101001001100010100100110011101: 44.00%
10010001010010111001111100001011111101001100001101: 52.00%
01100000011000110100100001111101111110000110110011: 50.00%
01011010101010101011111001001000001101111100111000: 52.00%
00001111111011101011110001010100001010100011110011: 54.00%
01010111011010100010100011100001000010110010101010: 44.00%
11100011100010011011010000110110110010100011001011: 50.00%
10
