# LAB9
Local-search algorithm able to solve the proposed poblem with instances 1, 2, 5, and 10 on a 1000-loci genomes,  
using a minimum number of fitness calls and considering the fitness as a black box.

### Quick summary since the notebook is quite long:
In this notebook you can find these principal sections:
- a first genetic algorithm with tournament selection, mutation, local-search mutation, crossover and elitism
- some improvemnents:
    - (1) self adaptive mutation rate and a counter to stop earlier in case of no improvements **(accepted)**
    - (2) implementation of n-cut crossover **(accepted)**
    - (3) mutation that flips all the bits in a certain range **(discarded)**
    - (4) mutation that swaps 2 bits of an individual **(accepted)**

I kept these sections separate re-running the genetic algorithm in order to show the actual improvements/failures,  
the last improvement is actually the final form of the genetic algorithm.

In [1]:
from random import random, randint, choice
import numpy as np
from dataclasses import dataclass
from copy import copy

import lab9_lib

### Parameters definition:

In [2]:
GENOME_LENGTH = 1000 # length of each sequence
POPULATION_SIZE = 100 # number of sequences
MAX_GENERATIONS = 1000 # max number of generations
CHAMPIONS_NUMBER = 5 # number of champions used at priory for the next generation
MUTATION_PROBABILITY = 0.1 # probability of choosing mutation
TOURNAMENT_SIZE = 5

### Basic methods useful for genetic algorithms:

In [3]:
@dataclass
class Individual:
    fitness: tuple
    genotype: list[bool]

def select_parent(pop):
    pool = [choice(pop) for _ in range(TOURNAMENT_SIZE)]
    champion = max(pool, key=lambda i: i.fitness)
    return champion

def mutate(ind: Individual) -> Individual:
    offspring = copy(ind)
    pos = randint(0, GENOME_LENGTH-1)
    offspring.genotype[pos] = not offspring.genotype[pos]
    offspring.fitness = None
    return offspring

def one_cut_xover(ind1: Individual, ind2: Individual) -> Individual:
    cut_point = randint(0, GENOME_LENGTH-1)
    offspring = Individual(fitness=None,
                           genotype=ind1.genotype[:cut_point] + ind2.genotype[cut_point:])
    assert len(offspring.genotype) == GENOME_LENGTH
    return offspring

### My first proposed genetic algorithm
I tried to solve the problem adding some techniques that actually improve the final best   
fitness with respect to a basic genetic algorithm with simple crossover and mutation.  
These techniques include:
- a local search mutation in which basically I change a neighborhood of bits
- the use of a certain number of "strong" individuals regardless the tournament selection as fathers of a part of the next generation

Note that I haven't done a parameters' tuning for each instance of the problem before launching   
the genetic algorithm because in my opinion it would be cheating (because we don't consider at   
the beginning of the strategy the fitness as an actual black box). The only way not to cheat would  
be to consider all the fitness calls done in the tuning in the final number. Therefore, since the problem  
requires to obtain an high fitness using the minimum number of fitness calls, I preferred to use values that  
made sense with the genome length and in a second version of the following genetic algorithm (section "Improvements 1"),  
trying to use a self adaptive mutation rate instead of one obtained with a possible tuning.

In [4]:
def local_search_mutation(ind: Individual) -> Individual:
    # perform a local search by flipping neighboring bits
    offspring = copy(ind)
    pos = randint(0, GENOME_LENGTH - 1)
    neighborhood_size = 5
    for i in range(max(0, pos - neighborhood_size), min(GENOME_LENGTH, pos + neighborhood_size + 1)):
        offspring.genotype[i] = not offspring.genotype[i]
    offspring.fitness = None
    return offspring

In [17]:
def genetic_algorithm(problem_instance):
    fitness = lab9_lib.make_problem(problem_instance)

    population = [
        Individual(
            genotype=[choice((True, False)) for _ in range(GENOME_LENGTH)],
            fitness=lab9_lib.make_problem(problem_instance),
        )
        for _ in range(POPULATION_SIZE)
    ]

    for i in population:
        i.fitness = fitness(i.genotype)

    for generation in range(MAX_GENERATIONS):
        offspring = list()
        for counter in range(POPULATION_SIZE):
            if counter <= CHAMPIONS_NUMBER:
                if random() < MUTATION_PROBABILITY:
                    # mutation of one champion
                    p = select_parent(population[:CHAMPIONS_NUMBER])
                    if random() < 0.5:
                        o = mutate(p)
                    else:
                        o = local_search_mutation(p)
                else:
                    # xover between 2 champions
                    p1 = select_parent(population[:CHAMPIONS_NUMBER])
                    p2 = select_parent(population[:CHAMPIONS_NUMBER])
                    o = one_cut_xover(p1, p2)
            else:
                if random() < MUTATION_PROBABILITY:
                    # mutation
                    p = select_parent(population)
                    if random() < 0.5:
                        o = mutate(p)
                    else:
                        o = local_search_mutation(p)
                else:
                    # xover
                    p1 = select_parent(population)
                    p2 = select_parent(population)
                    o = one_cut_xover(p1, p2)
            offspring.append(o)

        for i in offspring:
            i.fitness = fitness(i.genotype)
        population.extend(offspring)
        population.sort(key=lambda ind: ind.fitness, reverse=True)
        population = population[:POPULATION_SIZE]

    best_individual = population[0]
    print(f"Problem instance = {problem_instance}")
    print(f"- Best fitness: {(best_individual.fitness):.2%}")
    print(f"- Number of calls: {fitness.calls}\n")

problem_instances = [1, 2, 5, 10]
for problem_instance in problem_instances:
    genetic_algorithm(problem_instance)

Problem instance = 1
- Best fitness: 98.00%
- Number of calls: 100100

Problem instance = 2
- Best fitness: 71.00%
- Number of calls: 100100

Problem instance = 5
- Best fitness: 36.79%
- Number of calls: 100100

Problem instance = 10
- Best fitness: 21.20%
- Number of calls: 100100



### Some improvements (1)
Here I tried to implement a self adaptive mutation rate and on average it seems to give an improvement,  
above all with the problem instances 5 and 10.  
I have also added a counter to stop earlier in absence of improvements for multiple generations  
in order to reduce the number of fitness calls.

In [6]:
def genetic_algorithm_adaptive(problem_instance):
    fitness = lab9_lib.make_problem(problem_instance)

    stop_counter = 0
    max_non_improvement_gen = 500
    mutation_probability = 0.5

    population = [
        Individual(
            genotype=[choice((True, False)) for _ in range(GENOME_LENGTH)],
            fitness=lab9_lib.make_problem(problem_instance),
        )
        for _ in range(POPULATION_SIZE)
    ]

    for i in population:
        i.fitness = fitness(i.genotype)

    for generation in range(MAX_GENERATIONS):
        if stop_counter >= max_non_improvement_gen:
            break
        offspring = []
        offspring_m = []
        offspring_x = []
        for counter in range(POPULATION_SIZE):
            if counter <= CHAMPIONS_NUMBER:
                if random() < mutation_probability:
                    p = select_parent(population[:CHAMPIONS_NUMBER])
                    if random() < 0.5:
                        o_m = mutate(p)
                    else:
                        o_m = local_search_mutation(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population[:CHAMPIONS_NUMBER])
                    p2 = select_parent(population[:CHAMPIONS_NUMBER])
                    o_x = one_cut_xover(p1, p2)
                    offspring_x.append(o_x)
            else:
                if random() < mutation_probability:
                    p = select_parent(population)
                    if random() < 0.5:
                        o_m = mutate(p)
                    else:
                        o_m = local_search_mutation(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population)
                    p2 = select_parent(population)
                    o_x = one_cut_xover(p1, p2)
                    offspring_x.append(o_x)
        
        offspring = list(np.concatenate((offspring, offspring_m, offspring_x)))
        for i in offspring:
            i.fitness = fitness(i.genotype)

        # update adaptive mutation rates based on the success of the mutation
        old_population = copy(population)
        if len(offspring_m) > 0:
            offspring_m.sort(key=lambda ind: ind.fitness, reverse=True)
            mutation_success = (offspring_m[0].fitness > old_population[0].fitness)
            if mutation_success:
                mutation_probability += 0.1
            else:
                mutation_probability -= 0.1
            mutation_probability = max(0.1, min(1.0, mutation_probability))
        
        population.extend(offspring)
        population.sort(key=lambda ind: ind.fitness, reverse=True)
        population = population[:POPULATION_SIZE]

        if (population[0].fitness == old_population[0].fitness):
            stop_counter += 1


    best_individual = population[0]
    print(f"Problem instance = {problem_instance}")
    print(f"- Generation reached: {(generation)}")
    print(f"- Best fitness: {(best_individual.fitness):.2%}")
    print(f"- Number of calls: {fitness.calls}\n")

problem_instances = [1, 2, 5, 10]
for problem_instance in problem_instances:
    genetic_algorithm_adaptive(problem_instance)


Problem instance = 1
- Generation reached: 812
- Best fitness: 97.00%
- Number of calls: 81300

Problem instance = 2
- Generation reached: 571
- Best fitness: 70.60%
- Number of calls: 57200

Problem instance = 5
- Generation reached: 502
- Best fitness: 39.02%
- Number of calls: 50300

Problem instance = 10
- Generation reached: 640
- Best fitness: 29.67%
- Number of calls: 64100



### Some improvements (2)
Here I tried to add the n-cut crossover, it seems to give quite the same results

In [7]:
def n_cut_xover(ind1: Individual, ind2: Individual, n: int) -> Individual:
    cut_points = sorted([randint(0, GENOME_LENGTH - 1) for _ in range(n - 1)])
    
    offspring_genotype = []
    parent_switch = False
    
    for i in range(GENOME_LENGTH):
        if i in cut_points:
            parent_switch = not parent_switch
        if parent_switch:
            offspring_genotype.append(ind1.genotype[i])
        else:
            offspring_genotype.append(ind2.genotype[i])
    
    offspring = Individual(fitness=None, genotype=offspring_genotype)
    assert len(offspring.genotype) == GENOME_LENGTH
    
    return offspring

In [8]:
def genetic_algorithm_adaptive(problem_instance):
    fitness = lab9_lib.make_problem(problem_instance)

    number_of_cuts = 5
    stop_counter = 0
    max_non_improvement_gen = 500
    mutation_probability = 0.5

    population = [
        Individual(
            genotype=[choice((True, False)) for _ in range(GENOME_LENGTH)],
            fitness=lab9_lib.make_problem(problem_instance),
        )
        for _ in range(POPULATION_SIZE)
    ]

    for i in population:
        i.fitness = fitness(i.genotype)

    for generation in range(MAX_GENERATIONS):
        if stop_counter >= max_non_improvement_gen:
            break
        offspring = []
        offspring_m = []
        offspring_x = []
        for counter in range(POPULATION_SIZE):
            if counter <= CHAMPIONS_NUMBER:
                if random() < mutation_probability:
                    p = select_parent(population[:CHAMPIONS_NUMBER])
                    if random() < 0.5:
                        o_m = mutate(p)
                    else:
                        o_m = local_search_mutation(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population[:CHAMPIONS_NUMBER])
                    p2 = select_parent(population[:CHAMPIONS_NUMBER])
                    if random() < 0.5:
                        o_x = one_cut_xover(p1, p2)
                    else:
                        o_x = n_cut_xover(p1, p2, number_of_cuts)
                    offspring_x.append(o_x)
            else:
                if random() < mutation_probability:
                    p = select_parent(population)
                    if random() < 0.5:
                        o_m = mutate(p)
                    else:
                        o_m = local_search_mutation(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population)
                    p2 = select_parent(population)
                    if random() < 0.5:
                        o_x = one_cut_xover(p1, p2)
                    else:
                        o_x = n_cut_xover(p1, p2, number_of_cuts)
                    offspring_x.append(o_x)
        
        offspring = list(np.concatenate((offspring, offspring_m, offspring_x)))
        for i in offspring:
            i.fitness = fitness(i.genotype)

        # update adaptive mutation rates based on the success of the mutation
        old_population = copy(population)
        if len(offspring_m) > 0:
            offspring_m.sort(key=lambda ind: ind.fitness, reverse=True)
            mutation_success = (offspring_m[0].fitness > old_population[0].fitness)
            if mutation_success:
                mutation_probability += 0.1
            else:
                mutation_probability -= 0.1
            mutation_probability = max(0.1, min(1.0, mutation_probability))
        
        population.extend(offspring)
        population.sort(key=lambda ind: ind.fitness, reverse=True)
        population = population[:POPULATION_SIZE]

        if (population[0].fitness == old_population[0].fitness):
            stop_counter += 1


    best_individual = population[0]
    print(f"Problem instance = {problem_instance}")
    print(f"- Generation reached: {(generation)}")
    print(f"- Best fitness: {(best_individual.fitness):.2%}")
    print(f"- Number of calls: {fitness.calls}\n")

problem_instances = [1, 2, 5, 10]
for problem_instance in problem_instances:
    genetic_algorithm_adaptive(problem_instance)

Problem instance = 1
- Generation reached: 791
- Best fitness: 96.20%
- Number of calls: 79200

Problem instance = 2
- Generation reached: 589
- Best fitness: 75.00%
- Number of calls: 59000

Problem instance = 5
- Generation reached: 504
- Best fitness: 53.00%
- Number of calls: 50500

Problem instance = 10
- Generation reached: 728
- Best fitness: 34.73%
- Number of calls: 72900



### Some improvements (3)
Here I tried to add a mutation that flips all the bits in a certain range, it doesn't seem to work well in this case.  
-> I discarded it

In [9]:
def inversion_mutation_random_range(ind: Individual) -> Individual:
    offspring = copy(ind)
    inversion_range = sorted([randint(0, GENOME_LENGTH - 1) for _ in range(2)])
    pos1 = inversion_range[0]
    pos2 = inversion_range[1]
    offspring.genotype[pos1:pos2] = [1 - value for value in offspring.genotype[pos1:pos2]]
    return offspring

In [10]:
def genetic_algorithm_adaptive(problem_instance):
    fitness = lab9_lib.make_problem(problem_instance)

    number_of_cuts = 5
    stop_counter = 0
    max_non_improvement_gen = 500
    mutation_probability = 0.5

    population = [
        Individual(
            genotype=[choice((True, False)) for _ in range(GENOME_LENGTH)],
            fitness=lab9_lib.make_problem(problem_instance),
        )
        for _ in range(POPULATION_SIZE)
    ]

    for i in population:
        i.fitness = fitness(i.genotype)

    for generation in range(MAX_GENERATIONS):
        if stop_counter >= max_non_improvement_gen:
            break
        offspring = []
        offspring_m = []
        offspring_x = []
        for counter in range(POPULATION_SIZE):
            if counter <= CHAMPIONS_NUMBER:
                if random() < mutation_probability:
                    p = select_parent(population[:CHAMPIONS_NUMBER])
                    mutation_type_p = random()
                    if mutation_type_p <= 0.33:
                        o_m = mutate(p)
                    if 0.33 < mutation_type_p <= 0.66:
                        o_m = local_search_mutation(p)
                    if mutation_type_p > 0.66:
                        o_m = inversion_mutation_random_range(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population[:CHAMPIONS_NUMBER])
                    p2 = select_parent(population[:CHAMPIONS_NUMBER])
                    if random() < 0.5:
                        o_x = one_cut_xover(p1, p2)
                    else:
                        o_x = n_cut_xover(p1, p2, number_of_cuts)
                    offspring_x.append(o_x)
            else:
                if random() < mutation_probability:
                    p = select_parent(population)
                    mutation_type_p = random()
                    if mutation_type_p <= 0.33:
                        o_m = mutate(p)
                    if 0.33 < mutation_type_p <= 0.66:
                        o_m = local_search_mutation(p)
                    if mutation_type_p > 0.66:
                        o_m = inversion_mutation_random_range(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population)
                    p2 = select_parent(population)
                    if random() < 0.5:
                        o_x = one_cut_xover(p1, p2)
                    else:
                        o_x = n_cut_xover(p1, p2, number_of_cuts)
                    offspring_x.append(o_x)
        
        offspring = list(np.concatenate((offspring, offspring_m, offspring_x)))
        for i in offspring:
            i.fitness = fitness(i.genotype)

        # update adaptive mutation rates based on the success of the mutation
        old_population = copy(population)
        if len(offspring_m) > 0:
            offspring_m.sort(key=lambda ind: ind.fitness, reverse=True)
            mutation_success = (offspring_m[0].fitness > old_population[0].fitness)
            if mutation_success:
                mutation_probability += 0.1
            else:
                mutation_probability -= 0.1
            mutation_probability = max(0.1, min(1.0, mutation_probability))
        
        population.extend(offspring)
        population.sort(key=lambda ind: ind.fitness, reverse=True)
        population = population[:POPULATION_SIZE]

        if (population[0].fitness == old_population[0].fitness):
            stop_counter += 1


    best_individual = population[0]
    print(f"Problem instance = {problem_instance}")
    print(f"- Generation reached: {(generation)}")
    print(f"- Best fitness: {(best_individual.fitness):.2%}")
    print(f"- Number of calls: {fitness.calls}\n")

problem_instances = [1, 2, 5, 10]
for problem_instance in problem_instances:
    genetic_algorithm_adaptive(problem_instance)

Problem instance = 1
- Generation reached: 798
- Best fitness: 94.20%
- Number of calls: 79900

Problem instance = 2
- Generation reached: 556
- Best fitness: 66.80%
- Number of calls: 55700

Problem instance = 5
- Generation reached: 536
- Best fitness: 41.64%
- Number of calls: 53700

Problem instance = 10
- Generation reached: 639
- Best fitness: 32.19%
- Number of calls: 64000



### Some improvements (4) - final version of the genetic algorithm
Here I tried to add a function that simply swap 2 bit of an indivividual.

In [11]:
def swap_mutation(ind: Individual) -> Individual:
    offspring = copy(ind)
    pos1 = randint(0, GENOME_LENGTH - 1)
    pos2 = randint(0, GENOME_LENGTH - 1)
    tmp = offspring.genotype[pos1]
    offspring.genotype[pos1] = offspring.genotype[pos2]
    offspring.genotype[pos2] = tmp
    return offspring

In [14]:
def genetic_algorithm_adaptive(problem_instance):
    fitness = lab9_lib.make_problem(problem_instance)

    number_of_cuts = 5
    stop_counter = 0
    max_non_improvement_gen = 500
    mutation_probability = 0.5

    population = [
        Individual(
            genotype=[choice((True, False)) for _ in range(GENOME_LENGTH)],
            fitness=lab9_lib.make_problem(problem_instance),
        )
        for _ in range(POPULATION_SIZE)
    ]

    for i in population:
        i.fitness = fitness(i.genotype)

    for generation in range(MAX_GENERATIONS):
        if stop_counter >= max_non_improvement_gen:
            break
        offspring = []
        offspring_m = []
        offspring_x = []
        for counter in range(POPULATION_SIZE):
            if counter <= CHAMPIONS_NUMBER:
                if random() < mutation_probability:
                    p = select_parent(population[:CHAMPIONS_NUMBER])
                    mutation_type_p = random()
                    if mutation_type_p <= 0.33:
                        o_m = mutate(p)
                    if 0.33 < mutation_type_p <= 0.66:
                        o_m = local_search_mutation(p)
                    if mutation_type_p > 0.66:
                        o_m = swap_mutation(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population[:CHAMPIONS_NUMBER])
                    p2 = select_parent(population[:CHAMPIONS_NUMBER])
                    if random() < 0.5:
                        o_x = one_cut_xover(p1, p2)
                    else:
                        o_x = n_cut_xover(p1, p2, number_of_cuts)
                    offspring_x.append(o_x)
            else:
                if random() < mutation_probability:
                    p = select_parent(population)
                    mutation_type_p = random()
                    if mutation_type_p <= 0.33:
                        o_m = mutate(p)
                    if 0.33 < mutation_type_p <= 0.66:
                        o_m = local_search_mutation(p)
                    if mutation_type_p > 0.66:
                        o_m = swap_mutation(p)
                    offspring_m.append(o_m)
                else:
                    p1 = select_parent(population)
                    p2 = select_parent(population)
                    if random() < 0.5:
                        o_x = one_cut_xover(p1, p2)
                    else:
                        o_x = n_cut_xover(p1, p2, number_of_cuts)
                    offspring_x.append(o_x)
        
        offspring = list(np.concatenate((offspring, offspring_m, offspring_x)))
        for i in offspring:
            i.fitness = fitness(i.genotype)

        # update adaptive mutation rates based on the success of the mutation
        old_population = copy(population)
        if len(offspring_m) > 0:
            offspring_m.sort(key=lambda ind: ind.fitness, reverse=True)
            mutation_success = (offspring_m[0].fitness > old_population[0].fitness)
            if mutation_success:
                mutation_probability += 0.1
            else:
                mutation_probability -= 0.1
            mutation_probability = max(0.1, min(1.0, mutation_probability))
        
        population.extend(offspring)
        population.sort(key=lambda ind: ind.fitness, reverse=True)
        population = population[:POPULATION_SIZE]

        if (population[0].fitness == old_population[0].fitness):
            stop_counter += 1


    best_individual = population[0]
    print(f"Problem instance = {problem_instance}")
    print(f"- Generation reached: {(generation)}")
    print(f"- Best fitness: {(best_individual.fitness):.2%}")
    print(f"- Number of calls: {fitness.calls}\n")

problem_instances = [1, 2, 5, 10]
for problem_instance in problem_instances:
    genetic_algorithm_adaptive(problem_instance)

Problem instance = 1
- Generation reached: 825
- Best fitness: 99.20%
- Number of calls: 82600

Problem instance = 2
- Generation reached: 595
- Best fitness: 73.20%
- Number of calls: 59600

Problem instance = 5
- Generation reached: 504
- Best fitness: 54.00%
- Number of calls: 50500

Problem instance = 10
- Generation reached: 722
- Best fitness: 34.14%
- Number of calls: 72300

