---

Some useful packages and libraries:



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import colors
from collections import deque
import heapq
import unittest
from scipy import stats
import copy as cp
from time import time
import math
import random



---

## Problem 1: Maximizing an Objective Function with a Genetic Algorithm 

Suppose we've lost the index card with our favorite cupcake recipe. We know the ingredients of the cake, but cannot remember the exact amount of each ingredient. We decide to use a genetic algorithm to generate the  ingredient amounts. With each iteration of the genetic algorithm, we bake the cupcakes and taste-test them. We achieve our goal and stop running the genetic algorithm when we get to the actual recipe: 

* 1 tsp salt 
* 3 tsp baking powder 
* 2 cups all-purpose flour 
* 1 cup butter 
* 1 cup granulated sugar 
* 4 large eggs
* 1 tsp vanilla extract
* 1 cup buttermilk 

In [None]:
target = [1, 3, 2, 1, 1, 4, 1, 1]

An example starting state for a member of our population might look like: $state = [1, 2, 100, 36, 60, 3, 5, 50]$

### (1a) 

Write an objective function `def recipe_success(state)` that takes a single argument state, and returns the objective function value (fitness) of the state. The objective function should be maximized when a state reaches the target. You could for example define the fitness score of a particular state based on how far away each entry is from the target recipe.

In [None]:
def recipe_success(state):
    #max possible value for each ingredient is 100 for normalization 
    #who the hell would have 100 of each ingredient though...
    #unless ur a baker or something? but then wouldnt you have the recipe for cupcakes memorized? Not the point stay focuesed.
    diff = sum(abs(s - t) / 100 for s, t in zip(state, target))
    fitness = 1 / (1 + diff)
    return fitness

In [None]:
#Usesd math library to set a relative tolerance of 1e-9 because the fitness score is a float.
#Could have written tons of test cases, but due to how controlled env was figured I only need to test the two real possibilites.

#Test case 1: State and target have the same length and same values
state = [1, 3, 2, 1, 1, 4, 1, 1]
expected_output = 1.0
assert math.isclose(recipe_success(state), expected_output, rel_tol=1e-9), f"Expected {expected_output}, but got {recipe_success(state)}"
print("Test case 1 passed")

#Test case 2: State and target have the same length and diff values
state = [1, 3, 2, 1, 1, 4, 2, 2]
diff_normalized = sum(abs(s - t) / 100 for s, t in zip(state, target))
expected_output = 1 / (1 + diff_normalized)
assert math.isclose(recipe_success(state), expected_output, rel_tol=1e-9), f"Expected {expected_output}, but got {recipe_success(state)}"
print("Test case 2 passed")

### (1b) 

Using our in class notebook "Lecture 16 - Genetic Algorithms.ipynb" as your guide, write a genetic algorithm that starts with a population of 100 randomly generated "recipes/states/members" and uses the objective function you wrote in **(1a)** to hopefully hit the target after a certain number of generations. 

Key components of your code:
- Generate the initial population randomly from integers between 0 and 100 
- Allow for mutations in your population with an overall probability of mutation set to p = 0.2
- Choose 2 "parents" in the generation of each "child"
- Choose a random split point at which to combine the two "parents"

Run the algorithm for 50 iterations ("generations"). Do you hit your target? --> <span style="color:red">NO</span>


In [None]:
initial_population = [[random.randint(0, 100) for _ in range(len(target))] for _ in range(200)] #change outer list to adjust population size
mutation_probability = 0.4 #probability of mutation
fitness_goal = 1.0  #exact match to target
n_iter = 500 #number of iterations

class problem:
    def __init__(self, initial_population, objective_function, mutation_probability, fitness_goal):
        self.population = initial_population
        self.objective_function = objective_function
        self.p_mutate = mutation_probability
        self.fitness_goal = fitness_goal
        self.n_pop = len(initial_population)
        self.n_dna = len(initial_population[0])

    def fitness(self):
        performance = [self.objective_function(member) for member in self.population]
        total = sum(performance)
        p_reproduce = [perf / total for perf in performance]
        return p_reproduce

    def reproduce(self, parent1, parent2):
        split = np.random.randint(low=1, high=self.n_dna)
        child = parent1[:split] + parent2[split:]
        return child

    def mutate(self, child):
        gene = np.random.randint(low=0, high=self.n_dna)
        mutation_type = np.random.choice(['increment', 'decrement', 'randomize'])
        if mutation_type == 'increment' and child[gene] < 100:
            child[gene] += 1
        elif mutation_type == 'decrement' and child[gene] > 0:
            child[gene] -= 1
        elif mutation_type == 'randomize':
            child[gene] = np.random.randint(0, 101)
        return child


def genetic_algorithm(problem, n_iter):
    for t in range(n_iter):
        new_generation = []
        
        for k in range(problem.n_pop):
            p_reproduce = problem.fitness()
            ind_parents = np.random.choice(range(problem.n_pop), size=2, p=p_reproduce, replace=False)
            parent1, parent2 = problem.population[ind_parents[0]], problem.population[ind_parents[1]]
            
            child = problem.reproduce(parent1, parent2)
            
            l_mutate = np.random.choice([True, False], p=[problem.p_mutate, 1-problem.p_mutate])
            if l_mutate:
                child = problem.mutate(child)
            
            new_generation.append(child)
        
        problem.population = new_generation
        
        performance = [problem.objective_function(member) for member in problem.population]
        best_performance = max(performance)
        best_member = problem.population[performance.index(best_performance)]
        
        if best_performance >= problem.fitness_goal:
            print(f"Goal achieved at generation {t+1}")
            return best_member, best_performance

    print('Reached maximum number of iterations')
    return best_member, best_performance



#problem instance and prints
cupcake_problem = problem(initial_population, recipe_success, mutation_probability, fitness_goal)
best_recipe, best_fitness = genetic_algorithm(cupcake_problem, n_iter)
print(f"Output State: {best_recipe}\nFitness: {best_fitness}")


TARGET = [1, 3, 2, 1, 1, 4, 1, 1]


FIRST RUN: (50gens, 100 pop, .2pm)
- Reached maximum number of iterations
- Output State: [5, 1, 10, 2, 9, 0, 13, 13]
- Fitness: 0.6622516556291391

SECOND RUN: (500gens, 100 pop, .2 pm)
- Goal achieved at generation 449
- Output State: [1, 3, 2, 1, 1, 4, 1, 1]
- Fitness: 1.0


### (1c)

Report the following:
- How many generations did it take to hit the goal? 
    - <span style="color:red">449 on the second try</span>
- If you change the initial population size to 200, does that change the number of generations it takes to achieve the goal recipe? 
    - <span style="color:red">In theory yes, running 50 iterations with a 100 pop size I saw Fitness values of .5 to .8, running 50 iterations wiht a 200 pop size resulted in Fitness values of .8 to .98 </span>
- If you change the probability of mutation, does that affect the number of generations it takes to achieve the goal recipe? How so?
    - <span style="color:red">For this problem landscape (500 iter and 200 pop), increasing it lessened the generations needed to reach the target goal. This peaked at .4 pm (check table below for average), then slowly lessened due to increassed disruption. Generally speaking though, it depends on the problem landscape such has fitness, population size, iteration, etc. For my space it allowed for enhanced exploration which helped with any premature convergance issues, but only up to a certain point. Dont take my numbers for exact truth though becasue I only ran each variation 3 times and only looked at .2 increments of PM, so its not exact on but there is an obvious trend. </span>


Outcomes of different pop/mp values at 500 iterations:

| Mutation Probability | Population Size | Fitness (3 Run Average) |
|----------------------|-----------------|------------------|
|         0.2          |       200       |      0.993        |
|         0.4          |       200       |      1.000        | 
|         0.6          |       200       |      0.955        |
|         0.8          |       200       |      0.928        |
|         1            |       200       |      0.914        |


