# Project 2

Morgan Elder<br>
Ben Vuong<br>
CS 4320

1. Write a brief (no more than a paragraph) note on the importance of the choice of 
parameter settings in evolutionary algorithms. Or argue, with well-articulated, precise 
reasons (one paragraph only), that parameters do not really matter in EC.  

I would say that when working with evolutionary algorithms, diversity is very important for the growth and success of the evolutionary algorithms. There should not be one indiviudual within the population that over takes the other. If that would happen then there would be stagnation within the growth of the algorithms. In order to prevent such an event of happening and encourage growth and diversity, it is important to focus on the choice of parameters settings in evolutionary algorithms. If good parameters were choosen, there will be higher chances or growth and diversity to happen and the inverse would result in a higher chance of stangnation.

# De Jong Function 5: Variation of Shekel's Function 

The objective of this project is to minimize Shekel's function using parameters from De Jong function #5:

$$f(\vec{x}) = \left(0.002 + \sum_{i=1}^{25}\frac{1}{i + (x_1 - a_{1i})^6 + (x_2 - a_{2i})^6}\right)^{-1}$$

where $$
\textbf{a} =
 \begin{pmatrix}
    -32 & -16 & 0 & 16 & 32 & -32 & \ldots & 0 & 16 & 32 \\
    -32 & -32 & -32 & -32 & -32 & -16 & \ldots & 32 & 32 & 32
 \end{pmatrix}$$

The number of dimensions is 2 and the input domain is $-65.536 \le x_i \le 65.536$ for $i=1,2$. 










In [1]:
# import modules
import warnings
import json
from enum import Enum
from typing import TypedDict
import inspect
import numpy as np
import pandas as pd
from scipy import stats

## Optimization Goals

Use the Optimization_Goal class as an enum type in order to define a goal variable as either min or max.

In [2]:
class Optimization_Goal(Enum):
    MINIMIZE = 1
    MAXIMIZE = 2

## Optimization Problems

Genetic algorithms are suited for optimization problems. The Problem class is used to identify subclasses as optimization problems.

In [3]:
class Problem:
    """The Problem class defines the problems contains the definitions of
    of problems. Problems include any information such as the optimization
    goal (min/max), objective function, input dimension, domain/range constraints,
    and more."""

    pass


class De_Jong_Function_5(Problem):
    """De Jong function #5 is variation of Shekel's foxeholes problem involving
    25 local minima, 2 dimensions, and predefined constants.
    This problem is suited for multimodal optimization."""

    dimensions = 2
    goal = Optimization_Goal.MINIMIZE
    upper_bounds = [65.536, 65.536]
    lower_bounds = [-65.536, -65.536]
    constants_array = np.array(
        [
            [
                -32.0,
                -16.0,
                0.0,
                16.0,
                32.0,
                -32.0,
                -16.0,
                0.0,
                16.0,
                32.0,
                -32.0,
                -16.0,
                0.0,
                16.0,
                32.0,
                -32.0,
                -16.0,
                0.0,
                16.0,
                32.0,
                -32.0,
                -16.0,
                0.0,
                16.0,
                32.0,
            ],
            [
                -32.0,
                -32.0,
                -32.0,
                -32.0,
                -32.0,
                -16.0,
                -16.0,
                -16.0,
                -16.0,
                -16.0,
                0.0,
                0.0,
                0.0,
                0.0,
                0.0,
                16.0,
                16.0,
                16.0,
                16.0,
                16.0,
                32.0,
                32.0,
                32.0,
                32.0,
                32.0,
            ],
        ]
    )

    def objective_function(self, X):
        # array representing the 1st to 25th elements of the summation
        array_i = np.arange(1, self.constants_array.shape[1] + 1, step=1)
        results = (
            0.002
            + np.sum(
                1
                / (
                    array_i
                    + (X[:, [0]] - self.constants_array[0]) ** 6
                    + (X[:, [1]] - self.constants_array[1]) ** 6
                ),
                axis=1,
            )
        ) ** -1
        return results

## Genetic Operators

In [4]:
class Genetic_Operator:
    """The Genetic_Operator base class defines the genetic operators used in
    genetic algorithms. These include crossover and mutation."""

    pass

### Selection

In [5]:
class Selection(Genetic_Operator):
    """The Selection class defines the selection operator used in genetic
    algorithms. This includes the selection of parents and the selection of
    survivors."""

    def __init__(
        self,
        selection_type: str = "proportional",
        truncation_percentage: float = 0.2,
        expected_best_copies: float = 1.2,
        tournament_size: int = 2,
    ):
        # dictionary of selection types and their corresponding methods
        selection_types = {
            "proportional": self.proportional,
            "deterministic_tournament": self.deterministic_tournament,
            "linear_ranking": self.linear_ranking,
            "truncation": self.truncation,
        }
        # check if selection type is valid
        if selection_type not in selection_types:
            raise ValueError(f"Invalid selection type: {selection_type}")
        self.selection_type = selection_types[selection_type]

        if selection_type == "truncation":
            if truncation_percentage < 0 or truncation_percentage > 1:
                raise ValueError(
                    f"Invalid truncation percentage: {truncation_percentage}"
                )
            self.truncation_percentage = truncation_percentage

        if selection_type == "deterministic_tournament":
            if tournament_size < 2:
                raise ValueError(f"Invalid tournament size: {tournament_size}")
            self.tournament_size = tournament_size

        if selection_type == "linear_ranking":
            # expected_best_copies must be 1 < x <= 2
            if expected_best_copies <= 1 or expected_best_copies > 2:
                raise ValueError(
                    f"Invalid expected copies of best: {expected_best_copies}"
                )
            self.expected_best_copies = expected_best_copies

    def get_operator_args(self):
        """The get_operator_args method gets the arguments of the selection
        method."""
        return inspect.getfullargspec(self.selection_type).args[1:]

    def proportional(self, population, fitness_values, goal, rng):
        if goal is Optimization_Goal.MAXIMIZE:
            weights = fitness_values / np.sum(fitness_values, axis=0)
        else:
            fitness_values += 0.0001
            weights = (1 / fitness_values) / np.sum(1 / fitness_values, axis=0)

        indexes = np.arange(population.shape[0])
        indexes = rng.choice(indexes, size=population.shape[0], p=weights, replace=True)

        return population[indexes, :]

    def deterministic_tournament(self, population, fitness_values, goal, rng):
        if self.tournament_size > population.shape[0]:
            warnings.warn(
                "Tournament size is greater than population size." 
                + "Tournament size will be set to population size."
            )
            self.tournament_size = population.shape[0]

        indexes = np.arange(population.shape[0])
        tournament = rng.choice(
            indexes, size=(population.shape[0], self.tournament_size), replace=True
        )

        if goal == Optimization_Goal.MAXIMIZE:
            tournament_winner = tournament[
                indexes, np.argmax(fitness_values[tournament], axis=1)
            ]
        elif goal == Optimization_Goal.MINIMIZE:
            tournament_winner = tournament[
                indexes, np.argmin(fitness_values[tournament], axis=1)
            ]

        return population[tournament_winner, :]

    def linear_ranking(self, fitness_values, population, goal, rng):
        if goal == Optimization_Goal.MAXIMIZE:
            # sort the fitness values from smallest to largest
            sorted_indexes = np.argsort(fitness_values)
        elif goal == Optimization_Goal.MINIMIZE:
            # sort the fitness values from largest to smallest
            sorted_indexes = np.argsort(fitness_values)[::-1]

        rank = np.arange(1, population.shape[0] + 1)
        probabilities = self.calc_rank_based_probabilities(rank, population.shape[0])

        selected_indexes = rng.choice(
            sorted_indexes, size=population.shape[0], p=probabilities, replace=True
        )

        return population[selected_indexes, :]

    def calc_rank_based_probabilities(self, rank, population_size):
        expected_worst_copies = 2 - self.expected_best_copies
        probabilities = expected_worst_copies + ((rank - 1) / (population_size - 1)) * (
            self.expected_best_copies - expected_worst_copies
        )
        probabilities /= population_size
        return probabilities

    def truncation(self, population, goal, fitness_values, rng):
        truncation_size = int(self.truncation_percentage * population.shape[0])
        remaining_size = population.shape[0] - truncation_size

        if goal == Optimization_Goal.MAXIMIZE:
            top_indexes = np.argpartition(fitness_values, -truncation_size, axis=0)[
                -truncation_size:
            ]
        elif goal == Optimization_Goal.MINIMIZE:
            top_indexes = np.argpartition(fitness_values, truncation_size, axis=0)[
                :truncation_size
            ]

        fill_bottom_indexes = rng.choice(top_indexes, remaining_size, replace=True)
        return_indexes = np.concatenate((top_indexes, fill_bottom_indexes))

        return population[return_indexes, :]

    def run(self, **kwargs):
        """The run method runs the selection operator."""
        return self.selection_type(**kwargs)

### Cross Over

In [6]:
class Crossover(Genetic_Operator):
    """The Crossover class defines the crossover operator used in genetic
    algorithms."""

    def __init__(self, crossover_type: str = "single_point", rate: float = 0.8):
        crossover_types = {"single_point": self.single_point}
        if crossover_type not in crossover_types:
            raise ValueError(f"Invalid crossover type: {crossover_type}")
        self.crossover_type = crossover_types[crossover_type]
        self.rate = rate

    def single_point(self, population, rng):
        offspring = np.zeros_like(population)
        groups = rng.integers(0, population.shape[0], size=(population.shape[0], 2))
        is_group_selected = rng.uniform(0, 1, size=population.shape[0]) <= self.rate
        for i, group in enumerate(groups):
            if is_group_selected[i]:
                cross_point = self.generate_k_cross_points(population[group], rng)
                offspring[i] = self.swap_at_cross_point(
                    population[group, :], cross_point
                )
            else:
                offspring[i] = population[group[0], :]

        return offspring

    def generate_k_cross_points(self, group, rng, k=1):
        """Generate the location of the cross points."""
        return np.sort(rng.choice(range(1, group.shape[1]), size=k, replace=False))[0]

    def swap_at_cross_point(self, group, cross_point):
        """Return the vector with the elements swapped at the cross point."""
        group[0, cross_point:] = group[1, cross_point:]
        return group[0]

    def run(self, **kwargs):
        """The run method runs the crossover operator."""
        return self.crossover_type(**kwargs)

    def get_operator_args(self):
        """The get_operator_args method gets the arguments of the crossover
        method."""
        return inspect.getfullargspec(self.crossover_type).args[1:]

### Mutation

In [7]:
class Mutation(Genetic_Operator):
    """The Mutation class defines the mutation operator used in genetic
    algorithms."""

    def __init__(self, mutation_type: str = "percent_of_range", rate: float = 0.1):
        mutation_types = {"percent_of_range": self.percent_of_range}
        if mutation_type not in mutation_types:
            raise ValueError(f"Invalid mutation type: {mutation_type}")
        self.mutation_type = mutation_types[mutation_type]
        self.rate = rate

    def percent_of_range(self, population, rng, lower_bounds, upper_bounds):
        """plus or minus a small percent of interval"""

        upper_bounds = np.array(upper_bounds)
        lower_bounds = np.array(lower_bounds)

        mutation_amount = abs(upper_bounds - lower_bounds) * 0.02
        isMutating = rng.uniform(0, 1, population.shape) <= self.rate
        mutation_amount = (
            mutation_amount * rng.choice([-1, 1], size=population.shape) * isMutating
        )

        return population + mutation_amount

    def run(self, **kwargs):
        """The run method runs the mutation operator."""
        return self.mutation_type(**kwargs)

    def get_operator_args(self):
        """The get_operator_args method gets the arguments of the mutation
        method."""
        return inspect.getfullargspec(self.mutation_type).args[1:]

## Stopping Conditions

In [8]:
class Stopping_Condition:
    """The Stopping_Condition base class defines the stopping conditions used
    in genetic algorithms. These include the number of generations and the
    fitness threshold."""

    pass


class Max_Generations(Stopping_Condition):
    """The number_of_generations class defines the stopping condition based on
    the number of generations."""

    def __init__(self, max_generations: int = 100):
        self.max_generations = max_generations

    def is_running(self, generation: int):
        """The is_running method checks if the algorithm should continue running
        based on the number of generations."""
        running = True
        if generation >= self.max_generations:
            running = False
        return running

    def increase_max_generations(self, increase: int = 100):
        """The increase_max_generations method increases the maximum number of
        generations by a specified amount."""
        self.max_generations += increase

## Metrics

In [9]:
class Best_of_Run(TypedDict):
    """The Best_of_Run class defines the best of run metric."""

    generation: int
    solution: np.ndarray
    fitness: float


class Worst_of_Run(TypedDict):
    """The Worst_of_Run class defines the worst of run metric."""

    generation: int
    solution: np.ndarray
    fitness: float


class Best_at_Interval(TypedDict):
    """The Best_at_Interval class defines the best at interval metric."""

    generation: list[int]
    solution: list[np.ndarray]
    fitness: list[float]


class Worst_at_Interval(TypedDict):
    """The Worst_at_Interval class defines the worst at interval metric."""

    generation: list[int]
    solution: list[np.ndarray]
    fitness: list[float]


class Average_at_interval(TypedDict):
    """The Average_at_interval class defines the average at interval metric."""

    generation: list[int]
    average_fitness: list[float]
    standard_deviation: list[float]


class Metrics:
    interval_metrics = ["best_at_interval", "worst_at_interval", "average_at_interval"]

    def __init__(self, metrics: list, interval: int = 10):
        self.metrics = {}
        for metric in metrics:
            self.interval = interval

            match metric:
                case "best_of_run":
                    # create a best of run metric
                    self.metrics["best_of_run"] = {
                        "generation": 0,
                        "solution": np.array([]),
                        "fitness": 0,
                    }
                case "worst_of_run":
                    # create a worst of run metric
                    self.metrics["worst_of_run"] = {
                        "generation": 0,
                        "solution": np.array([]),
                        "fitness": 0,
                    }
                case "best_at_interval":
                    # create a best at interval metric
                    self.metrics["best_at_interval"] = {
                        "generation": [],
                        "solution": [],
                        "fitness": [],
                    }

                case "worst_at_interval":
                    # create a worst at interval metric
                    self.metrics["worst_at_interval"] = {
                        "generation": [],
                        "solution": [],
                        "fitness": [],
                    }

                case "average_at_interval":
                    # create an average at interval metric
                    self.metrics["average_at_interval"] = {
                        "generation": [],
                        "average_fitness": [],
                        "standard_deviation": [],
                    }

                case "_":
                    raise ValueError(f"Invalid metric: {metric}")

    def best_of_run(self, generation, population, fitness_values, goal):
        """record best of run"""
        if goal == Optimization_Goal.MINIMIZE:
            index = np.argmin(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]
            if fitness < self.metrics["best_of_run"]["fitness"] or generation == 0:
                self.metrics["best_of_run"] = {
                    "generation": generation,
                    "solution": solution,
                    "fitness": fitness,
                }
        else:
            index = np.argmax(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]
            if fitness > self.metrics["best_of_run"]["fitness"] or generation == 0:
                self.metrics["best_of_run"] = {
                    "generation": generation,
                    "solution": solution,
                    "fitness": fitness,
                }

    def worst_of_run(self, generation, population, fitness_values, goal):
        """record worst of run"""
        if goal == Optimization_Goal.MINIMIZE:
            index = np.argmax(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]
            if fitness > self.metrics["worst_of_run"]["fitness"] or generation == 0:
                self.metrics["worst_of_run"] = {
                    "generation": generation,
                    "solution": solution,
                    "fitness": fitness,
                }
        else:
            index = np.argmin(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]
            if fitness < self.metrics["worst_of_run"]["fitness"] or generation == 0:
                self.metrics["worst_of_run"] = {
                    "generation": generation,
                    "solution": solution,
                    "fitness": fitness,
                }

    def best_at_interval(self, generation, population, fitness_values, goal):
        """record best at interval"""
        if goal == Optimization_Goal.MINIMIZE:
            index = np.argmin(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]
        else:
            index = np.argmax(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]

        self.metrics["best_at_interval"]["generation"].append(generation)
        self.metrics["best_at_interval"]["solution"].append(solution)
        self.metrics["best_at_interval"]["fitness"].append(fitness)

    def worst_at_interval(self, generation, population, fitness_values, goal):
        """record worst at interval"""
        if goal == Optimization_Goal.MINIMIZE:
            index = np.argmax(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]
        else:
            index = np.argmin(fitness_values)
            fitness = fitness_values[index]
            solution = population[index]

        self.metrics["worst_at_interval"]["generation"].append(generation)
        self.metrics["worst_at_interval"]["solution"].append(solution)
        self.metrics["worst_at_interval"]["fitness"].append(fitness)

    def average_at_interval(self, generation, population, fitness_values, goal):
        """record average at interval"""
        self.metrics["average_at_interval"]["generation"].append(generation)
        self.metrics["average_at_interval"]["average_fitness"].append(
            np.mean(fitness_values)
        )
        self.metrics["average_at_interval"]["standard_deviation"].append(
            np.std(fitness_values)
        )

    def record(self, metric, **kwargs):
        """record a metric"""
        if metric in self.interval_metrics:
            if kwargs["generation"] % self.interval == 0:
                metric_method = getattr(self, metric)
                metric_method(**kwargs)
        else:
            metric_method = getattr(self, metric)
            metric_method(**kwargs)

    def get_metric_args(self, metric):
        """get the arguments of a metric"""
        metric_method = getattr(self, metric)
        return inspect.getfullargspec(metric_method).args[1:]

## Genetic Algorithm

In [10]:
class Genetic_Algorithm:
    """The Genetic_Algorithm class defines the genetic algorithm."""

    def __init__(
        self,
        problem: Problem = None,
        algorithm_steps: list[Genetic_Operator] = [],
        stopping_condition: Stopping_Condition = Max_Generations(),
        seed: int = 1234,
        rng: np.random.Generator = None,
        population_size: int = 100,
        population: np.ndarray = np.array([]),
        metrics: Metrics = Metrics(["best_of_run", "worst_of_run"]),
    ):
        self.generation = 0
        # loop locals and validate types
        for key, value in locals().items():
            match key:
                case "problem":
                    # check if problem is an instance of Problem
                    if not isinstance(value, Problem):
                        raise TypeError(
                            "problem must be an instance of Problem."
                        )
                    self.problem = value
                case "algorithm_steps":
                    # check if algorithm_steps is a list
                    if not isinstance(value, list):
                        raise TypeError(
                            "algorithm_steps must be a list."
                        )
                    # check if algorithm_steps is a list of Genetic_Operator
                    if not all(isinstance(item, Genetic_Operator) for item in value):
                        raise TypeError(
                            "algorithm_steps must be a list of Genetic_Operator."
                        )
                    self.algorithm_steps = value
                case "stopping_condition":
                    # check if stopping_condition is an instance of Stopping_Condition
                    if not isinstance(value, Stopping_Condition):
                        raise TypeError(
                            "stopping_condition must be an instance of Stopping_Condition."
                        )
                    self.stopping_condition = value
                case "population_size":
                    # check if population_size is an int or zero
                    if not isinstance(value, int):
                        raise TypeError(
                            "stopping_condition must be an instance of int."
                        )
                    self.population_size = value
                case "population":
                    if not isinstance(value, np.ndarray):
                        raise TypeError(
                            "population must be an instance of np.ndarray."
                        )

                    if len(value) == 0:
                        self.population = self.generate_dataset()
                    else:
                        self.population = value
                case "seed":
                    # check if seed is an int
                    if not isinstance(value, int):
                        raise TypeError(
                            "seed must be an instance of int."
                        )
                    self.seed = value
                case "rng":
                    if value is None:
                        self.rng = np.random.default_rng(self.seed)
                    # check if rng is an instance of np.random.Generator
                    elif not isinstance(value, np.random.Generator):
                        raise TypeError(
                            "rng must be an instance of np.random.Generator."
                        )
                    else:
                        self.rng = value
                case "metrics":
                    # check if metrics is an instance of Metrics
                    if not isinstance(value, Metrics):
                        raise TypeError(
                            "metrics must be an instance of Metrics."
                        )
                    self.metrics = value

    def generate_dataset(self):
        """Randomly generate a population as a numpy array of size=(population size, dimensions)
        with values from the half-closed interval [min_x, max_x) from a uniform distribution
        """
        # check if dimensions exist and is int
        if not hasattr(self.problem, "dimensions"):
            raise Exception(
                """The problem object should have a dimensions attribute 
defining the size (# of cols) the input vector has"""
            )
        elif not isinstance(self.problem.dimensions, int):
            raise TypeError(
                f"The problem dimensions should be an int. Received {self.problem.dimensions}"
            )

        args = {}

        args["size"] = (self.population_size, self.problem.dimensions)
        if hasattr(self.problem, "lower_bounds"):
            args["low"] = self.problem.lower_bounds
        if hasattr(self.problem, "upper_bounds"):
            args["high"] = self.problem.upper_bounds

        return self.rng.uniform(**args)

    def add_step(self, step: Genetic_Operator):
        """Add a step to the algorithm. The step will be appended to the
        end of the algorithm."""
        # check if step is a Genetic_Operator
        if not isinstance(step, Genetic_Operator):
            raise TypeError(f"step must be a Genetic_Operator. Received {type(step)}")
        self.algorithm_steps.append(step)

    def add_steps(self, steps: list[Genetic_Operator]):
        """Add multiple steps to the algorithm. The steps will be appended to the
        end of the algorithm."""
        # check if steps is a list
        if not isinstance(steps, list):
            raise TypeError(f"steps must be a list. Received {type(steps)}")
        # check if steps is a list of Genetic_Operator
        if not all(isinstance(item, Genetic_Operator) for item in steps):
            raise TypeError(
                f"steps must be a list of Genetic_Operator. Received {type(steps)}"
            )
        self.algorithm_steps.extend(steps)

    def set_problem(self, problem: Problem):
        """Set the problem to be solved. If a problem has already been set,
        it will be overwritten and a warning will be raised."""
        if self.problem is not None:
            #
            warnings.warn(
                f"Problem already set to {self.problem}. Overwriting with {problem}."
            )
        self.problem = problem

    def run(self):
        """Run the genetic algorithm."""
        if self.is_ready():
            self.fitness_values = self.evaluate()

            # run algorithm steps
            while self.is_running():
                for step in self.algorithm_steps:
                    # build arg dict for algorithm step and then run it

                    step_args = self.build_arg_dict_from_names(step.get_operator_args())

                    self.population = step.run(**step_args)
                self.fitness_values = self.evaluate()
                self.record_metrics()

                self.generation += 1

    def record_metrics(self):
        """Record the metrics for the current generation."""
        for metric in self.metrics.metrics.keys():
            metric_args = self.metrics.get_metric_args(metric)
            metric_args = self.build_arg_dict_from_names(metric_args)
            if "generation" not in metric_args:
                metric_args["generation"] = self.generation
            self.metrics.record(metric, **metric_args)

    def build_arg_dict_from_names(self, names):
        """Build an argument dictionary from a list of names."""
        args = {}
        for name in names:
            if name == "self":
                continue
            if hasattr(self, name):
                args[name] = getattr(self, name)
            elif hasattr(self.problem, name):
                args[name] = getattr(self.problem, name)
            elif hasattr(self.stopping_condition, name):
                args[name] = getattr(self.stopping_condition, name)
            else:
                raise AttributeError(f"Could not find attribute {name}")

        return args

    def build_arg_dict_from_func(self, func):
        """Build an argument dictionary for a function."""
        signature = inspect.signature(func)
        # build arg dict for algorithm step
        args = {}
        for param in signature.parameters.values():
            if param.name == "self":
                continue
            args[param.name] = getattr(self, param.name)
        return args

    def is_running(self):
        """Check if the genetic algorithm should continue running."""
        stopping_condition_args = self.build_arg_dict_from_func(
            self.stopping_condition.is_running
        )
        return self.stopping_condition.is_running(**stopping_condition_args)

    def is_ready(self):
        """Check if the genetic algorithm is ready to run."""
        if self.problem is None:
            raise ValueError("No problem has been set.")
        if len(self.algorithm_steps) == 0:
            raise ValueError("No algorithm steps have been set.")
        if self.stopping_condition is None:
            raise ValueError("No stopping condition has been set.")
        return True

    def evaluate(self):
        """Evaluate the population using the objective function."""
        # check if objective function exists
        if not hasattr(self.problem, "objective_function"):
            raise Exception(
                """The problem object should have an objective_function 
attribute defining the objective function to be used"""
            )
        # check if objective function is callable
        if not callable(self.problem.objective_function):
            raise TypeError(
                f"""The problem objective_function should be callable. 
Received {type(self.problem.objective_function)}"""
            )
        # evaluate population
        return self.problem.objective_function(self.population)

    def get_optimum_index_and_value(self):
        """Return the optimum index and value from the population."""
        if self.problem.goal == Optimization_Goal.MAXIMIZE:
            optimum = np.max(self.fitness_values)
            optimum_index = np.argmax(self.fitness_values, axis=0)
        elif self.problem.goal == Optimization_Goal.MINIMIZE:
            optimum = np.min(self.fitness_values)
            optimum_index = np.argmin(self.fitness_values, axis=0)

        return optimum_index, optimum

    def get_worst_index_and_value(self):
        """Return the worst index and value from the population."""
        if self.problem.goal == Optimization_Goal.MAXIMIZE:
            worst = np.min(self.fitness_values)
            worst_index = np.argmin(self.fitness_values, axis=0)
        elif self.problem.goal == Optimization_Goal.MINIMIZE:
            worst = np.max(self.fitness_values)
            worst_index = np.argmax(self.fitness_values, axis=0)

        return worst_index, worst

## Experiment

In [11]:
class Treatment(TypedDict):
    """A treatment is a set of parameters for a genetic algorithm."""

    name: str
    problem: Problem
    algorithm_steps: list[Genetic_Operator]
    stopping_condition: Stopping_Condition
    metrics: list[str]
    interval: int
    population_size: int


class Treament_Factory:
    """A factory for creating treatments."""

    def __init__(
        self,
        problem: Problem = None,
        algorithm_steps: list[Genetic_Operator] = None,
        stopping_condition: Stopping_Condition = None,
        metrics: list[str] = None,
        interval: int = None,
        population_size: int = None,
    ):
        self.problem = problem
        self.algorithm_steps = algorithm_steps
        self.stopping_condition = stopping_condition
        self.metrics = metrics
        self.interval = interval
        self.population_size = population_size

    def set_algorithm_steps(self, algorithm_steps: list[Genetic_Operator]):
        self.algorithm_steps = algorithm_steps

    def set_stopping_condition(self, stopping_condition: Stopping_Condition):
        self.stopping_condition = stopping_condition

    def set_metrics(self, metrics: list[str]):
        self.metrics = metrics

    def set_interval(self, interval: int):
        self.interval = interval

    def set_population_size(self, population_size: int):
        self.population_size = population_size

    def create_treatment(self, name: str):
        """Create a treatment."""
        if self.problem is None:
            raise ValueError("No problem has been set.")
        if len(self.algorithm_steps) == 0:
            raise ValueError("No algorithm steps have been set.")
        if self.stopping_condition is None:
            raise ValueError("No stopping condition has been set.")
        if self.metrics is None:
            raise ValueError("No metrics have been set.")
        if self.interval is None:
            raise ValueError("No interval has been set.")
        if self.population_size is None:
            raise ValueError("No population size has been set.")
        return Treatment(
            name=name,
            problem=self.problem,
            algorithm_steps=self.algorithm_steps,
            stopping_condition=self.stopping_condition,
            metrics=self.metrics,
            interval=self.interval,
            population_size=self.population_size,
        )


class Experiment:
    """An experiment is a collection of runs of a genetic algorithm."""

    def __init__(
        self,
        treatments: list[Treatment],
        runs: int,
        init_seed: int,
        name: str = "experiment",
    ):
        """Initialize the experiment."""
        self.name = name
        self.treatments = treatments
        self.runs = runs
        self.init_results()

        self.set_seeds(init_seed)

    def init_results(self):
        """Initialize the metrics for the experiment."""
        self.results = {}
        for treatment in self.treatments:
            self.results[treatment["name"]] = {}
            for metric in treatment["metrics"]:
                cols = ["generation", "solution", "fitness"]
                self.results[treatment["name"]][metric] = pd.DataFrame(columns=cols)

    def set_seeds(self, init_seed: int):
        if isinstance(init_seed, int):
            rng = np.random.default_rng(init_seed)
            # generate seeds for each run of a treatment
            random_seeds = rng.choice(
                1000000, size=(len(self.treatments), self.runs), replace=False
            )
            self.seeds = {
                treatment["name"]: random_seeds[i].tolist()
                for i, treatment in enumerate(self.treatments)
            }

        # save the seeds to a file
        with open(f"{self.name}_seeds.json", "w") as f:
            json.dump(self.seeds, f)

    def get_seed(self, run: int, treatment_name: str):
        return self.seeds[treatment_name][run]

    def run(self):
        """Run the experiment."""
        for treatment in self.treatments:
            name = treatment["name"]
            for run in range(self.runs):
                seed = self.get_seed(run, name)
                metrics = Metrics(treatment["metrics"], treatment["interval"])
                # create a genetic algorithm
                ga = Genetic_Algorithm(
                    problem=treatment["problem"],
                    algorithm_steps=treatment["algorithm_steps"],
                    stopping_condition=treatment["stopping_condition"],
                    seed=seed,
                    metrics=metrics,
                    population_size=treatment["population_size"],
                )
                # run the genetic algorithm
                ga.run()

                # save the results
                for metric in treatment["metrics"]:
                    self.results[name][metric] = pd.concat(
                        [
                            self.results[name][metric],
                            pd.DataFrame.from_dict(
                                ga.metrics.metrics[metric], orient="index"
                            ).T,
                        ]
                    )

## Setting Experiment Treatments

In [12]:
truncation_percentages = [0.3, 0.5, 0.7]
tournament_sizes = [2, 5, 10]
expected_best_copies = [1.2, 1.4, 1.6]
population_size = 50
mutation_rate = 0.1
crossover_rate = 0.8
max_generations = 30
interval = 10
metrics = ["best_of_run"]

treatment_dict = {
    "truncation": [],
    "tournament": [],
    "linear_ranking": [],
    "proportional": [],
}

# truncation selection
for truncation_percentage in truncation_percentages:
    treatment_factory = Treament_Factory(
        problem=De_Jong_Function_5(),
        stopping_condition=Max_Generations(max_generations),
        population_size=population_size,
        metrics=metrics,
        interval=interval,
    )

    treatment_factory.set_algorithm_steps(
        [
            Selection(
                selection_type="truncation", truncation_percentage=truncation_percentage
            ),
            Crossover(crossover_type="single_point", rate=crossover_rate),
            Mutation(mutation_type="percent_of_range", rate=mutation_rate),
        ]
    )
    treatment_dict["truncation"].append(
        treatment_factory.create_treatment(
            f"Truncation Percentage {truncation_percentage}"
        )
    )

# tournament selection
for tournament_size in tournament_sizes:
    treatment_factory = Treament_Factory(
        problem=De_Jong_Function_5(),
        stopping_condition=Max_Generations(max_generations),
        population_size=population_size,
        metrics=metrics,
        interval=interval,
    )
    treatment_factory.set_algorithm_steps(
        [
            Selection(
                selection_type="deterministic_tournament",
                tournament_size=tournament_size,
            ),
            Crossover(crossover_type="single_point", rate=crossover_rate),
            Mutation(mutation_type="percent_of_range", rate=mutation_rate),
        ]
    )
    treatment_dict["tournament"].append(
        treatment_factory.create_treatment(f"Tournament Size {tournament_size}")
    )

# linear ranking selection
for expected_best_copy in expected_best_copies:
    treatment_factory = Treament_Factory(
        problem=De_Jong_Function_5(),
        stopping_condition=Max_Generations(max_generations),
        population_size=population_size,
        metrics=metrics,
        interval=interval,
    )
    treatment_factory.set_algorithm_steps(
        [
            Selection(
                selection_type="linear_ranking", expected_best_copies=expected_best_copy
            ),
            Crossover(crossover_type="single_point", rate=crossover_rate),
            Mutation(mutation_type="percent_of_range", rate=mutation_rate),
        ]
    )
    treatment_dict["linear_ranking"].append(
        treatment_factory.create_treatment(
            f"Linear Ranking Max Best {expected_best_copy}"
        )
    )


# proportional selection
treatment_factory = Treament_Factory(
    problem=De_Jong_Function_5(),
    stopping_condition=Max_Generations(max_generations),
    population_size=population_size,
    metrics=metrics,
    interval=interval,
)
treatment_factory.set_algorithm_steps(
    [
        Selection(selection_type="proportional"),
        Crossover(crossover_type="single_point", rate=crossover_rate),
        Mutation(mutation_type="percent_of_range", rate=mutation_rate),
    ]
)

treatment_dict["proportional"].append(
    treatment_factory.create_treatment("Proportional")
)

## Executing Experiments

In [13]:
runs = 30
seeds = [22323, 6353, 7834, 98465]
alpha = 0.05

experiments = {}
for name, treatments, seed in zip(
    treatment_dict.keys(), treatment_dict.values(), seeds
):
    experiments[name] = Experiment(
        treatments=treatments, runs=runs, init_seed=seed, name=name
    )
    experiments[name].run()

## Comparison of Treatments

In [14]:
def test_normality(experiment, metric, metric_feature, alpha):
    all_normal = True
    feature_values = {
        treatment["name"]: np.array(
            [experiment.results[treatment["name"]][metric][metric_feature]]
        ).flatten()
        for treatment in experiment.treatments
    }
    for treatment in feature_values.keys():
        stat, p = stats.shapiro(feature_values[treatment])
        if p < alpha:
            all_normal = False
            break

    if all_normal:
        test = "ANOVA"
        print(
            f"""Since all distributions are normal, 
we will use ANOVA to test for differences 
in mean {metric_feature} values"""
        )
    else:
        test = "Kruskal-Wallis"
        print(
            f"""Since at least one distribution is not normal, 
we will use Kruskal-Wallis to test for differences 
in median {metric_feature} values"""
        )
        print()

    return test


def test_differences(experiment, metric, metric_feature, test, alpha):
    has_difference = True
    feature_values = {
        treatment["name"]: np.array(
            [experiment.results[treatment["name"]][metric][metric_feature]]
        ).flatten()
        for treatment in experiment.treatments
    }
    if test == "Kruskal-Wallis":
        # state that the null hypothesis is that all groups have the same median
        print("Null hypothesis: All groups have the same median")
        stat, p = stats.kruskal(*feature_values.values())
    else:
        # state that the null hypothesis is that all groups have the same mean
        print("Null hypothesis: All groups have the same mean")
        stat, p = stats.f_oneway(*feature_values.values())

    print(f"statistic: {stat}")
    print(f"p-value: {p}")
    if p < alpha:
        print("Reject the null hypothesis")
        # what does this mean?
        print("The groups are significantly different")
        print(
            """This means that the selection method
does affect the performance of the algorithm in this case"""
        )
    else:
        print("Fail to reject the null hypothesis")
        # what does this mean?
        print("The groups are not significantly different")
        print(
            """This means that the selection method does
not affect the performance of the algorithm in this case"""
        )
        has_difference = False
    print()

    return has_difference

## Analysis and Results:

Comparing parameter settings for truncation, tournament, and linear rank selection

In [15]:
alpha = 0.05
for name, experiment in experiments.items():
    test = ""
    has_difference = True
    if name == "proportional":
        continue
    print(f"Testing for differences in {name} selection method")
    print()
    test = test_normality(experiment, "best_of_run", "fitness", alpha)
    has_difference = test_differences(experiment, "best_of_run", "fitness", test, alpha)

    if not has_difference:
        agg_params = ["mean", "std", "size"]
        print(
            "Reporting the mean, standard deviation, and number of runs for each treatment"
        )
        print("because the average fitness values are not significantly different")
        for treatment in experiment.treatments:
            print(treatment["name"])
            df = pd.DataFrame(experiment.results[treatment["name"]]["best_of_run"])
            print(df[["generation", "fitness"]].agg(agg_params))
            print()

Testing for differences in truncation selection method

Since at least one distribution is not normal, 
we will use Kruskal-Wallis to test for differences 
in median fitness values

Null hypothesis: All groups have the same median
statistic: 0.5697680097680404
p-value: 0.7521014893593525
Fail to reject the null hypothesis
The groups are not significantly different
This means that the selection method does
not affect the performance of the algorithm in this case

Reporting the mean, standard deviation, and number of runs for each treatment
because the average fitness values are not significantly different
Truncation Percentage 0.3
      generation    fitness
mean    2.133333   5.742873
std     1.795268   4.161589
size   30.000000  30.000000

Truncation Percentage 0.5
      generation    fitness
mean    3.733333   5.949905
std     2.517981   4.646919
size   30.000000  30.000000

Truncation Percentage 0.7
      generation    fitness
mean    4.633333   5.335077
std     3.079222   4.270576


## Also Proportional Selection

In [16]:
# here are the results of proportional selection
print("Proportional Selection")
agg_params = ["mean", "std", "size"]
for treatment in experiments["proportional"].treatments:
    df = pd.DataFrame(experiments["proportional"].results[treatment["name"]]["best_of_run"])
    print(df[["generation", "fitness"]].agg(agg_params))
    print()

Proportional Selection
      generation    fitness
mean    1.633333   8.181372
std     1.828573   5.244602
size   30.000000  30.000000



We, Ben Vuong and Morgan Elder, certify that this report is our own, independent work and that it does not plagiarize, in part or in full, any other work.

$Ben$ $Vuong$<br>
$Morgan$ $Elder$