# **Genetic Algorithm - Phrase Solver**

This notebook shows how to use the genetic algorithm classes to create a training scenario.
It will demonstrate how a genetic algorithm can be used to generate a specific sentence.

For a sentence of length $L$ constructed from $N_C$ possible characters, the number of permutations of possible sentences is given by $(N_C)^L$.

If $N_P$ sentences are created each generation, it would take $\frac{(N_C)^L}{N_P}$ generations on average to create a specific sentence.
This number grows large quickly, and so this genetic algorithm aims to implement a more efficient algorithm to find a solution.

## Code Implementation

### Libraries and Helper Functions

To create the training scenario, we will need to import `GeneticAlgorithm` and `Member`. We will use class inheritance to set things up.

In [1]:
from __future__ import annotations

import logging

import numpy as np

from genetic_algorithm.ga import GeneticAlgorithm
from genetic_algorithm.member import Member

logging.basicConfig(format="%(asctime)s %(message)s", datefmt="[%d-%m-%Y|%H:%M:%S]", level=logging.INFO)
logger = logging.getLogger(__name__)
rng = np.random.default_rng()

### Creating Classes from Member and GeneticAlgorithm

The following class is used to represent an individual which has genetic information. In this case, the genes are different characters, and a sequence of these genes is a chromosome which can be mixed between individuals each generation. The individuals with better chromosomes (i.e. a higher fitness score) are more likely to reproduce and pass on their genes to future generations.

Their fitnesses are calculated by checking the number of characters in their chromosome which match a specified phrase and then squaring that number.

To crossover two chromosomes to construct a new sequence, genes are taken randomly from the two with equal probability, and there is also a chance for the gene to mutate and choose a random character.

Configuring:
- `Member._chromosome`
- `Member._new_chromosome`
- `Member.fitness`
- `Member.crossover()`

In [2]:
class PhraseSolverMember(Member):
    """Member to use in PhraseSolver app."""

    def __init__(self, length: int, gene_pool: list[str]) -> None:
        """Initialise PhraseSolverMember with length of phrase and gene pool.

        :param int length:
            Length of the phrase to solve
        :param list[str] gene_pool:
            List of possible genes to use in the chromosome
        """
        super().__init__()
        self._length = length
        self._gene_pool = gene_pool
        self._chromosome = "".join([self.random_char for _ in range(self._length)])

    @property
    def random_char(self) -> str:
        """Return a random gene from the possible genes."""
        _choice: str = rng.choice(self._gene_pool)
        return _choice

    @property
    def fitness(self) -> int:
        """Return member fitness."""
        return self._score**2

    def calculate_score(self, phrase: str) -> None:
        """Calculate the member's score based on the provided phrase.

        :param str phrase:
            Phrase to compare the member's chromosome against
        """
        self._score = sum([self._chromosome[i] == phrase[i] for i in range(self._length)])

    def crossover(self, parent_a: Member, parent_b: Member, mutation_rate: int) -> None:
        """Crossover the chromosomes of two parents to create a new chromosome.

        :param Member parent_a:
            First parent member
        :param Member parent_b:
            Second parent member
        :param int mutation_rate:
            Mutation rate as a percentage (0-100) for the crossover
        """
        def _choose_gene(roll: int, i: int) -> str:
            # Half of the genes will come from parentA
            if roll < (100 - mutation_rate) / 2:
                return parent_a._chromosome[i]
            # Half of the genes will come from parentB
            if roll < (100 - mutation_rate):
                return parent_b._chromosome[i]

            # Chance for a random gene to be selected
            return self.random_char

        prob_array = rng.integers(0, 100, size=self._length)
        indices = np.arange(self._length)
        vectorized_choose_gene = np.vectorize(_choose_gene)
        self._new_chromosome = "".join(vectorized_choose_gene(prob_array, indices))

The next class is used to generate a population of the members belonging to the above class. It is also used to run the algorithm.

Each generation, the fitnesses of each member is evaluated.
The highest fitness is used to normalise the fitnesses to be in the range [0, 1].
The normalised fitnesses are used to select parents for the next generation of individuals via the Rejection Sampling technique:

1) Select a random member from the population
2) Generate a random number between 0 and 1
3) If this number is less than than the normalised fitness of the parent, the parent is selected. Otherwise, select a new random member and generate a new number.

This is a memory efficient way of making members with higher fitnesses more likely to reproduce.
For each individual of the new generation, two different parents are selected this way.
Their chromosomes are mixed to generate a new sequence for that individual.

The program terminates when a member of the population has a chromosome which completely matches the specified phrase.

Configuring:
- `GeneticAlgorithm._population`
- `GeneticAlgorithm._evaluate()`
- `GeneticAlgorithm._analyse()`

In [3]:
class PhraseSolver(GeneticAlgorithm):
    """Simple app to use genetic algorithms to solve an alphanumeric phrase."""

    def __init__(self, members: list[PhraseSolverMember], mutation_rate: int) -> None:
        """Initialise PhraseSolver app.

        :param list[PhraseSolverMember] members:
            List of members to use in the population
        :param int mutation_rate:
            Mutation rate for members
        """
        super().__init__(members, mutation_rate)
        self._phrase: str

    @classmethod
    def create_and_run(
        cls, population_size: int, mutation_rate: int, phrase: str, mem_genes: list[str]
    ) -> PhraseSolver:
        """Create app and run genetic algorithm.

        :param int population_size:
            Size of the population to create
        :param int mutation_rate:
            Mutation rate for the members
        :param str phrase:
            Phrase to solve using the genetic algorithm
        :param list[str] mem_genes:
            List of possible genes to use in the chromosome
        :return PhraseSolver:
            Instance of PhraseSolver with the genetic algorithm run
        """
        ga = cls([PhraseSolverMember(len(phrase), mem_genes) for _ in range(population_size)], mutation_rate)
        ga._phrase = phrase
        logger.info(ga)
        ga.run()
        return ga

    def _evaluate(self) -> None:
        """Evaluate the population."""
        for _member in self._population._members:
            _member.calculate_score(self._phrase)
        self._population.evaluate()

    def _analyse(self) -> None:
        """Analyse best member's chromosome."""
        _gen_text = f"Generation {self._generation:>4}:"

        # Correct phrase found so break out of the loop
        if self._population.best_chromosome == self._phrase:
            logger.info("%s %s \t|| Solved!", _gen_text, self._population.best_chromosome)
            self._running = False
            return

        # Return the closest match and its associated fitness then evolve.
        logger.info(
            "%s %s \t|| Max Fitness: %d",
            _gen_text,
            self._population.best_chromosome,
            self._population.best_fitness,
        )

### Running the Algorithm

The possible genes an individual's chromosome can contain are provided as a list.
The population size and mutation rate is then defined.
A phrase is also chosen for the population members to guess.

In [4]:
population_size = 200
mutation_rate = 3
phrase = "I am a genetic algorithm!"
gene_pool = list("0123456789 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.,!?")

Each generation, the best guess will be printed and until eventually it has converged on the specified phrase.

In [6]:
ga = PhraseSolver.create_and_run(population_size, mutation_rate, phrase, gene_pool)

[19-06-2025|23:49:42] Population Size: 200
Mutation Rate: 3
[19-06-2025|23:49:42] Generation    1: c6WZr?hgF5,edKa8lz0TLq3Sh 	|| Max Fitness: 4
[19-06-2025|23:49:42] Generation    2: yjDp!EfgeyEoSL aa,R1bVCoI 	|| Max Fitness: 16
[19-06-2025|23:49:42] Generation    3: a3rm3BdgO5,JPK8alBo26qu 9 	|| Max Fitness: 25
[19-06-2025|23:49:42] Generation    4: Bxam3Bdge5wJmK8awIorVPCo9 	|| Max Fitness: 49
[19-06-2025|23:49:42] Generation    5: B37m3Bdge5zJPK8alIor6qha9 	|| Max Fitness: 64
[19-06-2025|23:49:42] Generation    6: Ivameavgy698dX4alLRri33E! 	|| Max Fitness: 100
[19-06-2025|23:49:42] Generation    7: Bmam?a genzvm?saMWxrY5hon 	|| Max Fitness: 100
[19-06-2025|23:49:42] Generation    8: IvamnayqenJom? a9L2ri3C7! 	|| Max Fitness: 121
[19-06-2025|23:49:42] Generation    9: Bn6m?a ge5A8Cc alWRrib3EK 	|| Max Fitness: 121
[19-06-2025|23:49:42] Generation   10: B3am!a genJ8CJ alLRrVbCox 	|| Max Fitness: 121
[19-06-2025|23:49:42] Generation   11: I3am!aygenQndP algJrFpCm! 	|| Max Fitness: 196
