# Permutation Flowshop Scheduling Problem (PFSP)

The Permutation Flowshop Scheduling Problem (PFSP) is a well-known combinatorial optimization problem. The problem is defined as follows:

 Given a set of jobs $J = \{J_1, J_2, \ldots, J_n\}$ and a set of machines $M = \{M_1, M_2, \ldots, M_m\}$, where each job $J_i$ consists of $m$ operations, one for each machine, the objective is to find a permutation of the jobs that minimizes the makespan, i.e., the total time it takes to process all jobs on all machines.


---
>Spanakis Panagiotis-Alexios, Pregraduate Student
Department of Management Science and Technology
Athens University of Economics and Business
t8200158@aueb.gr

## Before we start

We first need to go over the dependencies needed for this notebook to function properly. 

We will be using the following libraries:

- numpy: For numerical operations and more efficient matrix operations
- pandas: For data manipulation, more specifically for reading the data from the provided files
- math: For the calculation of the factorial of a number
- collections: For the implementation of the tabu list in the Tabu Search algorithm
- typing: For type hints in the function definitions
- time: For measuring the time taken to find the optimal solution
- optuna: For hyperparameter optimization
- json: For saving the optimal hyperparameters to a json file

To install the required libraries, run the following command:

In [1]:
!pip install -r requirements.txt



DEPRECATION: Loading egg at c:\users\panagiotis\appdata\local\programs\python\python311\lib\site-packages\spherecluster-0.1.7-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330


## Importing the necessary libraries

We will start by importing the necessary libraries for this notebook.



In [2]:
import numpy as np
import pandas as pd
import math
from collections import deque
from typing import Tuple
from time import time
import optuna
import json

## Reading the data

We will start by reading the data from the provided file `input.csv`. The file contains the processing times for each job on each machine.

Let's create a function that reads the data from the file.

In [3]:
def read_data(data_path: str):
    """
    Load data from the input file into a pandas DataFrame
    :param data_path: the path to the input file
    """
    data = pd.read_csv(data_path, header=None)
    # Create columns J1 - J20 and name the rows M1 to M5
    data.columns = ['J' + str(i) for i in range(1, data.shape[1] + 1)]
    data.index = ['M' + str(i) for i in range(1, data.shape[0] + 1)]
    return data

Now, let's read the data from the file `input.csv` and display the data.

In [4]:
data_path = 'input.csv'
data = read_data(data_path)
data

Unnamed: 0,J1,J2,J3,J4,J5,J6,J7,J8,J9,J10,J11,J12,J13,J14,J15,J16,J17,J18,J19,J20
M1,54,83,15,71,77,36,53,38,27,87,76,91,14,29,12,77,32,87,68,94
M2,79,3,11,99,56,70,99,60,5,56,3,61,73,75,47,14,21,86,5,77
M3,16,89,49,15,89,45,60,23,57,64,7,1,63,41,63,47,26,75,77,40
M4,66,58,31,68,78,91,13,59,49,85,85,9,39,41,56,40,54,77,51,31
M5,58,56,20,85,53,35,53,41,69,13,86,72,8,49,47,87,58,18,68,28


Now that we can see clearly the data we are working with, let us now convert the data into a numpy array for easier manipulation.

In [5]:
def convert_data_to_numpy(data: pd.DataFrame):
    """
    Convert the data from a pandas DataFrame to a numpy array
    :param data: the input data
    """
    return data.to_numpy()

In [6]:
data = convert_data_to_numpy(data)
data

array([[54, 83, 15, 71, 77, 36, 53, 38, 27, 87, 76, 91, 14, 29, 12, 77,
        32, 87, 68, 94],
       [79,  3, 11, 99, 56, 70, 99, 60,  5, 56,  3, 61, 73, 75, 47, 14,
        21, 86,  5, 77],
       [16, 89, 49, 15, 89, 45, 60, 23, 57, 64,  7,  1, 63, 41, 63, 47,
        26, 75, 77, 40],
       [66, 58, 31, 68, 78, 91, 13, 59, 49, 85, 85,  9, 39, 41, 56, 40,
        54, 77, 51, 31],
       [58, 56, 20, 85, 53, 35, 53, 41, 69, 13, 86, 72,  8, 49, 47, 87,
        58, 18, 68, 28]], dtype=int64)

With the data now in a numpy array, we can now start implementing the permutation flowshop scheduling problem.

We will start by defining the objective function for the problem. 
The objective function is the total time it takes to process the last job on the last machine, also known as the makespan.
More specifically, the makespan is calculated as the sum of the processing times of each job on the last machine.

### Variables:

- $m$: Index for machines, $m = 1, 2, \ldots, M$, where $M$ is the total number of machines.
- $j$: Index for jobs, $j = 1, 2, \ldots, N$, where $N$ is the total number of jobs.
- $p_{mj}$: Processing time of job $j$ on machine $m$.
- $seq$: A sequence (permutation) of jobs $j$, indicating the order in which jobs are processed.
- $C_{mj}$: Completion time of job $j$ on machine $m$.
- $C_{\text{max}}$: Makespan, the total time to complete all jobs on all machines, which is the completion time of the last job on the last machine.

### Makespan Calculation:

1. **Initialization**: For the first job in the sequence ($j=1$) and the first machine ($m=1$), the completion time is simply the processing time of that job on that machine.

$$C_{1,seq[1]} = p_{1,seq[1]}$$

2. **First Machine**: For each subsequent job $j$ on the first machine ($m=1$), the completion time is the sum of its processing time and the completion time of the previous job.

$$C_{1,seq[j]} = C_{1,seq[j-1]} + p_{1,seq[j]}, \quad \text{for } j = 2, 3, \ldots, N$$

3. **First Job on Subsequent Machines**: For the first job in the sequence on each subsequent machine $m$, the completion time is the sum of its processing time on the current machine and the completion time on the previous machine.

$$C_{m,seq[1]} = C_{m-1,seq[1]} + p_{m,seq[1]}, \quad \text{for } m = 2, 3, \ldots, M$$

4. **Subsequent Jobs on Subsequent Machines**: For each subsequent job $j$ on each subsequent machine $m$, the completion time is the maximum of the completion time of the previous job on the current machine and the completion time of the current job on the previous machine, plus the processing time of the current job on the current machine.

$$C_{m,seq[j]} = \max(C_{m,seq[j-1]}, C_{m-1,seq[j]}) + p_{m,seq[j]}, \quad \text{for } m = 2, 3, \ldots, M; \, j = 2, 3, \ldots, N$$

5. **Makespan**: The makespan $C_{\text{max}}$ is the completion time of the last job on the last machine.

$$C_{\text{max}} = C_{M,seq[N]}$$



Now that we have defined the objective function, we can implement the function that calculates the makespan for a given sequence of jobs.



In [7]:
def calculate_makespan(data: np.ndarray, solution: np.ndarray) -> int:
    """
    Calculate the makespan of a solution
    :param data: The data in a numpy array
    :param solution: The solution to evaluate
    :return: The calculated makespan
    """
    # Get the number of machines and jobs from the data
    machines = data.shape[0]
    jobs = data.shape[1]

    # Initialize a matrix to store the completion times of each job on each machine
    times = np.zeros((machines, jobs))

    for m in range(machines):
        for j in range(jobs):
            # If we are in the first machine and in the first job
            # We just input the processing time of the job
            if m == 0 and j == 0:
                times[m][j] = data[m][solution[j] - 1]
            # If we are in the first machine but not in the first job
            # We add the processing time of the job to the previous completion time
            elif m == 0:
                times[m][j] = times[m][j - 1] + data[m][solution[j] - 1]
            # If we are in the first job but not in the first machine
            # We add the processing time of the job to the previous completion time
            elif j == 0:
                times[m][j] = times[m - 1][j] + data[m][solution[j] - 1]
            # If we are not in the first job or the first machine
            # We add the processing time of the job to the maximum of the previous job on the machine
            # or the previous machine on the job
            else:
                times[m][j] = max(times[m - 1][j], times[m][j - 1]) + data[m][solution[j] - 1]

    # The makespan is the completion time of the last job on the last machine
    # So the last element of the times matrix will be returned
    return times[-1][-1]

With the makespan function implemented, we can now start implementing the permutation flowshop scheduling problem.

In order to solve the Ta001 problem, we will use the Genetic Algorithm (GA) optimization technique.

The Genetic Algorithm (GA) is a search heuristic inspired by Charles Darwin's theory of natural evolution. It reflects the process of natural selection where the fittest individuals are selected for reproduction to produce offspring of the next generation. The algorithm repeatedly modifies a population of individual solutions. At each step, the GA selects individuals at random from the current population to be parents and uses them to produce the children for the next generation. Over successive generations, the population "evolves" toward an optimal solution.
How the Genetic Algorithm works : 

1) **Initialization**: Start with a randomly generated population of nn chromosomes (possible solutions for the problem).

2) **Fitness Calculation**: Evaluate the fitness f(x)f(x) of each chromosome xx in the population. The fitness score is an indication of how good a solution is relative to others.

3) **Selection**: Select parent chromosomes for breeding. Parents are selected according to their fitness. The better the chromosomes are, the more chances they have to be selected for reproduction. This step mimics the "survival of the fittest" principle.

4) **Crossover** (Recombination): Combine the genetic information of two parents to generate new offspring. There are various methods to perform crossover such as single-point crossover, multi-point crossover, and uniform crossover.

5) **Mutation**: Apply random mutations to some offspring at some probability. This step introduces genetic diversity into the population, providing new genetic structures to explore.

6) **Replacement**: Form a new population by selecting individuals from the current population and the offspring. This new generation is then used in the next iteration of the algorithm.

7) **Termination**: Repeat the steps from Fitness Calculation to Replacement until a termination condition is met (e.g., a maximum number of generations is reached, or a satisfactory fitness level has been achieved).

In order to explain why the Genetic Algorithm is an appropriate choice for the PFSP, let us calculate the number of possible solutions for our problem.

More specifically, the number of possible solutions for the PFSP is the factorial of the number of jobs as each job can be scheduled in a different position in the sequence.

The factorial is calculated as follows:

$$n! = n \times (n-1) \times (n-2) \times \ldots \times 2 \times 1$$

Where $n$ is the number of jobs.




In [8]:
# Calculate the number of possible solutions for the PFSP
def calculate_possible_solutions(data: np.ndarray) -> int:
    """
    Calculate the number of possible solutions for the PFSP
    :param data: The data in a numpy array
    :return: The number of possible solutions
    """
    # Get the number of jobs from the data
    jobs = data.shape[1]

    # The number of possible solutions is the factorial of the number of jobs
    return math.factorial(jobs)


# Calculate the number of possible solutions
possible_solutions = calculate_possible_solutions(data)
possible_solutions

2432902008176640000

We can see that the number of possible solutions for the PFSP is very large, even for a small number of jobs. This is due to the combinatorial nature of the problem, where each job can be scheduled in a different position in the sequence.

The Genetic Algorithm is well-suited for combinatorial optimization problems like the PFSP because it can efficiently explore a large search space of possible solutions. By using techniques such as selection, crossover, and mutation, the Genetic Algorithm can find good solutions to complex optimization problems.

While other algorithms such as simple heuristics or local search methods can also be used for the PFSP, the Genetic Algorithm provides a good balance between exploration and exploitation of the search space, making it a popular choice for solving combinatorial optimization problems.


In our case, in order to find the optimal solution for the Ta001 problem, we will implement a **hybrid Genetic Algorithm** that 
exploits and an another metaheuristic algorithm, the **Tabu Search** algorithm, in order to improve the performance of the Genetic Algorithm.

**Tabu Search**, developed by Fred Glover in 1986, is particularly known for its ability to escape local optima, a common challenge in optimization problems. It achieves this by maintaining a tabu list—a short-term memory of recently visited solutions (or moves that alter the current solution) that are temporarily banned or made "tabu." This prevents the search from cycling back to previously explored solutions, encouraging exploration of new regions of the solution space.

The key components of Tabu Search include:

- **Tabu List**: A list that records certain attributes of the recently visited solutions (or the moves made) to prevent the algorithm from revisiting them. The list has a fixed length (tenure), after which the oldest entries are removed to make room for new ones.

- **Aspiration Criteria**: Conditions under which the tabu status of a solution (or move) can be overridden. Typically, if a move leads to a solution better than any seen so far (even if the move is tabu), it may be accepted.

- **Neighborhood Search**: At each iteration, Tabu Search examines the "neighborhood" of the current solution (solutions reachable from the current solution through a single move) and moves to the best solution in this neighborhood that is not tabu (unless overridden by the aspiration criteria).

- **Diversification and Intensification Strategies**: Mechanisms to explore the solution space broadly (diversification) and to exploit promising regions of the solution space deeply (intensification).

---

# Hybrid Genetic Algorithm Implementation

Now that we have a good understanding of the Genetic Algorithm and Tabu Search, and why they are suitable for the PFSP, we can proceed with the implementation of the hybrid algorithm to solve our problem.
 
The hybrid Genetic Algorithm with Tabu Search will combine the exploration capabilities of the Genetic Algorithm with the intensification capabilities of Tabu Search to find a good solution to the PFSP.

With the Genetic Algorithm, we will explore the search space of possible solutions by generating and evolving populations of chromosomes. The Genetic Algorithm will use selection, crossover, and mutation operators to create new solutions and improve the population over multiple generations.

With Tabu Search, we will intensify the search by exploring the neighborhood of the best solution found by the Genetic Algorithm. Tabu Search will use a tabu list to prevent cycling and encourage exploration of new regions of the solution space. More specifically, we will apply Tabu Search each 2 generations of the Genetic Algorithm at the 20% of the population in order to improve the performance of the Genetic Algorithm while improving the quality and diversity of the solutions.

Having already defined the function that calculates the makespan for a given sequence of jobs, we can now continue to implement our solution.

Let's start by defining the function that initializes the population for the Genetic Algorithm.

In [9]:
def initialize_population(data: np.ndarray, pop_size: int) -> list:
    """
    Initialize the population for the Genetic Algorithm
    :param data: The data in a numpy array
    :param pop_size: The size of the population
    :return: The initialized population
    """
    # Get the number of jobs from the data
    jobs = data.shape[1]

    return [np.random.permutation(jobs) + 1 for _ in range(pop_size)]

Now let's implement the function that selects the best individuals from the population based on their fitness (makespan).
Basically, this function is a tournament selection that selects the best individuals from the population based on their fitness (makespan).

In [10]:
def tournament_selection(population: list, fitness: list, tournament_size: int) -> np.ndarray:
    """
    Perform tournament selection on the population
    :param population: The population to select from
    :param fitness: The fitness values of the population
    :param tournament_size: The size of the tournament
    :return: The selected individual
    """
    # Choose random indices for the tournament
    selected_indices = np.random.choice(range(len(population)), tournament_size)
    # Get the makespan values of the selected individuals
    selected_fitness = [fitness[i] for i in selected_indices]
    # Select the winner of the tournament
    winner_index = selected_indices[np.argmin(selected_fitness)]
    # Return the winner of the tournament
    return population[winner_index]

Now, let's implement the function that performs the crossover operation on two parent individuals to produce two offspring individuals.

In [11]:
def ordered_crossover(parent1: np.ndarray, parent2: np.ndarray) -> np.ndarray:
    """
    Perform ordered crossover on two parents
    :param parent1: The first parent
    :param parent2: The second parent
    :return: The children produced by the crossover
    """
    # Get the parent size
    size = len(parent1)
    # Initialize the child with the same size as the parents
    child = np.full(size, None, dtype=object)
    # Choose two random indices for the crossover
    start, end = sorted(np.random.choice(range(size), size=2, replace=False))
    # Copy the selected part of the first parent to the child
    child[start:end + 1] = parent1[start:end + 1]
    # Get the values that are not in the child from the second parent
    fill_values = [item for item in parent2 if item not in child]
    # Get the indices that need to be filled
    fill_pos = [i for i in range(size) if child[i] is None]
    # Fill the child with the values from the second parent
    for i, value in zip(fill_pos, fill_values):
        child[i] = value
    return child

Now, let's implement the mutation function that performs the swap operator on a given solution 
where it randomly selects two positions in the solution and swaps the jobs at those positions.

In [12]:
def swap_mutation(solution: np.ndarray) -> np.ndarray:
    """
    Perform swap mutation on a sequence
    :param solution: The provided solution
    :return: The mutated solution
    """
    # Choose two random indices to swap
    idx1, idx2 = np.random.choice(range(len(solution)), size=2, replace=False)
    # Swap the values at the indices
    solution[idx1], solution[idx2] = solution[idx2], solution[idx1]
    # Return the mutated solution
    return solution

Now that we have implemented the initialization, selection, crossover, and mutation functions for the Genetic Algorithm, we can proceed with the implementation of the Genetic Algorithm itself.

But first let us implement the Tabu Search algorithm in order to use it in the Genetic Algorithm.

Let's also keep in mind that the best known solution for the Ta001 problem is 1278.

In [13]:
OPTIMAL_MAKESPAN = 1278

In [14]:
def tabu_search(data: np.ndarray, initial_solution: np.ndarray, initial_makespan: int, tabu_list: deque, tenure: int,
                num_iterations: int = 50) -> Tuple[np.ndarray, int]:
    """
    Perform tabu search on the initial solution
    :param initial_solution: The initial solution provided
    :param initial_makespan: The makespan of the initial solution
    :param tabu_list: The tabu list to use
    :param tenure: The tenure of the tabu list
    :param num_iterations: The number of iterations to perform
    :return: The Best solution and its makespan
    """
    # Initialize the best solution and its makespan
    best_solution = initial_solution.copy()
    best_makespan = initial_makespan
    current_solution = initial_solution.copy()

    # Define the jobs from the data
    jobs = data.shape[1]

    # Perform the tabu search for the specified number of iterations
    for iteration in range(num_iterations):
        neighborhood = []  # List to hold all neighbors (solutions) and their makespans
        # Generate neighbors by swapping two jobs
        for i in range(jobs):
            for j in range(i + 1, jobs):
                # Check if the move is not tabu else skip
                if (i, j) not in tabu_list:
                    # Copy the current solution
                    neighbor = current_solution.copy()
                    # Swap the two jobs
                    neighbor[i], neighbor[j] = neighbor[j], neighbor[i]
                    # Calculate the makespan of the neighbor
                    neighbor_makespan = calculate_makespan(data, neighbor)
                    # Save the move that was made
                    move = (i, j)
                    # Append the neighbor and its makespan to the neighborhood
                    neighborhood.append((neighbor, neighbor_makespan, move))

        # If no moves are available, break the loop
        if not neighborhood:
            break

        # Select the best move from the neighborhood
        neighborhood.sort(key=lambda x: x[1])
        # Get the best solution, its makespan and the move that was made
        current_solution, current_makespan, current_move = neighborhood[0]

        # Update the best solution and its makespan
        if current_makespan < best_makespan:
            best_solution, best_makespan = current_solution, current_makespan
            # Clear tabu list if a better solution is found
            tabu_list.clear()

            # Check if the optimal solution has been found
            if best_makespan == OPTIMAL_MAKESPAN:
                print(f"Found optimal solution at iteration {iteration + 1}!")
                return best_solution, best_makespan

        # Update the tabu list
        tabu_list.append(current_move)

        # If the tabu list is full, remove the oldest element
        if len(tabu_list) > tenure:
            tabu_list.popleft()

    return best_solution, best_makespan

Now that we have implemented the Tabu Search algorithm, we can proceed with the implementation of the Genetic Algorithm.

But let us check the performance of the Tabu Search algorithm first.

In [15]:
# Initialize the tabu list and tenure
tabu_list = deque(maxlen=5)
tenure = 5

# Initialize the initial solution and its makespan
initial_solution = np.random.permutation(data.shape[1]) + 1

# Calculate the initial makespan
initial_makespan = calculate_makespan(data, initial_solution)

# Time the execution of the Tabu Search
start_time = time()

# Perform the Tabu Search
best_solution, best_makespan = tabu_search(data, initial_solution, initial_makespan, tabu_list, tenure)

# Save the execution time
execution_time = time() - start_time
# Print the best solution and its makespan
print(f"Best Solution: {best_solution}")
print(f"Best Makespan: {best_makespan}")
# Print the execution time
print(f"Execution Time: {round(execution_time, 2)} seconds")

Best Solution: [ 8 15  3  4 11 17 14  6  9  5  7 16  1 18 12 19 13 10  2 20]
Best Makespan: 1297.0
Execution Time: 0.55 seconds


As we can see, the Tabu Search algorithm is able to find a solution close to the optimal solution of the Ta001 problem
with a makespan of 1297 and also very quickly (around half a second).

However, we can improve the performance of the Tabu Search algorithm by using it in combination with the Genetic Algorithm.

Let's now implement the Genetic Algorithm with Tabu Search.

The parameters for the Genetic Algorithm are as follows:

- **Population Size**: The size of the population
- **Number of Generations**: The number of generations
- **Mutation Rate**: The probability of mutation
- **Tournament Size**: The size of the tournament selection
- **Tabu Tenure**: The tenure of the tabu list
- **Tabu Search Frequency**: The frequency to apply Tabu Search
- **Tabu Iterations**: The number of iterations to perform Tabu Search
- **Tabu Search Percentage**: The percentage of the population to apply Tabu Search

In [16]:
def genetic_algorithm(data: np.ndarray, pop_size: int = 100, generations: int = 100, mutation_rate: float = 0.01,
                      tournament_size: int = 3, tabu_tenure: int = 10, tabu_search_frequency: int = 2,
                      tabu_iterations: int = 50, tabu_search_percentage: float = 0.1) \
        -> Tuple[np.ndarray, int]:
    """
    The Genetic Algorithm for the PFSP problem
    :param data: the data in a numpy array
    :param pop_size: the population size
    :param generations: the number of generations
    :param mutation_rate: the mutation rate
    :param tournament_size: the size of the tournament selection
    :param tabu_tenure: the tabu tenure
    :param tabu_search_frequency: the frequency to apply Tabu Search
    :param tabu_iterations: the number of iterations to perform Tabu Search
    :param tabu_search_percentage: the percentage of the population to apply Tabu Search
    :return: the best solution and its makespan
    """

    # Initialize the population
    population = initialize_population(data, pop_size)
    # Initialize the tabu list as a deque
    tabu_list = deque(maxlen=tabu_tenure)

    # Initialize the best solution and its makespan
    best_sol_overall = None
    best_makespan_overall = float('inf')

    # Begin the genetic algorithm loop
    for generation in range(generations):
        # Calculate the fitness (makespan) of each individual of the population
        fitness = [calculate_makespan(data, individual) for individual in population]

        # Tabu Search integration
        if generation % tabu_search_frequency == 0:
            for i in range(len(population)):
                # Apply TS to (tabu_search_percentage) of the population
                if np.random.rand() < tabu_search_percentage:
                    # Apply tabu search to the individual
                    improved_solution, improved_fitness = tabu_search(data, population[i], fitness[i],
                                                                      tabu_list,
                                                                      tabu_tenure, tabu_iterations)

                    # Check if we have found the optimal solution
                    if improved_fitness == OPTIMAL_MAKESPAN:
                        return improved_solution, improved_fitness

                    # Update the population and the fitness
                    population[i] = improved_solution.copy()
                    fitness[i] = improved_fitness

        # Initialize a new population
        new_population = []
        # Initialize the new fitness
        new_fitness = []

        # Generate the new population
        for _ in range(pop_size):
            # Perform tournament selection
            parent1 = tournament_selection(population, fitness, tournament_size)
            parent2 = tournament_selection(population, fitness, tournament_size)
            # Perform ordered crossover
            child = ordered_crossover(parent1, parent2)
            # Perform mutation if the mutation rate is met
            if np.random.rand() < mutation_rate:
                child = swap_mutation(child)
            # Append the child to the new population
            new_population.append(child)
            # Calculate the fitness of the child
            new_fitness.append(calculate_makespan(data, child))

        # Update the population
        population = new_population.copy()
        # Update the fitness of the population
        fitness = new_fitness.copy()

        # Find the best solution and its makespan
        best_idx = np.argmin(fitness)
        best_solution = population[best_idx]
        best_makespan = fitness[best_idx]
        print(best_solution)

        # Check if the best solution is better than the overall best solution
        if best_makespan < best_makespan_overall:
            best_sol_overall = best_solution.copy()
            best_makespan_overall = best_makespan
            # If the best solution is the optimal solution, break the loop
            if best_makespan_overall == OPTIMAL_MAKESPAN:
                print("Found optimal solution!")
                break

        print(f"Generation {generation + 1}: Best Makespan = {best_makespan}")

    return best_sol_overall, best_makespan_overall

Now that we have implemented the Genetic Algorithm with Tabu Search, we can proceed with running the algorithm to find the optimal solution for the Ta001 problem.

But first, let's define the parameters for the Genetic Algorithm by hand.

---

## Parameters for the Genetic Algorithm

After some initial testing, we have found that the following parameters work well for the Ta001 problem:

In [17]:
# Set the parameters for the Genetic Algorithm
population_size = 50
num_generations = 100
tournament_size = 5
mutation_rate = 0.2
tabu_tenure = 5  # Size of the tabu list
tabu_search_frequency = 3  # Apply TS every 3 generations
tabu_iterations = 50
tabu_search_percentage = 0.1  # The percentage of the population to apply Tabu Search

Now let's run the Genetic Algorithm with Tabu Search to find the optimal solution for the Ta001 problem.

In [18]:
# Set a random seed for reproducibility
np.random.seed(42)
# Time the execution of the Genetic Algorithm
start_time = time()
# Run the Genetic Algorithm
best_solution, best_makespan = genetic_algorithm(data=data, pop_size=population_size, generations=num_generations,
                                                 mutation_rate=mutation_rate,
                                                 tournament_size=tournament_size, tabu_tenure=tabu_tenure,
                                                 tabu_iterations=tabu_iterations,
                                                 tabu_search_frequency=tabu_search_frequency,
                                                 tabu_search_percentage=tabu_search_percentage)
# Save the execution time
execution_time = time() - start_time
# Print the best solution and its makespan
print(f"Best Solution: {best_solution}")
print(f"Best Makespan: {best_makespan}")
# Print the execution time
print(f"Execution Time: {round(execution_time, 2)} seconds")

[3 9 15 6 11 13 1 4 19 5 14 17 16 18 8 2 7 10 20 12]
Generation 1: Best Makespan = 1288.0
[3 9 15 6 11 13 1 4 19 5 14 17 16 18 8 2 7 10 20 12]
Generation 2: Best Makespan = 1288.0
[3 9 15 6 11 13 1 4 19 5 14 17 16 18 8 2 7 10 20 12]
Generation 3: Best Makespan = 1288.0
Found optimal solution at iteration 9!
Best Solution: [9 15 6 3 17 5 7 11 13 1 19 18 16 8 4 2 14 10 20 12]
Best Makespan: 1278.0
Execution Time: 5.33 seconds


As we can see, the Genetic Algorithm with Tabu Search is able to find the optimal solution for the Ta001 problem with a makespan of **1278**
in a relatively short amount of time (around 5 seconds).

## Hyperparameter Optimization with Optuna

Optuna is a hyperparameter optimization framework that automates the process of finding the best hyperparameters for a given machine learning model or optimization problem. It uses an adaptive algorithm to efficiently search the hyperparameter space and find the best set of hyperparameters that optimize the objective function.

In this section, we will use Optuna to perform hyperparameter optimization for the Genetic Algorithm with Tabu Search. We will define the hyperparameters to optimize, the objective function to minimize (makespan), and the search space for each hyperparameter.

Let's start by defining the objective function for the hyperparameter optimization.




### Objective Function for Hyperparameter Optimization

The objective function for the hyperparameter optimization is the makespan of the best solution found by the Genetic Algorithm with Tabu Search. The goal is to minimize the makespan by finding the best set of hyperparameters for the algorithm.

The hyperparameters we will optimize are:

- Population Size
- Tournament Size
- Mutation Rate
- Tabu Tenure
- Tabu Search Frequency
- Tabu Iterations
- Tabu Search Percentage

We will define the search space for each hyperparameter and use Optuna to find the best set of hyperparameters that **minimize** the makespan.

In order to do that, let's first define the number of generations for the optimization process, in this case, we will use 10 generations.
Also, we will use 10 trials for the optimization process.

In [24]:
# Set the number of generations for the optimization process
GENERATIONS = 10

In [25]:
def objective(trial):
    # Define the hyperparameter search space
    pop_size = trial.suggest_int('pop_size', 50, 200)
    mutation_rate = trial.suggest_float('mutation_rate', 0.01, 0.5)
    tournament_size = trial.suggest_int('tournament_size', 2, 5)
    tabu_tenure = trial.suggest_int('tabu_tenure', 5, 20)
    tabu_search_frequency = trial.suggest_int('tabu_search_frequency', 1, 10)
    tabu_iterations = trial.suggest_int('tabu_iterations', 30, 70)
    tabu_search_percentage = trial.suggest_float('tabu_search_percentage', 0.05, 0.4)

    # Start measuring execution time
    start_time = time()

    np.random.seed(2)

    # Execute the GA and retrieve the best makespan
    _, best_makespan = genetic_algorithm(data=data, pop_size=pop_size,
                                         generations=GENERATIONS,
                                         mutation_rate=mutation_rate,
                                         tournament_size=tournament_size,
                                         tabu_tenure=tabu_tenure,
                                         tabu_search_frequency=tabu_search_frequency,
                                         tabu_iterations=tabu_iterations,
                                         tabu_search_percentage=tabu_search_percentage)

    # Measure execution time
    execution_time = time() - start_time

    # The goal is to minimize the time taken to find the optimal solution
    return execution_time


Now that we have defined the objective function for the hyperparameter optimization, we can proceed with running the optimization process using Optuna.



In [26]:
# Create a study object and optimize the objective function
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=10)  # Run with 10 trials

[I 2024-04-02 19:15:42,038] A new study created in memory with name: no-name-7c5e07f9-63b8-4652-9df6-57e585cc96b9


[9 13 6 3 16 19 14 17 15 4 18 11 5 7 8 2 12 1 10 20]
Generation 1: Best Makespan = 1297.0
[17 15 14 19 13 4 1 11 9 3 2 6 7 16 8 5 18 12 10 20]
Generation 2: Best Makespan = 1297.0
[9 15 17 19 13 4 11 6 14 18 5 7 8 16 10 12 3 2 1 20]
Generation 3: Best Makespan = 1297.0
[9 6 11 13 3 19 14 17 15 4 5 7 8 1 2 16 18 12 10 20]
Generation 4: Best Makespan = 1297.0
[9 15 17 19 13 3 6 14 5 7 8 16 4 11 1 2 18 12 10 20]
Generation 5: Best Makespan = 1297.0
[11 9 6 13 17 19 14 4 3 2 1 15 7 8 5 16 18 12 10 20]
Generation 6: Best Makespan = 1297.0
[15 13 17 6 14 4 11 3 9 7 19 5 8 1 2 16 18 12 10 20]
Generation 7: Best Makespan = 1297.0
[9 17 15 19 13 3 6 14 5 7 16 1 11 2 4 18 8 12 10 20]
Generation 8: Best Makespan = 1297.0


[I 2024-04-02 19:15:59,883] Trial 0 finished with value: 17.844082832336426 and parameters: {'pop_size': 64, 'mutation_rate': 0.136143655964633, 'tournament_size': 5, 'tabu_tenure': 17, 'tabu_search_frequency': 4, 'tabu_iterations': 38, 'tabu_search_percentage': 0.2529253715855299}. Best is trial 0 with value: 17.844082832336426.


[9 13 11 15 6 19 4 3 17 14 5 7 8 2 1 16 18 12 10 20]
Generation 9: Best Makespan = 1297.0
[15 17 9 19 6 14 5 8 1 2 4 13 11 16 3 7 18 12 10 20]
Generation 10: Best Makespan = 1297.0


[I 2024-04-02 19:16:05,781] Trial 1 finished with value: 5.89701247215271 and parameters: {'pop_size': 128, 'mutation_rate': 0.11784030728618715, 'tournament_size': 4, 'tabu_tenure': 6, 'tabu_search_frequency': 5, 'tabu_iterations': 64, 'tabu_search_percentage': 0.21080015514597011}. Best is trial 1 with value: 5.89701247215271.


Found optimal solution at iteration 27!
[15 6 9 5 4 17 19 18 13 16 14 3 11 7 8 10 12 1 2 20]
Generation 1: Best Makespan = 1297.0
[15 6 9 5 11 13 4 17 19 18 16 14 3 7 8 10 12 1 2 20]
Generation 2: Best Makespan = 1297.0
[17 9 14 8 5 19 6 3 7 4 11 1 15 18 13 12 2 16 10 20]
Generation 3: Best Makespan = 1297.0
[6 9 3 17 14 5 7 11 8 4 19 1 15 18 2 13 12 16 10 20]
Generation 4: Best Makespan = 1297.0
[6 3 15 9 1 14 5 7 11 2 4 8 17 19 13 16 18 12 10 20]
Generation 5: Best Makespan = 1297.0
[15 5 9 4 16 1 8 17 2 14 13 3 11 7 6 19 18 12 10 20]
Generation 6: Best Makespan = 1297.0


[I 2024-04-02 19:16:22,553] Trial 2 finished with value: 16.771024465560913 and parameters: {'pop_size': 71, 'mutation_rate': 0.3054681622108537, 'tournament_size': 5, 'tabu_tenure': 11, 'tabu_search_frequency': 6, 'tabu_iterations': 54, 'tabu_search_percentage': 0.21908802873204142}. Best is trial 1 with value: 5.89701247215271.


[15 17 14 9 6 5 4 2 13 1 19 16 3 7 11 8 18 10 12 20]
Generation 7: Best Makespan = 1297.0
[6 9 7 11 3 15 14 5 4 8 17 19 13 2 12 1 18 16 10 20]
Generation 8: Best Makespan = 1297.0
[15 19 7 11 3 17 14 5 8 1 6 9 4 2 13 12 18 16 10 20]
Generation 9: Best Makespan = 1297.0
[15 6 9 4 16 1 8 17 2 14 13 3 11 5 7 19 18 12 10 20]
Generation 10: Best Makespan = 1297.0


[I 2024-04-02 19:16:24,028] Trial 3 finished with value: 1.4739975929260254 and parameters: {'pop_size': 51, 'mutation_rate': 0.449028996840953, 'tournament_size': 4, 'tabu_tenure': 5, 'tabu_search_frequency': 8, 'tabu_iterations': 64, 'tabu_search_percentage': 0.2394570015432429}. Best is trial 3 with value: 1.4739975929260254.


Found optimal solution at iteration 6!
[14 9 6 11 13 4 3 5 7 8 2 12 15 19 17 18 16 1 10 20]
Generation 1: Best Makespan = 1297.0
[15 17 13 19 6 4 3 1 9 5 14 2 12 8 18 16 7 11 10 20]
Generation 2: Best Makespan = 1297.0
[6 9 4 3 11 13 17 14 5 7 8 2 15 19 1 18 12 16 10 20]
Generation 3: Best Makespan = 1297.0
[17 9 13 4 11 6 2 5 7 8 15 19 16 14 1 18 12 3 10 20]
Generation 4: Best Makespan = 1297.0
[9 13 4 11 17 5 6 2 15 7 14 1 16 8 3 19 18 12 10 20]
Generation 5: Best Makespan = 1297.0
[17 9 13 4 11 3 6 15 7 5 14 1 2 8 16 18 19 12 10 20]
Generation 6: Best Makespan = 1297.0
[9 13 15 4 11 6 16 14 5 8 7 3 1 2 19 17 18 12 10 20]
Generation 7: Best Makespan = 1297.0
[15 6 11 13 9 8 7 4 14 17 16 3 2 1 19 5 18 12 10 20]
Generation 8: Best Makespan = 1297.0
[13 9 17 4 11 15 19 14 8 16 3 6 2 7 1 5 18 12 10 20]
Generation 9: Best Makespan = 1297.0


[I 2024-04-02 19:17:02,303] Trial 4 finished with value: 38.27374076843262 and parameters: {'pop_size': 126, 'mutation_rate': 0.2449432361249659, 'tournament_size': 5, 'tabu_tenure': 19, 'tabu_search_frequency': 3, 'tabu_iterations': 39, 'tabu_search_percentage': 0.18912654509213805}. Best is trial 3 with value: 1.4739975929260254.


[15 9 13 11 17 4 6 8 3 2 7 14 1 16 19 5 18 12 10 20]
Generation 10: Best Makespan = 1297.0
[17 15 6 19 14 9 13 1 16 4 5 18 11 8 2 3 7 10 20 12]
Generation 1: Best Makespan = 1288.0
[15 14 9 17 4 8 19 3 18 1 16 7 11 5 6 10 12 2 13 20]
Generation 2: Best Makespan = 1297.0
[9 8 15 3 6 11 5 17 7 4 19 14 18 10 12 16 1 2 13 20]
Generation 3: Best Makespan = 1297.0
[3 9 6 16 17 15 5 1 11 14 7 4 2 10 8 18 12 19 13 20]
Generation 4: Best Makespan = 1297.0
[15 6 8 17 14 19 4 3 9 13 1 16 5 2 11 7 18 12 10 20]
Generation 5: Best Makespan = 1297.0
[9 14 15 6 8 17 19 4 3 13 1 16 5 2 11 7 18 12 10 20]
Generation 6: Best Makespan = 1297.0
[14 17 9 6 8 1 3 16 13 11 7 5 2 15 4 18 19 12 10 20]
Generation 7: Best Makespan = 1297.0
[17 8 9 15 6 19 16 13 1 14 4 2 11 5 3 7 18 12 10 20]
Generation 8: Best Makespan = 1297.0


[I 2024-04-02 19:17:26,523] Trial 5 finished with value: 24.21959161758423 and parameters: {'pop_size': 166, 'mutation_rate': 0.3555748941678539, 'tournament_size': 4, 'tabu_tenure': 6, 'tabu_search_frequency': 8, 'tabu_iterations': 57, 'tabu_search_percentage': 0.1006635212470671}. Best is trial 3 with value: 1.4739975929260254.


[15 8 4 19 9 13 1 14 17 16 11 5 6 3 18 7 12 2 10 20]
Generation 9: Best Makespan = 1297.0
[15 9 14 17 1 4 16 13 11 5 19 6 2 8 3 7 18 12 10 20]
Generation 10: Best Makespan = 1297.0
[15 6 9 14 17 3 8 11 13 1 2 7 18 16 4 19 5 10 20 12]
Generation 1: Best Makespan = 1294.0
[9 15 14 3 6 17 8 11 13 1 2 7 18 16 4 19 5 10 20 12]
Generation 2: Best Makespan = 1291.0
[14 11 13 16 15 6 18 12 9 5 1 17 3 7 8 19 4 2 10 20]
Generation 3: Best Makespan = 1297.0
[15 6 1 16 13 17 18 12 9 5 3 14 7 11 4 19 2 8 10 20]
Generation 4: Best Makespan = 1297.0


[I 2024-04-02 19:17:56,330] Trial 6 finished with value: 29.804608821868896 and parameters: {'pop_size': 110, 'mutation_rate': 0.06041807911585367, 'tournament_size': 5, 'tabu_tenure': 14, 'tabu_search_frequency': 2, 'tabu_iterations': 42, 'tabu_search_percentage': 0.22978943054459544}. Best is trial 3 with value: 1.4739975929260254.


Found optimal solution at iteration 4!


[I 2024-04-02 19:18:00,665] Trial 7 finished with value: 4.334554433822632 and parameters: {'pop_size': 107, 'mutation_rate': 0.45953492859991224, 'tournament_size': 4, 'tabu_tenure': 9, 'tabu_search_frequency': 5, 'tabu_iterations': 36, 'tabu_search_percentage': 0.3850600363230722}. Best is trial 3 with value: 1.4739975929260254.


Found optimal solution at iteration 6!
[15 9 3 1 6 4 19 17 5 14 13 2 12 8 18 16 7 11 10 20]
Generation 1: Best Makespan = 1297.0
[14 9 8 3 1 11 15 17 7 19 13 18 16 6 12 5 4 2 10 20]
Generation 2: Best Makespan = 1297.0
[6 4 19 17 13 1 9 7 11 5 8 14 16 3 15 18 2 12 10 20]
Generation 3: Best Makespan = 1297.0
[15 1 11 9 13 4 6 3 5 16 19 7 14 17 18 12 8 2 10 20]
Generation 4: Best Makespan = 1302.0
[8 3 13 1 11 17 9 5 15 12 14 4 2 16 6 19 7 10 18 20]
Generation 5: Best Makespan = 1324.0
[14 3 1 8 17 2 16 13 4 19 6 15 11 7 5 9 18 12 10 20]
Generation 6: Best Makespan = 1297.0
[14 3 1 8 17 19 6 15 11 7 5 9 18 16 4 13 2 12 10 20]
Generation 7: Best Makespan = 1315.0
[9 8 15 17 1 14 11 18 19 7 6 3 4 2 16 5 13 12 10 20]
Generation 8: Best Makespan = 1297.0


[I 2024-04-02 19:18:23,819] Trial 8 finished with value: 23.152254819869995 and parameters: {'pop_size': 195, 'mutation_rate': 0.4999803339805571, 'tournament_size': 2, 'tabu_tenure': 19, 'tabu_search_frequency': 8, 'tabu_iterations': 41, 'tabu_search_percentage': 0.22721680910699138}. Best is trial 3 with value: 1.4739975929260254.


Found optimal solution at iteration 41!


[I 2024-04-02 19:18:26,958] Trial 9 finished with value: 3.1379992961883545 and parameters: {'pop_size': 173, 'mutation_rate': 0.4568170270557136, 'tournament_size': 5, 'tabu_tenure': 15, 'tabu_search_frequency': 5, 'tabu_iterations': 49, 'tabu_search_percentage': 0.27070520354733413}. Best is trial 3 with value: 1.4739975929260254.


Found optimal solution at iteration 48!


After running the hyperparameter optimization with Optuna, we can retrieve the best set of hyperparameters found by the optimization process.

In [27]:
# Get the best hyperparameters
best_params = study.best_params

# Print the best hyperparameters
print("Best Hyperparameters: ", best_params)

Best Hyperparameters:  {'pop_size': 51, 'mutation_rate': 0.449028996840953, 'tournament_size': 4, 'tabu_tenure': 5, 'tabu_search_frequency': 8, 'tabu_iterations': 64, 'tabu_search_percentage': 0.2394570015432429}


Now let us check the best makespan found by the hyperparameter optimization process.

In [28]:
# Execute the Genetic Algorithm with the best hyperparameters
np.random.seed(2)

start_time = time()

best_solution, best_makespan = genetic_algorithm(data=data, pop_size=best_params['pop_size'],
                                                 generations=GENERATIONS,
                                                 mutation_rate=best_params['mutation_rate'],
                                                 tournament_size=best_params['tournament_size'],
                                                 tabu_tenure=best_params['tabu_tenure'],
                                                 tabu_search_frequency=best_params['tabu_search_frequency'],
                                                 tabu_iterations=best_params['tabu_iterations'],
                                                 tabu_search_percentage=best_params['tabu_search_percentage'])

execution_time = time() - start_time

# Print the best solution and its makespan
print(f"Best Solution: {best_solution}")
print(f"Best Makespan: {best_makespan}")
print(f"Execution Time: {round(execution_time, 2)} seconds")

Found optimal solution at iteration 6!
Best Solution: [ 9 15  6  5 14  8  1 19 16 17 13  4  3 18  7 11  2 10 20 12]
Best Makespan: 1278.0
Execution Time: 1.57 seconds


In [29]:
# Save the optimal parameters to a json file
with open('optimal_params.json', 'w') as f:
    json.dump(best_params, f)

As we can see, the hyperparameter optimization process with Optuna helped us find the best set of hyperparameters that minimized the time taken to find the optimal solution for the Ta001 problem.

More specifically, with the best hyperparameters found by the optimization process, the Genetic Algorithm with Tabu Search was able to find the optimal solution with a makespan of 1278 in a short amount of time (around a second), which is a significant improvement over the initial execution time of around 5 seconds.


> Note: The optimal makespan can be found with different solutions, so the solution may vary between executions if the random seed is different,
(but the makespan should remain the same).

# Conclusion

In this notebook, we have implemented a hybrid Genetic Algorithm with Tabu Search to solve the Permutation Flowshop Scheduling Problem (PFSP). We have defined the objective function for the PFSP, implemented the Genetic Algorithm with selection, crossover, and mutation operators, and integrated Tabu Search to improve the performance of the Genetic Algorithm. We have also performed hyperparameter optimization using Optuna to find the best set of hyperparameters for the Genetic Algorithm with Tabu Search.

The Genetic Algorithm with Tabu Search was able to find the optimal solution for the Ta001 problem with a makespan of 1278 in a relatively short amount of time. The hyperparameter optimization process using Optuna helped us find the best set of hyperparameters that minimized the time taken to find the optimal solution.

Let's sum up all the answers to the questions of the assignment.

1) Calculate the number of possible solutions to the problem: 
**20! = 2432902008176640000**

2) Explain which type of algorithm best suits its solution (exact,
heuristics, metaheuristics): 
**Metaheuristics** are the best choice for solving the PFSP due to the large search space and the combinatorial nature of the problem. Metaheuristics like the **Genetic Algorithm** and **Tabu Search** can efficiently explore the search space and find good solutions to the PFSP.

3) Choose an algorithm to solve and develop it in the language programming language of your choice: 
I have chosen to implement a **hybrid Genetic Algorithm with Tabu Search** to solve the PFSP in Python. The solution can be found in the `final.py` file in the repository where the code is implemented and when executed it will write a file named `output.csv` with the optimal solution and its makespan, along with the execution time.

4) List the optimal solution you found and its makespan (total completion time): **makespan** = **1278**, 
**solution** = [9, 15,  6,  5, 14,  8,  1, 19, 16, 17, 13,  4,  3, 18,  7, 11,  2, 10, 20, 12]
