# Genetic Algorithm for Graph Coloring

This notebook explores the application of genetic algorithms to solve the graph coloring problem. The graph coloring problem is a classic NP-hard problem in computational complexity theory, where the goal is to assign colors to vertices of a graph such that no two adjacent vertices share the same color. The objective is to minimize the total number of colors used.

## Introduction to Genetic Algorithms

Genetic Algorithms (GAs) are a subclass of evolutionary algorithms inspired by the process of natural selection. They are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operations like mutation, crossover, and selection.

### Key Components of Genetic Algorithms

1. **Population**: A set of candidate solutions (individuals) to the optimization problem.
2. **Chromosome/Individual**: A representation of a solution to the problem, typically encoded as a string of genes.
3. **Fitness Function**: A function that evaluates how good a solution is relative to the problem's objectives.
4. **Selection**: The process of choosing individuals from the population to create offspring for the next generation.
5. **Crossover**: The process of combining parts of two parent solutions to create one or more children.
6. **Mutation**: Random changes to individual solutions that introduce diversity into the population.
7. **Elitism**: The strategy of retaining the best solutions from one generation to the next.

### The Genetic Algorithm Process

1. **Initialization**: Create an initial population of random individuals.
2. **Evaluation**: Calculate the fitness of each individual in the population.
3. **Selection**: Select individuals from the population to be parents based on their fitness.
4. **Reproduction**: Create new individuals through crossover and mutation operations.
5. **Replacement**: Form a new population by replacing some or all of the old population with the new individuals.
6. **Termination**: Repeat steps 2-5 until a termination condition is met (e.g., maximum number of generations or satisfactory fitness level).

## Graph Coloring Problem

### Problem Definition

Given an undirected graph G = (V, E) where V is the set of vertices and E is the set of edges, the graph coloring problem involves assigning colors to each vertex such that no two adjacent vertices share the same color. The objective is to minimize the number of colors used.

### Applications

Graph coloring has numerous real-world applications, including:

- **Scheduling**: Assigning time slots to tasks while avoiding conflicts.
- **Frequency Assignment**: Allocating radio frequencies to transmitters to minimize interference.
- **Register Allocation**: Optimizing the use of CPU registers in compiler design.
- **Map Coloring**: Ensuring adjacent regions on a map have different colors for better visualization.

### Complexity

The graph coloring problem is NP-hard, meaning there's no known polynomial-time algorithm that can find the optimal solution for all instances. This makes it a perfect candidate for meta-heuristic approaches like genetic algorithms.

## Modeling the Graph Coloring Problem for Genetic Algorithms

### Representation

For the graph coloring problem, we use the following representation:

- **Chromosome**: A list of integers where each position represents a vertex and the value represents the color assigned to that vertex.
- **Example**: For a graph with 5 vertices, a chromosome [0, 1, 0, 2, 1] means vertex 0 has color 0, vertex 1 has color 1, and so on.

### Fitness Function

The fitness function should capture both the number of colors used and whether the coloring is valid (no adjacent vertices share the same color). In our implementations, we'll see two different approaches to the fitness function.

## Overview of Genetic Algorithm Approaches

Genetic algorithms are inspired by the process of natural selection and are particularly effective for optimization problems. In this notebook, we explore two distinct approaches for applying genetic algorithms to the graph coloring problem:

1. **Naive Approach**: This approach ensures that all solutions in the population are conflict-free from the start. It focuses on maintaining valid solutions throughout the algorithm's execution.
2. **Two-Phase Approach**: This approach divides the problem into two phases. The first phase focuses on finding a valid coloring, while the second phase minimizes the number of colors used.

## Naive Approach

The naive approach is designed to ensure that all solutions in the population remain conflict-free throughout the algorithm. This is achieved by carefully designing the initialization, crossover, and mutation processes to avoid conflicts.

### Problem Modeling in the Naive Approach

In the naive approach, we model the graph coloring problem with the following elements:

1. **Representation**: Each individual is a list where the index is the vertex and the value is its color.
2. **Constraint**: All solutions must be valid colorings (no adjacent vertices share the same color).
3. **Objective**: Minimize the number of colors used while maintaining validity.

### Key Algorithmic Decisions

1. **Initial Population Generation**: We carefully generate individuals that are already conflict-free, using a greedy approach that considers the colors of neighbors when assigning a color to a vertex.
2. **Fitness Function**: Since all individuals are valid, the fitness function simply counts the number of unique colors used.
3. **Genetic Operators**:
   - **Selection**: Rank-based selection that favors individuals using fewer colors.
   - **Crossover**: Designed to produce offspring that remain conflict-free.
   - **Mutation**: Modifies the coloring while ensuring no conflicts are introduced.

### Advantages and Limitations

- **Advantage**: Every generation contains only valid solutions, simplifying the fitness evaluation.
- **Limitation**: The constraint of maintaining validity can restrict the exploration of the solution space, potentially missing better solutions that might be reachable through temporarily invalid colorings.

### Key Features of the Naive Approach

- **Initialization**: The population is initialized with valid colorings, ensuring no conflicts from the start.
- **Crossover**: The crossover operator is designed to produce offspring that are also conflict-free.
- **Mutation**: The mutation operator modifies solutions while maintaining their validity.

This approach prioritizes maintaining valid solutions, which can limit the exploration of the solution space. As a result, it may converge to local optima, especially for complex graphs.

In [None]:
import random
import time
import numpy as np

class GeneticGraphColoringNoConflict:
    def __init__(self, adjacency_matrix):
        self.graph = adjacency_matrix
        self.num_nodes = len(adjacency_matrix)
        self.max_colors = self.num_nodes
        self.best_coloring = None
        self.best_colors_used = self.num_nodes



    # Function to check if there are conflicts in a solution
    def has_conflicts(self, individual):
        for i in range(self.num_nodes):
            for j in range(i + 1, self.num_nodes):
                if self.graph[i][j] == 1 and individual[i] == individual[j]:
                    return True
        return False



    # Fitness function: considers only the number of colors used since we don't accept conflicts when generating solutions
    def fitness(self, individual):
        return len(set(individual))



    # Function to generate a conflict-free solution for use in the initial population generation
    def generate_valid_individual(self):
        individual = [-1] * self.num_nodes
        for node in range(self.num_nodes):
            neighbor_colors = {individual[neighbor] for neighbor in range(self.num_nodes) if self.graph[node][neighbor] == 1 and individual[neighbor] != -1}
            available_colors = [c for c in range(self.max_colors) if c not in neighbor_colors]
            if available_colors:
                individual[node] = random.choice(available_colors)
            else:
                # If no color is available, add a new one
                individual[node] = max(neighbor_colors, default=-1) + 1
        return individual



    # Generate the initial population
    def initialize_population(self, size):
        population = []
        while len(population) < size:
            individual = self.generate_valid_individual()
            population.append(individual)
        return population


    # Rank selection: sort solutions by fitness and take the best ones (intermediate_size)
    def rank_selection(self, population):
        population.sort(key=lambda ind: self.fitness(ind))
        return population[:self.intermediate_size]


    # Crossover that only generates valid children (conflict-free solutions)
    def crossover(self, parent1, parent2):
        for _ in range(5):  # Try 5 times to generate conflict-free children from the chosen parents, each time trying a different crossover point
            point = random.randint(1, self.num_nodes - 2) # Crossover point
            child1 = parent1[:point] + parent2[point:]
            child2 = parent2[:point] + parent1[point:]
            if not self.has_conflicts(child1) and not self.has_conflicts(child2):
                return child1, child2
        # If failed, return unchanged parents
        return parent1[:], parent2[:]



    # Mutation that only generates valid individuals (conflict-free)
    def mutate(self, individual):
        for _ in range(5):  # Try 5 valid mutations, each time attempting with a new node for mutation
            mutated = individual[:]
            node = random.randint(0, self.num_nodes - 1)
            neighbor_colors = {mutated[neighbor] for neighbor in range(self.num_nodes) if self.graph[node][neighbor] == 1}
            available_colors = [c for c in range(self.max_colors) if c not in neighbor_colors]
            if available_colors:
                mutated[node] = random.choice(available_colors)
                if not self.has_conflicts(mutated):
                    return mutated
        return individual[:]  # If mutation fails, keep the original individual



    # Main genetic algorithm
    def run(self, pop_size, intermediate_size, crossover_rate, mutation_rate, max_iter, stagnation_threshold):
        self.intermediate_size = intermediate_size
        # Initialize population
        population = self.initialize_population(pop_size)

        stagnation = 0
        best_fitness = float('inf')
        start_time = time.time()

        for iteration in range(max_iter):

            # 1. Selection
            intermediate = self.rank_selection(population)
            offspring = intermediate[:]

            # 2. Crossover
            for _ in range(len(intermediate) // 2):
                if random.random() < crossover_rate:
                    p1, p2 = random.sample(intermediate, 2)
                    c1, c2 = self.crossover(p1, p2)
                    offspring.extend([c1, c2])

            # 3. Mutation
            for i in range(len(offspring)):
                if random.random() < mutation_rate:
                    offspring[i] = self.mutate(offspring[i])

            # 4. Population update
            offspring.sort(key=lambda ind: self.fitness(ind))
            population = offspring[:pop_size]

            # 5. Update the best individual
            current_best = population[0]
            current_fitness = self.fitness(current_best)

            if current_fitness < self.best_colors_used:
                self.best_coloring = current_best[:]
                self.best_colors_used = current_fitness
                stagnation = 0
            else:
                stagnation += 1

            if stagnation >= stagnation_threshold:
                break

        end_time = time.time()
        return self.best_coloring, self.best_colors_used, end_time - start_time

# Reading DIMACS file

def read_dimacs_graph(file_path):
    with open(file_path, 'r') as f:
        lines = f.readlines()

    edges = []
    num_nodes = 0
    for line in lines:
        if line.startswith('p'):
            _, _, n, _ = line.strip().split()
            num_nodes = int(n)
        elif line.startswith('e'):
            _, u, v = line.strip().split()
            edges.append((int(u) - 1, int(v) - 1))

    adjacency_matrix = np.zeros((num_nodes, num_nodes), dtype=int)
    for u, v in edges:
        adjacency_matrix[u, v] = 1
        adjacency_matrix[v, u] = 1

    print("Number of detected vertices:", num_nodes)
    print("Detected edges:", edges[:10])
    return adjacency_matrix


### Hyperparameter Analysis for the Naive Approach

In genetic algorithms, hyperparameters significantly impact performance. Let's examine the key hyperparameters in our naive approach:

1. **Population Size (`pop_size`)**:
   - Controls the diversity of solutions explored in parallel.
   - Larger populations provide more genetic diversity but increase computational cost.
   - For graph coloring, a population size of 100-200 often provides a good balance.

2. **Intermediate Size (`intermediate_size`)**:
   - Determines how many top individuals survive for breeding in each generation.
   - Typically set to about 30-50% of the population size.
   - Smaller values increase selection pressure and may lead to premature convergence.

3. **Crossover Rate (`crossover_rate`)**:
   - Controls the probability of performing crossover between two parents.
   - High values (0.7-0.9) generally work well for graph coloring problems.
   - Too low values may lead to insufficient exploration of the search space.

4. **Mutation Rate (`mutation_rate`)**:
   - Controls the probability of mutating an individual.
   - Should be relatively low (0.01-0.2) to avoid disrupting good solutions.
   - For graph coloring, slightly higher values can help escape local optima.

5. **Stagnation Threshold**:
   - Determines how many generations without improvement are allowed before termination.
   - Acts as an early stopping mechanism to save computational resources.
   - Values of 50-100 generations are common for medium-sized graphs.

In [None]:
def main():
    file_path = "/content/r250.5.col.txt"
    adjacency_matrix = read_dimacs_graph(file_path)

    ggc = GeneticGraphColoringNoConflict(adjacency_matrix)
    best_coloring, min_colors, exec_time = ggc.run(
        pop_size=100,
        intermediate_size=50,
        crossover_rate=0.8,
        mutation_rate=0.1,
        max_iter=1000,
        stagnation_threshold=100
    )

    print("Best coloring:", best_coloring)
    print("Minimum number of colors used:", min_colors)
    print("Execution time:", exec_time, "seconds")

if __name__ == "__main__":
    main()

## Two-Phase Approach

The two-phase approach separates the graph coloring problem into two distinct objectives, allowing for more focused optimization in each phase.

### Problem Modeling in the Two-Phase Approach

The two-phase approach takes a fundamentally different approach to modeling the graph coloring problem:

1. **Representation**: Same as in the naive approach - a list where the index is the vertex and the value is its color.

2. **Two Distinct Phases**:
   - **Phase 1**: Focus on finding a valid coloring (eliminating all conflicts) without concern for the number of colors.
   - **Phase 2**: Starting from valid solutions, focus on minimizing the number of colors used.

3. **Phase-Specific Fitness Functions**:
   - **Phase 1**: Fitness is based primarily on the number of conflicts, heavily penalizing invalid solutions.
   - **Phase 2**: For valid solutions, fitness is based on the number of colors used; invalid solutions are heavily penalized.

4. **Phase-Specific Genetic Operators**:
   - Selection adapts to the current phase's fitness criteria.
   - Mutation strategies differ between phases to prioritize either conflict resolution or color minimization.

### Implementation Strategy

1. **Population Initialization**: Create a diverse initial population without enforcing validity constraints.
2. **Phase 1 Evolution**: Run the GA until a valid coloring is found, focusing only on conflict elimination.
3. **Phase Transition**: Once a valid solution is found, use it as a seed for the second phase.
4. **Phase 2 Evolution**: Continue evolution with a new fitness function that focuses on minimizing colors while maintaining validity.

### Advantages and Limitations

- **Advantage**: More flexible exploration of the solution space, often finding better solutions for complex graphs.
- **Advantage**: Clear separation of concerns allows for specialized operators in each phase.
- **Limitation**: More complex implementation and potentially longer runtime due to the two-phase structure.

### Key Features of the Two-Phase Approach

1. **Phase 1: Find a Valid Coloring**
   - The algorithm focuses on eliminating conflicts between adjacent vertices.
   - The fitness function prioritizes conflict resolution, ignoring the number of colors used.

2. **Phase 2: Minimize Colors**
   - Once a valid coloring is achieved, the algorithm shifts to minimizing the number of colors.
   - The fitness function penalizes the use of additional colors while maintaining validity.

This approach is particularly effective for complex graphs with high edge density, as it allows for a more flexible exploration of the solution space.

In [None]:
import numpy as np
import random
import time


def read_dimacs_graph(file_path):
    """Reads a graph in DIMACS format and returns its adjacency matrix."""
    with open(file_path, 'r') as file:
        lines = file.readlines()

    edges = []
    num_nodes = 0

    for line in lines:
        line = line.strip()
        if line.startswith('p'):  # Problem line
            parts = line.split()
            num_nodes = int(parts[2])
        elif line.startswith('e'):  # Edge line
            parts = line.split()
            # DIMACS vertices are 1-indexed, convert to 0-indexed
            node1, node2 = int(parts[1]) - 1, int(parts[2]) - 1
            edges.append((node1, node2))

    # Create the adjacency matrix
    adjacency_matrix = np.zeros((num_nodes, num_nodes), dtype=int)

    # Fill the adjacency matrix
    for node1, node2 in edges:
        adjacency_matrix[node1, node2] = 1
        adjacency_matrix[node2, node1] = 1  # Undirected graph

    return adjacency_matrix

################################################################################

class TwoPhaseGeneticAlgorithmGraphColoring:
    def __init__(self, adjacency_matrix, population_size=100, max_generations=500,
                 crossover_rate=0.8, mutation_rate=0.2, tournament_size=3, elitism=True):
        """
        Initialize the genetic algorithm for graph coloring.

        Args:
            adjacency_matrix: Adjacency matrix of the graph
            population_size: Size of the population
            max_generations: Maximum number of generations
            crossover_rate: Crossover rate
            mutation_rate: Mutation rate
            tournament_size: Size of tournament for selection
            elitism: If True, preserves the best individual from each generation
        """
        self.adjacency_matrix = adjacency_matrix
        self.num_nodes = adjacency_matrix.shape[0]
        self.population_size = population_size
        self.max_generations = max_generations
        self.crossover_rate = crossover_rate
        self.mutation_rate = mutation_rate
        self.tournament_size = tournament_size
        self.elitism = elitism

        # Calculate the maximum degree of the graph to estimate the initial number of colors
        degrees = np.sum(adjacency_matrix, axis=1)
        self.max_degree = np.max(degrees)
        # An upper bound on the chromatic number is max_degree + 1
        self.initial_max_colors = self.max_degree + 1

        # Current phase: 1 = find valid coloring, 2 = minimize colors
        self.current_phase = 1

    def initialize_population(self):
        """Initialize a population of random solutions."""
        population = []
        for _ in range(self.population_size):
            # Each individual is a list where the index represents the vertex
            # and the value represents the color (from 0 to initial_max_colors-1)
            individual = [random.randint(0, self.initial_max_colors - 1) for _ in range(self.num_nodes)]
            population.append(individual)
        return population

    def calculate_conflicts(self, individual):
        """Calculate the number of conflicts in a coloring."""
        conflicts = 0
        for i in range(self.num_nodes):
            for j in range(i + 1, self.num_nodes):
                if self.adjacency_matrix[i, j] == 1 and individual[i] == individual[j]:
                    conflicts += 1
        return conflicts

    def calculate_fitness(self, individual):
        """
        Calculate the fitness value of an individual based on the current phase.

        Phase 1: Focus solely on eliminating conflicts
        Phase 2: Minimize the number of colors while maintaining a valid coloring
        """
        conflicts = self.calculate_conflicts(individual)

        # Count the colors used
        num_colors_used = len(set(individual))

        # Fitness function based on phase
        if self.current_phase == 1:
            # Phase 1: Eliminating conflicts is the priority
            fitness = conflicts * 1000
        else:
            # Phase 2: The coloring is valid, minimize the number of colors
            if conflicts > 0:
                # If conflicts appear in phase 2, penalize them heavily
                fitness = conflicts * 10000
            else:
                fitness = num_colors_used

        return fitness

    def is_valid_coloring(self, individual):
        """Check if the coloring respects the constraint that no adjacent vertices have the same color."""
        return self.calculate_conflicts(individual) == 0

    def tournament_selection(self, population, fitnesses):
        """Select an individual using tournament selection."""
        selected_indices = random.sample(range(len(population)), self.tournament_size)
        selected_fitness = [fitnesses[i] for i in selected_indices]
        # Select the individual with the best fitness (lowest value)
        return population[selected_indices[selected_fitness.index(min(selected_fitness))]]

    def crossover(self, parent1, parent2):
        """
        Perform crossover between two parents to create two children.
        Uses single-point crossover.
        """
        if random.random() < self.crossover_rate:
            crossover_point = random.randint(1, self.num_nodes - 1)
            child1 = parent1[:crossover_point] + parent2[crossover_point:]
            child2 = parent2[:crossover_point] + parent1[crossover_point:]
            return child1, child2
        else:
            return parent1.copy(), parent2.copy()

    def mutation(self, individual):
        """
        Apply mutation to an individual based on the current phase.
        """
        mutated = individual.copy()

        # In phase 1, allow more diversity to resolve conflicts
        if self.current_phase == 1:
            max_color = max(individual)

            for i in range(self.num_nodes):
                if random.random() < self.mutation_rate:
                    # Identify conflicts for this vertex
                    has_conflict = False
                    conflicting_colors = set()

                    for j in range(self.num_nodes):
                        if self.adjacency_matrix[i, j] == 1 and individual[i] == individual[j]:
                            has_conflict = True
                            conflicting_colors.add(individual[j])

                    # If this vertex has a conflict, choose a non-conflicting color
                    if has_conflict:
                        existing_colors = set(individual)
                        # Try to find an existing non-conflicting color
                        valid_colors = existing_colors - conflicting_colors

                        if valid_colors:
                            mutated[i] = random.choice(list(valid_colors))
                        else:
                            # Use a new color
                            mutated[i] = max_color + 1
                    else:
                        # No conflict, normal mutation
                        if random.random() < 0.3:  # 30% chance to change
                            mutated[i] = random.randint(0, max_color)
        else:
            # In phase 2, try to reduce the number of colors
            used_colors = set(individual)
            color_frequency = {color: individual.count(color) for color in used_colors}
            min_color = min(used_colors)

            for i in range(self.num_nodes):
                if random.random() < self.mutation_rate:
                    current_color = individual[i]

                    # Identify colors used by neighbors
                    neighbor_colors = set()
                    for j in range(self.num_nodes):
                        if self.adjacency_matrix[i, j] == 1:
                            neighbor_colors.add(individual[j])

                    # Find the lowest color not used by neighbors
                    valid_colors = []
                    for color in range(min_color, max(used_colors) + 1):
                        if color not in neighbor_colors:
                            valid_colors.append(color)

                    if valid_colors:
                        # Prefer existing colors with high frequency
                        valid_existing_colors = [c for c in valid_colors if c in used_colors]
                        if valid_existing_colors:
                            # Choose an existing color to reduce color diversity
                            weights = [color_frequency.get(c, 0) for c in valid_existing_colors]
                            total = sum(weights)
                            if total > 0:
                                weights = [w/total for w in weights]
                                mutated[i] = np.random.choice(valid_existing_colors, p=weights)
                            else:
                                mutated[i] = random.choice(valid_existing_colors)
                        else:
                            # Otherwise use the lowest available color
                            mutated[i] = min(valid_colors)

        return mutated

    def optimize_colors(self, individual):
        """
        Optimize the coloring by reducing color numbers if possible.
        Reindexes colors consecutively.
        """
        unique_colors = set(individual)
        color_map = {old_color: new_color for new_color, old_color in enumerate(sorted(unique_colors))}
        return [color_map[color] for color in individual]

    def local_search(self, individual):
        """
        Improve the solution by locally searching to reduce conflicts (phase 1)
        or the number of colors (phase 2).
        """
        improved = individual.copy()

        if self.current_phase == 1:
            # In phase 1, try to resolve conflicts
            for i in range(self.num_nodes):
                # Check if this vertex has conflicts
                has_conflict = False
                for j in range(self.num_nodes):
                    if self.adjacency_matrix[i, j] == 1 and improved[i] == improved[j]:
                        has_conflict = True
                        break

                if has_conflict:
                    # Collect neighbor colors
                    neighbor_colors = set()
                    for j in range(self.num_nodes):
                        if self.adjacency_matrix[i, j] == 1:
                            neighbor_colors.add(improved[j])

                    # Find a non-conflicting color
                    current_colors = set(improved)
                    for color in range(len(current_colors) + 1):
                        if color not in neighbor_colors:
                            improved[i] = color
                            break
        else:
            # In phase 2, try to reduce colors
            colors_used = set(improved)

            # Sort colors by frequency of use (start with least frequent)
            color_freq = {color: improved.count(color) for color in colors_used}
            sorted_colors = sorted(color_freq.items(), key=lambda x: x[1])

            # For each color (starting with least frequent)
            for color, _ in sorted_colors:
                # For each vertex with this color
                for i in range(self.num_nodes):
                    if improved[i] == color:
                        # Collect neighbor colors
                        neighbor_colors = set()
                        for j in range(self.num_nodes):
                            if self.adjacency_matrix[i, j] == 1:
                                neighbor_colors.add(improved[j])

                        # Try existing colors with higher frequency
                        for other_color, freq in reversed(sorted_colors):
                            if other_color != color and other_color not in neighbor_colors:
                                improved[i] = other_color
                                break

        return improved

    def run(self, verbose=True):
        """
        Run the two-phase genetic algorithm.

        Returns:
            best_individual: Best solution found
            num_colors: Number of colors used in the best solution
            is_valid: True if the coloring is valid, False otherwise
        """
        start_time = time.time()

        # Phase 1: Find a valid coloring
        self.current_phase = 1
        print("Phase 1: Finding a valid coloring...")

        # Initialize population
        population = self.initialize_population()

        best_individual = None
        best_fitness = float('inf')
        best_phase1_individual = None

        for generation in range(self.max_generations):
            # Calculate fitness for each individual
            fitnesses = [self.calculate_fitness(ind) for ind in population]

            # Find the best individual
            current_best_index = fitnesses.index(min(fitnesses))
            current_best = population[current_best_index]
            current_best_fitness = fitnesses[current_best_index]

            if current_best_fitness < best_fitness:
                best_fitness = current_best_fitness
                best_individual = current_best.copy()

                # Optimize the coloring
                best_individual = self.optimize_colors(best_individual)
                conflicts = self.calculate_conflicts(best_individual)
                num_colors = len(set(best_individual))
                is_valid = conflicts == 0

                if verbose and (generation % 10 == 0 or generation == self.max_generations - 1):
                    elapsed_time = time.time() - start_time
                    print(f"Generation {generation}: Best fitness = {best_fitness}, "
                          f"Conflicts = {conflicts}, Colors = {num_colors}, "
                          f"Valid = {is_valid}, Elapsed time = {elapsed_time:.2f}s")

                # If we found a valid coloring, move to phase 2
                if is_valid:
                    best_phase1_individual = best_individual.copy()
                    break

            # Create the new generation
            new_population = []

            # Elitism: keep the best individual
            if self.elitism:
                new_population.append(current_best)

            # Create the rest of the population through selection, crossover, and mutation
            while len(new_population) < self.population_size:
                parent1 = self.tournament_selection(population, fitnesses)
                parent2 = self.tournament_selection(population, fitnesses)

                child1, child2 = self.crossover(parent1, parent2)

                child1 = self.mutation(child1)
                child1 = self.local_search(child1)

                child2 = self.mutation(child2)
                child2 = self.local_search(child2)

                new_population.append(child1)
                if len(new_population) < self.population_size:
                    new_population.append(child2)

            population = new_population

        # If we didn't find a valid coloring in phase 1
        if not is_valid:
            print("Phase 1 did not find a valid coloring.")
            return best_individual, num_colors, is_valid

        # Phase 2: Minimize the number of colors
        self.current_phase = 2
        print("\nPhase 2: Minimizing the number of colors...")

        # Reinitialize the population with the best individual from phase 1
        population = [best_phase1_individual.copy() for _ in range(self.population_size)]
        # Apply slight mutations to each individual except the first to ensure diversity
        for i in range(1, self.population_size):
            population[i] = self.mutation(population[i])
            population[i] = self.local_search(population[i])
            population[i] = self.optimize_colors(population[i])

        best_fitness = self.calculate_fitness(best_phase1_individual)
        best_individual = best_phase1_individual.copy()
        best_num_colors = len(set(best_individual))

        for generation in range(self.max_generations):
            # Calculate fitness for each individual
            fitnesses = [self.calculate_fitness(ind) for ind in population]

            # Find the best individual
            current_best_index = fitnesses.index(min(fitnesses))
            current_best = population[current_best_index]
            current_best_fitness = fitnesses[current_best_index]

            # Optimize the coloring
            current_best = self.optimize_colors(current_best)

            if current_best_fitness <= best_fitness:  # <= to favor solutions with fewer colors
                best_fitness = current_best_fitness

                current_conflicts = self.calculate_conflicts(current_best)
                current_num_colors = len(set(current_best))

                # Only update if the coloring is valid and uses fewer colors
                if current_conflicts == 0 and current_num_colors <= best_num_colors:
                    best_individual = current_best.copy()
                    best_num_colors = current_num_colors

                    if verbose and (generation % 10 == 0 or generation == self.max_generations - 1):
                        elapsed_time = time.time() - start_time
                        print(f"Generation {generation}: Colors = {best_num_colors}, "
                              f"Valid = {current_conflicts == 0}, Elapsed time = {elapsed_time:.2f}s")

            # Create the new generation
            new_population = []

            # Elitism: keep the best individual
            if self.elitism:
                new_population.append(current_best)

            # Create the rest of the population
            while len(new_population) < self.population_size:
                parent1 = self.tournament_selection(population, fitnesses)
                parent2 = self.tournament_selection(population, fitnesses)

                child1, child2 = self.crossover(parent1, parent2)

                child1 = self.mutation(child1)
                child1 = self.local_search(child1)
                child1 = self.optimize_colors(child1)

                child2 = self.mutation(child2)
                child2 = self.local_search(child2)
                child2 = self.optimize_colors(child2)

                new_population.append(child1)
                if len(new_population) < self.population_size:
                    new_population.append(child2)

            population = new_population

        # Optimize the final coloring
        best_individual = self.optimize_colors(best_individual)
        num_colors = len(set(best_individual))
        is_valid = self.is_valid_coloring(best_individual)

        elapsed_time = time.time() - start_time
        if verbose:
            print(f"\nResults after {generation+1} generations of phase 2:")
            print(f"Number of colors: {num_colors}")
            print(f"Valid coloring: {is_valid}")
            print(f"Total execution time: {elapsed_time:.2f} seconds")

        return best_individual, num_colors, is_valid

### Hyperparameter Analysis for the Two-Phase Approach

The two-phase genetic algorithm has several hyperparameters that significantly influence its performance:

1. **Population Size (`population_size`)**:
   - Determines the number of individuals (potential solutions) in each generation.
   - Larger populations provide greater genetic diversity but require more computational resources.
   - In practice, values between 50-200 work well for medium-sized graphs.

2. **Maximum Generations (`max_generations`)**:
   - The maximum number of iterations for each phase.
   - Higher values allow for more thorough exploration but increase runtime.
   - For complex graphs, values of 300-500 generations per phase may be needed.

3. **Crossover Rate (`crossover_rate`)**:
   - Probability of performing crossover between two parents.
   - Typical values range from 0.7-0.9 for graph coloring problems.
   - Higher values encourage exploration of new solution combinations.

4. **Mutation Rate (`mutation_rate`)**:
   - Probability of mutating an individual after crossover.
   - For graph coloring, slightly higher values (0.1-0.3) can help escape local optima.
   - Phase 1 may benefit from higher mutation rates to find valid colorings quickly.

5. **Tournament Size (`tournament_size`)**:
   - Number of individuals randomly selected for each tournament selection.
   - Smaller values (2-5) reduce selection pressure, promoting diversity.
   - Larger values increase selection pressure, leading to faster convergence but potentially to local optima.

6. **Elitism**:
   - Whether to preserve the best individual from each generation.
   - Usually set to `True` to prevent losing good solutions.
   - Critical for maintaining progress, especially during the transition between phases.

Balancing these hyperparameters is crucial for achieving good results. For larger graphs, increasing population size and maximum generations while using elitism tends to provide better solutions at the cost of higher computational time.

In [None]:
if __name__ == "__main__":
    # Load a graph from a DIMACS file
    file_path = "/content/r250.5.col.txt"
    adjacency_matrix = read_dimacs_graph(file_path)

    # Genetic algorithm parameters
    population_size = 100
    max_generations = 300  # For each phase
    crossover_rate = 0.8
    mutation_rate = 0.2
    tournament_size = 3
    elitism = True

    # Create and run the genetic algorithm
    ga = TwoPhaseGeneticAlgorithmGraphColoring(
        adjacency_matrix,
        population_size=population_size,
        max_generations=max_generations,
        crossover_rate=crossover_rate,
        mutation_rate=mutation_rate,
        tournament_size=tournament_size,
        elitism=elitism
    )

    best_solution, num_colors, is_valid = ga.run(verbose=True)

    print("\nBest solution found:")
    print(f"Coloring: {best_solution}")
    print(f"Number of colors: {num_colors}")
    print(f"Valid coloring: {is_valid}")

## Comparison of Approaches

### Naive Approach
- **Advantages**: Guarantees valid solutions from the start, making it simpler to implement and analyze.
- **Disadvantages**: Limited exploration of the solution space, which may lead to suboptimal results for complex graphs.
- **Best Use Case**: Suitable for simpler graphs with relatively low chromatic numbers.

### Two-Phase Approach
- **Advantages**: Separates conflicting objectives, allowing for more effective optimization in each phase. Often finds solutions with fewer colors.
- **Disadvantages**: Requires more computational effort due to the two-phase structure.
- **Best Use Case**: Ideal for complex graphs with high edge density, where minimizing the chromatic number is critical.

## Experimental Analysis and Performance Evaluation

To thoroughly evaluate the effectiveness of both approaches, we should conduct experiments on various benchmark graphs from the DIMACS collection. We can measure performance based on the following criteria:

1. **Solution Quality**: Number of colors used in the final solution.
2. **Validity**: Whether the coloring respects all edge constraints.
3. **Computational Efficiency**: Execution time required to find a solution.
4. **Convergence Rate**: How quickly the algorithm approaches a good solution.

### Experiment Design

For a fair comparison, we should run both algorithms with equivalent computational resources (same number of fitness evaluations) on the same set of benchmark instances. We should also conduct multiple runs with different random seeds to account for the stochastic nature of genetic algorithms.

### Expected Results

- **Sparse Graphs**: For graphs with low edge density, the naive approach may be more efficient, as valid solutions are easier to maintain.
- **Dense Graphs**: For graphs with high edge density, the two-phase approach is expected to find better solutions by separating the conflict resolution from color minimization.
- **Large Graphs**: As the graph size increases, the two-phase approach will likely demonstrate better scalability due to its more flexible exploration of the solution space.

### Future Improvements

Several enhancements could further improve the performance of both approaches:

1. **Adaptive Mutation Rates**: Dynamically adjust mutation rates based on population diversity.
2. **Hybrid Algorithms**: Combine genetic algorithms with local search methods like tabu search or simulated annealing.
3. **Parallel Implementation**: Utilize parallel computing to evaluate fitness functions concurrently.
4. **Knowledge-Based Initialization**: Use graph-specific properties (like vertex degree) to guide the initial population creation.

## Conclusion

In this notebook, we've explored two different genetic algorithm approaches to solve the graph coloring problem. Both approaches have their strengths and weaknesses, making them suitable for different types of graphs and constraints.

The naive approach ensures valid solutions throughout the algorithm's execution, making it simpler but potentially limiting its exploration capabilities. In contrast, the two-phase approach separates the conflicting objectives of finding a valid coloring and minimizing the number of colors, allowing for more flexible exploration but requiring more complex implementation.

Genetic algorithms, when properly tuned and adapted to the problem structure, can provide high-quality approximate solutions to the graph coloring problem, especially for large and complex graphs where exact methods become computationally infeasible.