# Introduction

This work presents a constrained combinatorial optimization approach to the **Sports League Assignment Problem** using **Genetic Algorithms (GAs)**. The objective is to allocate a fixed pool of professional players into a set of 5 structurally valid teams in such a way that the **standard deviation of the teams\" average skill ratings** is minimized—promoting competitive balance across the league.

Each player is defined by three attributes: **position** (one of `GK`, `DEF`, `MID`, `FWD`), **skill rating** (a numerical measure of ability), and **cost** (in million euros). A valid solution must satisfy the following **hard constraints**:

- Each team must consist of exactly **7 players**, with a specific positional structure: **1 GK, 2 DEF, 2 MID, and 2 FWD**
- Each team must have a **total cost ≤ 750 million €**
- Each player must be assigned to **exactly one team** (no overlaps)

The **search space** is therefore highly constrained and discrete, and infeasible configurations are explicitly excluded from the solution space. The optimization objective is to identify league configurations where teams are not only valid but also **skill-balanced**, quantified by the **standard deviation of average skill ratings across teams**, which serves as the **fitness function** (to be minimized).

To address this, we implement a domain-adapted **Genetic Algorithm framework** featuring:

- A custom **representation** based on team-to-player mappings
- Validity-preserving **mutation** and **crossover** operators
- Multiple **selection mechanisms**
- Optional **elitism** and population-level diversity handling

This report provides a formal problem definition, details the design of the solution encoding and operators, and presents empirical results comparing different GA configurations. The overall objective is to evaluate how well GA-based metaheuristics can navigate this complex constrained search space and evolve solutions that both satisfy domain constraints and optimize league balance.

In addition to Genetic Algorithms, this project also explores and evaluates alternative optimization strategies, such as **Hill Climbing** and **Simulated Annealing**, which are well-suited for navigating discrete and constrained search spaces. These algorithms offer different trade-offs in terms of exploration, exploitation, and convergence speed. By implementing and benchmarking multiple approaches on the same problem, we aim to gain deeper insights into their relative effectiveness and robustness when applied to complex constrained optimization tasks such as the Sports League Assignment. This comparative analysis enhances the interpretability of results and supports a broader understanding of the strengths and limitations of population-based versus local search-based heuristics.

## Cell 1: Setup and Critical Data Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from pathlib import Path

from solution import LeagueSolution, LeagueHillClimbingSolution, LeagueSASolution
from evolution import genetic_algorithm, hill_climbing, simulated_annealing
from parameters import ProblemParameters, AlgorithmParameters
from operators import (
    # Base Mutations
    mutate_swap,
    mutate_team_shift,
    mutate_shuffle_team,
    # New/Adapted Mutations
    mutate_swap_constrained,
    mutate_targeted_player_exchange,
    mutate_shuffle_within_team_constrained,
    # Base Crossovers
    crossover_one_point,
    crossover_uniform,
    # New/Adapted Crossovers
    crossover_one_point_prefer_valid,
    crossover_uniform_prefer_valid,
    # Selection Operators
    selection_ranking,
    selection_tournament_variable_k,
    selection_boltzmann
)

# Define caminhos relativos para resultados
# Determina o diretório raiz do projeto
notebook_dir = Path(os.path.dirname(os.path.abspath(__file__)))
project_root = notebook_dir.parent.parent  # Ajusta para a raiz do projeto

# Define diretórios de resultados
RESULTS_DIR = project_root / "results"
PHASE_1_DIR = RESULTS_DIR / "phase_1_sp"
IMAGES_DIR = PHASE_1_DIR / "images"
DATA_DIR = PHASE_1_DIR / "data"

# Garante que os diretórios existem
for dir_path in [RESULTS_DIR, PHASE_1_DIR, IMAGES_DIR, DATA_DIR]:
    if not os.path.exists(dir_path):
        os.makedirs(dir_path)
        print(f"[{time.strftime('%Y-%m-%d %H:%M:%S')}] Created directory: {dir_path}")

# Inicializa parâmetros do problema e algoritmos
problem_params = ProblemParameters()
algo_params = AlgorithmParameters()

# Exibe os parâmetros para referência
print("\nParâmetros do problema:")
print(problem_params)
print("\nParâmetros dos algoritmos:")
print(algo_params)

# Load player data
# Caminho relativo para o arquivo de dados
data_file = project_root / "data" / "players.csv"
players_df = pd.read_csv(data_file, sep=";")
# Não usamos a primeira coluna como índice
players_data = players_df.values.tolist()  # Convertendo para lista de listas em vez de dicionários

## Cell 2: Hill Climbing Implementation