#### Overview

In this section, we will set up the necessary environment for training optimizing a machine learning model using a XGBoost Classifier classifier and **Bayesian Search**. We will:

1. **Import Essential Libraries** – Load key Python libraries for data handling, model training, evaluation, and hyperparameter tuning.
2. **Load and Prepare the Dataset** – Read the dataset from a CSV file, remove unnecessary columns, and split the data into features (`X`) and target labels (`y`).
3. **Perform Data Splitting** – Divide the dataset into training and testing sets to ensure the model generalizes well to unseen data.
4. **Define the Hyperparameter Search Space** – Specify a range of values for key hyperparameters of the Random Forest model to optimize performance.
5. **Optimize Model with Bayesian Search** – Utilize Bayesian optimization via `BayesSearchCV` to efficiently search for the best hyperparameters.
6. **Evaluate the Model** – Assess the model's performance using accuracy and a classification report.

The entire process will be logged with **Rich Console** to enhance readability and provide real-time updates.

In [None]:
# Importing libraries
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from skopt import BayesSearchCV
from skopt.space import Integer, Real, Categorical
from rich.console import Console
import numpy as np

# Initialize rich console
console: Console = Console()

# Load dataset
with console.status("[green]Loading Data...") as status:
    df: pd.DataFrame = pd.read_csv("../data/csv/dataset.csv")
    status.update("Data Loaded Successfully!")

# Drop non-training columns
df: pd.DataFrame = df.drop(["date", "home_team", "away_team"], axis=1)
X: pd.DataFrame = df.drop("winning_team", axis=1)
y: pd.DataFrame = df["winning_team"]

# Split dataset
console.print(
    "Splitting the dataset into training and testing sets...", style="bold cyan"
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define hyperparameter search space
param_space: dict[str, Integer | Real | Categorical] = {
    "n_estimators": Integer(100, 1000),  # Number of boosting rounds
    "max_depth": Integer(3, 15),  # Depth of trees
    "learning_rate": Real(0.01, 0.3, prior="log-uniform"),  # Step size shrinkage
    "subsample": Real(0.5, 1.0),  # Fraction of samples for training each tree
    "colsample_bytree": Real(0.5, 1.0),  # Fraction of features for each tree
    "gamma": Real(0, 5),  # Minimum loss reduction for further partition
    "reg_alpha": Real(0, 10),  # L1 regularization term on weights
    "reg_lambda": Real(0, 10),  # L2 regularization term on weights
    "min_child_weight": Integer(1, 10),  # Minimum sum of instance weight in a child
    "objective": Categorical(["binary:logistic"]),  # Binary classification
    "eval_metric": Categorical(["logloss", "auc", "error"]),  # Log loss as evaluation metric
}

# Bayesian optimization
console.print("Starting Bayesian Hyperparameter Tuning...", style="bold yellow")
bayes_search: BayesSearchCV = BayesSearchCV(
    XGBClassifier(random_state=42),
    param_space,
    n_iter=30,  # Number of evaluations
    cv=5,  # 5-fold cross-validation
    n_jobs=-1,
    random_state=42,
)

# Train the model with hyperparameter tuning
with console.status(
    "[yellow]Optimizing hyperparameters... please wait.[/yellow]"
) as status:
    bayes_search.fit(X_train, y_train)
    status.update("[green]Hyperparameter tuning complete![/green]")

# Get the best model and parameters
best_params: dict[str, int | float | str] = bayes_search.best_params_
best_model: XGBClassifier = bayes_search.best_estimator_
console.print("Best Hyperparameters:", best_params, style="bold green")

# Evaluate the best model
y_pred: np.ndarray = best_model.predict(X_test)
accuracy: float = accuracy_score(y_test, y_pred)
console.print(
    f"\n[bold green]Best Model Accuracy: {accuracy * 100:.2f}%[/bold green]",
    style="bold",
)
console.print(
    "\n[bold magenta]Classification Report:[/bold magenta]",
    classification_report(y_test, y_pred),
)