# XGBoost Optimization with FCVOpt

This notebook demonstrates how to tune an `XGBClassifier` on the `eye_movements` dataset using `FCVOpt`.
We will walk through the entire workflow: importing libraries, defining a metric, loading data,
setting up the cross-validation objective, specifying a hyperparameter search space,
running the optimizer, and inspecting the best result.


## 1. Import libraries

Load numerical, machine learning, and FCVOpt utilities used throughout this notebook.


In [None]:
import random
import numpy as np
import torch
from sklearn.datasets import fetch_openml
from sklearn.metrics import roc_auc_score
from xgboost import XGBClassifier
from fcvopt.configspace import ConfigurationSpace
from ConfigSpace import Float, Integer, Categorical
from fcvopt.optimizers.fcvopt import FCVOpt
from fcvopt.crossvalidation.sklearn_cvobj import XGBoostCVObjEarlyStopping


## 2. Define metric and helper functions

The optimizer minimizes a loss function. Here we convert ROC AUC into a loss by taking the square root of
`1 - AUC`. A small helper function sets all random seeds to ensure reproducible results.


In [None]:
def metric(y_true, y_pred):
    return np.sqrt(1 - roc_auc_score(y_true, y_pred))

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

set_seed(0)


## 3. Load the eye_movements dataset

Retrieve the `eye_movements` dataset from OpenML. Features `X` are returned as a pandas DataFrame and labels `y`
as a NumPy array. Displaying the head gives a glimpse of the feature structure.


In [None]:
DATA_ID = 44157  # OpenML id for eye_movements
X, y = fetch_openml(data_id=DATA_ID, return_X_y=True, as_frame=True, parser='auto')
X.head()


## 4. Create cross-validation objective

`XGBoostCVObjEarlyStopping` wraps an `XGBClassifier` and evaluates configurations using cross-validation with
early stopping. FCVOpt expects this object to provide the `cvloss` function that returns the averaged
validation loss for a given configuration.


In [None]:
cvobj = XGBoostCVObjEarlyStopping(
    estimator=XGBClassifier(
        n_estimators=2000,
        tree_method='approx',
        enable_categorical=True,
        n_jobs=-1
    ),
    X=X,
    y=y,
    loss_metric=metric,
    needs_proba=True,
    n_splits=10,
    n_repeats=1,
    holdout=False,
    task='binary-classification',
    early_stopping_rounds=50,
    rng_seed=0,
)


## 5. Define hyperparameter search space

Specify ranges for key XGBoost hyperparameters using a `ConfigurationSpace`. These bounds guide FCVOpt when
sampling new configurations to evaluate.


In [None]:
config = ConfigurationSpace(seed=1234)
config.add([
    Float('learning_rate', bounds=(1e-5, 0.95), log=True),
    Integer('max_depth', bounds=(1, 12), log=True),
    Integer('max_leaves', bounds=(2, 1024), log=True),
    Float('reg_alpha', bounds=(1e-8, 100), log=True),
    Float('reg_lambda', bounds=(1e-8, 100), log=True),
    Float('gamma', bounds=(1e-8, 100), log=True),
    Float('subsample', bounds=(0.1, 1.0)),
    Float('colsample_bytree', bounds=(0.1, 1.0)),
    Categorical('grow_policy', ['depthwise', 'lossguide']),
])
config.generate_indices()


## 6. Instantiate the optimizer

Create the `FCVOpt` optimizer with the cross-validation objective and search space. The optimizer performs
fold selection using variance reduction and starts with a stratified initialization.


In [None]:
opt = FCVOpt(
    obj=cvobj.cvloss,
    n_folds=cvobj.cv.get_n_splits(),
    n_repeats=1,
    fold_selection_criterion='variance_reduction',
    fold_initialization='stratified',
    config=config,
    save_iter=10,
    save_dir=None,
    verbose=1,
    n_jobs=-1,
)


## 7. Run the optimization

Run 20 optimization iterations, beginning with 5 random initial configurations to explore the space before
the model starts selecting promising candidates.


In [None]:
results = opt.run(n_iters=20, n_init=5)


## 8. Inspect the best result

After optimization, report the best loss and the corresponding hyperparameter configuration discovered by
FCVOpt.


In [None]:
print('Best loss:', opt.f_inc)
print('Best configuration:')
print(opt.inc_config)
