<div style = "background-color:   rgba(0, 169, 143, 0.1); padding: 10px;">

### Bayesian Optimization (ONLY FOR CLASSIFICATION PROBLEMS!!!)

</div>

Documentation of Hyperopt Bayesian optimization can be found [Here](https://github.com/hyperopt/hyperopt/wiki/FMin)

---

**Hyperopt** is a Python library for optimizing machine learning models using Bayesian optimization. **Bayesian optimization** is a probabilistic model-based approach for global optimization of black-box functions. In the context of machine learning, the **black-box function** is typically the performance (objective) of a model on a given task.
- **Objective Function (Black-Box Function)**: In machine learning, the objective function is a function that evaluates the performance of a model given a set of hyperparameters. The goal is to find the set of hyperparameters that maximizes or minimizes this objective function.
- **Bayesian Optimization**: Bayesian optimization is a probabilistic model-based approach that uses a surrogate model to approximate the true objective function. It maintains a probabilistic model (usually a Gaussian Process) that predicts the distribution of the objective function across the hyperparameter space. The model is updated iteratively as new evaluations of the objective function are obtained.
- **Acquisition Function**: The acquisition function is a criterion used to decide where to sample the objective function next. It balances exploration (sampling in regions where uncertainty is high) and exploitation (sampling in regions where the objective function is likely to be optimal). Common acquisition functions include Probability of Improvement (PI), Expected Improvement (EI), and Upper Confidence Bound (UCB).
- **Trials and History**: Bayesian optimization is an iterative process. Each iteration consists of evaluating the objective function at a specific set of hyperparameters. These evaluations are referred to as trials. The history of trials is used to update the probabilistic model and make informed decisions about where to sample next.
- **Hyperparameter Space**: The hyperparameter space is the range of values that each hyperparameter can take. Hyperopt allows you to define a search space for hyperparameters, specifying the type of each hyperparameter (continuous, discrete, or categorical) and its possible values.
- **Optimization Algorithm**: Hyperopt uses a combination of Tree of Parzen Estimators (TPE) and Random Search for global optimization. TPE is a Bayesian optimization algorithm that models the probability of improvement and uses it to guide the search.

---

There are four key elements for Hyperopt:
- The **Space** over which to search
- The **Objective Function** to minimize
- The **Database** in which to store all the point evaluations of the search
- The **Search Algorithm** to use

In [47]:
# Space: The same space as the Random Search
space = param_grid_RS

In [48]:
from sklearn.model_selection import StratifiedKFold

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits = 5,                     # the training dataset is split into 5 folds
                        shuffle=True,                     # the dataset will be shuffled before splitting into folds. Note that the samples within each split will not be shuffled.
                        random_state=0)                   # make the split reproducible.

In [49]:
from hyperopt import tpe, STATUS_OK, Trials, hp, fmin, STATUS_OK, space_eval
from sklearn.model_selection import cross_val_score

# Objective function
def objective(params):

    xgbreg = XGBRegressor(seed=0, **params)                        # XGBRegressor is used as the model algorithm. seed=0 makes the model results reproducible. **params takes in the hyperparameter values.
    score = cross_val_score(estimator = xgbreg,                    # 'cross_val_score' produces k scores, one for each of the k folds. We get the mean of the k scores and output the average value. estimator takes the estimator to fit the data.
                            X = X_train,                           # X takes the training dataset feature matrix and
                            y = y_train,                           # y takes the target variable for the training dataset.
                            cv = kfold,                            # cv determines the cross-validation splitting strategy
                            scoring = 'r2',
                            n_jobs=-1).mean()                      # n_jobs = -1 enables parallel model training.

    # Loss is negative score
    loss = - score

    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'status': STATUS_OK}

In [50]:
# Optimize
best = fmin(fn = objective,                                 # 'fmin' is the function to search the best hyperparameters with the smallest loss value.
            space = space,                                  # the search space of the hyperparameters.
            algo = tpe.suggest,                             # the type of search algorithms. Hyperopt currently has three algorithms, random search, Tree of Parzen Estimators (TPE), and adaptive TPE. Using TPE this time.
            max_evals = 48,                                 # specifies the maximum number of evaluations.
            trials = Trials())                              # stores the information for the evaluations.

  0%|          | 0/48 [00:00<?, ?trial/s, best loss=?]

job exception: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.



  0%|          | 0/48 [00:00<?, ?trial/s, best loss=?]


ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.