**NOTE: This notebook is written for the Google Colab platform. However it can also be run (possibly with minor modifications) as a standard Jupyter notebook.** 



In [None]:
#@title -- Installation of Packages -- { display-mode: "form" }
import sys
!{sys.executable} -m pip install hyperopt
!{sys.executable} -m pip install git+https://github.com/michalgregor/class_utils.git

In [None]:
#@title -- Import of Necessary Packages -- { display-mode: "form" }
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score
from hyperopt import hp, tpe
from hyperopt.pyll.base import scope
from hyperopt.fmin import fmin
from hyperopt import space_eval
from sklearn.model_selection import cross_validate

In [None]:
#@title -- Downloading Data -- { display-mode: "form" }
from class_utils.download import download_file_maybe_extract
download_file_maybe_extract("https://www.dropbox.com/s/u8u7vcwy3sosbar/titanic.zip?dl=1",
                            directory="data/titanic")

# also create a directory for storing any outputs
import os
os.makedirs("output", exist_ok=True)

## Bayesian Hyperparameter Optimization

We already know how to use `scikit-learn` to set up and train a simple model. We also know that models usually have hyperparameters: parameters that are not learned but need to be set in some way beforehand or, ideally, selected by some kind of hyperparameter optimization method.

Of course, training a model is often very expensive and has to be repeated for many different sets of hyperparameters while looking for the optimal configuration. This is why Bayesian optimization is often used for hyperparameter tuning – as we have already discussed, it is an approach geared towards being able to optimize the objective function with as few actual queries to its value as possible.

In this notebook we are going to show a practical approach to Bayesian hyperparameter optimization using a popular package called `hyperopt`.

### Loading and Preprocessing the Data

As usual, we will start by loading and preprocessing data. We will make use of the well-known [Titanic](https://www.kaggle.com/c/titanic) dataset. Since, at this point, there is no need to go over the loading and preprocessing in detail, the code of the next cell is hidden.



In [None]:
#@title -- Loading and preprocessing: X_train, Y_train, X_test, Y_test -- { display-mode: "form" }
df = pd.read_csv("data/titanic/train.csv")
df_train, df_test = train_test_split(df, test_size=0.25,
                     stratify=df["Survived"], random_state=4)

# we split the columns into categorical and numeric inputs and the output
categorical_inputs = ["Pclass", "Sex", "Embarked"]
numeric_inputs = ["Age", "SibSp", 'Parch', 'Fare']
output = ["Survived"]

# we create our preprocessing pipeline
input_preproc = make_column_transformer(
    (make_pipeline(
        SimpleImputer(strategy="most_frequent"),
        OrdinalEncoder(categories='auto')),
     categorical_inputs),
    
    (make_pipeline(
        SimpleImputer(),
        StandardScaler()),
     numeric_inputs)
)

# we fit the pipeline on the train set and then apply it to both train and test
X_train = input_preproc.fit_transform(df_train[categorical_inputs + numeric_inputs])
Y_train = df_train[output]

X_test = input_preproc.transform(df_test[categorical_inputs + numeric_inputs])
Y_test = df_test[output]

### Bayesian Optimization

The first thing that we will need to do before we apply Bayesian optimization will, of course, be to define the objective function that the method is to minimize.

Given that our goal it to find hyperparameters with which our model will achieve the best results, the input arguments will be the hyperparameters. We will use these to set up a model (a decision tree based on class  `DecisionTreeClassifier`).

The performance of the model will then be evaluated using $k$-fold cross-validation. (The training data will be split into $k$ folds, one of which will be used for testing and the other ones for training each time. Once we have tested the model on all combinations of training and testing datasets in this way, the final score will be determined by averaging the results from all the individual runs.)



In [None]:
def objective(params):
    model = DecisionTreeClassifier(**params)
    
    score = cross_validate(model, X_train, Y_train,
                           scoring='f1_macro',
                           cv=10, n_jobs=10)['test_score'].mean()
    print("Score {:.3f} params {}".format(score, params))

    # minus because we want the score to be as high as
    # possible, but the objective function is to be minimized
    return -score

As our next step, we will need to set up the search space: i.e. to specify our method's hyperparamters and to determine what values they can take. Let us start by displaying the docstring of class `DecisionTreeClassifier`.



In [None]:
?DecisionTreeClassifier

---
### Task 1: Setting up the Search Space

**Use the next cell to define the search space `space` of decision tree hyperparameters.** 

---
To set up the space, create a dictionary of the following form:

```
space = {
    # categorical variable:
    'cat_var': hp.choice("cat_var", ["opt1", "opt2", "opt3"]),

    # a uniformly distributed integer:
    'int_var': scope.int(hp.quniform("int_var", 1, 15, 1)),

    # a uniformly distributed real nubmer:
    'float_var': hp.uniform('float_var', 0.2, 1.0),
}
```
Further options and more detailed documentation of how to define such parameter spaces can be found at [hyperopt's wiki](https://github.com/hyperopt/hyperopt/wiki/FMin#21-parameter-expressions).



In [None]:
space = {
    
    
    # ---
    
    
}

### Running the Optimization

Next, we can run the optimization itself. We'll specify the objective function, the search space, the maximum number of evaluations of the objective function and the algorithm. We will be using `tpe`, i.e. the Tree-structured Parzen Estimator. This approach is better at dealing with high-dimensional spaces than Gaussian processes are, but the basic aim remains the same.



In [None]:
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100
        )

Function `fmin` will return the best solution found. We then decode it using function `space_eval`, which will yield a representation that we can use when creating our model.



In [None]:
best_params = space_eval(space, best)

### Retraining the Model with the Best Hyperparameters

Now that we have identified the best set of hyperparameters, we will use them to retrain the model: this time using the entire training set.



In [None]:
model = DecisionTreeClassifier(**best_params)
model.fit(X_train, Y_train)

### Testing

And finally, we are ready to test the model on the test set. We will display the confusion matrix and our standard metrics.



In [None]:
y_test = model.predict(X_test)

In [None]:
cm = pd.crosstab(Y_test.values.reshape(-1), y_test,
                 rownames=['actual'],
                 colnames=['predicted'])
print(cm)

In [None]:
print("Accuracy = {}".format(accuracy_score(Y_test, y_test)))
print("Precision = {}".format(precision_score(Y_test, y_test)))
print("Recall = {}".format(recall_score(Y_test, y_test)))

The performance should be better than with default hyperparameters (at least on average – it is difficult to say anything reasonable about one particular run because of all the stochasticity). We can and verify whether this is the case.



In [None]:
def_model = DecisionTreeClassifier()
def_model.fit(X_train, Y_train)
y_test = def_model.predict(X_test)

print("Accuracy = {}".format(accuracy_score(Y_test, y_test)))
print("Precision = {}".format(precision_score(Y_test, y_test)))
print("Recall = {}".format(recall_score(Y_test, y_test)))

---
### Task 2: Optimizing XGBoost's Hyperparameters

**Try to apply the same procedure to a different classification method now: to the XGBClassifier from package `xgboost`. It will be necessary to redefine especially method `objective`, so that it uses the new model, and the search space `space`, so that it corresponds to the hyperparameters of the new method.** 

---


In [None]:
from xgboost import XGBClassifier