# Bayesian Optimization

@author:<tb> Kareem<br>
@date: <tb>28.04.2019
    
In the following I'd like to domenstrate the Bayesian Optimization process, in order for us to better handle continous values during model optimization. This implementaion relies on this [GitHub package](https://github.com/fmfn/BayesianOptimization). So to get started of course:

```
pip install bayesian-optimization
```

## Loading Data

Let's load some data from sklearn default datasets and then we will build different classifiers to evaluate how well this implemntation is

In [1]:
#### Load Iris Data ####
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer()
X = dataset['data']
y = dataset['target']

## Split into train and validation
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0, shuffle=True)

## Handling Continous Values

There's immediate support for this type of parameters from the package, so let's build a logistic regressor right away and see the accuracies we can get.

Note that the object `BayesianOptimization`, always needs the objective function (i.e. function to be maximized) as constructor argument. If you want to minimize a function, just return the minus value of that function.

In [2]:
#### Build Classifier ####
from sklearn.linear_model import LogisticRegression
def black_box_function(C):
    """Function with we wish to maximize."""
    classifier = LogisticRegression(C=C, random_state=0)
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test) # irrelevent!
    return classifier.score(X_test, y_test) # return accuracy

`C` will take all the values range from $0$ to $1$.

In [4]:
from bayes_opt import BayesianOptimization

In [11]:
#### Build the Optimization Space ####
from bayes_opt import BayesianOptimization
# Bounded region of parameter space
pbounds = {'C': (0, 1)} 

optimizer = BayesianOptimization(
    f=black_box_function, # function to optimize
    pbounds=pbounds, # dimension to explore
    random_state=0)
optimizer.maximize(
    init_points=5, # number of exploration steps
    n_iter=10) # number of exploitation steps

print(optimizer.max) # best score and parameter set
for i, res in enumerate(optimizer.res): # full optimization history
    print("Iteration {}: \n\t{}".format(i, res))

|   iter    |  target   |     C     |
-------------------------------------
| [0m 1       [0m | [0m 0.9561  [0m | [0m 0.5488  [0m |
| [0m 2       [0m | [0m 0.9561  [0m | [0m 0.7152  [0m |
| [0m 3       [0m | [0m 0.9561  [0m | [0m 0.6028  [0m |
| [0m 4       [0m | [0m 0.9561  [0m | [0m 0.5449  [0m |
| [0m 5       [0m | [0m 0.9561  [0m | [0m 0.4237  [0m |
| [0m 6       [0m | [0m 0.9386  [0m | [0m 0.02578 [0m |
| [0m 7       [0m | [0m 0.9561  [0m | [0m 1.0     [0m |
| [0m 8       [0m | [0m 0.9561  [0m | [0m 1.0     [0m |
| [0m 9       [0m | [0m 0.9561  [0m | [0m 0.8043  [0m |
| [0m 10      [0m | [0m 0.9561  [0m | [0m 0.7758  [0m |
| [0m 11      [0m | [0m 0.9561  [0m | [0m 1.0     [0m |
| [0m 12      [0m | [0m 0.9561  [0m | [0m 0.6563  [0m |
| [0m 13      [0m | [0m 0.9561  [0m | [0m 1.0     [0m |
| [0m 14      [0m | [0m 0.9561  [0m | [0m 0.8092  [0m |
| [0m 15      [0m | [0m 0.9561  [0m | [0m 0.5683  

## Handling Discrete Values

Discrete values are not handled by the package and therefore, we would only need to round and convert them integers in the blackbox function. That's the recommended behavior by the creators of the package.

In [5]:
#### Handling Discrete Values ####
from sklearn.ensemble import RandomForestClassifier
def black_box_function(n_estimators, max_depth, max_features):
    ## handle discrete
    n_estimators = int(round(n_estimators))
    max_depth = int(round(max_depth))
    max_features = int(round(max_features))
    ## throw an AssertionError at an earlier level if not int
    assert type(n_estimators) == int
    assert type(max_depth) == int
    assert type(max_features) == int
    #### Build Classifier ####
    classifier = RandomForestClassifier(n_estimators=n_estimators, max_depth= max_depth,
                                        max_features=max_features, random_state=0, verbose=0, n_jobs=-1)
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    return classifier.score(X_test, y_test)

pbounds = {'n_estimators': (5, 100), 
           'max_depth': (5, 25), 
           'max_features': (5, 20)}

optimizer = BayesianOptimization(f=black_box_function, pbounds=pbounds, random_state=0)
optimizer.maximize(init_points=5, n_iter=10)

print(optimizer.max)
for i, res in enumerate(optimizer.res):
    print("Iteration {}: \n\t{}".format(i, res))

|   iter    |  target   | max_depth | max_fe... | n_esti... |
-------------------------------------------------------------
| [0m 1       [0m | [0m 0.9737  [0m | [0m 15.98   [0m | [0m 15.73   [0m | [0m 62.26   [0m |
| [95m 2       [0m | [95m 0.9825  [0m | [95m 15.9    [0m | [95m 11.35   [0m | [95m 66.36   [0m |
| [0m 3       [0m | [0m 0.9825  [0m | [0m 13.75   [0m | [0m 18.38   [0m | [0m 96.55   [0m |
| [0m 4       [0m | [0m 0.9737  [0m | [0m 12.67   [0m | [0m 16.88   [0m | [0m 55.25   [0m |
| [0m 5       [0m | [0m 0.9737  [0m | [0m 16.36   [0m | [0m 18.88   [0m | [0m 11.75   [0m |
| [0m 6       [0m | [0m 0.9298  [0m | [0m 5.0     [0m | [0m 5.0     [0m | [0m 5.0     [0m |
| [0m 7       [0m | [0m 0.9474  [0m | [0m 5.0     [0m | [0m 5.0     [0m | [0m 100.0   [0m |
| [0m 8       [0m | [0m 0.9649  [0m | [0m 25.0    [0m | [0m 5.0     [0m | [0m 100.0   [0m |
| [0m 9       [0m | [0m 0.9561  [0m | [0m 24.86   

## Categorical Values

Categorical values are handled very similarly to Discrete values, nevertheless, you would only need a reference list. First, we encode the categories into integers such as in `LabelEncoder` (order is crucial here), and then the balckbox function has to unpack the right value from the reference list.

In [3]:
#### Handling Categorical values ####
from sklearn.svm import SVC
def black_box_function(C, kernel, degree, gamma):
    # handle categorical
    kernel = int(round(0.234567))
    # throw an AssertionError at an earlier level if not int
    assert type(kernel) == int
    kernel = kernels[kernel] # unpack from reference list
    
    degree = int(round(degree))
    assert type(degree) == int

    classifier = SVC(C=C, kernel=kernel, degree=degree, gamma=gamma, random_state=0, verbose=0)
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test) # irrelevent!
    return classifier.score(X_test, y_test)

# reference list
kernels = ['linear', 'rbf', 'sigmoid'] # poly kernel is removed, due to computational expense
# encoded list
kernels_encoded = [i for i in range(len(kernels))]

pbounds = {'C': (0, 1), 
           'kernel': (0, 2), 
           'degree': (1, 3),
           'gamma':(0, 1)}

optimizer = BayesianOptimization(f=black_box_function, pbounds=pbounds, random_state=0)
optimizer.maximize(init_points=5, n_iter=10)

print(optimizer.max)
for i, res in enumerate(optimizer.res):
    print("Iteration {}: \n\t{}".format(i, res))

NameError: name 'BayesianOptimization' is not defined

## Remarks

* Interesting to see that the method doesn't need many exploitation steps, it starts to converge immediatly after 5 steps.
* We'd still need a function convert the parameters back to their true values for interpretability.