# Optimization techniques Lab. 6: Bayesian Optimization
## Introduction
**Goal.** The goal of this lab is to study the behavior of Bayesian optimization on a regression problem and a classifier one. 
Bayesian optimization is a probabilistic approach that uses the Bayes' Theorem $P(A|B) = \frac{P(B|A)*P(A)}{P(B)}$. Briefly, we use the prior information, $P(A)$,(random samples) to optimize a surrogate function, $P(B|A)$.

**Getting started.** The following cells contain the implementation of the methods that we will use throughout this lab, together with utilities. 


In [None]:
import itertools
from collections import Counter

import matplotlib.pyplot as plt

from warnings import catch_warnings, simplefilter
from numpy import mean
from scipy.optimize import OptimizeResult
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier, RadiusNeighborsClassifier
from skopt import gp_minimize
from skopt.space import Integer
from skopt.utils import use_named_args

Classifier
---
## Questions:
- Try different ranges of hyperparameters. How do the results change?
- Does the model influence the choice of the hyperparameters?

In [None]:
# generate 2d classification dataset
X, y = make_blobs(n_samples=500, centers=3, n_features=2)

plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', s=20)
plt.savefig('out/blobs.svg')
plt.close()

In [None]:
from skopt.space import Real

# define the model
# define the space of hyperparameters to search
cases = [
    ('random-forest-classifier', RandomForestClassifier, [
        [
            Integer(2, 3, name='n_estimators'),
            Integer(1, 2, name='max_features')
        ],
        [
            Integer(1, 5, name='n_estimators'),
            Integer(1, 2, name='max_features')
        ],
        [
            Integer(1, 4, name='n_estimators'),
            Integer(1, 4, name='max_features')
        ],

    ]),
    ('radius-neighbors-classifier', RadiusNeighborsClassifier, [
        [
            Integer(3, 7, name='radius'),
            Integer(1, 3, name='p')
        ],
        [
            Integer(5, 10, name='radius'),
            Integer(1, 3, name='p')
        ],
    ]),
    ('k-neighbors-classifier', KNeighborsClassifier, [
        [
            Integer(2, 3, name='n_neighbors'),
            Integer(1, 2, name='p')
        ],
        [
            Integer(1, 5, name='n_neighbors'),
            Integer(1, 2, name='p')
        ],
        [
            Integer(1, 4, name='n_neighbors'),
            Integer(1, 4, name='p')
        ],
    ]),
]

In [None]:
for (n, m, search_spaces) in cases:
    for search_space in search_spaces:
        model = m()


        # define the function used to evaluate a given configuration
        @use_named_args(search_space)
        def evaluate_model(**params) -> float:
            # something
            model.set_params(**params)
            # calculate 5-fold cross validation
            with catch_warnings():
                # ignore generated warnings
                simplefilter("ignore")
                scores = cross_val_score(model, X, y, cv=5, n_jobs=-1, scoring='accuracy')
                # calculate the mean of the scores
                estimate = mean(scores)
                return 1.0 - estimate


        # perform optimization
        result: OptimizeResult = gp_minimize(evaluate_model, search_space)

        os = Counter([str(i) for i in result.x_iters])
        xs = [str([a, b]) for (a, b) in itertools.product(
            range(search_space[0].low, 1 + search_space[0].high),
            range(search_space[1].low, 1 + search_space[1].high)
        )]
        ys = [os.get(x, 0) for x in xs]

        fig, (ax) = plt.subplots(1)
        bs = ax.bar(xs, ys)
        bs[xs.index(str([result.x[0], result.x[1]]))].set_color('green')
        ax.set_title(f'accuracy = {(1.0 - result.fun) * 100}%\n' + ', '.join([n.name for n in search_space]))
        ax.tick_params('x', labelrotation=90)
        fig.savefig(
            f'out/{n} - [{(search_space[0].low, search_space[0].high)},{(search_space[1].low, search_space[1].high)}].svg')
        plt.close()


# BONUS

You see in the classifier the effect of hyperparameter tuning. 
You can now change the acquisition functions in the regression problem, adding a slack variable as a hyperparameter. How does this variable affect the optimization problem?