# Bayesian Optimization Example: Boston Housing Dataset

## Useful Resources
 - [Scikit-Learn](http://scikit-learn.org/)
 - [Scikit-Optimize](https://github.com/scikit-optimize/scikit-optimize) 
 - [GPyOpt](https://gpyopt.readthedocs.io/en/latest/)
 - [GPyOpt GitHub](https://github.com/SheffieldML/GPyOpt)
 - [fmfn/BayesianOptimization](https://github.com/fmfn/BayesianOptimization)
 - [Taking the Human Out of the Loop: A Review of Bayesian Optimization](https://ieeexplore.ieee.org/document/7352306/)
 - [Practical Bayesian Optimization of Machine Learning Algorithms](https://arxiv.org/abs/1206.2944)
 - [Evaluating Hyperparameter Optimization Strategies](https://blog.sigopt.com/posts/evaluating-hyperparameter-optimization-strategies)
 - [A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchial Reinforcement Learning ](https://arxiv.org/abs/1012.2599)
 
## Introduction
Bayesian Optimization is a strategy for global optimization of black-box functions with the goal of finding a min/max of a function f(x) bounded by X. The Bayesian optimization will construct a probabilistic model for f(x) to exploit in order to determine where in X to evaluate the function next. It performs this determination using the information from previous evaluations of f(x).

## General Theory
#### Objective:
Find global maximizer or minimizer of a function $f$(x)
$$\textbf{x}^{*} = \text{arg} \max_{\textbf{x} \in \chi } f(\textbf{x})$$
$\chi$ is the space of interest and can be categorical, conditional, or both

#### Strategy 
- Unknown objective function 
- Treat as a random function 
- Place prior over it
- Prior captures belief about function
- Gather information and update the prior with posterior 
- Determine next query point based on priors

![](https://ieeexplore.ieee.org/mediastore_new/IEEE/content/media/5/7360840/7352306/shahr1-2494218-large.gif)[A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchial Reinforcement Learning ](https://arxiv.org/abs/1012.2599)


## Summarize
- Finds min/max with relatively few evaluations
- Cost of more computation to determine next point to try
- Good for expensive functions such as ML

## What can $f$ be?
Bayesian Optimization is best used for costly functions as Bayesian Optimization can become rather costly due to the strategy of determining the next point of query based on the prior guesses. While Bayesian Optimization is more computationaly expensive than other search methods, it often requires less iterations to find the maxima/minima thus reducing the amount of times something like training a neural net is performed, reducing the overall computational cost. 

### Random Search
![Random Search](https://daks2k3a4ib2z.cloudfront.net/59235ff882b78a59a72fa9bd/593477f37fa7db0d44d42510_tumblr_inline_o7181jRDUR1toi3ym_540.gif)

### Grid Search
![Grid Search](https://daks2k3a4ib2z.cloudfront.net/59235ff882b78a59a72fa9bd/593477f0c5b12e2f0b26ec3a_tumblr_inline_o7181iRIMT1toi3ym_540.gif)

### Bayesian Optimization
![Bayesian Optimization](https://daks2k3a4ib2z.cloudfront.net/59235ff882b78a59a72fa9bd/593477fa4beb0a0d64a26806_tumblr_inline_o7181mi1eT1toi3ym_540.gif)

## Summarize
- Finds min/max with relatively few evaluations
- Cost of more computation to determine next point to try
- Good for expensive functions such as ML


In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from math import sqrt

## Load dataset

In [None]:
from sklearn.datasets import load_boston
boston = load_boston()

## Import scikit models

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import BayesianRidge
from sklearn.linear_model import Ridge
from sklearn.kernel_ridge import KernelRidge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    boston.data, boston.target, train_size=0.9, test_size=0.1)

## Model to Optimize

In [None]:
regr = RandomForestRegressor(n_jobs=-1)

## Parameters to Optimize

In [None]:
from skopt.space import Integer
from skopt.space import Categorical

space = [Integer(1, 200, name='n_estimators'),
         Categorical(('auto', 'sqrt', 'log2'), name='max_features'),
         Integer(2, 100, name='min_samples_split'),
         Integer(1, 100, name='min_samples_leaf')]

## Objective

In [None]:
from skopt.utils import use_named_args


@use_named_args(space)
def objective(**params):
    regr.set_params(**params)
    regr.fit(X_train, y_train)

    return mean_absolute_error(y_test, regr.predict(X_test))

## Optimization

In [None]:
from skopt import gp_minimize

res_gp = gp_minimize(objective, space, n_calls=20, random_state=0)

res_gp.fun

In [None]:
res_gp.x

In [None]:
n_estimators = res_gp.x[0]
max_features = res_gp.x[1]
min_samples_split = res_gp.x[2]
min_samples_leaf = res_gp.x[3]

regr = RandomForestRegressor(n_jobs=-1, n_estimators=n_estimators, max_features=max_features,
                             min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf)
regr.fit(X_train, y_train)

predicted = regr.predict(X_test)

mae = mean_absolute_error(y_test, predicted)
mse = mean_squared_error(y_test, predicted)
rmse = sqrt(mse)
print('MAE:', mae, '\tMSE:', mse, '\tRMSE:', rmse)

## Plot Results

In [None]:
training_size = ('%0.1f' % (100 - (len(y_test)/len(boston.target) * 100)))
class_type = str(regr).split('(')[0]
label1 = ('MAE   = {}'.format('%8.3f' % mae))
label2 = (class_type + '\nTraining size = ' + training_size + '%')

plt.figure(dpi=250)
plt.plot([min(boston.target), max(boston.target)], [
         min(boston.target), max(boston.target)], ls="--", c="g")
plt.plot(y_test, predicted, 'o', markersize=1.5)
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
legend1 = plt.legend([label1], loc='lower right',
                     markerscale=0, fontsize=6, handlelength=0)
plt.legend([label2], loc='upper left',
           markerscale=0, fontsize=6, handlelength=0)
plt.gca().add_artist(legend1)
plt.show()

## What to do now? 
- Try a different classifier/regressor
- Try with a different dataset provided by scikit-learn
- Try out a different library