# 02 - Bayesian Optimization

In this notebook, we'll look at how to apply baysian optimization onto `fastai2` tabular problems

## What is Bayesian Optimization?

* Form of hyper-parameter tuning
* Repository for today: [BayesianOptimization](https://github.com/fmfn/BayesianOptimization)

Bayesian optimization works by constructing a posterior distribution of functions (gaussian process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not, as seen in the picture below.

![](https://camo.githubusercontent.com/2f66986b9b375058dcaede2e7c3dd2b8db4abc9d/68747470733a2f2f6769746875622e636f6d2f666d666e2f426179657369616e4f7074696d697a6174696f6e2f7261772f6d61737465722f6578616d706c65732f626f5f6578616d706c652e706e67)

In [0]:
!pip install fastai2

First let's get the items we need:

In [0]:
from fastai2.tabular.all import *

In [3]:
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

And now let's install the `bayesian-optimization` library:

In [0]:
!pip install bayesian-optimization -q

In [0]:
from bayes_opt import BayesianOptimization

## Bayesian Optimization

When working with `BayesianOpimization`, everything needs to be in a `fit_with` function that accepts our tuned parameters, and does whatever we require of it:

In [0]:
def fit_with(lr:float, wd:float, dp:float):
  # create a Learner
  learn = tabular_learner(data, layers=[200,100], metrics=accuracy, emb_drop=dp, wd=wd)
  
  # Train for x epochs
  with learn.no_bar():
    learn.fit_one_cycle(3, lr)
    
  # Save, print, and return the overall accuracy
  acc = float(learn.validate()[1])
  
  return acc

Let's adjust this further to show how we would go about adjusting the learning rate, embedded weight decay, drop out, and layer size:

In [0]:
def fit_with(lr:float, wd:float, dp:float, n_layers:float, layer_1:float, layer_2:float, layer_3:float):

  print(lr, wd, dp)
  if int(n_layers) == 2:
    layers = [int(layer_1), int(layer_2)]
  elif int(n_layers) == 3:
    layers = [int(layer_1), int(layer_2), int(layer_3)]
  else:
    layers = [int(layer_1)]

  learn = tabular_learner(dls, layers=layers, metrics=accuracy, embed_p=float(dp), wd=float(wd))

  with learn.no_bar() and learn.no_logging():
    learn.fit(5, lr=float(lr))

  acc = float(learn.validate()[1])

  return acc

Let's try it out

In [0]:
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
y_names = 'salary'
y_block = CategoryBlock()
splits = RandomSplitter()(range_of(df))

In [0]:
to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,
                   y_names=y_names, y_block=y_block, splits=splits)

In [0]:
dls = to.dataloaders(bs=512)

We'll declare our hyper-parameters:

In [0]:
hps = {'lr': (1e-05, 1e-01),
      'wd': (4e-4, 0.4),
      'dp': (0.01, 0.5),
       'n_layers': (1,3),
       'layer_1': (50, 200),
       'layer_2': (100, 1000),
       'layer_3': (200, 2000)}

And now we build the optimizer:

In [0]:
optim = BayesianOptimization(
    f = fit_with, # our fit function
    pbounds = hps, # our hyper parameters to tune
    verbose = 2, # 1 prints out when a maximum is observed, 0 for silent
    random_state=1
)

And now we can search!

In [65]:
%time optim.maximize(n_iter=10)

|   iter    |  target   |    dp     |  layer_1  |  layer_2  |  layer_3  |    lr     | n_layers  |    wd     |
-------------------------------------------------------------------------------------------------------------
0.014684121522803134 0.07482958046651729 0.21434078230426126


| [0m 1       [0m | [0m 0.8398  [0m | [0m 0.2143  [0m | [0m 158.0   [0m | [0m 100.1   [0m | [0m 744.2   [0m | [0m 0.01468 [0m | [0m 1.185   [0m | [0m 0.07483 [0m |
0.06852509784467198 0.3512957275818218 0.1793247562510934


| [0m 2       [0m | [0m 0.8383  [0m | [0m 0.1793  [0m | [0m 109.5   [0m | [0m 584.9   [0m | [0m 954.6   [0m | [0m 0.06853 [0m | [0m 1.409   [0m | [0m 0.3513  [0m |
0.014047289990137426 0.32037752964274446 0.02341992066698382


| [0m 3       [0m | [0m 0.8371  [0m | [0m 0.02342 [0m | [0m 150.6   [0m | [0m 475.6   [0m | [0m 1.206e+0[0m | [0m 0.01405 [0m | [0m 1.396   [0m | [0m 0.3204  [0m |
0.0894617202837497 0.016006291379859792 0.4844481721025048


| [0m 4       [0m | [0m 0.8279  [0m | [0m 0.4844  [0m | [0m 97.01   [0m | [0m 723.1   [0m | [0m 1.778e+0[0m | [0m 0.08946 [0m | [0m 1.17    [0m | [0m 0.01601 [0m |
0.0957893741197487 0.27687409473460917 0.09321690558663875


| [0m 5       [0m | [0m 0.8193  [0m | [0m 0.09322 [0m | [0m 181.7   [0m | [0m 188.5   [0m | [0m 958.0   [0m | [0m 0.09579 [0m | [0m 2.066   [0m | [0m 0.2769  [0m |
0.010278165724320144 0.09525641811550664 0.039180893394315415


| [0m 6       [0m | [0m 0.8395  [0m | [0m 0.03918 [0m | [0m 169.8   [0m | [0m 995.0   [0m | [0m 206.4   [0m | [0m 0.01028 [0m | [0m 1.351   [0m | [0m 0.09526 [0m |
0.0721697277771627 0.035039066808375346 0.1710613610736308


| [95m 7       [0m | [95m 0.8404  [0m | [95m 0.1711  [0m | [95m 57.63   [0m | [95m 100.2   [0m | [95m 1.93e+03[0m | [95m 0.07217 [0m | [95m 2.868   [0m | [95m 0.03504 [0m |
0.01869849193249724 0.3818035749775107 0.38248534044728827


| [0m 8       [0m | [0m 0.8322  [0m | [0m 0.3825  [0m | [0m 69.48   [0m | [0m 104.2   [0m | [0m 1.936e+0[0m | [0m 0.0187  [0m | [0m 2.247   [0m | [0m 0.3818  [0m |
0.01926217459495932 0.09147910397966807 0.12524115091042773


| [0m 9       [0m | [0m 0.8366  [0m | [0m 0.1252  [0m | [0m 57.32   [0m | [0m 188.9   [0m | [0m 206.0   [0m | [0m 0.01926 [0m | [0m 2.35    [0m | [0m 0.09148 [0m |
0.06351398588490466 0.04037129713639955 0.24616817310336075


| [0m 10      [0m | [0m 0.8325  [0m | [0m 0.2462  [0m | [0m 52.86   [0m | [0m 992.5   [0m | [0m 1.125e+0[0m | [0m 0.06351 [0m | [0m 1.886   [0m | [0m 0.04037 [0m |
0.041377735497824156 0.06620056537013629 0.32676614140212673


| [0m 11      [0m | [0m 0.8352  [0m | [0m 0.3268  [0m | [0m 54.51   [0m | [0m 114.5   [0m | [0m 1.363e+0[0m | [0m 0.04138 [0m | [0m 1.901   [0m | [0m 0.0662  [0m |
0.0758754095037137 0.26350352166402036 0.10013698396219092


| [0m 12      [0m | [0m 0.8342  [0m | [0m 0.1001  [0m | [0m 195.5   [0m | [0m 166.4   [0m | [0m 236.0   [0m | [0m 0.07588 [0m | [0m 1.276   [0m | [0m 0.2635  [0m |
0.09389439425982331 0.19684544787061198 0.15110795801308013


| [0m 13      [0m | [0m 0.8331  [0m | [0m 0.1511  [0m | [0m 52.28   [0m | [0m 999.6   [0m | [0m 1.971e+0[0m | [0m 0.09389 [0m | [0m 1.358   [0m | [0m 0.1968  [0m |
0.05628262681087503 0.1085257299138718 0.26772478114338355


| [0m 14      [0m | [0m 0.8348  [0m | [0m 0.2677  [0m | [0m 152.2   [0m | [0m 105.5   [0m | [0m 755.5   [0m | [0m 0.05628 [0m | [0m 2.52    [0m | [0m 0.1085  [0m |
0.053434289309203895 0.16031684438917504 0.38333589370592197


| [0m 15      [0m | [0m 0.8315  [0m | [0m 0.3833  [0m | [0m 191.2   [0m | [0m 972.2   [0m | [0m 674.7   [0m | [0m 0.05343 [0m | [0m 2.787   [0m | [0m 0.1603  [0m |
CPU times: user 1min 11s, sys: 2.01 s, total: 1min 13s
Wall time: 1min 12s


We can grab the best results:

In [66]:
print(optim.max)

{'target': 0.8404483795166016, 'params': {'dp': 0.1710613610736308, 'layer_1': 57.63154958927875, 'layer_2': 100.1567384765859, 'layer_3': 1930.4092799350558, 'lr': 0.0721697277771627, 'n_layers': 2.868052690189961, 'wd': 0.035039066808375346}}


And with a few conversions we see:

* The best number of layers was 2
* The first layer a size of 57
* The second layer a size of 100
And then of course our other hyper paramters