This notebook is going to be an adaptation of Zach Mueller's a walk with fastai. Where he creates a Bayesian optimisation on this dataset.

https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Tabular%20Notebooks/02_Bayesian_Optimization.ipynb

installing/updating fastai

In [2]:
!pip install -Uqq fastai

lets import all the tabular functions,given by fastai

In [3]:
from fastai.tabular.all import *

Loading data into a dataframe from path. Path is a urllib object which has a lot of added functionalities than just having a text path

In [5]:
path = Path('../input/adult-census-income')

In [6]:
df = pd.read_csv(path/'adult.csv')

Installing Bayesian optimisation package

In [8]:
!pip install bayesian-optimization -q

You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [9]:
from bayes_opt import BayesianOptimization

Now we create a fit_with function for training on hyperparameters and giving us the accuracy back.

In [12]:
def fit_with(lr:float, wd:float, dp:float, n_layers:float, layer_1:float, layer_2:float, layer_3:float):
    print(lr, wd, dp)
    if int(n_layers) == 2:
        layers = [int(layer_1), int(layer_2)]
    elif int(n_layers) == 3:
        layers = [int(layer_1), int(layer_2), int(layer_3)]
    else:
        layers = [int(layer_1)]
    config = tabular_config(embed_p = float(dp), ps = float(dp))
    learn = tabular_learner(dls, layers=layers, metrics=accuracy, config=config)
    
    with learn.no_bar() and learn.no_logging():
        learn.fit(5, lr=float(lr))
    
    acc = float(learn.validate()[1])
    
    return acc

In [15]:
df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education.num',
       'marital.status', 'occupation', 'relationship', 'race', 'sex',
       'capital.gain', 'capital.loss', 'hours.per.week', 'native.country',
       'income'],
      dtype='object')

creating a TabularPandas

In [22]:

cat_names = ['workclass', 'education', 'marital.status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education.num']
procs = [Categorify, FillMissing, Normalize]
y_names = 'income'
y_block = CategoryBlock()
splits = RandomSplitter()(range_of(df))

In [23]:
to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,
                   y_names=y_names, y_block=y_block, splits=splits)

what dls does is that a dataloader simply gives us data in batches.

In [24]:
dls = to.dataloaders(bs=512)

declaring our hyperparameters

In [25]:
hps = {
    'lr' : (1e-05, 1e-01),
    'wd' : (4e-4,0.4),
    'dp' : (0.01, 0.5),
    'n_layers' : (1,3),
    'layer_1' : (50, 200),
    'layer_2' : (100,1000),
    'layer_3' : (200,2000)
}

we declare our optimisers

In [27]:
optim = BayesianOptimization(
    f = fit_with,
    pbounds = hps,
    verbose = 2,
    random_state=1
)

In [28]:
%time optim.maximize(n_iter=10)

|   iter    |  target   |    dp     |  layer_1  |  layer_2  |  layer_3  |    lr     | n_layers  |    wd     |
-------------------------------------------------------------------------------------------------------------
0.014684121522803134 0.07482958046651729 0.21434078230426126


| [0m 1       [0m | [0m 0.8374  [0m | [0m 0.2143  [0m | [0m 158.0   [0m | [0m 100.1   [0m | [0m 744.2   [0m | [0m 0.01468 [0m | [0m 1.185   [0m | [0m 0.07483 [0m |
0.06852509784467198 0.3512957275818218 0.1793247562510934


| [95m 2       [0m | [95m 0.8398  [0m | [95m 0.1793  [0m | [95m 109.5   [0m | [95m 584.9   [0m | [95m 954.6   [0m | [95m 0.06853 [0m | [95m 1.409   [0m | [95m 0.3513  [0m |
0.014047289990137426 0.32037752964274446 0.02341992066698382


| [95m 3       [0m | [95m 0.8418  [0m | [95m 0.02342 [0m | [95m 150.6   [0m | [95m 475.6   [0m | [95m 1.206e+0[0m | [95m 0.01405 [0m | [95m 1.396   [0m | [95m 0.3204  [0m |
0.0894617202837497 0.016006291379859792 0.4844481721025048


| [0m 4       [0m | [0m 0.8318  [0m | [0m 0.4844  [0m | [0m 97.01   [0m | [0m 723.1   [0m | [0m 1.778e+0[0m | [0m 0.08946 [0m | [0m 1.17    [0m | [0m 0.01601 [0m |
0.0957893741197487 0.27687409473460917 0.09321690558663875


| [0m 5       [0m | [0m 0.8334  [0m | [0m 0.09322 [0m | [0m 181.7   [0m | [0m 188.5   [0m | [0m 958.0   [0m | [0m 0.09579 [0m | [0m 2.066   [0m | [0m 0.2769  [0m |
0.06580610506722293 0.27427945458024644 0.23907480958937974


| [0m 6       [0m | [0m 0.8352  [0m | [0m 0.2391  [0m | [0m 110.1   [0m | [0m 602.4   [0m | [0m 1.153e+0[0m | [0m 0.06581 [0m | [0m 1.993   [0m | [0m 0.2743  [0m |
0.09321474912357043 0.29488785857225375 0.4336194216942654


| [0m 7       [0m | [0m 0.8007  [0m | [0m 0.4336  [0m | [0m 162.8   [0m | [0m 109.5   [0m | [0m 736.4   [0m | [0m 0.09321 [0m | [0m 2.257   [0m | [0m 0.2949  [0m |
0.057910440131609765 0.16129406478624328 0.4828208474809104


| [0m 8       [0m | [0m 0.8349  [0m | [0m 0.4828  [0m | [0m 59.94   [0m | [0m 147.0   [0m | [0m 1.301e+0[0m | [0m 0.05791 [0m | [0m 2.722   [0m | [0m 0.1613  [0m |
0.06825546229572553 0.38961448701474927 0.16830582168120725


| [0m 9       [0m | [0m 0.8388  [0m | [0m 0.1683  [0m | [0m 156.9   [0m | [0m 531.9   [0m | [0m 248.2   [0m | [0m 0.06826 [0m | [0m 2.559   [0m | [0m 0.3896  [0m |
0.0002796637130848967 0.0448370970701597 0.22630281827604312


| [0m 10      [0m | [0m 0.8374  [0m | [0m 0.2263  [0m | [0m 126.3   [0m | [0m 625.4   [0m | [0m 1.711e+0[0m | [0m 0.000279[0m | [0m 2.636   [0m | [0m 0.04484 [0m |
0.06552505896942336 0.08103743930918762 0.36112697236765107


| [0m 11      [0m | [0m 0.8378  [0m | [0m 0.3611  [0m | [0m 147.7   [0m | [0m 194.1   [0m | [0m 356.8   [0m | [0m 0.06553 [0m | [0m 2.663   [0m | [0m 0.08104 [0m |
0.0005465734116890944 0.17802142541867974 0.46329843491833883


| [0m 12      [0m | [0m 0.8216  [0m | [0m 0.4633  [0m | [0m 175.7   [0m | [0m 106.8   [0m | [0m 1.81e+03[0m | [0m 0.000546[0m | [0m 2.017   [0m | [0m 0.178   [0m |
0.0404514572663943 0.31234441498440596 0.4911455437346324


| [0m 13      [0m | [0m 0.8326  [0m | [0m 0.4911  [0m | [0m 79.71   [0m | [0m 936.3   [0m | [0m 947.6   [0m | [0m 0.04045 [0m | [0m 1.032   [0m | [0m 0.3123  [0m |
0.029415674202259722 0.005868544291961274 0.31814795484259445


| [0m 14      [0m | [0m 0.8398  [0m | [0m 0.3181  [0m | [0m 174.3   [0m | [0m 660.7   [0m | [0m 794.0   [0m | [0m 0.02942 [0m | [0m 1.182   [0m | [0m 0.005869[0m |
0.09277135723767846 0.23871276504022906 0.23800477408617662


| [0m 15      [0m | [0m 0.8314  [0m | [0m 0.238   [0m | [0m 85.34   [0m | [0m 272.8   [0m | [0m 374.2   [0m | [0m 0.09277 [0m | [0m 1.858   [0m | [0m 0.2387  [0m |
CPU times: user 3min 4s, sys: 5.76 s, total: 3min 10s
Wall time: 1min 35s


In [29]:
print(optim.max)

{'target': 0.8418304920196533, 'params': {'dp': 0.02341992066698382, 'layer_1': 150.57012652676033, 'layer_2': 475.57432213041426, 'layer_3': 1205.6416912023528, 'lr': 0.014047289990137426, 'n_layers': 1.3962029781697576, 'wd': 0.32037752964274446}}
