# How to provide intial points to the Optimizer and BayesSearchCV class

Iaroslav Shcherbatyi, June 2018.

In [1]:
import numpy as np
np.random.seed(123)

%matplotlib inline
import matplotlib.pyplot as plt

## Problem statement

Sometimes for your black box optimization problem $$x^* = \arg \min_{x \in X} f(x)$$ you either have a bunch of points $x_i \in X, i = 1...k$ that you want to evaluate first because you have a good reason to think they will result in good objective values, or you already have a few points $x_j \in X, j = 1...m$ for which you already have the values of the objective $f(x_j) \in \mathbb{R}$ and you would like to use them to initialize your optimizer.


## Providing initial points for the `Optimizer` to be evaluated

For that, use `x0` argument of the optimizer class constructor. An example is given below.

In [16]:
from skopt import Optimizer
opt = Optimizer(dimensions=[(0.0, 1.0)], x0=[[0.0], [0.5], [1.0]])

print("Parallel mode:")

# you can use parallel interface
# notice that first the 3 initial points are provided
print("Suggested points:", opt.ask(4))

opt = Optimizer(dimensions=[(0.0, 1.0)], x0=[[0.0], [0.5], [1.0]])

print("Sequential mode:")

# or sequencial - it all works!
for i in range(5):
    r = opt.ask()
    opt.tell(r, i/2.0)
    print('Suggested point', r)

Parallel mode:
Suggested points: [[0.0], [0.5], [1.0], [0.043422101727152]]
Sequencial mode:
Suggested point [0.0]
Suggested point [0.5]
Suggested point [1.0]
Suggested point [0.5652137360964778]
Suggested point [0.05901570160283677]


**Important**: specifying `x0` does not change the number of points that the algorithm will try at random. To adjust  this number, set the `n_initial_points` accordingly. That is, the total number of initialization iterations including `x0` is `n_initial_points + len(x0)`.

## Two ways to provide already evaluated points to `Optimizer`

In some situations, you already have some objective values already evaluated, and you would like to use those to "bootstrap" your algorithm. One way is to simply `tell` these points to `Optimizer` instance as it is instantiated, like in the cell below:

In [23]:
from skopt import Optimizer

opt = Optimizer(dimensions=[(0.0, 1.0)])

opt.tell(x=[[0.0], [0.5]], y=[0.0, 0.1])

          fun: 0.0
    func_vals: array([0. , 0.1])
       models: []
 random_state: <mtrand.RandomState object at 0x7f85980f2bd0>
        space: Space([Real(low=0.0, high=1.0, prior='uniform', transform='normalize')])
        specs: None
            x: [0.0]
      x_iters: [[0.0], [0.5]]

Another way is to provide the initialization points via the `xy0` property of Optimizer constructor. This way you do not need an extra call to the `tell` function.

In [16]:
from skopt import Optimizer

Optimizer(dimensions=[(0.0, 1.0)], xy0=[
    [[0.0], 0.0],
    [[1.0], 0.1],
])

<skopt.optimizer.optimizer.Optimizer at 0x7f6865cec9b0>

You can also combine both, so that you provide some points which are already evaluated, and the ones that are to be evaluated:

In [2]:
from skopt import Optimizer

opt = Optimizer(dimensions=[(0.0, 1.0)], 
        xy0=[
            [[0.0], 0.0],
            [[1.0], 0.1],
        ],
        x0=[
            [0.5],
            [0.7]
        ]
)

print(opt.ask(5))

[[0.5], [0.7], [0.038396889140708386], [0.7224627817380277], [0.5105514975437424]]


## Specifying initial points for `BayesSearchCV`

You can set the `optimizer_kwargs` to specify which parameters of your pipeline the optimizer should try first. For this, you need to use a few helper functions from `skopt` package.

In [3]:
from skopt import BayesSearchCV
from skopt.utils import dimensions_aslist

from sklearn.svm import SVC
from sklearn.datasets import load_iris

x0 = [
    {'C': 0.2, 'gamma': 0.1},
    {'C': 0.4, 'gamma': 0.2}
]

x0 = [dimensions_aslist(v) for v in x0]

# this class is simply used to print the values provided to the class
class MyModel(SVC):
    def fit(self, *args, **kwargs):
        print('C=', self.C, 'gamma=', self.gamma)
        super(MyModel, self).fit(*args, **kwargs)

X, y = load_iris(True)

bcv = BayesSearchCV(
    estimator=MyModel(),
    search_spaces={
        'C': (0.1, 10.0),
        'gamma': (0.1, 10.0),
    },
    n_iter=6,
    optimizer_kwargs={
        'x0': x0,  # initialization done here!
    },
    cv=3  # note that 3 folds are used for cross - validation, hence output is repeated 3 times
)

bcv.fit(X, y)

C= 0.2 gamma= 0.1
C= 0.2 gamma= 0.1
C= 0.2 gamma= 0.1
C= 0.4 gamma= 0.2
C= 0.4 gamma= 0.2
C= 0.4 gamma= 0.2
C= 3.857733300809835 gamma= 0.27385219988077025
C= 3.857733300809835 gamma= 0.27385219988077025
C= 3.857733300809835 gamma= 0.27385219988077025
C= 7.870432808673247 gamma= 0.18827444863361564
C= 7.870432808673247 gamma= 0.18827444863361564
C= 7.870432808673247 gamma= 0.18827444863361564
C= 1.1523854167021892 gamma= 1.4253617857743606
C= 1.1523854167021892 gamma= 1.4253617857743606
C= 1.1523854167021892 gamma= 1.4253617857743606
C= 6.50979320286485 gamma= 4.140545264744949
C= 6.50979320286485 gamma= 4.140545264744949
C= 6.50979320286485 gamma= 4.140545264744949
C= 7.870432808673247 gamma= 0.18827444863361564


BayesSearchCV(cv=3, error_score='raise',
       estimator=MyModel(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False),
       fit_params=None, iid=True, n_iter=6, n_jobs=1, n_points=1,
       optimizer_kwargs={'x0': [[0.2, 0.1], [0.4, 0.2]]},
       pre_dispatch='2*n_jobs', random_state=None, refit=True,
       return_train_score=False, scoring=None,
       search_spaces={'C': (0.1, 10.0), 'gamma': (0.1, 10.0)}, verbose=0)