# Random vs. Grid vs. Bayesian Optimization (Simulator)

Using the Pico W microcontroller hosting a web server.

## Setup

Imports and the main class.

### Imports

In [1]:
%load_ext autoreload
%autoreload 2 # just some IPython magic to recognize changes to installed packages
import pandas as pd
from self_driving_lab_demo import SelfDrivingLabDemoLight

### SelfDrivingLabDemo

We'll instantiate the class and verify some of the functionality described in the random
search tutorial ([`2.0-random-search.ipynb`](2.0-random-search.ipynb)).

#### Instantiation

Now, we instantiate the `SelfDrivingLabDemo` class with `autoload=True` so that records
a target to optimize against. This involves selecting a set of target measurements as the "true" input values (i.e. the input
brightness and RGB values that define the target spectrum) based on a random seed,
setting the LED to those values, and then recording the spectrum intensities.

> Note: Instantiating with autoload=True will light the LED.

In [2]:
sdl = SelfDrivingLabDemoLight(autoload=True, simulation=True)

#### Functionality

We can do similar things to what was done in `2.0-random-search.ipynb`. For example, getting
random inputs, observing the sensor data, and evaluating the objective function.

In [3]:
[sdl.get_random_inputs(), sdl.get_random_inputs()]

[(69, 39, 77), (62, 8, 87)]

In [4]:
sdl.observe_sensor_data(*sdl.get_random_inputs())

{'ch410': 0.02274872739601002,
 'ch440': 0.15840405464904034,
 'ch470': 0.6992673713051308,
 'ch510': 1.4646967639614292,
 'ch550': 0.4523257779036664,
 'ch583': 0.06687319505248851,
 'ch620': 1.2196073083010566,
 'ch670': 0.011441290712745047}

In [5]:
sdl.evaluate(*sdl.get_random_inputs())

{'ch410': 0.040401293996706895,
 'ch440': 1.1412718813578129,
 'ch470': 5.1223132396124,
 'ch510': 0.8321234426266353,
 'ch550': 0.22516060230242996,
 'ch583': 0.03844316636923058,
 'ch620': 0.7185291510796704,
 'ch670': 0.008465238424509755,
 'mae': 0.6131167372101403,
 'rmse': 1.1924949962431428,
 'frechet': 3.211667942966358}

We can also turn the LED off.

In [11]:
sdl.clear()

## Optimization

While there are great numerical tutorials comparing [grid search vs. random search vs.
Bayesian optimization](https://towardsdatascience.com/grid-search-vs-random-search-vs-bayesian-optimization-2e68f57c3c46), here, we'll compare these three search methods in a way that perhaps you've never seen before,
namely a self-driving laboratory demo!

### Setup

We define our optimization task parameters and take care of imports.

### Optimization Task Parameters

We'll use 125 iterations repeated 5 times. The use of 125 iterations instead of something
"cleaner" like 50 or 100 is due to constraints of doing uniform (full-factorial) grid
search. $n^d$ number of points are required for uniform grid search, where $n$ and $d$
represent number of points per dimension (`n_pts_per_dim`) and number of dimensions
(`3`), respectively.

In [7]:
num_iter = 5 ** 3
num_repeats = 5
SEEDS = range(10, 10 + num_repeats)

We also instantiate multiple `SelfDrivingLabDemo` instances, each with their own
unique target spectrum, and then turn off the LED.

In [14]:
sdls = [SelfDrivingLabDemoLight(autoload=True, simulation=True, target_seed=seed) for seed in SEEDS]
sdls[0].clear()

Notice that the target_data is different for each.

In [15]:
df = pd.DataFrame([sdl.target_results for sdl in sdls])
df.loc[:, sdl.channel_names] # sort columns by wavelength

Unnamed: 0,ch410,ch440,ch470,ch510,ch550,ch583,ch620,ch670
0,0.03895,1.017894,4.564305,0.527746,0.135327,0.040214,1.521026,0.013703
1,0.028921,0.744123,3.339486,1.026408,0.296912,0.0372,0.200806,0.004277
2,0.022989,0.240644,1.072494,1.763603,0.541681,0.064652,0.399184,0.005994
3,0.047319,0.995106,4.45815,1.699501,0.500214,0.077158,1.382109,0.014404
4,0.03606,0.867966,3.890597,0.77567,0.216703,0.045551,1.325242,0.012382




### Imports

We'll be using `scikit-learn`'s `ParameterGrid` for grid search, `self_driving_lab_demo`'s built-in
`get_random_inputs` for random search, and `ax-platform`'s Gaussian Process Expected
Improvement (GPEI) model for Bayesian
optimization. To help with defining the grid search space, we will also use the
`bounds` and `parameters` class property of `SelfDrivingLabDemo` for convenience. Note
that 89 is the upper limit for RGB values instead of 255 since 255 is very bright.

In [16]:
import numpy as np
from tqdm.notebook import trange, tqdm
from sklearn.model_selection import ParameterGrid
from ax.service.ax_client import AxClient

In [17]:
sdls[0].bounds

{'R': [0, 89], 'G': [0, 89], 'B': [0, 89]}

In [18]:
sdls[0].parameters

[{'name': 'R', 'type': 'range', 'bounds': [0, 89]},
 {'name': 'G', 'type': 'range', 'bounds': [0, 89]},
 {'name': 'B', 'type': 'range', 'bounds': [0, 89]}]

### Grid Search

First, we need to define our parameter grid. We'll divide up the 3-dimensional parameter
space as evenly as possible (see `num_pts_per_dim` below).

In [19]:
param_grid = {}
num_pts_per_dim = round(num_iter ** (1 / len(sdl.bounds)))
for name, bnd in sdl.bounds.items():
    param_grid[name] = np.linspace(bnd[0], bnd[1], num=num_pts_per_dim)
    if isinstance(bnd[0], int):
        param_grid[name] = np.round(param_grid[name]).astype(int)
print(f"num_pts_per_dim: {num_pts_per_dim}")

num_pts_per_dim: 5


Notice how many distinct values are along each dimension.

In [20]:
param_grid

{'R': array([ 0, 22, 44, 67, 89]),
 'G': array([ 0, 22, 44, 67, 89]),
 'B': array([ 0, 22, 44, 67, 89])}

After assembling the full grid, notice that the total number of points is $5^3 = 125$.

In [21]:
grid = list(ParameterGrid(param_grid))
print("grid:\n", grid[0:4], "...", grid[-1:])
print("\nNumber of grid points: ", len(grid))

grid:
 [{'B': 0, 'G': 0, 'R': 0}, {'B': 0, 'G': 0, 'R': 22}, {'B': 0, 'G': 0, 'R': 44}, {'B': 0, 'G': 0, 'R': 67}] ... [{'B': 89, 'G': 89, 'R': 89}]

Number of grid points:  125


Now, we can start the actual search. The grid search locations are fixed
for each of the repeat optimization campaigns; however the observed sensor data will be
stochastic and the target spectrum is different for each repeat run. An alternative approach to setting a
fixed budget and varying the target solution would be to see how many iterations it takes to meet a criteria for the
objective function similar to [this post](https://towardsdatascience.com/grid-search-vs-random-search-vs-bayesian-optimization-2e68f57c3c46); however, a fixed budget seems more characteristic of a real chemistry
or materials optimization campaign due to limits on funding, time, and other resources:
(i.e. we'll search until we find what we're looking for, until we run out of
resources, or until we decide it's no longer worth the expense, whichever comes first).

In [22]:
grid_data = [
    [
        sdl.evaluate(pt["R"], pt["G"], pt["B"])
        for pt in grid
    ]
    for sdl in tqdm(sdls)
]
sdls[0].clear()

  0%|          | 0/5 [00:00<?, ?it/s]

### Random Search

Now, let's perform random search as we did before in
[`2.0-random-search.ipynb`](2.0-random-search.ipynb), storing the inputs and outputs as we go.

In [23]:
%%time
random_inputs = []
random_data = []
for _ in tqdm(range(num_repeats)):
    random_input = []
    random_datum = []
    for i in range(num_iter):
        random_input.append(sdl.get_random_inputs())
        random_datum.append(sdl.evaluate(*random_input[i]))
    random_inputs.append(random_input)
    random_data.append(random_datum)
sdls[0].clear()

  0%|          | 0/5 [00:00<?, ?it/s]

CPU times: total: 922 ms
Wall time: 902 ms


### Bayesian Optimization

Now, we'll use an optimization algorithm that learns from prior information. Once a
small set of initialization points have been evaluated, the algorithm will leverage the
previously observed information to intelligently select the next point to evaluate. The
selected point will be a trade-off between exploiting the highest performance and
exploring uncertain regions (i.e. exploitation/exploration trade-off). We'll also use
a discretized Frechet distance in place of mean absolute error as a more robust
comparison between discrete distributions.

In [24]:
%%time
bo_results = []
objective_name = "frechet"

for sdl in tqdm(sdls):
    def evaluation_function(parameters):
        data = sdl.evaluate(
            parameters["R"],
            parameters["G"],
            parameters["B"],
        )
        return data[objective_name]

    ax_client = AxClient()
    ax_client.create_experiment(
        parameters=sdl.parameters,
        objective_name=objective_name,
        minimize=True,
    )

    for i in range(num_iter):
        trial_parameters, trial_index = ax_client.get_next_trial()
        raw_data = evaluation_function(trial_parameters)
        ax_client.complete_trial(trial_index=trial_index, raw_data=raw_data)

    best_parameters, values = ax_client.get_best_parameters()
    experiment = ax_client.experiment
    model = ax_client.generation_strategy.model

    bo_results.append((best_parameters, values, experiment, model))

best_parameters, values, experiment, model = zip(*bo_results)
sdls[0].clear()

  0%|          | 0/5 [00:00<?, ?it/s]

[INFO 09-07 16:21:05] ax.service.utils.instantiation: Inferred value type of ParameterType.INT for parameter R. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
[INFO 09-07 16:21:05] ax.service.utils.instantiation: Inferred value type of ParameterType.INT for parameter G. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
[INFO 09-07 16:21:05] ax.service.utils.instantiation: Inferred value type of ParameterType.INT for parameter B. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
[INFO 09-07 16:21:05] ax.service.utils.instantiation: Created search space: SearchSpace(parameters=[RangeParameter(name='R', parameter_type=INT, range=[0, 89]), RangeParameter(name='G', parameter_type=INT, range=[0, 89]), RangeParameter(name='B', parameter_type=INT, r

CPU times: total: 56min 16s
Wall time: 9min 30s


### Analysis

Now that we've run our three optimizations, let's compare the performance in tabular
form and visually.

### Preparing the data

In [25]:
grid_obj = [[g[objective_name] for g in gd] for gd in grid_data]
random_obj = [[r[objective_name] for r in rd] for rd in random_data]
bayesian_obj = [exp.fetch_data().df["mean"].tolist() for exp in experiment]

In [26]:
obj = np.array([grid_obj, random_obj, bayesian_obj])
obj.shape

(3, 5, 125)

### Tabular

In [27]:
avg_obj = np.mean(np.minimum.accumulate(obj, axis=2), axis=1)
std_obj = np.std(avg_obj, axis=1)
avg_obj.shape

(3, 125)

In [28]:
np.mean(random_obj)

1.4655333535241746

In [29]:
best_avg_obj = np.min(avg_obj, axis=1)
best_avg_obj

array([0.37914942, 0.25684419, 0.01071572])

### Best Objective vs. Iteration

In [30]:
names = ["grid", "random", "bayesian"]
df = pd.DataFrame({
    **{f"{n}_{objective_name}": m for n, m in zip(names, avg_obj)},
    **{f"{n}_std": s for n, s in zip(names, std_obj)},
})


In [31]:
obj_df = pd.melt(df.reset_index(), id_vars=["index"], value_vars = [f"grid_{objective_name}", f"random_{objective_name}", f"bayesian_{objective_name}"], var_name="method", value_name=objective_name)

std_df = pd.melt(df.reset_index(), id_vars=["index"], value_vars = ["grid_std", "random_std", "bayesian_std"], var_name="method", value_name="std")

obj_df.loc[:, "method"] = obj_df.loc[:, "method"].apply(lambda x: x.replace(f"_{objective_name}", ""))
std_df.loc[:, "method"] = std_df.loc[:, "method"].apply(lambda x: x.replace("_std", ""))

In [32]:
results_df = obj_df.merge(std_df, on=["method", "index"]).rename(columns=dict(index="iteration"))
results_df

Unnamed: 0,iteration,method,frechet,std
0,0,grid,3.603228,0.733952
1,1,grid,3.210319,0.733952
2,2,grid,2.817409,0.733952
3,3,grid,2.406640,0.733952
4,4,grid,2.084905,0.733952
...,...,...,...,...
370,120,bayesian,0.010716,0.311105
371,121,bayesian,0.010716,0.311105
372,122,bayesian,0.010716,0.311105
373,123,bayesian,0.010716,0.311105


### Visualization
As we might expect, Bayesian optimization outperforms random search while grid and
random search are on par with each other.

In [33]:
# import plotly.express as px
from self_driving_lab_demo.utils.plotting import line

fig = line(
    data_frame=results_df,
    x="iteration",
    y=objective_name,
    error_y="std",
    error_y_mode="band",
    color="method",
)
max_y = (results_df[objective_name] + results_df["std"]).max()
fig.update_yaxes(range=[0.0, max_y*1.02])
fig

#### Example Output

![pico-grid-random-bayesian-simulator](pico-grid-random-bayesian-simulator.png)