# Bayesian Optimization using OPTIMA

## Single outcome
### First, we generate some data

Let's create an `experimental_data(temp, conc)` function that simulates the yield of a chemical reaction based on temperature and concentration. The yield is a function of these two variables, and we will use Bayesian Optimization to find the position of the maximum in the minimum number of experiments.

In the following block, we just plot this function to see what it looks like(but in real life, you have no idea what the function looks like). We also create an experimental data set where we have already done some experiments (they are the red crosses in the plot). 

In [21]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

def experimental_data(temp, conc):
    """
    This function simulates experimental data based on temperature and concentration.
    The function is a Gaussian-like function that peaks at certain temperature and concentration values.
    The function is not based on any real experimental data and is purely for demonstration purposes.
    """
    out = np.exp(-((temp - 50) ** 2)/1000)*np.exp(-((conc - .50) ** 2)/.05)-.9*np.exp(-((temp - 45) ** 2)/100)*np.exp(-((conc - .450) ** 2)/.05)
    return out

def generate_data(N=100):
    temp = np.linspace(0, 100, N)
    conc = np.linspace(0, 1, N)
    data = np.meshgrid(temp, conc)
    temp, conc = data[0].flatten(), data[1].flatten()
    exp_data = experimental_data(temp, conc)
    df = pd.DataFrame({'Temperature': temp, 'Concentration': conc, 'Yield': exp_data})
    return df

df = generate_data(500)

# Prepare data for 3D surface plot
unique_temps = np.unique(df['Temperature'])
unique_concs = np.unique(df['Concentration'])
Z = df.pivot(index='Concentration', columns='Temperature', values='Yield').values
# find the maximum yield and its position
max_yield = np.max(Z)
max_pos = np.unravel_index(np.argmax(Z), Z.shape)
max_temp = unique_temps[max_pos[1]]
max_conc = unique_concs[max_pos[0]]

# Create the contour plot
fig = go.Figure(data=[go.Contour(
    z=Z,
    x=unique_temps,
    y=unique_concs,
    colorscale='Viridis',
    contours=dict(coloring='heatmap', size=0.1),
    hovertemplate='Temperature: %{x}<br>Concentration: %{y}<br>Yield: %{z}<extra></extra>'
)])

# Update layout with legend at the top
fig.update_layout(
    title='Experimental Yield we want to determine the maximum of',
    xaxis_title='Temperature',
    yaxis_title='Concentration',
    legend=dict(
        x=0.5,  # Center the legend horizontally
        y=1.1,  # Place the legend above the plot
        orientation='h',  # Horizontal orientation
        xanchor='center',
        yanchor='bottom'
    )
)

# Add a marker for the maximum yield
fig.add_trace(go.Scatter(
    x=[max_temp],
    y=[max_conc],
    mode='markers',
    marker=dict(size=10, color='red', symbol='x'),
    name='Max Yield that we want to determine through experiments',
    customdata=[[max_yield]],  # Add max_yield as custom data
    hovertemplate='Max Yield:<br>Temperature: %{x}<br>Concentration: %{y}<br>Yield: %{customdata[0]}<extra></extra>'
))

# Create some sample data
df_sample = generate_data(2)

# Add the experimental data on the plot
fig.add_trace(go.Scatter(
    x=df_sample['Temperature'],
    y=df_sample['Concentration'],
    mode='markers',
    marker=dict(size=10, color='white', symbol='circle',
                line=dict(width=2, color='black')),
    name='Experimental measurements we have so far',
))

fig.show()

# Save the experimental data to a CSV file
df_sample.to_csv('experimental_data.csv', index=False)

## Bayesian Optimization

Now, we will use the OPTIMA package to perform Bayesian Optimization.
First, we load the data in the right format.

In [22]:
from optima.bo import *

# there is only one outcome here, in the last column (-1)
features, outcomes = read_experimental_data('experimental_data.csv', out_pos=[-1])
print(f"Features:\n{features}")
print(f"Outcomes:\n{outcomes}")

Features:
{'Temperature': {'type': 'float', 'data': [0.0, 100.0, 0.0, 100.0], 'range': [np.float64(0.0), np.float64(100.0)]}, 'Concentration': {'type': 'float', 'data': [0.0, 0.0, 1.0, 1.0], 'range': [np.float64(0.0), np.float64(1.0)]}}
Outcomes:
{'Yield': {'type': 'float', 'data': [0.0005530843449776, 0.0005530843701466, 0.0005530843667414, 0.0005530843701476]}}


The ranges and types of the features are automatically determined from the data (see the data printed above). In case you want to change the ranges or types, you can do so editing the corresponding fields in the `features` dict or by providing a `ranges` dict. The ranges should be in the format `{'feature_name': [minvalue,maxvalue]}`. If the ranges are not provided either in `features` or in `ranges`, they will be determined from the data.

In [23]:
ranges = {'Temperature': [-10,100]}

Now, let's create the BOExperiment object. This object contains the data, the features, and the model. The model is a Gaussian Process with a Matern kernel.

In [None]:
bo = BOExperiment(
    features=features, 
    outcomes=outcomes,
    # ranges=ranges,
    N = 1, # number of new points to generate
    maximize=True, # we want to maximize the response
    outcome_constraints=None,
    fixed_features=None, # fixed features are not used here
    # but they can be added as fixed_features = {Temperature: 50, Concentration: 0.5}
    feature_constraints=None, # feature constraints are not used here
    # but they can be added as 
    # feature_constraints = ['Concentration + Temperature <= 200']
    optim = 'sobol', # sobol is used to randomly generate the new points
    # to actually optimize, use optim = 'bo'
)

In [25]:
bo


BOExperiment(
    N=1,
    maximize={'Yield': True},
    outcome_constraints=None,
    feature_constraints=None,
    optim=sobol
)

Input data:

   Temperature  Concentration     Yield
0          0.0            0.0  0.000553
1        100.0            0.0  0.000553
2          0.0            1.0  0.000553
3        100.0            1.0  0.000553
        

In [26]:
new_points = bo.suggest_next_trials()
print(f"New points to sample:\n{new_points}")
fig = bo.plot_model()
fig.show()

New points to sample:
   Temperature  Concentration  Predicted_Yield
0    29.918209       0.479172         0.000553


Now let's do the optimization. We will first perform 5 random point generations, then we will use the Bayesian Optimization algorithm to find the maximum of the function. The algorithm will use the Gaussian Process model to predict the function value at each point and will use the expected improvement criterion to select the next point to evaluate. 

In [None]:
for i in range(30): #let's do 30 iterations
    if i==6:
        bo.optim = 'bo' # change to BO optimization after 5 iterations
    # simulate the new points
    new = bo.suggest_next_trials(with_predicted=False)
    newT = new['Temperature'].values
    newC = new['Concentration'].values
    # perform an experiment to measure the response at these points
    # here we just simulate the response using the experimental data function
    # in a real experiment, you would measure the response at these points
    # and add the new points to the experimental data
    response = experimental_data(newT, newC)
    # add the new points to the experimental data
    bo.update_experiment(params   = {'Temperature':newT, 'Concentration':newC}, 
                         outcomes = {'Yield': response})

Now le'ts plot the model:

In [34]:
bo.plot_model()

In [29]:
print(f"Best parameters from BO:")
print(bo.get_best_parameters())
print(f"Expected best parameters:")
print(pd.DataFrame({'Temperature': max_temp, 'Concentration': max_conc, 'Yield':max_yield}, index=[0]))

Best parameters from BO:
   Temperature  Concentration     Yield
0    61.204647       0.504263  0.820285
Expected best parameters:
   Temperature  Concentration     Yield
0    61.322645       0.503006  0.820258


And the convergence plot: we see here that the maximum was found after the 9th iteration. You can play on the number of random points to see how it affects the convergence.

In [31]:
bo.plot_optimization_trace(optimum=max_yield)