# Function 6

## Function Description

You’re optimising a cake recipe using a black-box function with five ingredient inputs, for example flour, sugar, eggs, butter and milk. Each recipe is evaluated with a combined score based on flavour, consistency, calories, waste and cost, where each factor contributes negative points as judged by an expert taster. This means the total score is negative by design. 

To frame this as a maximisation problem, your goal is to bring that score as close to zero as possible or, equivalently, to maximise the negative of the total sum.

## Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, WhiteKernel, ConstantKernel

## Data

In [2]:
# Initialize the dataset (Function 6)
df_init = pd.DataFrame({
    "x1": [0.728186105, 0.242384347, 0.72952261, 0.770620242, 0.618812299,
           0.784958094, 0.145110786, 0.945069068, 0.125720155, 0.757594355,
           0.536796903, 0.957739669, 0.629307895, 0.021735308, 0.439344264,
           0.258905574, 0.432165933, 0.782879823, 0.921776198, 0.126678917],

    "x2": [0.15469257, 0.844099972, 0.7481062, 0.114403744, 0.331802137,
           0.910682349, 0.896684598, 0.288459051, 0.862724692, 0.355831415,
           0.308780907, 0.235668572, 0.803483678, 0.42808424, 0.698923833,
           0.793677708, 0.715617813, 0.536335859, 0.931871216, 0.291470301],

    "x3": [0.732551669, 0.577809099, 0.679774641, 0.046779932, 0.187287868,
           0.708120104, 0.896322235, 0.978805764, 0.028544332, 0.0165229,
           0.411879285, 0.09914585, 0.811408439, 0.835939437, 0.426820222,
           0.642113905, 0.341819103, 0.443283557, 0.41487637, 0.064528477],

    "x4": [0.693996509, 0.679021284, 0.356552279, 0.648324285, 0.756238474,
           0.959225429, 0.726271537, 0.961655587, 0.246605272, 0.434207205,
           0.388225177, 0.156805934, 0.045613186, 0.489488659, 0.109476085,
           0.196673464, 0.704999881, 0.859699826, 0.595057266, 0.680514603],

    "x5": [0.056401311, 0.501952888, 0.671053683, 0.273549053, 0.328834798,
           0.004911496, 0.236271991, 0.598015936, 0.751206241, 0.112433044,
           0.522528304, 0.071317373, 0.110624462, 0.511081735, 0.877888468,
           0.593103177, 0.614961845, 0.010325991, 0.735625686, 0.892819191],

    "y": [-0.714264948, -1.209955245, -1.67219994, -1.536057709, -0.829236552,
          -1.247048927, -1.233786381, -1.694343442, -2.571169632, -1.309116353,
          -1.144784851, -1.912677143, -1.622838952, -1.356682109, -2.018425399,
          -1.702557841, -1.294246965, -0.935756555, -2.155767764, -1.746882087]
})
new_data = [
    (0.063639, 0.162438, 0.357843, 0.944475, 0.041057, -1.02314603833499),  # week 1
    (0.379296, 0.053792, 0.951098, 0.967612, 0.029988, -0.975870838600801),  # week 2
    (0.220644, 0.918626, 0.225872, 0.998594, 0.068566, -1.43039848098564),  # week 3
    (0.067099, 0.003709, 0.897369, 0.262107, 0.016609, -1.45149633863228),  # week 4
    (0.436673, 0.337150, 0.438363, 0.700006, 0.102384, -0.353070583393954),  # week 5
    (0.444383, 0.381786, 0.666648, 0.843050, 0.205667, -0.196025191696168),  # week 6
    (0.352036, 0.305168, 0.546117, 0.855492, 0.315926, -0.479012990125332),  # week 7
    (0.494493, 0.420382, 0.679832, 0.804002, 0.151804, -0.231698049175702),  # week 8
    (0.399756, 0.403500, 0.735563, 0.666335, 0.037895, -0.301043542660978),  # week 9
    (0.474866, 0.411705, 0.597222, 0.698148, 0.101653, -0.157178787074027),  # week 10
    (0.490841, 0.352466, 0.653892, 0.648082, 0.164898, -0.231719160224005),  # week 11
    (0.437498, 0.498382, 0.601102, 0.688545, 0.141083, -0.415319449923454),  # week 12
    (0.487483, 0.345913, 0.660707, 0.772383, 0.026845, -0.192394957383667),  # week 13
]
df_new = pd.DataFrame(new_data, columns=["x1", "x2", "x3", "x4", "x5", "y"])
df_all = pd.concat([df_init, df_new], ignore_index=True)
# Extract input (X) and output (y)
X_check = df_all[["x1", "x2", "x3", "x4", "x5"]].values  # shape (20, 5)
y_check = df_all["y"].values.reshape(-1, 1)  # shape (20, 1)

print("Dataset shape:", X_check.shape, y_check.shape)
print(df_all.tail())

# For later use in model training
X_init = df_all[["x1", "x2", "x3", "x4", "x5"]].to_numpy()
y_raw = df_all["y"].to_numpy()

Dataset shape: (33, 5) (33, 1)
          x1        x2        x3        x4        x5         y
28  0.399756  0.403500  0.735563  0.666335  0.037895 -0.301044
29  0.474866  0.411705  0.597222  0.698148  0.101653 -0.157179
30  0.490841  0.352466  0.653892  0.648082  0.164898 -0.231719
31  0.437498  0.498382  0.601102  0.688545  0.141083 -0.415319
32  0.487483  0.345913  0.660707  0.772383  0.026845 -0.192395


## Optimisation Model

In [3]:
# --- Adjustable parameters ---
n_candidates = 20000  # number of random candidate points to explore
nu = 2.5  # smoothness parameter for Matern kernel
noise_level = 1.0  # assumed noise (for WhiteKernel)
length_scale = 0.3  # initial length scale for Matern
kappa = 1.0  # exploration parameter for UCB (higher = more exploration)
random_state = 42  # reproducibility

# --- Define kernel and GP model ---
kernel = ConstantKernel(1.0, (1e-2, 1e2)) * Matern(length_scale=length_scale, nu=nu) + WhiteKernel(
    noise_level=noise_level)
gp = GaussianProcessRegressor(kernel=kernel, normalize_y=True, random_state=random_state)

# --- Fit GP to initial data ---
gp.fit(X_init, y_raw)

# --- Generate candidate points uniformly in [0,1]^5 ---
X_candidates = np.random.rand(n_candidates, 5)

# --- Predict mean and std for each candidate ---
mean, std = gp.predict(X_candidates, return_std=True)

# --- Convert mean/std back to original y scale ---
mean_orig = mean * np.std(y_raw) + np.mean(y_raw)
std_orig = std * np.std(y_raw)

# --- Compute UCB acquisition function (in original scale) ---
ucb = mean_orig + kappa * std_orig  # for maximization

# --- Get top 5 candidates ---
top_idx = np.argsort(ucb)[-5:][::-1]
top_candidates = X_candidates[top_idx]
top_ucb_values = ucb[top_idx]
top_pred_y = mean_orig[top_idx]

# --- Display results ---
df_top = pd.DataFrame(top_candidates, columns=["x1", "x2", "x3", "x4", "x5"])
df_top["Pred_y"] = top_pred_y
df_top["UCB_value"] = top_ucb_values

print("\nTop 5 candidate points (highest UCB):")
print(df_top)
print("\nBest guess (highest UCB):")
print(df_top.iloc[0])


Top 5 candidate points (highest UCB):
         x1        x2        x3        x4        x5    Pred_y  UCB_value
0  0.476959  0.293031  0.678857  0.756301  0.130248 -1.270497  -1.220860
1  0.376799  0.273158  0.633744  0.711137  0.095047 -1.305826  -1.247652
2  0.497171  0.376446  0.592753  0.873298  0.194631 -1.304371  -1.250890
3  0.582629  0.334631  0.642448  0.759214  0.051953 -1.303408  -1.252184
4  0.551698  0.351824  0.760460  0.708541  0.153464 -1.319204  -1.261317

Best guess (highest UCB):
x1           0.476959
x2           0.293031
x3           0.678857
x4           0.756301
x5           0.130248
Pred_y      -1.270497
UCB_value   -1.220860
Name: 0, dtype: float64
