# Function 7

## Function Description

You’re tasked with optimising an ML model by tuning six hyperparameters, for example learning rate, regularisation strength or number of hidden layers. The function you’re maximising is the model’s performance score (such as accuracy or F1), but since the relationship between inputs and output isn’t known, it’s treated as a black-box function. 

Because this is a commonly used model, you might benefit from researching best practices or literature to guide your initial search space. Your goal is to find the combination of hyperparameters that yields the highest possible performance.

## Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, WhiteKernel, ConstantKernel

## Data

In [2]:
# Initialize the dataset (Function 7)
df_init = pd.DataFrame({
    "x1": [0.272623822, 0.543002577, 0.090832248, 0.118866975, 0.630217641,
           0.764919173, 0.057895542, 0.195251881, 0.642302982, 0.789942554,
           0.528497328, 0.722615215, 0.075664921, 0.942450839, 0.148647021,
           0.817112389, 0.417626293, 0.726285664, 0.319810433, 0.879871278,
           0.54124078, 0.226347921, 0.68685257, 0.175977537, 0.881646737,
           0.066610511, 0.93246638, 0.846866965, 0.806282084, 0.476823129],
    "x2": [0.324495362, 0.924693904, 0.661529382, 0.61505494, 0.838096896,
           0.255882917, 0.491672219, 0.079226651, 0.836874547, 0.195545005,
           0.457424355, 0.011812837, 0.334502119, 0.377439621, 0.033943363,
           0.548168231, 0.064099981, 0.464895805, 0.520097588, 0.39796199,
           0.631403143, 0.115025814, 0.041017208, 0.624416497, 0.204450188,
           0.528045066, 0.488811888, 0.142429172, 0.324122375, 0.34094195],
    "x3": [0.89710881, 0.341567459, 0.065930911, 0.905816385, 0.680013052,
           0.609084224, 0.247422224, 0.554580462, 0.021792691, 0.575623326,
           0.36009569, 0.063645913, 0.132732737, 0.486122326, 0.72880565,
           0.103347578, 0.245668774, 0.924570514, 0.290677754, 0.003634565,
           0.031902046, 0.824749655, 0.007573011, 0.295541984, 0.414474359,
           0.816095195, 0.258607742, 0.06066859, 0.726076013, 0.014335232],
    "x4": [0.832951152, 0.646485849, 0.258577008, 0.855300304, 0.73189509,
           0.218079042, 0.218118436, 0.17056682, 0.10148801, 0.073659187,
           0.362045506, 0.165173106, 0.608312361, 0.228791083, 0.316066456,
           0.124369546, 0.559040796, 0.807245395, 0.876706678, 0.95699064,
           0.44998156, 0.945383725, 0.285008996, 0.469552759, 0.420384678,
           0.961017137, 0.956243436, 0.756292128, 0.148712132, 0.880139563],
    "x5": [0.154062685, 0.718440327, 0.963452851, 0.413631429, 0.526736715,
           0.322942769, 0.42042833, 0.014944176, 0.683070828, 0.259049174,
           0.816890978, 0.079244146, 0.918385918, 0.082631747, 0.021769383,
           0.728234821, 0.191531384, 0.635438395, 0.495034689, 0.264513733,
           0.798652816, 0.905311528, 0.691568484, 0.09776977, 0.264915015,
           0.086509334, 0.19042781, 0.552398295, 0.719376401, 0.998654696],
    "x6": [0.795863623, 0.343132664, 0.640265398, 0.585235628, 0.348429213,
           0.095793655, 0.730969843, 0.10703171, 0.6924164, 0.051099864,
           0.637476366, 0.359951657, 0.822330786, 0.711957551, 0.516917757,
           0.449673612, 0.254640923, 0.143417874, 0.6190825, 0.114869241,
           0.633704291, 0.951013915, 0.655542897, 0.728141081, 0.730660187,
           0.777788216, 0.519851758, 0.081306087, 0.362883978, 0.079664019],
    "y":  [0.604432696, 0.562753067, 0.007503237, 0.061424303, 0.273046801,
           0.083746572, 1.364968304, 0.092644955, 0.017869599, 0.033564936,
           0.073516304, 0.206309698, 0.008825634, 0.268400317, 0.611525528,
           0.014798183, 0.274892508, 0.066763247, 0.042118355, 0.002701465,
           0.018209073, 0.007016028, 0.100506611, 0.475395516, 0.675141631,
           0.516457219, 0.00377748, 0.003134333, 0.021342523, 0.095411159]
})
new_data = [
    (0.05969, 0.49062, 0.44211, 0.221421, 0.37082, 0.623604, 1.89516263652999),  # week 1
    (0.019217, 0.482437, 0.580522, 0.144951, 0.419265, 0.593028, 1.48054536890545),  # week 2
    (0.025015, 0.417002, 0.436375, 0.284062, 0.476013, 0.566835, 1.49135444841371),  # week 3
    (0.026324, 0.490783, 0.514007, 0.244984, 0.333210, 0.737171, 1.88126737867531),  # week 4
    (0.034315, 0.236644, 0.418646, 0.112691, 0.314479, 0.636966, 2.26599434707379),  # week 5
    (0.055358, 0.225496, 0.333946, 0.117042, 0.158492, 0.768427, 1.41191883436077),  # week 6
    (0.042796, 0.270499, 0.523765, 0.193193, 0.301244, 0.762064, 2.49771512713392),  # week 7
    (0.001992, 0.357589, 0.422128, 0.180666, 0.321347, 0.748367, 2.09593695799762),  # week 8
    (0.108398, 0.227876, 0.509489, 0.097279, 0.358231, 0.826052, 1.8991103201826),  # week 9
    (0.001190, 0.186782, 0.525036, 0.212273, 0.310239, 0.595689, 2.65150382741171),  # week 10
    (0.000986, 0.229214, 0.607506, 0.341165, 0.334450, 0.699692, 2.50440355703788),  # week 11
    (0.051607, 0.202300, 0.557931, 0.235196, 0.255594, 0.544375, 2.52608780108705),  # week 12
    (0.001190, 0.186782, 0.525036, 0.212273, 0.310239, 0.595689, 2.65150382741171),  # week 13
]
df_new = pd.DataFrame(new_data, columns=["x1", "x2", "x3", "x4", "x5", "x6", "y"])
df_all = pd.concat([df_init, df_new], ignore_index=True)
# Extract input (X) and output (y)
X_check = df_all[["x1", "x2", "x3", "x4", "x5", "x6"]].values  # shape (n, 6)
y_check = df_all["y"].values.reshape(-1, 1)  # shape (n, 1)

print("Dataset shape:", X_check.shape, y_check.shape)
print(df_all.tail())

# For later use in model training
X_init = df_all[["x1", "x2", "x3", "x4", "x5", "x6"]].to_numpy()
y_raw = df_all["y"].to_numpy()

Dataset shape: (43, 6) (43, 1)
          x1        x2        x3        x4        x5        x6         y
38  0.108398  0.227876  0.509489  0.097279  0.358231  0.826052  1.899110
39  0.001190  0.186782  0.525036  0.212273  0.310239  0.595689  2.651504
40  0.000986  0.229214  0.607506  0.341165  0.334450  0.699692  2.504404
41  0.051607  0.202300  0.557931  0.235196  0.255594  0.544375  2.526088
42  0.001190  0.186782  0.525036  0.212273  0.310239  0.595689  2.651504


## Optimisation Model

In [3]:
# --- Adjustable parameters ---
n_candidates = 20000  # number of random candidate points to explore
nu = 2.5  # smoothness parameter for Matern kernel
noise_level = 1.0  # assumed noise (for WhiteKernel)
length_scale = 0.2  # initial length scale for Matern
kappa = 1.0  # exploration parameter for UCB (higher = more exploration)
random_state = 42  # reproducibility

# --- Define kernel and GP model ---
kernel = ConstantKernel(1.0, (1e-2, 1e2)) * Matern(length_scale=length_scale, nu=nu) + WhiteKernel(
    noise_level=noise_level)
gp = GaussianProcessRegressor(kernel=kernel, normalize_y=True, random_state=random_state)

# --- Fit GP to initial data ---
gp.fit(X_init, y_raw)

# --- Generate candidate points uniformly in [0,1]^6 ---
X_candidates = np.random.rand(n_candidates, 6)

# --- Predict mean and std for each candidate ---
mean, std = gp.predict(X_candidates, return_std=True)

# --- Convert mean/std back to original y scale ---
mean_orig = mean * np.std(y_raw) + np.mean(y_raw)
std_orig = std * np.std(y_raw)

# --- Compute UCB acquisition function (in original scale) ---
ucb = mean_orig + kappa * std_orig  # for maximization

# --- Get top 5 candidates ---
top_idx = np.argsort(ucb)[-5:][::-1]
top_candidates = X_candidates[top_idx]
top_ucb_values = ucb[top_idx]
top_pred_y = mean_orig[top_idx]

# --- Display results ---
df_top = pd.DataFrame(top_candidates, columns=["x1", "x2", "x3", "x4", "x5", "x6"])
df_top["Pred_y"] = top_pred_y
df_top["UCB_value"] = top_ucb_values

print("\nTop 5 candidate points (highest UCB):")
print(df_top)
print("\nBest guess (highest UCB):")
print(df_top.iloc[0])


Top 5 candidate points (highest UCB):
         x1        x2        x3        x4        x5        x6    Pred_y  \
0  0.025598  0.226841  0.517254  0.279975  0.265219  0.675452  3.279032   
1  0.065647  0.322461  0.606360  0.335063  0.192871  0.684458  2.910550   
2  0.042777  0.088511  0.485700  0.150081  0.282249  0.653322  2.920398   
3  0.053865  0.179907  0.460770  0.231220  0.209749  0.742968  2.897710   
4  0.087502  0.181909  0.652997  0.281382  0.161291  0.700308  2.796760   

   UCB_value  
0   3.381169  
1   3.105875  
2   3.090583  
3   3.053802  
4   3.003088  

Best guess (highest UCB):
x1           0.025598
x2           0.226841
x3           0.517254
x4           0.279975
x5           0.265219
x6           0.675452
Pred_y       3.279032
UCB_value    3.381169
Name: 0, dtype: float64


