##  Importing Libraries and Checking spacier Version
In this initial setup, we import necessary libraries including `pandas` for data manipulation, and `sys` to modify the Python path to include our custom module. We then import `model` and `spacier` from our `spacier` package. This cell concludes by printing the version of the `spacier` package, ensuring we are using the intended version for our analysis. `spacier` is a custom library tailored for this project, enabling advanced data analysis and modeling techniques.

In [1]:
import pandas as pd
import numpy as np
import sys
sys.path.append('../')
from spacier.ml import model, spacier

print("spacier: ", spacier.__version__)

spacier:  0.0.5


## Data preparation (Binh and Korn function)

In [2]:
num_samples = 10000
x_min, x_max = -5, 15


np.random.seed(0)
x_samples = np.random.uniform(x_min, x_max, (num_samples, 2))

X = pd.DataFrame(x_samples, columns=["x1", "x2"])

df_X = X.iloc[:10]
df_pool_X = X.iloc[10:].reset_index(drop=True)

##
df = pd.DataFrame({
    "y1": -(4 * df_X["x1"]**2 + 4 * df_X["x2"]**2),
    "y2": -((df_X["x1"] - 5)**2 + (df_X["x2"] - 5)**2)
})

## Random Sampling

This section demonstrates random sampling, a method where we randomly select a subset of data from our pool dataset. Random sampling is a basic but effective strategy for selecting data points without any inherent bias, often used as a baseline in various data analysis tasks.

In [3]:
new_index = spacier.Random(df_X, df, df_pool_X).sample(10)
print(new_index)

Number of candidates :  9990
[8432, 6361, 996, 6319, 1660, 1081, 3807, 2785, 1465, 8228]


## Uncertainty Sampling

In this part, we utilize uncertainty sampling, a technique often employed in active learning. It involves selecting samples for which the model has the lowest confidence in its predictions. This method is beneficial for improving model performance efficiently by focusing on learning from ambiguous or challenging examples.

In [4]:
new_index = spacier.BO(df_X, df, df_pool_X, "sklearn_GP", ["y1"]).uncertainty(10)
print(new_index)

Number of training data :  10
Number of candidates :  9990
[6873, 1017, 8530, 9456, 8781, 9741, 39, 3954, 2999, 4314]


## Probability of Improvement (PI)

Probability of Improvement is a strategy used in Bayesian optimization to select the next point to evaluate by maximizing the probability of achieving improvement over the current best observation. It's particularly useful in optimizing performance criteria under uncertainty.

In [5]:
new_index = spacier.BO(df_X, df, df_pool_X,  "sklearn_GP", ["y1"]).PI([[-20, 0]], 10)
print(new_index)

Number of training data :  10
Number of candidates :  9990
[5777, 8172, 2485, 7238, 1271, 5306, 3508, 1727, 9205, 2292]


This cell is a continuation of the previous PI method, now incorporating an additional parameter, `y1`, alongside `y2`. This demonstrates how PI can be adapted to multi-dimensional scenarios, enhancing the model's ability to navigate more complex optimization landscapes.

In [6]:
new_index = spacier.BO(df_X, df, df_pool_X, "sklearn_GP", ["y1", "y2"]).PI([[-20, 10], [-40, -20]], 10)
new_index

Number of training data :  10
Number of candidates :  9990


[8571, 2292, 1331, 3630, 2374, 946, 1249, 9036, 7857, 5452]

## Expected Improvement (EI)

Expected Improvement is another technique from the realm of Bayesian optimization. It chooses the next query point by considering both the expected improvement and the uncertainty of the outcome. EI is particularly effective in scenarios where we aim to balance exploration (of uncharted territories) and exploitation (of known valuable areas).

In [7]:
new_index = spacier.BO(df_X, df, df_pool_X, "sklearn_GP", ["y1"]).EI(10)
print(new_index)

Number of training data :  10
Number of candidates :  9990
[5950, 508, 2266, 5068, 5160, 6834, 5738, 9397, 659, 88]


## Upper Confidence Bound (UCB)

The Upper Confidence Bound algorithm is a balance between exploring uncertain areas and exploiting known areas of the parameter space. It's used in decision-making processes where there's a need to balance the exploration of untested options with the exploitation of current knowledge.


In [8]:
new_index = spacier.BO(df_X, df, df_pool_X, "sklearn_GP", ["y1"]).UCB(10)
print(new_index)

Number of training data :  10
Number of candidates :  9990
[5738, 508, 2266, 6834, 5160, 9946, 5068, 6342, 5950, 659]


## Expected Hypervolume Improvement (EHVI)

Expected Hypervolume Improvement is a multi-objective optimization strategy used in Bayesian optimization. It aims to select points that are expected to most improve the 'hypervolume' metric, a measure of space covered by the Pareto front in multi-objective optimization. This method is valuable when dealing with trade-offs between two or more conflicting objectives.

In [9]:
%%time
new_index = spacier.BO(df_X, df, df_pool_X, "sklearn_GP", ["y1", "y2"], standardization=True).EHVI(10)
print(new_index)

Number of training data :  10
Number of candidates :  9990
[1618, 1669, 1821, 9453, 5018, 1661, 665, 7744, 2374, 6616]
CPU times: total: 22.8 s
Wall time: 22.9 s
