# Advanced Tuning: Bayesian Optimization with Gaussian Processes (tune_bayes)

This notebook demonstrates **Bayesian optimization** for hyperparameter tuning using Gaussian Process surrogates.

## Key Benefits:
- **Most sample-efficient**: Best for expensive model evaluations
- **Probabilistic surrogate**: Models uncertainty in performance
- **Acquisition functions**: Balances exploration vs exploitation
- **Sequential**: Each evaluation informs the next choice

## Bayesian Optimization Algorithm:
1. **Initial phase**: Random sampling (build surrogate)
2. **Fit GP**: Model performance surface with uncertainty
3. **Acquisition**: Find point maximizing expected improvement/utility
4. **Evaluate**: Test the proposed configuration
5. **Update GP**: Incorporate new observation
6. **Repeat** until budget exhausted

## Acquisition Functions:
- **Expected Improvement (EI)**: How much better than current best?
- **Probability of Improvement (PI)**: Chance of beating current best?
- **Upper Confidence Bound (UCB)**: Optimistic estimate with exploration bonus

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import time
warnings.filterwarnings('ignore')

# py-tidymodels imports
from py_workflows import workflow
from py_parsnip import linear_reg, boost_tree, svm_rbf
from py_rsample import vfold_cv
from py_yardstick import metric_set, rmse, mae, r_squared
from py_tune import (
    tune, grid_regular, tune_grid,
    tune_bayes, control_bayes
)

print("✓ All imports successful")

## Load and Prepare Data

In [None]:
# Load data
df = pd.read_csv('../_md/__data/preem.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.drop(columns=['date'])  # Drop date to avoid patsy categorical issues

print(f"Dataset shape: {df.shape}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")

df.head()

In [None]:
# Define formula
FORMULA = "target ~ totaltar + mean_med_diesel_crack_input1_trade_month_lag2 + mean_nwe_hsfo_crack_trade_month_lag1 + mean_nwe_lsfo_crack_trade_month"

print(f"Formula: {FORMULA}")

## 1. Elastic Net with Expected Improvement

### 1.1 Setup

In [None]:
# Elastic Net workflow
wf_elasticnet = (
    workflow()
    .add_formula(FORMULA)
    .add_model(
        linear_reg(
            penalty=tune(),
            mixture=tune()
        )
    )
)

# Define parameter space
param_info = {
    'penalty': {'range': (0.001, 10.0), 'trans': 'log'},
    'mixture': {'range': (0.0, 1.0)}
}

print("Parameter space:")
for param, info in param_info.items():
    trans = info.get('trans', 'none')
    print(f"  {param}: {info['range']} (trans: {trans})")

In [None]:
# Create CV folds
cv_folds = vfold_cv(df, v=5)
print(f"Created {len(cv_folds)} CV folds")

### 1.2 Baseline: Grid Search

In [None]:
# Grid search for comparison
grid = grid_regular(param_info, levels=10)  # 100 combinations

print(f"Running grid search with {len(grid)} combinations...")
start_time = time.time()

grid_results = tune_grid(
    wf_elasticnet,
    resamples=cv_folds,
    grid=grid,
    metrics=metric_set(rmse, mae)
)

grid_time = time.time() - start_time
grid_best_rmse = grid_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]

print(f"✓ Grid search: {grid_time:.1f}s, best RMSE: {grid_best_rmse:.4f}")

### 1.3 Bayesian Optimization with Expected Improvement

In [None]:
# Configure Bayesian optimization
bayes_ctrl = control_bayes(
    n_initial=5,           # Random initial points to build GP
    n_iter=20,             # Bayesian optimization iterations
    acquisition='ei',      # Expected Improvement
    kappa=2.576,           # Exploration parameter (for UCB)
    xi=0.01,               # Exploration parameter (for EI/PI)
    verbose=True
)

print("Bayesian Optimization Configuration:")
print(f"  Initial random points: {bayes_ctrl.n_initial}")
print(f"  Bayesian iterations: {bayes_ctrl.n_iter}")
print(f"  Total evaluations: {bayes_ctrl.n_initial + bayes_ctrl.n_iter}")
print(f"  Acquisition function: {bayes_ctrl.acquisition}")
print(f"  Exploration (xi): {bayes_ctrl.xi}")

In [None]:
# Run Bayesian optimization
print("\nRunning Bayesian optimization with Expected Improvement...\n")
start_time = time.time()

bayes_results = tune_bayes(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse, mae),
    control=bayes_ctrl
)

bayes_time = time.time() - start_time
print(f"\n✓ Bayesian optimization complete in {bayes_time:.1f} seconds")

In [None]:
# Compare results
bayes_best_rmse = bayes_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]
n_bayes_evals = len(bayes_results.grid)
n_grid_evals = len(grid)

print("=" * 60)
print("COMPARISON: Grid Search vs Bayesian Optimization")
print("=" * 60)
print(f"\nGrid search:")
print(f"  Evaluations: {n_grid_evals}")
print(f"  Time: {grid_time:.1f}s")
print(f"  Best RMSE: {grid_best_rmse:.4f}")
print(f"\nBayesian Optimization:")
print(f"  Evaluations: {n_bayes_evals}")
print(f"  Time: {bayes_time:.1f}s")
print(f"  Best RMSE: {bayes_best_rmse:.4f}")
print(f"\nEfficiency:")
print(f"  Speedup: {grid_time / bayes_time:.2f}x")
print(f"  Evaluation reduction: {(1 - n_bayes_evals/n_grid_evals)*100:.1f}%")
print(f"  Performance ratio: {bayes_best_rmse / grid_best_rmse:.4f}")

if bayes_best_rmse <= grid_best_rmse * 1.01:
    print("\n✓ Bayesian optimization found comparable/better solution with 4x fewer evaluations!")

### 1.4 Visualize Bayesian Optimization Progress

In [None]:
# Extract search trajectory
bayes_metrics = bayes_results.metrics
rmse_values = bayes_metrics[bayes_metrics['metric'] == 'rmse'].groupby('.config')['value'].mean()
configs = [int(c.split('_')[1]) for c in rmse_values.index]

# Separate initial random phase from Bayesian phase
n_initial = bayes_ctrl.n_initial
initial_idx = configs[:n_initial]
bayes_idx = configs[n_initial:]
initial_rmse = rmse_values.iloc[:n_initial].values
bayes_rmse = rmse_values.iloc[n_initial:].values

# Plot convergence
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# RMSE over iterations (split random vs Bayesian)
ax1.scatter(initial_idx, initial_rmse, c='gray', s=80, alpha=0.6, label='Random initial')
ax1.scatter(bayes_idx, bayes_rmse, c='blue', s=80, alpha=0.6, label='Bayesian')
ax1.plot(configs, np.minimum.accumulate(rmse_values.values), 'r-', linewidth=2, label='Best so far')
ax1.axhline(grid_best_rmse, color='green', linestyle='--', label='Grid search best')
ax1.axvline(n_initial, color='gray', linestyle=':', alpha=0.5, label='End of random phase')
ax1.set_xlabel('Iteration')
ax1.set_ylabel('RMSE')
ax1.set_title('Bayesian Optimization Convergence')
ax1.legend(loc='upper right')
ax1.grid(True, alpha=0.3)

# Parameter space exploration
penalty_values = bayes_results.grid['penalty'].values
mixture_values = bayes_results.grid['mixture'].values
scatter = ax2.scatter(penalty_values[:n_initial], mixture_values[:n_initial], 
                     c='gray', s=100, alpha=0.6, edgecolors='black', linewidth=0.5, label='Random')
scatter = ax2.scatter(penalty_values[n_initial:], mixture_values[n_initial:], 
                     c=bayes_rmse, cmap='viridis', s=100, edgecolors='black', linewidth=0.5, label='Bayesian')
ax2.set_xscale('log')
ax2.set_xlabel('Penalty (log scale)')
ax2.set_ylabel('Mixture')
ax2.set_title('Parameter Space Exploration')
ax2.legend()
plt.colorbar(scatter, ax=ax2, label='RMSE')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✓ Left: Sequential improvement over iterations")
print("✓ Right: GP guides search to promising regions (dark = better)")

## 2. Comparing Acquisition Functions

### 2.1 Probability of Improvement (PI)

In [None]:
# Probability of Improvement
pi_ctrl = control_bayes(
    n_initial=5,
    n_iter=20,
    acquisition='pi',  # Probability of Improvement
    xi=0.01,
    verbose=False
)

print("Running Bayesian optimization with Probability of Improvement...")
start_time = time.time()

pi_results = tune_bayes(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse),
    control=pi_ctrl
)

pi_time = time.time() - start_time
pi_best_rmse = pi_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]

print(f"✓ PI: {pi_time:.1f}s, best RMSE: {pi_best_rmse:.4f}")

### 2.2 Upper Confidence Bound (UCB)

In [None]:
# Upper Confidence Bound
ucb_ctrl = control_bayes(
    n_initial=5,
    n_iter=20,
    acquisition='ucb',  # Upper Confidence Bound
    kappa=2.576,        # 99% confidence interval
    verbose=False
)

print("Running Bayesian optimization with Upper Confidence Bound...")
start_time = time.time()

ucb_results = tune_bayes(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse),
    control=ucb_ctrl
)

ucb_time = time.time() - start_time
ucb_best_rmse = ucb_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]

print(f"✓ UCB: {ucb_time:.1f}s, best RMSE: {ucb_best_rmse:.4f}")

### 2.3 Compare Acquisition Functions

In [None]:
print("=" * 60)
print("ACQUISITION FUNCTION COMPARISON")
print("=" * 60)
print(f"\nExpected Improvement (EI):     RMSE = {bayes_best_rmse:.4f}, {bayes_time:.1f}s")
print(f"Probability of Improvement (PI): RMSE = {pi_best_rmse:.4f}, {pi_time:.1f}s")
print(f"Upper Confidence Bound (UCB):   RMSE = {ucb_best_rmse:.4f}, {ucb_time:.1f}s")

print("\nCharacteristics:")
print("  EI: Balanced exploration/exploitation (default choice)")
print("  PI: More exploitative (prefers immediate improvement)")
print("  UCB: More exploratory (optimistic exploration bonus)")

## 3. XGBoost with Bayesian Optimization

Test on high-dimensional parameter space.

In [None]:
# XGBoost workflow (4D parameter space)
wf_xgb = (
    workflow()
    .add_formula(FORMULA)
    .add_model(
        boost_tree(
            trees=tune(),
            tree_depth=tune(),
            learn_rate=tune(),
            min_n=tune()
        ).set_mode("regression").set_engine("xgboost")
    )
)

# 4D parameter space
xgb_param_info = {
    'trees': {'range': (50, 300), 'type': 'int'},
    'tree_depth': {'range': (3, 10), 'type': 'int'},
    'learn_rate': {'range': (0.001, 0.3), 'trans': 'log'},
    'min_n': {'range': (2, 40), 'type': 'int'}
}

print("XGBoost parameter space (4D):")
for param, info in xgb_param_info.items():
    print(f"  {param}: {info['range']}")

In [None]:
# Run Bayesian optimization on XGBoost
xgb_bayes_ctrl = control_bayes(
    n_initial=10,  # More initial points for 4D space
    n_iter=30,     # More iterations
    acquisition='ei',
    verbose=True
)

print("\nRunning Bayesian optimization on XGBoost (4D)...\n")
start_time = time.time()

xgb_bayes_results = tune_bayes(
    wf_xgb,
    resamples=cv_folds,
    param_info=xgb_param_info,
    metrics=metric_set(rmse, mae, r_squared),
    control=xgb_bayes_ctrl
)

xgb_bayes_time = time.time() - start_time
print(f"\n✓ XGBoost Bayesian optimization complete in {xgb_bayes_time:.1f} seconds")

In [None]:
# Show best XGBoost configurations
print("Top 5 XGBoost configurations:")
xgb_bayes_results.show_best(metric="rmse", n=5, maximize=False)

In [None]:
# Compare with grid search equivalence
n_xgb_evals = len(xgb_bayes_results.grid)
equivalent_grid_size = 5 ** 4  # 5 levels for 4 parameters

print(f"\nXGBoost efficiency:")
print(f"  Bayesian Optimization: {n_xgb_evals} evaluations")
print(f"  Equivalent grid (5^4): {equivalent_grid_size} evaluations")
print(f"  Reduction: {(1 - n_xgb_evals/equivalent_grid_size)*100:.1f}%")
print(f"\n✓ Bayesian optimization excels in high-dimensional spaces")

## 4. Advanced: Custom Exploration Parameters

### 4.1 Effect of xi (EI exploration)

In [None]:
# Test different xi values
xi_values = [0.001, 0.01, 0.1, 0.5]
xi_results = {}

print("Testing different xi values (EI exploration parameter)...\n")

for xi in xi_values:
    print(f"xi = {xi}...")
    ctrl = control_bayes(n_initial=5, n_iter=15, acquisition='ei', xi=xi, verbose=False)
    
    results = tune_bayes(
        wf_elasticnet,
        resamples=cv_folds,
        param_info=param_info,
        metrics=metric_set(rmse),
        control=ctrl
    )
    
    best_rmse = results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]
    xi_results[xi] = best_rmse
    print(f"  → Best RMSE: {best_rmse:.4f}\n")

print("\nxi comparison:")
print("=" * 50)
for xi, rmse in xi_results.items():
    print(f"xi = {xi:6.3f}:  RMSE = {rmse:.4f}")

print("\n✓ Lower xi = more exploitation (greedy)")
print("✓ Higher xi = more exploration (optimistic)")

### 4.2 Effect of kappa (UCB exploration)

In [None]:
# Test different kappa values
kappa_values = [1.0, 2.576, 5.0]  # 68%, 99%, aggressive
kappa_results = {}

print("Testing different kappa values (UCB exploration parameter)...\n")

for kappa in kappa_values:
    print(f"kappa = {kappa}...")
    ctrl = control_bayes(n_initial=5, n_iter=15, acquisition='ucb', kappa=kappa, verbose=False)
    
    results = tune_bayes(
        wf_elasticnet,
        resamples=cv_folds,
        param_info=param_info,
        metrics=metric_set(rmse),
        control=ctrl
    )
    
    best_rmse = results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]
    kappa_results[kappa] = best_rmse
    print(f"  → Best RMSE: {best_rmse:.4f}\n")

print("\nkappa comparison:")
print("=" * 50)
for kappa, rmse in kappa_results.items():
    print(f"kappa = {kappa:5.3f}:  RMSE = {rmse:.4f}")

print("\n✓ Lower kappa = more exploitation (mean-focused)")
print("✓ Higher kappa = more exploration (uncertainty bonus)")

## 5. Summary and Best Practices

### When to use Bayesian Optimization:
- ✓ **Expensive models** (slow to train, e.g., deep learning)
- ✓ **Limited budget** (can only afford 20-50 evaluations)
- ✓ **Continuous parameters** (works on continuous spaces)
- ✓ **Black-box optimization** (no gradient information)
- ✓ **Sequential** tuning (can't parallelize evaluations)

### Acquisition Function Selection:

**Expected Improvement (EI)**:
- Default choice for most cases
- Balanced exploration/exploitation
- Robust across different problems

**Probability of Improvement (PI)**:
- More exploitative
- Good when you have a target performance
- Faster convergence, may miss global optimum

**Upper Confidence Bound (UCB)**:
- More exploratory
- Good for exploration-heavy tasks
- Principled uncertainty-based exploration

### Configuration Guidelines:

**n_initial** (random phase):
- 2D: 5-10 points
- 3-4D: 10-20 points
- 5+D: 20-30 points
- Rule of thumb: 2-5 × number of dimensions

**n_iter** (Bayesian phase):
- 20-50: Quick search
- 50-100: Standard
- 100+: Thorough search

**xi** (for EI/PI):
- 0.001: Greedy (exploitation)
- 0.01: Balanced (default)
- 0.1-0.5: Exploratory

**kappa** (for UCB):
- 1.0: Conservative (68% CI)
- 2.576: Standard (99% CI)
- 5.0+: Aggressive exploration

### Expected Performance:
- **Most sample-efficient** method
- **5-10x fewer evaluations** than grid search
- **Finds global optimum** with high probability
- **Overhead**: GP fitting (negligible for expensive models)

In [None]:
# Final summary
print("\n" + "=" * 70)
print("FINAL SUMMARY: tune_bayes()")
print("=" * 70)
print(f"\nDataset: {df.shape[0]} observations")
print(f"\nElastic Net (2D space):")
print(f"  Grid search: {n_grid_evals} evals, {grid_time:.1f}s, RMSE={grid_best_rmse:.4f}")
print(f"  Bayesian Optimization: {n_bayes_evals} evals, {bayes_time:.1f}s, RMSE={bayes_best_rmse:.4f}")
print(f"  Speedup: {grid_time / bayes_time:.2f}x")
print(f"  Efficiency: {(1 - n_bayes_evals/n_grid_evals)*100:.0f}% fewer evaluations")
print(f"\nXGBoost (4D space):")
print(f"  Bayesian Optimization: {n_xgb_evals} evals vs {equivalent_grid_size} grid evals")
print(f"  Reduction: {(1 - n_xgb_evals/equivalent_grid_size)*100:.0f}%")
print(f"\nKey advantages:")
print("  ✓ Most sample-efficient hyperparameter optimization")
print("  ✓ Gaussian Process models performance surface + uncertainty")
print("  ✓ Sequential: each evaluation informs next choice")
print("  ✓ Three acquisition functions (EI, PI, UCB)")
print("  ✓ Scales to moderate dimensions (4-10 parameters)")
print("  ✓ Ideal for expensive models (deep learning, large ensembles)")
print("\n✓ State-of-the-art hyperparameter optimization")
print("✓ Industry standard for expensive black-box optimization")