# Advanced Tuning: Simulated Annealing (tune_sim_anneal)

This notebook demonstrates **simulated annealing** for hyperparameter optimization.

## Key Benefits:
- **Global optimization**: Escapes local optima via probabilistic acceptance
- **Efficient exploration**: Focuses on promising regions over time
- **Continuous spaces**: Excellent for continuous hyperparameters
- **Fewer evaluations**: More efficient than grid search for large spaces

## Simulated Annealing Algorithm:
1. Start with random (or provided) initial configuration
2. Generate neighbor by perturbing current parameters
3. Accept better neighbors always
4. Accept worse neighbors with probability exp(Δ/T)
5. Decrease temperature T over iterations (cooling)
6. Stop after max iterations or no improvement

## Temperature & Acceptance:
- **High temperature**: Accept most moves (exploration)
- **Low temperature**: Accept only improvements (exploitation)
- **Cooling schedule**: Controls exploration → exploitation transition

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import time
warnings.filterwarnings('ignore')

# py-tidymodels imports
from py_workflows import workflow
from py_parsnip import linear_reg, boost_tree, svm_rbf
from py_rsample import vfold_cv
from py_yardstick import metric_set, rmse, mae, r_squared
from py_tune import (
    tune, grid_regular, tune_grid,
    tune_sim_anneal, control_sim_anneal
)

print("✓ All imports successful")

## Load and Prepare Data

In [None]:
# Load data
df = pd.read_csv('../_md/__data/preem.csv')
# Convert and save date range before dropping
df['date'] = pd.to_datetime(df['date'])
date_min, date_max = df['date'].min(), df['date'].max()
df = df.drop(columns=['date'])  # Drop date to avoid patsy categorical issues

print(f"Dataset shape: {df.shape}")
print(f"Date range: {date_min} to {date_max}")

df.head()

In [None]:
# Define formula
FORMULA = "target ~ totaltar + mean_med_diesel_crack_input1_trade_month_lag2 + mean_nwe_hsfo_crack_trade_month_lag1 + mean_nwe_lsfo_crack_trade_month"

print(f"Formula: {FORMULA}")

## 1. Linear Regression with Regularization

### 1.1 Setup

In [None]:
# Elastic Net workflow (2D parameter space)
wf_elasticnet = (
    workflow()
    .add_formula(FORMULA)
    .add_model(
        linear_reg(
            penalty=tune(),   # L1/L2 regularization strength
            mixture=tune()    # L1/L2 mixture (0=ridge, 1=lasso)
        )
    )
)

# Define parameter space
param_info = {
    'penalty': {'range': (0.001, 10.0), 'trans': 'log'},
    'mixture': {'range': (0.0, 1.0)}  # No transformation
}

print("Parameter space:")
for param, info in param_info.items():
    print(f"  {param}: {info['range']} (trans: {info.get('trans', 'none')})")

In [None]:
# Create CV folds
cv_folds = vfold_cv(df, v=5)
print(f"Created {len(cv_folds)} CV folds")

### 1.2 Baseline: Grid Search

In [None]:
# Grid search for comparison
grid = grid_regular(param_info, levels=10)  # 100 combinations

print(f"Running grid search with {len(grid)} combinations...")
start_time = time.time()

grid_results = tune_grid(
    wf_elasticnet,
    resamples=cv_folds,
    grid=grid,
    metrics=metric_set(rmse, mae)
)

grid_time = time.time() - start_time
grid_best_rmse = grid_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]

print(f"✓ Grid search: {grid_time:.1f}s, best RMSE: {grid_best_rmse:.4f}")

### 1.3 Simulated Annealing with Exponential Cooling

In [None]:
# Configure simulated annealing
ctrl_exp = control_sim_anneal(
    initial_temp=2.0,           # Starting temperature
    cooling_schedule='exponential',  # T = T0 * rate^iteration
    cooling_rate=0.95,          # Decay factor
    max_iter=50,                # Maximum iterations
    no_improve=15,              # Stop if no improvement for 15 iterations
    restart_after=None,         # No restarts
    verbose=True
)

print("Simulated Annealing Configuration:")
print(f"  Initial temperature: {ctrl_exp.initial_temp}")
print(f"  Cooling schedule: {ctrl_exp.cooling_schedule}")
print(f"  Cooling rate: {ctrl_exp.cooling_rate}")
print(f"  Max iterations: {ctrl_exp.max_iter}")
print(f"  Early stopping: {ctrl_exp.no_improve} iterations without improvement")

In [None]:
# Run simulated annealing
print("\nRunning simulated annealing with exponential cooling...\n")
start_time = time.time()

sa_exp_results = tune_sim_anneal(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse, mae),
    control=ctrl_exp
)

sa_exp_time = time.time() - start_time
print(f"\n✓ Simulated annealing complete in {sa_exp_time:.1f} seconds")

In [None]:
# Compare results
sa_best_rmse = sa_exp_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]
n_sa_evals = len(sa_exp_results.grid)
n_grid_evals = len(grid)

print("=" * 60)
print("COMPARISON: Grid Search vs Simulated Annealing")
print("=" * 60)
print(f"\nGrid search:")
print(f"  Evaluations: {n_grid_evals}")
print(f"  Time: {grid_time:.1f}s")
print(f"  Best RMSE: {grid_best_rmse:.4f}")
print(f"\nSimulated Annealing:")
print(f"  Evaluations: {n_sa_evals}")
print(f"  Time: {sa_exp_time:.1f}s")
print(f"  Best RMSE: {sa_best_rmse:.4f}")
print(f"\nEfficiency:")
print(f"  Speedup: {grid_time / sa_exp_time:.2f}x")
print(f"  Evaluation reduction: {(1 - n_sa_evals/n_grid_evals)*100:.1f}%")
print(f"  Performance ratio: {sa_best_rmse / grid_best_rmse:.4f}")

if sa_best_rmse <= grid_best_rmse * 1.01:
    print("\n✓ SA found comparable or better solution with fewer evaluations!")

### 1.4 Visualize Search Progress

In [None]:
# Extract search trajectory
sa_metrics = sa_exp_results.metrics
rmse_values = sa_metrics[sa_metrics['metric'] == 'rmse'].groupby('.config')['value'].mean()
configs = [int(c.split('_')[1]) for c in rmse_values.index]

# Plot convergence
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# RMSE over iterations
ax1.plot(configs, rmse_values.values, 'o-', alpha=0.6, label='Evaluated')
ax1.plot(configs, np.minimum.accumulate(rmse_values.values), 'r-', linewidth=2, label='Best so far')
ax1.axhline(grid_best_rmse, color='green', linestyle='--', label='Grid search best')
ax1.set_xlabel('Iteration')
ax1.set_ylabel('RMSE')
ax1.set_title('Simulated Annealing Convergence')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Parameter space exploration
penalty_values = sa_exp_results.grid['penalty'].values
mixture_values = sa_exp_results.grid['mixture'].values
scatter = ax2.scatter(penalty_values, mixture_values, c=rmse_values.values, 
                     cmap='viridis', s=100, edgecolors='black', linewidth=0.5)
ax2.set_xscale('log')
ax2.set_xlabel('Penalty (log scale)')
ax2.set_ylabel('Mixture')
ax2.set_title('Parameter Space Exploration')
plt.colorbar(scatter, ax=ax2, label='RMSE')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✓ Left plot: RMSE decreases over iterations")
print("✓ Right plot: Parameter space exploration (darker = better)")

## 2. Different Cooling Schedules

### 2.1 Linear Cooling

In [None]:
# Linear cooling: T = T0 - rate * iteration
ctrl_linear = control_sim_anneal(
    initial_temp=2.0,
    cooling_schedule='linear',
    cooling_rate=0.04,  # Decrease by 0.04 each iteration
    max_iter=50,
    no_improve=15,
    verbose=False
)

print("Running simulated annealing with linear cooling...")
start_time = time.time()

sa_linear_results = tune_sim_anneal(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse),
    control=ctrl_linear
)

sa_linear_time = time.time() - start_time
print(f"✓ Linear cooling: {sa_linear_time:.1f}s, {len(sa_linear_results.grid)} evaluations")

### 2.2 Logarithmic Cooling

In [None]:
# Logarithmic cooling: T = T0 / (1 + rate * log(1 + iteration))
ctrl_log = control_sim_anneal(
    initial_temp=2.0,
    cooling_schedule='logarithmic',
    cooling_rate=0.5,
    max_iter=50,
    no_improve=15,
    verbose=False
)

print("Running simulated annealing with logarithmic cooling...")
start_time = time.time()

sa_log_results = tune_sim_anneal(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse),
    control=ctrl_log
)

sa_log_time = time.time() - start_time
print(f"✓ Logarithmic cooling: {sa_log_time:.1f}s, {len(sa_log_results.grid)} evaluations")

### 2.3 Compare Cooling Schedules

In [None]:
# Extract best RMSE for each schedule
exp_best = sa_exp_results.select_best(metric="rmse", maximize=False)
linear_best = sa_linear_results.select_best(metric="rmse", maximize=False)
log_best = sa_log_results.select_best(metric="rmse", maximize=False)

exp_rmse = sa_exp_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]
linear_rmse = sa_linear_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]
log_rmse = sa_log_results.show_best(metric="rmse", n=1, maximize=False)['mean'].values[0]

print("=" * 60)
print("COOLING SCHEDULE COMPARISON")
print("=" * 60)
print(f"\nExponential:  RMSE = {exp_rmse:.4f}, {len(sa_exp_results.grid):3d} evals, {sa_exp_time:5.1f}s")
print(f"Linear:       RMSE = {linear_rmse:.4f}, {len(sa_linear_results.grid):3d} evals, {sa_linear_time:5.1f}s")
print(f"Logarithmic:  RMSE = {log_rmse:.4f}, {len(sa_log_results.grid):3d} evals, {sa_log_time:5.1f}s")

print("\nCharacteristics:")
print("  Exponential: Fast initial cooling, good for quick convergence")
print("  Linear: Steady cooling, balanced exploration/exploitation")
print("  Logarithmic: Slow cooling, thorough exploration (longer runs)")

## 3. XGBoost with Simulated Annealing

Test on high-dimensional parameter space.

In [None]:
# XGBoost workflow (4D parameter space)
wf_xgb = (
    workflow()
    .add_formula(FORMULA)
    .add_model(
        boost_tree(
            trees=tune(),
            tree_depth=tune(),
            learn_rate=tune(),
            min_n=tune()
        ).set_mode("regression").set_engine("xgboost")
    )
)

# 4D parameter space
xgb_param_info = {
    'trees': {'range': (50, 300), 'type': 'int'},
    'tree_depth': {'range': (3, 10), 'type': 'int'},
    'learn_rate': {'range': (0.001, 0.3), 'trans': 'log'},
    'min_n': {'range': (2, 40), 'type': 'int'}
}

print("XGBoost parameter space (4D):")
for param, info in xgb_param_info.items():
    print(f"  {param}: {info['range']}")

In [None]:
# Run simulated annealing on XGBoost
xgb_ctrl = control_sim_anneal(
    initial_temp=1.5,
    cooling_schedule='exponential',
    cooling_rate=0.93,
    max_iter=40,
    no_improve=12,
    verbose=True
)

print("\nRunning simulated annealing on XGBoost (4D)...\n")
start_time = time.time()

xgb_sa_results = tune_sim_anneal(
    wf_xgb,
    resamples=cv_folds,
    param_info=xgb_param_info,
    metrics=metric_set(rmse, mae, r_squared),
    control=xgb_ctrl
)

xgb_sa_time = time.time() - start_time
print(f"\n✓ XGBoost SA complete in {xgb_sa_time:.1f} seconds")

In [None]:
# Show best XGBoost configurations
print("Top 5 XGBoost configurations:")
xgb_sa_results.show_best(metric="rmse", n=5, maximize=False)

In [None]:
# Compare with grid search equivalence
n_xgb_evals = len(xgb_sa_results.grid)
equivalent_grid_size = 5 ** 4  # 5 levels for 4 parameters

print(f"\nXGBoost efficiency:")
print(f"  Simulated Annealing: {n_xgb_evals} evaluations")
print(f"  Equivalent grid (5^4): {equivalent_grid_size} evaluations")
print(f"  Reduction: {(1 - n_xgb_evals/equivalent_grid_size)*100:.1f}%")
print(f"\n✓ SA scales well to high-dimensional spaces")

## 4. Advanced Features

### 4.1 Restart Mechanism

In [None]:
# Simulated annealing with restarts
ctrl_restart = control_sim_anneal(
    initial_temp=2.0,
    cooling_schedule='exponential',
    cooling_rate=0.95,
    max_iter=60,
    no_improve=20,
    restart_after=10,  # Restart from best after 10 iterations without improvement
    verbose=True
)

print("Running SA with restart mechanism...\n")

sa_restart_results = tune_sim_anneal(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    metrics=metric_set(rmse),
    control=ctrl_restart
)

print("\n✓ SA with restarts allows escaping local optima")

### 4.2 Custom Initial Point

In [None]:
# Start from domain knowledge or prior results
initial_point = {
    'penalty': 1.0,  # Start with moderate regularization
    'mixture': 0.5   # Start with equal L1/L2 mix
}

print(f"Running SA from custom initial point: {initial_point}\n")

sa_custom_results = tune_sim_anneal(
    wf_elasticnet,
    resamples=cv_folds,
    param_info=param_info,
    initial=initial_point,  # Provide starting point
    metrics=metric_set(rmse),
    control=control_sim_anneal(max_iter=30, verbose=False)
)

print("\n✓ Custom starting points can leverage domain knowledge")
print("✓ Useful for refinement after grid search")

## 5. Summary and Best Practices

### When to use Simulated Annealing:
- ✓ **Large continuous parameter spaces** (expensive to grid search)
- ✓ **High-dimensional** spaces (3+ parameters)
- ✓ **Non-convex** optimization landscapes (many local optima)
- ✓ **Budget-constrained** tuning (limited evaluations)

### Cooling Schedule Selection:
- **Exponential**: Fast convergence, good for most cases
- **Linear**: Balanced, predictable behavior
- **Logarithmic**: Thorough exploration, longer runs

### Configuration Guidelines:

**initial_temp**:
- 1.0-2.0: Standard (moderate exploration)
- 3.0-5.0: High (more exploration, escapes local optima)
- 0.5-1.0: Low (less exploration, faster convergence)

**cooling_rate**:
- Exponential: 0.90-0.99 (higher = slower cooling)
- Linear: 0.01-0.1 (larger = faster cooling)
- Logarithmic: 0.1-1.0

**max_iter**:
- 30-50: Quick search
- 50-100: Standard
- 100+: Thorough search

**restart_after**:
- None: No restarts (faster)
- 10-20: Moderate (escape local optima)
- Used when landscape has many valleys

### Expected Performance:
- **2-10x faster** than grid search in high dimensions
- **Finds near-optimal solutions** with fewer evaluations
- **Stochastic**: Results vary between runs (set seed for reproducibility)

In [None]:
# Final summary
print("\n" + "=" * 70)
print("FINAL SUMMARY: tune_sim_anneal()")
print("=" * 70)
print(f"\nDataset: {df.shape[0]} observations")
print(f"\nElastic Net (2D space):")
print(f"  Grid search: {n_grid_evals} evals, {grid_time:.1f}s, RMSE={grid_best_rmse:.4f}")
print(f"  Simulated Annealing: {n_sa_evals} evals, {sa_exp_time:.1f}s, RMSE={sa_best_rmse:.4f}")
print(f"  Speedup: {grid_time / sa_exp_time:.2f}x")
print(f"\nXGBoost (4D space):")
print(f"  Simulated Annealing: {n_xgb_evals} evals vs {equivalent_grid_size} grid evals")
print(f"  Reduction: {(1 - n_xgb_evals/equivalent_grid_size)*100:.0f}%")
print(f"\nKey advantages:")
print("  ✓ Efficient for continuous parameter spaces")
print("  ✓ Escapes local optima via probabilistic acceptance")
print("  ✓ Scales to high-dimensional problems")
print("  ✓ Flexible cooling schedules for different scenarios")
print("  ✓ Can start from prior knowledge (warm start)")
print("\n✓ Excellent alternative to exhaustive grid search")
print("✓ Combine with Bayesian optimization for best results")