# Hyperparameter Tuning for Time Series Forecasting

This tutorial demonstrates how to optimize forecaster parameters using sktime's tuning capabilities.

**Duration:** ~10 minutes

## Learning objectives

By the end of this tutorial, you will be able to:
- Use tuners in sktime for hyperparameter optimization
- Tune parameters of simple models
- Optimize complex pipeline compositions
- Perform cross-validation of tuned models

## 1. Introduction to Tuners in sktime

sktime provides several tuning strategies for optimizing forecaster parameters.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sktime.datasets import load_airline
from sktime.forecasting.model_evaluation import evaluate
from sktime.split import ExpandingWindowSplitter
from sktime.utils.plotting import plot_series
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error

# Load data
y = load_airline()
print(f"Dataset: {y.shape[0]} observations from {y.index[0]} to {y.index[-1]}")

# Split data for final evaluation
y_train = y.iloc[:-12]
y_test = y.iloc[-12:]

print(f"Training: {len(y_train)} observations")
print(f"Test: {len(y_test)} observations")

# Plot the data
plot_series(y_train, y_test, labels=["Training", "Test"], title="Airline Dataset")
plt.show()

## 2. Tuning a Simple Model

Let's start by tuning the parameters of an exponential smoothing model.

In [None]:
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.model_selection import ForecastingGridSearchCV

# Define the base forecaster
base_forecaster = ExponentialSmoothing(sp=12)

# Define parameter grid
param_grid = {
    "trend": [None, "add", "mul"],
    "seasonal": [None, "add", "mul"],
    "damped_trend": [True, False]
}

print(f"Parameter grid combinations: {len(param_grid['trend']) * len(param_grid['seasonal']) * len(param_grid['damped_trend'])}")
print(f"Parameters to tune: {list(param_grid.keys())}")

# Set up cross-validation
cv = ExpandingWindowSplitter(
    initial_window=60,
    step_length=12,
    fh=[1, 3, 6, 12]
)

print(f"\nCV setup: {cv.get_n_splits(y_train)} folds")
print(f"Forecast horizons: {cv.fh}")

### 2.1 Perform Grid Search

In [None]:
# Create grid search tuner
tuner = ForecastingGridSearchCV(
    forecaster=base_forecaster,
    cv=cv,
    param_grid=param_grid,
    scoring="mean_absolute_percentage_error",
    n_jobs=1,  # Set to -1 for parallel processing
    refit=True
)

print("Running grid search...")
print("This may take a moment...")

# Fit the tuner
tuner.fit(y_train)

print("Grid search completed!")
print(f"Best parameters: {tuner.best_params_}")
print(f"Best score (MAPE): {tuner.best_score_:.4f}")

### 2.2 Analyze Tuning Results

In [None]:
# Get detailed results
results_df = pd.DataFrame(tuner.cv_results_)

print("Top 5 parameter combinations:")
top_results = results_df.nsmallest(5, 'mean_test_score')[[
    'param_trend', 'param_seasonal', 'param_damped_trend', 
    'mean_test_score', 'std_test_score'
]]
print(top_results)

# Visualize parameter importance
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Trend effect
trend_scores = results_df.groupby('param_trend')['mean_test_score'].mean()
trend_scores.plot(kind='bar', ax=axes[0], title='Effect of Trend Parameter')
axes[0].set_ylabel('Mean MAPE')
axes[0].tick_params(axis='x', rotation=45)

# Seasonal effect
seasonal_scores = results_df.groupby('param_seasonal')['mean_test_score'].mean()
seasonal_scores.plot(kind='bar', ax=axes[1], title='Effect of Seasonal Parameter')
axes[1].set_ylabel('Mean MAPE')
axes[1].tick_params(axis='x', rotation=45)

# Damped trend effect
damped_scores = results_df.groupby('param_damped_trend')['mean_test_score'].mean()
damped_scores.plot(kind='bar', ax=axes[2], title='Effect of Damped Trend')
axes[2].set_ylabel('Mean MAPE')
axes[2].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## 3. Tuning Pipeline Compositions

Now let's tune a more complex pipeline with transformations.

In [None]:
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.detrend import Detrender
from sktime.transformations.compose import TransformerPipeline
from sktime.forecasting.naive import NaiveForecaster

# Create a pipeline with transformations
transformer_pipeline = TransformerPipeline([
    ("boxcox", BoxCoxTransformer()),
    ("detrend", Detrender())
])

pipeline_forecaster = TransformedTargetForecaster([
    ("transformer", transformer_pipeline),
    ("forecaster", ExponentialSmoothing(sp=12))
])

# Define pipeline parameter grid
pipeline_param_grid = {
    # BoxCox parameters
    "steps__transformer__steps__boxcox__method": ["mle", "pearsonr"],
    
    # Forecaster parameters
    "steps__forecaster__trend": [None, "add"],
    "steps__forecaster__seasonal": ["add", "mul"],
    "steps__forecaster__damped_trend": [True, False]
}

print(f"Pipeline parameter combinations: {2 * 2 * 2 * 2}")
print("Parameters to tune:")
for param, values in pipeline_param_grid.items():
    print(f"  {param}: {values}")

### 3.1 Tune the Pipeline

In [None]:
# Create pipeline tuner
pipeline_tuner = ForecastingGridSearchCV(
    forecaster=pipeline_forecaster,
    cv=cv,
    param_grid=pipeline_param_grid,
    scoring="mean_absolute_percentage_error",
    n_jobs=1,
    refit=True
)

print("Running pipeline tuning...")
pipeline_tuner.fit(y_train)

print("Pipeline tuning completed!")
print(f"\nBest pipeline parameters:")
for param, value in pipeline_tuner.best_params_.items():
    print(f"  {param}: {value}")
print(f"\nBest score (MAPE): {pipeline_tuner.best_score_:.4f}")

## 4. Comparing Tuned Models

Let's compare the performance of our tuned models.

In [None]:
# Create baseline model (no tuning)
baseline = ExponentialSmoothing(trend="add", seasonal="mul", sp=12)
baseline.fit(y_train)
y_pred_baseline = baseline.predict(fh=range(1, 13))

# Get predictions from tuned models
y_pred_tuned = tuner.predict(fh=range(1, 13))
y_pred_pipeline = pipeline_tuner.predict(fh=range(1, 13))

# Calculate performance metrics
mape_baseline = mean_absolute_percentage_error(y_test, y_pred_baseline)
mape_tuned = mean_absolute_percentage_error(y_test, y_pred_tuned)
mape_pipeline = mean_absolute_percentage_error(y_test, y_pred_pipeline)

print("Model Comparison:")
print(f"Baseline (no tuning):    {mape_baseline:.4f}")
print(f"Tuned model:             {mape_tuned:.4f}")
print(f"Tuned pipeline:          {mape_pipeline:.4f}")

print(f"\nImprovement over baseline:")
print(f"Tuned model:    {((mape_baseline - mape_tuned) / mape_baseline * 100):+.1f}%")
print(f"Tuned pipeline: {((mape_baseline - mape_pipeline) / mape_baseline * 100):+.1f}%")

# Plot results
plot_series(
    y_train.iloc[-24:], y_test, 
    y_pred_baseline, y_pred_tuned, y_pred_pipeline,
    labels=["Training", "Actual", "Baseline", "Tuned", "Tuned Pipeline"],
    title="Comparison of Tuned Models"
)
plt.legend()
plt.show()

## 5. Cross-validation of Tuned Models

Let's perform a more robust evaluation using cross-validation.

In [None]:
# Set up evaluation CV (different from tuning CV)
eval_cv = ExpandingWindowSplitter(
    initial_window=72,
    step_length=6,
    fh=[1, 6, 12]
)

print(f"Evaluation CV: {eval_cv.get_n_splits(y)} folds")

# Models to evaluate
models = {
    "Baseline": ExponentialSmoothing(trend="add", seasonal="mul", sp=12),
    "Tuned_Simple": tuner.best_estimator_,
    "Tuned_Pipeline": pipeline_tuner.best_estimator_
}

# Evaluate all models
print("\nRunning cross-validation evaluation...")
cv_results = {}

for name, model in models.items():
    print(f"Evaluating {name}...")
    result = evaluate(
        forecaster=model,
        y=y,
        cv=eval_cv,
        scoring=["mean_absolute_percentage_error", "mean_absolute_error"],
        return_data=False
    )
    cv_results[name] = result

print("Cross-validation completed!")

### 5.1 Analyze Cross-validation Results

In [None]:
# Analyze CV results
print("Cross-validation Results:")
print("=" * 40)

summary_stats = []
for name, results in cv_results.items():
    if isinstance(results, dict):
        # Convert to DataFrame if needed
        results_df = pd.DataFrame(results)
    else:
        results_df = results
    
    print(f"\n{name}:")
    
    # Calculate summary statistics
    summary = results_df.describe()
    print(summary)
    
    # Store for comparison
    mean_mape = results_df['test_mean_absolute_percentage_error'].mean()
    std_mape = results_df['test_mean_absolute_percentage_error'].std()
    summary_stats.append({
        'Model': name,
        'Mean_MAPE': mean_mape,
        'Std_MAPE': std_mape,
        'CV_Score': mean_mape  # For ranking
    })

# Create comparison table
comparison_df = pd.DataFrame(summary_stats)
comparison_df = comparison_df.sort_values('CV_Score')

print("\n\nMODEL RANKING (by Cross-validation MAPE):")
print("=" * 45)
for i, row in comparison_df.iterrows():
    print(f"{row['Model']:15}: {row['Mean_MAPE']:6.4f} ± {row['Std_MAPE']:6.4f}")

# Statistical significance test (simple)
best_model = comparison_df.iloc[0]['Model']
print(f"\nBest model: {best_model}")

# Plot CV results
fig, ax = plt.subplots(figsize=(10, 6))

models_list = []
mape_values = []

for name, results in cv_results.items():
    if isinstance(results, dict):
        results_df = pd.DataFrame(results)
    else:
        results_df = results
    
    mapes = results_df['test_mean_absolute_percentage_error']
    models_list.extend([name] * len(mapes))
    mape_values.extend(mapes)

# Create box plot
cv_plot_df = pd.DataFrame({'Model': models_list, 'MAPE': mape_values})
cv_plot_df.boxplot(column='MAPE', by='Model', ax=ax)
ax.set_title('Cross-validation MAPE Distribution')
ax.set_xlabel('Model')
ax.set_ylabel('MAPE')
plt.suptitle('')  # Remove automatic title
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 6. Advanced Tuning Strategies

Beyond grid search, there are other tuning approaches.

In [None]:
print("Advanced Tuning Strategies:")
print("=" * 30)

print("\n1. RANDOMIZED SEARCH:")
print("   • Samples random combinations from parameter space")
print("   • More efficient for large parameter spaces")
print("   • Good for continuous parameters")

print("\n2. BAYESIAN OPTIMIZATION:")
print("   • Uses previous evaluations to guide search")
print("   • More efficient than grid/random search")
print("   • Available through scikit-optimize integration")

print("\n3. NESTED CROSS-VALIDATION:")
print("   • Inner loop: parameter tuning")
print("   • Outer loop: model evaluation")
print("   • Provides unbiased performance estimates")

# Demonstrate randomized search concept
from sktime.forecasting.model_selection import ForecastingRandomizedSearchCV
from scipy.stats import uniform

print("\n\nExample: Randomized Search Setup")
print("(Not executed due to time constraints)")

# Example parameter distribution for randomized search
random_param_dist = {
    "trend": [None, "add", "mul"],
    "seasonal": [None, "add", "mul"],
    "damped_trend": [True, False],
    # For models with continuous parameters:
    # "alpha": uniform(0.01, 0.99),  # Uniform distribution between 0.01 and 1.0
}

print(f"Randomized search would sample from: {random_param_dist}")

# Example setup (not executed)
# random_tuner = ForecastingRandomizedSearchCV(
#     forecaster=base_forecaster,
#     cv=cv,
#     param_distributions=random_param_dist,
#     n_iter=20,  # Number of random samples
#     scoring="mean_absolute_percentage_error",
#     random_state=42
# )

print("\nAdvantages of each approach:")
print("Grid Search:      Exhaustive, good for small parameter spaces")
print("Random Search:    Efficient, good for large/continuous spaces")
print("Bayesian Opt:     Intelligent, good for expensive evaluations")

## 7. Best Practices for Hyperparameter Tuning

Key guidelines for effective parameter optimization.

In [None]:
print("Hyperparameter Tuning Best Practices:")
print("=" * 38)

print("\n1. SEPARATE VALIDATION:")
print("   ✓ Use different CV for tuning vs. final evaluation")
print("   ✓ Never tune on your final test set")
print("   ✓ Consider nested CV for unbiased estimates")

print("\n2. PARAMETER SPACE DESIGN:")
print("   ✓ Start with wide ranges, then narrow down")
print("   ✓ Use domain knowledge to guide ranges")
print("   ✓ Consider parameter interactions")

print("\n3. COMPUTATIONAL EFFICIENCY:")
print("   ✓ Use parallel processing (n_jobs=-1)")
print("   ✓ Start with coarse grid, refine iteratively")
print("   ✓ Consider early stopping for expensive models")

print("\n4. VALIDATION STRATEGY:")
print("   ✓ Ensure CV reflects real-world usage")
print("   ✓ Use multiple metrics for comprehensive evaluation")
print("   ✓ Check for overfitting to validation set")

print("\n5. RESULT INTERPRETATION:")
print("   ✓ Analyze parameter importance")
print("   ✓ Check for parameter stability")
print("   ✓ Consider confidence intervals")

# Practical example of parameter analysis
print("\n\nPractical Tips:")
print("=" * 15)

print("\nParameter Stability Check:")
# Get top 3 parameter combinations
if hasattr(tuner, 'cv_results_'):
    results_df = pd.DataFrame(tuner.cv_results_)
    top_3 = results_df.nsmallest(3, 'mean_test_score')
    
    print("Top 3 parameter combinations:")
    for i, (_, row) in enumerate(top_3.iterrows()):
        score_diff = row['mean_test_score'] - results_df['mean_test_score'].min()
        print(f"{i+1}. Score: {row['mean_test_score']:.4f} (+{score_diff:.4f})")
        print(f"   Parameters: trend={row['param_trend']}, "
              f"seasonal={row['param_seasonal']}, "
              f"damped={row['param_damped_trend']}")
    
    score_range = results_df['mean_test_score'].max() - results_df['mean_test_score'].min()
    print(f"\nScore range: {score_range:.4f}")
    if score_range < 0.001:
        print("→ Parameters have small impact, focus on other aspects")
    else:
        print("→ Parameter choice matters, tuning is worthwhile")

print("\nCommon Pitfalls to Avoid:")
print("❌ Tuning on test set")
print("❌ Using same CV for tuning and evaluation")
print("❌ Over-interpreting small performance differences")
print("❌ Ignoring computational cost vs. performance trade-offs")
print("❌ Not validating final model on truly unseen data")

## Summary

In this tutorial, you learned:

1. **Tuning Basics**: Using `ForecastingGridSearchCV` for parameter optimization
2. **Simple Model Tuning**: Optimizing exponential smoothing parameters
3. **Pipeline Tuning**: Tuning complex compositions with transformations
4. **Model Comparison**: Evaluating tuned vs. baseline models
5. **Cross-validation**: Robust evaluation of tuned models
6. **Advanced Strategies**: Randomized search and Bayesian optimization concepts
7. **Best Practices**: Guidelines for effective hyperparameter tuning

## Key Takeaways

- **Separate Validation**: Never tune on your final test set
- **Computational Efficiency**: Balance thoroughness with computational cost
- **Parameter Analysis**: Understand which parameters matter most
- **Robust Evaluation**: Use cross-validation for reliable performance estimates
- **Practical Impact**: Consider whether tuning improvements are meaningful

## Next Steps

- Explore "Probabilistic Forecasting" to extend tuning to uncertainty quantification
- Learn "Global Forecasting" for advanced model architectures
- Try "Ensemble Methods" to combine multiple tuned models