# Pipelines in Time Series Forecasting

This tutorial demonstrates how to build robust forecasting pipelines in sktime to avoid data leakage and ensure reproducibility.

**Duration:** ~10 minutes

## Learning objectives

By the end of this tutorial, you will be able to:
- Understand the motivation for using pipelines
- Build pipelines with target transformations
- Create pipelines with exogenous variable transformations
- Compose complex pipelines with multiple transformation types
- Use `get_params` and `set_params` for pipeline inspection and tuning

## 1. Motivation for Pipelines

Pipelines are essential for:
- **Avoiding Data Leakage**: Transformations are fit only on training data
- **Reproducibility**: Consistent preprocessing across different datasets
- **Maintainability**: Clear separation of preprocessing and modeling steps
- **Parameter Tuning**: Unified interface for optimizing all components

In [None]:
import matplotlib.pyplot as plt

from sktime.datasets import load_airline, load_longley
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
from sktime.utils.plotting import plot_series

# Load datasets
y = load_airline()
y_longley, X_longley = load_longley(return_X_y=True)

print("Datasets loaded:")
print(f"Airline: {y.shape} (univariate)")
print(f"Longley: y={y_longley.shape}, X={X_longley.shape} (with exogenous variables)")

# Split airline data
y_train = y.iloc[:-12]
y_test = y.iloc[-12:]

# Split Longley data
split_point = -4
y_longley_train, y_longley_test = (
    y_longley.iloc[:split_point],
    y_longley.iloc[split_point:],
)
X_longley_train, X_longley_test = (
    X_longley.iloc[:split_point],
    X_longley.iloc[split_point:],
)

## 2. Target Transformations with Pipelines

Let's start with pipelines that transform the target variable.

In [None]:
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.naive import NaiveForecaster
from sktime.transformations.compose import TransformerPipeline
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.detrend import Detrender

# Create a target transformation pipeline
target_transformer = TransformerPipeline(
    [("boxcox", BoxCoxTransformer(method="mle")), ("detrend", Detrender())]
)

# Create forecaster with transformed target
forecaster_with_target_transform = TransformedTargetForecaster(
    [
        ("transform", target_transformer),
        ("forecaster", NaiveForecaster(strategy="seasonal_last", sp=12)),
    ]
)

print(f"Pipeline created: {forecaster_with_target_transform}")

### 2.1 Fit and Predict with Target Transformation

In [None]:
# Fit the pipeline
forecaster_with_target_transform.fit(y_train)

# Make predictions (transformations are automatically inverted)
y_pred_transformed = forecaster_with_target_transform.predict(fh=range(1, 13))

# Compare with simple forecaster
forecaster_simple = NaiveForecaster(strategy="seasonal_last", sp=12)
forecaster_simple.fit(y_train)
y_pred_simple = forecaster_simple.predict(fh=range(1, 13))

# Calculate performance
mape_simple = mean_absolute_percentage_error(y_test, y_pred_simple)
mape_transformed = mean_absolute_percentage_error(y_test, y_pred_transformed)

print(f"MAPE - Simple: {mape_simple:.2%}")
print(f"MAPE - With Target Transform: {mape_transformed:.2%}")
print(f"Improvement: {((mape_simple - mape_transformed) / mape_simple * 100):.1f}%")

# Plot results
plot_series(
    y_train.iloc[-24:],
    y_test,
    y_pred_simple,
    y_pred_transformed,
    labels=["Training", "Actual", "Simple", "Target Transformed"],
    title="Target Transformation Pipeline",
)
plt.legend()
plt.show()

## 3. Exogenous Variable Transformations

Now let's create pipelines that transform exogenous variables.

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

from sktime.forecasting.arima import AutoARIMA
from sktime.forecasting.compose import ForecastingPipeline
from sktime.transformations.series.adapt import TabularToSeriesAdaptor

# Create exogenous variable transformation pipeline
X_transformer = TabularToSeriesAdaptor(
    transformer=StandardScaler()  # Standardize exogenous variables
)

# Create forecasting pipeline with X transformation
forecaster_with_X_transform = ForecastingPipeline(
    [("X_transform", X_transformer), ("forecaster", AutoARIMA(suppress_warnings=True))]
)

print(f"Pipeline with X transformation: {forecaster_with_X_transform}")

### 3.1 Fit and Predict with Exogenous Transformations

In [None]:
# Fit pipeline with exogenous variables
forecaster_with_X_transform.fit(y_longley_train, X=X_longley_train)

# Make predictions
fh_longley = range(1, len(y_longley_test) + 1)
y_pred_X_transformed = forecaster_with_X_transform.predict(
    fh=fh_longley, X=X_longley_test
)

# Compare with non-transformed
forecaster_no_transform = AutoARIMA(suppress_warnings=True)
forecaster_no_transform.fit(y_longley_train, X=X_longley_train)
y_pred_no_transform = forecaster_no_transform.predict(fh=fh_longley, X=X_longley_test)

# Calculate performance
mape_no_transform = mean_absolute_percentage_error(y_longley_test, y_pred_no_transform)
mape_X_transformed = mean_absolute_percentage_error(
    y_longley_test, y_pred_X_transformed
)

print(f"MAPE - No X Transform: {mape_no_transform:.2%}")
print(f"MAPE - X Transformed: {mape_X_transformed:.2%}")
print(
    f"Improvement: {((mape_no_transform - mape_X_transformed) / mape_no_transform * 100):.1f}%"
)

# Plot results
plot_series(
    y_longley_train.iloc[-8:],
    y_longley_test,
    y_pred_no_transform,
    y_pred_X_transformed,
    labels=["Training", "Actual", "No Transform", "X Transformed"],
    title="Exogenous Variable Transformation Pipeline",
)
plt.legend()
plt.show()

## 4. Complex Composition: Both Target and Exogenous Transformations

Let's create a pipeline that transforms both the target and exogenous variables.

In [None]:
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.difference import Differencer

# Create comprehensive transformation pipeline

# 1. Target transformations
y_transformer = TransformerPipeline(
    [("boxcox", BoxCoxTransformer(method="mle")), ("diff", Differencer(lags=1))]
)

# 2. Exogenous transformations with dimensionality reduction
X_transformer_advanced = TabularToSeriesAdaptor(
    transformer=PCA(n_components=2)  # Reduce to 2 components
)

# 3. Base forecasting pipeline with X transformation
base_pipeline = ForecastingPipeline(
    [
        ("X_transform", X_transformer_advanced),
        ("forecaster", AutoARIMA(suppress_warnings=True)),
    ]
)

# 4. Complete pipeline with both transformations
complete_pipeline = TransformedTargetForecaster(
    [("y_transform", y_transformer), ("pipeline", base_pipeline)]
)

print(f"Complete pipeline: {complete_pipeline}")

### 4.1 Fit and Evaluate Complete Pipeline

In [None]:
# Fit complete pipeline
complete_pipeline.fit(y_longley_train, X=X_longley_train)

# Make predictions
y_pred_complete = complete_pipeline.predict(fh=fh_longley, X=X_longley_test)

# Calculate performance
mape_complete = mean_absolute_percentage_error(y_longley_test, y_pred_complete)

print("Performance Comparison:")
print(f"No Transform:      {mape_no_transform:.2%}")
print(f"X Transform Only:  {mape_X_transformed:.2%}")
print(f"Complete Pipeline: {mape_complete:.2%}")

# Plot all results
plot_series(
    y_longley_train.iloc[-8:],
    y_longley_test,
    y_pred_no_transform,
    y_pred_X_transformed,
    y_pred_complete,
    labels=["Training", "Actual", "No Transform", "X Transform", "Complete Pipeline"],
    title="Pipeline Comparison",
)
plt.legend()
plt.show()

## 5. Pipeline Inspection with get_params and set_params

Pipelines provide a unified interface for parameter inspection and modification.

In [None]:
# Inspect pipeline parameters
params = complete_pipeline.get_params()

print("Pipeline Parameters (first 10):")
for i, (key, value) in enumerate(list(params.items())[:10]):
    print(f"{key}: {value}")

print(f"\nTotal parameters: {len(params)}")

# Show parameter structure
print("\nParameter Structure (key patterns):")
key_patterns = set([key.split("__")[0] for key in params.keys()])
for pattern in sorted(key_patterns):
    matching_keys = [k for k in params.keys() if k.startswith(pattern)]
    print(f"{pattern}: {len(matching_keys)} parameters")

### 5.1 Modifying Pipeline Parameters

In [None]:
# Find specific parameters we can modify
print("Modifiable Parameters Examples:")

# Look for specific parameter types
diff_params = [k for k in params.keys() if "diff" in k.lower()]
pca_params = [k for k in params.keys() if "pca" in k.lower()]
arima_params = [k for k in params.keys() if "autoarima" in k.lower()]

print(f"\nDifferencing parameters: {diff_params[:3]}...")  # Show first 3
print(f"PCA parameters: {pca_params[:3]}...")  # Show first 3
print(f"ARIMA parameters: {arima_params[:3]}...")  # Show first 3

# Modify some parameters
new_params = {
    "steps__pipeline__steps__X_transform__transformer__n_components": 3,  # Change PCA components
}

# Create new pipeline with modified parameters
modified_pipeline = complete_pipeline.set_params(**new_params)

print("\nModified PCA components to 3")
print(
    f"New parameter value: {modified_pipeline.get_params()['steps__pipeline__steps__X_transform__transformer__n_components']}"
)

### 5.2 Parameter Naming Convention

In [None]:
# Demonstrate parameter naming convention
print("Parameter Naming Convention in sktime Pipelines:")
print("\nFormat: steps__<step_name>__<parameter_name>")
print("For nested pipelines: steps__<step1>__steps__<step2>__<parameter>")

# Show examples from our pipeline
example_params = [
    "steps__y_transform__steps__boxcox__method",
    "steps__y_transform__steps__diff__lags",
    "steps__pipeline__steps__X_transform__transformer__n_components",
    "steps__pipeline__steps__forecaster__suppress_warnings",
]

print("\nExamples from our pipeline:")
for param in example_params:
    if param in params:
        print(f"{param}: {params[param]}")
    else:
        # Try to find similar parameter
        similar = [k for k in params.keys() if param.split("__")[-1] in k]
        if similar:
            print(f"{param}: Not found, similar: {similar[0]}")
        else:
            print(f"{param}: Not found")

## 6. Advanced Pipeline Patterns

Let's explore some advanced pipeline construction patterns.

In [None]:
# Pattern 1: Conditional transformations
from sktime.transformations.series.outlier_detection import HampelFilter

# Create a robust pipeline with outlier detection
robust_pipeline = TransformedTargetForecaster(
    [
        ("outlier_detection", HampelFilter(window_length=5)),
        ("boxcox", BoxCoxTransformer()),
        ("forecaster", NaiveForecaster(strategy="seasonal_last", sp=12)),
    ]
)

print(f"Robust pipeline: {robust_pipeline}")

# Pattern 2: Feature selection pipeline
from sklearn.feature_selection import SelectKBest, f_regression

feature_selection_pipeline = ForecastingPipeline(
    [
        ("scale", TabularToSeriesAdaptor(StandardScaler())),
        ("select", TabularToSeriesAdaptor(SelectKBest(f_regression, k=2))),
        ("forecaster", AutoARIMA(suppress_warnings=True)),
    ]
)

print(f"Feature selection pipeline: {feature_selection_pipeline}")


# Pattern 3: Ensemble-ready pipeline
def create_forecasting_pipeline(transformer_config, forecaster_config):
    """Factory function for creating standardized pipelines"""
    transformers = []

    if transformer_config.get("boxcox", False):
        transformers.append(("boxcox", BoxCoxTransformer()))

    if transformer_config.get("detrend", False):
        transformers.append(("detrend", Detrender()))

    if transformer_config.get("diff_lags"):
        transformers.append(("diff", Differencer(lags=transformer_config["diff_lags"])))

    if transformers:
        target_transform = TransformerPipeline(transformers)
        return TransformedTargetForecaster(
            [
                ("transform", target_transform),
                ("forecaster", forecaster_config["forecaster"]),
            ]
        )
    else:
        return forecaster_config["forecaster"]


# Create different pipeline configurations
configs = [
    {
        "name": "Simple",
        "transformer": {"boxcox": False, "detrend": False},
        "forecaster": {"forecaster": NaiveForecaster(strategy="seasonal_last", sp=12)},
    },
    {
        "name": "BoxCox",
        "transformer": {"boxcox": True, "detrend": False},
        "forecaster": {"forecaster": NaiveForecaster(strategy="seasonal_last", sp=12)},
    },
    {
        "name": "Full",
        "transformer": {"boxcox": True, "detrend": True, "diff_lags": 1},
        "forecaster": {"forecaster": NaiveForecaster(strategy="seasonal_last", sp=12)},
    },
]

pipelines = {}
for config in configs:
    pipelines[config["name"]] = create_forecasting_pipeline(
        config["transformer"], config["forecaster"]
    )

print("\nCreated pipeline variants:")
for name, pipeline in pipelines.items():
    print(f"{name}: {type(pipeline).__name__}")

## 7. Pipeline Best Practices

Here are key best practices for building effective pipelines:

In [None]:
print("Pipeline Best Practices:")

print("\n1. DATA LEAKAGE PREVENTION:")
print("   ✓ Always fit transformers on training data only")
print("   ✓ Use pipelines to ensure consistent train/test preprocessing")
print("   ✓ Never use future information in transformations")

print("\n2. TRANSFORMATION ORDER:")
print("   ✓ Outlier detection → Variance stabilization → Detrending → Differencing")
print("   ✓ Feature engineering before dimensionality reduction")
print("   ✓ Scaling after feature creation")

print("\n3. PARAMETER MANAGEMENT:")
print("   ✓ Use get_params() for inspection")
print("   ✓ Use set_params() for tuning")
print("   ✓ Follow naming convention: steps__<step>__<param>")

print("\n4. ROBUSTNESS:")
print("   ✓ Handle missing values appropriately")
print("   ✓ Consider outlier detection for noisy data")
print("   ✓ Test pipelines on different data splits")

print("\n5. MAINTAINABILITY:")
print("   ✓ Use descriptive step names")
print("   ✓ Create reusable pipeline factories")
print("   ✓ Document transformation rationale")

# Demonstrate validation
print("\n6. VALIDATION EXAMPLE:")

# Test pipeline on different splits
validation_results = {}
for name, pipeline in pipelines.items():
    try:
        # Fit on training data
        pipeline.fit(y_train)
        # Predict on test data
        pred = pipeline.predict(fh=range(1, 13))
        # Calculate error
        mape = mean_absolute_percentage_error(y_test, pred)
        validation_results[name] = mape
        print(f"   {name}: MAPE = {mape:.2%}")
    except Exception as e:
        print(f"   {name}: Error - {str(e)[:50]}...")

if validation_results:
    best_pipeline = min(validation_results.items(), key=lambda x: x[1])
    print(f"\n   Best pipeline: {best_pipeline[0]} (MAPE: {best_pipeline[1]:.2%})")

## Summary

In this tutorial, you learned:

1. **Pipeline Motivation**: Why pipelines are essential for robust forecasting
2. **Target Transformations**: Using `TransformedTargetForecaster` for y transformations
3. **Exogenous Transformations**: Using `ForecastingPipeline` for X transformations
4. **Complex Composition**: Combining both target and exogenous transformations
5. **Parameter Management**: Using `get_params` and `set_params` for inspection and tuning
6. **Advanced Patterns**: Factory functions and configuration-driven pipelines
7. **Best Practices**: Guidelines for building robust and maintainable pipelines

## Key Takeaways

- **Data Leakage Prevention**: Pipelines ensure transformations are fit on training data only
- **Reproducibility**: Consistent preprocessing across different datasets and experiments
- **Unified Interface**: Single point for parameter tuning and model management
- **Composability**: Easy to combine different transformation and forecasting components
- **Maintainability**: Clear separation of concerns and reusable patterns

## Next Steps

- Learn "Cross-validation and Metrics" for robust pipeline evaluation
- Explore "Hyperparameter Tuning" to optimize pipeline parameters
- Try "Probabilistic Forecasting" to extend pipelines with uncertainty quantification