# Forecasting with Exogenous Variables

This tutorial demonstrates how to perform forecasting with exogenous (external) variables in sktime.

**Duration:** ~10 minutes

## Learning objectives

By the end of this tutorial, you will be able to:
- Work with datasets containing exogenous variables
- Use `all_estimators` to find forecasters that support exogenous variables
- Apply AutoREG for forecasting with exogenous variables
- Understand the versatility of modern forecasters like Chronos

## 1. Loading Dataset with Exogenous Variables

Let's load a dataset that includes both the target variable and exogenous variables.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sktime.datasets import load_longley
from sktime.utils.plotting import plot_series

# Load Longley dataset (economic data with multiple variables)
y, X = load_longley(return_X_y=True)

print(f"Target variable (y) shape: {y.shape}")
print(f"Exogenous variables (X) shape: {X.shape}")
print(f"\nTarget variable (Employment):")
print(y.head())
print(f"\nExogenous variables:")
print(X.head())

## 2. Exploring the Dataset

In [None]:
# Plot the target variable
plt.figure(figsize=(12, 4))
plot_series(y, title="Employment (Target Variable)")
plt.show()

# Plot some exogenous variables
fig, axes = plt.subplots(2, 2, figsize=(15, 8))
axes = axes.ravel()

for i, col in enumerate(X.columns[:4]):
    plot_series(X[col], ax=axes[i], title=f"{col}")

plt.tight_layout()
plt.show()

## 3. Finding Forecasters that Support Exogenous Variables

Let's use `all_estimators` to discover which forecasters support exogenous variables.

In [None]:
from sktime.registry import all_estimators

# Get all forecasters that support exogenous variables
forecasters_with_X = all_estimators(
    estimator_types="forecaster",
    filter_tags={"X-y-must-have-same-index": True},
    return_names=False
)

print(f"Found {len(forecasters_with_X)} forecasters supporting exogenous variables:")
for i, forecaster_class in enumerate(forecasters_with_X[:10]):  # Show first 10
    print(f"{i+1}. {forecaster_class.__name__}")

if len(forecasters_with_X) > 10:
    print(f"... and {len(forecasters_with_X) - 10} more")

## 4. Preparing Data for Forecasting

Let's split our data and prepare it for forecasting.

In [None]:
# Split data: use first 12 observations for training, last 4 for testing
split_point = -4
y_train, y_test = y.iloc[:split_point], y.iloc[split_point:]
X_train, X_test = X.iloc[:split_point], X.iloc[split_point:]

print(f"Training period: {y_train.index[0]} to {y_train.index[-1]} ({len(y_train)} observations)")
print(f"Test period: {y_test.index[0]} to {y_test.index[-1]} ({len(y_test)} observations)")

# Plot training and test data
plot_series(y_train, y_test, labels=["Training", "Test"], title="Train/Test Split")
plt.show()

## 5. Forecasting with AutoREG

AutoREG (Autoregressive model with eXogenous variables) is a simple but effective forecaster that can use exogenous variables.

In [None]:
from sktime.forecasting.statsforecast import StatsForecastAutoARIMA
# If StatsForecast is not available, we'll use a simple alternative
try:
    from sktime.forecasting.statsforecast import StatsForecastAutoARIMA as AutoREG
except ImportError:
    from sktime.forecasting.arima import AutoARIMA as AutoREG
    print("Using AutoARIMA as alternative to AutoREG")

# Initialize the forecaster
forecaster_autoreg = AutoREG()

print(f"Forecaster: {forecaster_autoreg}")

### 5.1 Fit the Model with Exogenous Variables

In [None]:
# Fit the forecaster with both y and X
forecaster_autoreg.fit(y_train, X=X_train)
print("AutoREG model fitted with exogenous variables!")

### 5.2 Make Predictions with Exogenous Variables

In [None]:
# Generate forecasts - need to provide X_test for the forecast period
fh = range(1, len(y_test) + 1)
y_pred_autoreg = forecaster_autoreg.predict(fh=fh, X=X_test)

print(f"Forecast shape: {y_pred_autoreg.shape}")
print(f"Predictions: {y_pred_autoreg.values}")
print(f"Actual values: {y_test.values}")

### 5.3 Visualize Results

In [None]:
# Plot the results
plot_series(
    y_train, y_test, y_pred_autoreg,
    labels=["Training", "Actual", "AutoREG Forecast"],
    title="AutoREG Forecast with Exogenous Variables"
)
plt.legend()
plt.show()

## 6. Comparison: With vs Without Exogenous Variables

Let's compare the performance when using exogenous variables vs. not using them.

In [None]:
# Fit the same model without exogenous variables
forecaster_no_X = AutoREG()
forecaster_no_X.fit(y_train)  # No X provided
y_pred_no_X = forecaster_no_X.predict(fh=fh)  # No X for prediction

# Calculate performance metrics
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error, mean_squared_error

mape_with_X = mean_absolute_percentage_error(y_test, y_pred_autoreg)
mape_without_X = mean_absolute_percentage_error(y_test, y_pred_no_X)
rmse_with_X = np.sqrt(mean_squared_error(y_test, y_pred_autoreg))
rmse_without_X = np.sqrt(mean_squared_error(y_test, y_pred_no_X))

print("Performance Comparison:")
print(f"With Exogenous Variables:    MAPE = {mape_with_X:.2%}, RMSE = {rmse_with_X:.2f}")
print(f"Without Exogenous Variables: MAPE = {mape_without_X:.2%}, RMSE = {rmse_without_X:.2f}")
print(f"\nImprovement with exogenous variables:")
print(f"MAPE: {((mape_without_X - mape_with_X) / mape_without_X * 100):.1f}% better")
print(f"RMSE: {((rmse_without_X - rmse_with_X) / rmse_without_X * 100):.1f}% better")

## 7. Advanced Example: Modern Forecasters

Let's demonstrate the versatility of modern forecasting approaches. Note: This example shows how you could use advanced models like Chronos if available.

In [None]:
# Example of how to check for and use advanced forecasters
# Note: This is demonstration code - Chronos may not be available in all environments

try:
    # Check if advanced forecasters are available
    from sktime.forecasting.llm import LLMForecaster
    print("Advanced LLM-based forecasters are available!")
    print("These can handle complex patterns and don't always require traditional feature engineering.")
except ImportError:
    print("Advanced LLM-based forecasters not available in this environment.")
    print("For production use, consider forecasters like:")
    print("- LightGBM-based forecasters for complex feature interactions")
    print("- Deep learning models for non-linear patterns")
    print("- Ensemble methods for robust predictions")

# Show some other advanced forecasters that work well with exogenous variables
from sktime.forecasting.compose import make_reduction
from sklearn.ensemble import RandomForestRegressor

# Create a reduction-based forecaster
rf_forecaster = make_reduction(
    RandomForestRegressor(n_estimators=100, random_state=42),
    window_length=3,
    strategy="recursive"
)

print(f"\nReduction-based forecaster created: {rf_forecaster}")
print("This approach can naturally handle exogenous variables and complex patterns.")

## 8. Key Considerations for Exogenous Variables

When working with exogenous variables, keep these important points in mind:

In [None]:
# Demonstrate key considerations
print("Key Considerations for Exogenous Variables:")
print("\n1. Future Values Required:")
print(f"   - X_train shape: {X_train.shape}")
print(f"   - X_test shape: {X_test.shape}")
print("   - You need future values of X to make forecasts!")

print("\n2. Index Alignment:")
print(f"   - y_test index: {list(y_test.index)}")
print(f"   - X_test index: {list(X_test.index)}")
print("   - Indices must match exactly")

print("\n3. Variable Selection:")
print(f"   - Current X variables: {list(X.columns)}")
print("   - Consider correlation with target and forecasting horizon")

# Show correlation with target
correlations = X_train.corrwith(y_train).sort_values(key=abs, ascending=False)
print(f"\nCorrelations with target variable:")
for var, corr in correlations.items():
    print(f"   {var}: {corr:.3f}")

## Summary

In this tutorial, you learned:

1. **Exogenous Variables**: How to work with datasets containing external variables
2. **Forecaster Discovery**: Using `all_estimators` to find suitable forecasters
3. **AutoREG Usage**: Applying autoregressive models with exogenous variables
4. **Performance Comparison**: Measuring the impact of exogenous variables
5. **Advanced Approaches**: Understanding modern forecasting capabilities
6. **Key Considerations**: Important points for successful implementation

## Important Takeaways

- **Future Values**: You need future values of exogenous variables to forecast
- **Index Alignment**: Target and exogenous variables must have matching indices
- **Variable Selection**: Not all exogenous variables improve forecasting performance
- **Model Choice**: Different forecasters handle exogenous variables differently

## Next Steps

- Explore "Transformations" to learn about feature engineering
- Try "Pipelines" tutorial for advanced preprocessing workflows
- Check "Hyperparameter Tuning" for optimizing forecaster performance