---
title: "Advanced Time Series Forecasting"
format: html
---

# ðŸ“ˆ Advanced Time Series Forecasting
## Portfolio Project 2 â€” Ensemble Forecasting, STL Decomposition, Quantile Regression & Conformal Prediction

---

### What This Notebook Covers (Beyond Basics)
| Topic | Technique |
|---|---|
| Decomposition | STL (Seasonal-Trend decomposition using Loess) |
| Quantile regression | GBR with quantile loss â€” prediction intervals |
| Conformal prediction | Split-conformal prediction intervals (distribution-free) |
| Stacking ensemble | Meta-learner on top of Ridge / RF / GB base models |
| Walk-forward CV | Expanding-window cross-validation for time series |
| Feature selection | Recursive Feature Elimination with cross-validation |

### Dataset
**UCI Appliances Energy Prediction** (synthetic replica, 10-min, ~20 k rows)  
Reference: https://archive.ics.uci.edu/dataset/330

---


In [None]:
# â”€â”€â”€ 1. Imports â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import seaborn as sns
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor, StackingRegressor
from sklearn.linear_model import Ridge, RidgeCV
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFECV
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
print('âœ“ All imports loaded.')

In [None]:
# â”€â”€â”€ 2. Synthetic energy data (UCI-structure) â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
def gen_energy(n=20050, seed=7):
    rng = np.random.default_rng(seed)
    dates = pd.date_range('2016-01-11', periods=n, freq='10min')
    hour = dates.hour + dates.minute/60
    dow = dates.dayofweek.values

    T1 = 20 + 5*np.sin(2*np.pi*(hour-6)/24) + rng.normal(0, 1.5, n)
    T2 = T1 + rng.normal(0.5, 0.8, n)
    T3 = T1 + rng.normal(-0.3, 1.0, n)
    H1 = 55 - 0.8*T1 + rng.normal(0, 4, n)
    H2 = H1 + rng.normal(0, 2, n)
    H3 = H1 + rng.normal(1, 3, n)
    wind = np.clip(rng.exponential(3, n), 0, 25)
    pres = 1013 + rng.normal(0, 5, n)

    # Target with weekly + daily seasonality
    app = (30 + 12*np.sin(2*np.pi*(hour-7)/24)**2
           + 8*(dow < 5).astype(float)
           + 3*np.sin(2*np.pi*np.arange(n)/144)   # 1-day cycle in samples
           + 0.3*T1 + 0.1*H1 - 0.05*wind
           + rng.normal(0, 6, n))
    app = np.clip(app, 5, 250)

    df = pd.DataFrame({
        'date': dates, 'Appliances': app.round(1),
        'T1': T1.round(2), 'T2': T2.round(2), 'T3': T3.round(2),
        'H1': H1.round(2), 'H2': H2.round(2), 'H3': H3.round(2),
        'Wind': wind.round(2), 'Pressure': pres.round(2)
    })
    return df.set_index('date')


df = gen_energy()
print(df.shape)
df.head()

In [None]:
# 1. STL Decomposition (Seasonal-Trend using Loess)

# STL separates a time series into **Trend**, **Seasonal**, and **Residual** components using iteratively re-weighted Loess smoothing. We implement a simplified version (single-pass Loess trend + periodic seasonal extraction).

# â”€â”€â”€ 3. STL-style decomposition (manual Loess trend) â”€â”€â”€â”€â”€
def loess_smooth(y, window_frac=0.1, poly=1, iters=3):
    """Local polynomial (Loess) smoothing with iterative reweighting."""
    n = len(y)
    win = max(int(window_frac * n) | 1, 3)   # ensure odd
    if win % 2 == 0:
        win += 1
    half = win // 2
    smoothed = np.zeros(n)
    weights = np.ones(n)

    for _ in range(iters):
        for i in range(n):
            lo, hi = max(0, i-half), min(n, i+half+1)
            x_loc = np.arange(lo, hi)
            d = np.abs(x_loc - i)
            u = d / (d.max() + 1e-10)
            # Tricube kernel
            w = (1 - u**3)**3 * weights[lo:hi]
            # Weighted least squares
            X_mat = np.column_stack([np.ones(hi-lo), x_loc - i])
            W_mat = np.diag(w)
            try:
                beta = np.linalg.solve(X_mat.T @ W_mat @ X_mat,
                                       X_mat.T @ W_mat @ y[lo:hi])
                smoothed[i] = beta[0]
            except:
                smoothed[i] = np.average(y[lo:hi], weights=w)

        # Update weights (bisquare on residuals)
        resid = y - smoothed
        med_r = np.median(np.abs(resid))
        u_r = resid / (6 * med_r + 1e-10)
        weights = np.where(np.abs(u_r) < 1, (1 - u_r**2)**2, 0)
    return smoothed


# Use a 2-day subset for speed (288 samples at 10min = 2 days)
sub = df['Appliances'].values[:2880].copy()   # 20 days

print('Computing Loess trend â€¦ (this may take ~10-20 s)')
trend = loess_smooth(sub, window_frac=0.15, poly=1, iters=2)

# Extract seasonal: average the detrended signal over each 144-sample period
detrended = sub - trend
n_periods = len(sub) // 144
seasonal = np.zeros(144)
for p in range(n_periods):
    seasonal += detrended[p*144:(p+1)*144]
seasonal /= n_periods
# Tile seasonal to full length
seasonal_full = np.tile(seasonal, n_periods + 1)[:len(sub)]

residual = sub - trend - seasonal_full

fig, axes = plt.subplots(4, 1, figsize=(16, 9), sharex=True)
labels_data = [('Original', sub, '#4c72b0'),
               ('Trend',    trend, '#c44e52'),
               ('Seasonal', seasonal_full, '#55a868'),
               ('Residual', residual, '#8172b2')]
for ax, (label, data, color) in zip(axes, labels_data):
    ax.plot(data, lw=0.8, color=color)
    ax.set_ylabel(label, fontsize=10)
    ax.set_title(label, fontsize=11)
axes[-1].set_xlabel('Sample Index (10-min intervals)')
plt.suptitle(
    'STL-Style Decomposition â€” Appliances Energy (20 days)', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

In [None]:
# 2. Feature Engineering + Recursive Feature Elimination (RFECV)


# â”€â”€â”€ 4. Comprehensive feature engineering â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
df2 = df.copy()
# Temporal
df2['hour'] = df2.index.hour
df2['dow'] = df2.index.dayofweek
df2['is_weekend'] = (df2['dow'] >= 5).astype(int)
df2['month'] = df2.index.month
df2['hour_sin'] = np.sin(2*np.pi*df2['hour']/24)
df2['hour_cos'] = np.cos(2*np.pi*df2['hour']/24)
df2['dow_sin'] = np.sin(2*np.pi*df2['dow']/7)
df2['dow_cos'] = np.cos(2*np.pi*df2['dow']/7)

# Lags
for lag in [1, 2, 3, 6, 12, 24, 144]:
    df2[f'App_lag{lag}'] = df2['Appliances'].shift(lag)

# Rolling stats
for w in [6, 12, 24, 144]:
    df2[f'App_rmean_{w}'] = df2['Appliances'].rolling(w).mean()
    df2[f'App_rstd_{w}'] = df2['Appliances'].rolling(w).std()
    df2[f'App_rmax_{w}'] = df2['Appliances'].rolling(w).max()
    df2[f'App_rmin_{w}'] = df2['Appliances'].rolling(w).min()

# Interactions
df2['T1_H1'] = df2['T1'] * df2['H1']
df2['T_range'] = df2[['T1', 'T2', 'T3']].max(
    axis=1) - df2[['T1', 'T2', 'T3']].min(axis=1)
df2['H_range'] = df2[['H1', 'H2', 'H3']].max(
    axis=1) - df2[['H1', 'H2', 'H3']].min(axis=1)

df2.dropna(inplace=True)
TARGET = 'Appliances'
X_all = df2.drop(columns=[TARGET])
y_all = df2[TARGET]
print(f'Feature matrix: {X_all.shape}')

# RFECV with GBR â€” select best features
print('Running RFECV â€¦ (expanding-window CV)')
tscv = TimeSeriesSplit(n_splits=5)
gbr_small = GradientBoostingRegressor(
    n_estimators=100, max_depth=4, random_state=0)
rfecv = RFECV(gbr_small, step=3, cv=tscv,
              scoring='neg_root_mean_squared_error', min_features_to_select=8)
rfecv.fit(X_all.values, y_all.values)

selected = X_all.columns[rfecv.support_].tolist()
print(f'\nSelected {len(selected)} features:')
print(selected)

In [None]:
# â”€â”€â”€ 5. RFECV ranking visualisation â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
ranking = pd.Series(rfecv.ranking_, index=X_all.columns).sort_values()

fig, ax = plt.subplots(figsize=(12, 6))
colors = ['#55a868' if r == 1 else '#4c72b0' for r in ranking.values]
ranking.plot(kind='barh', ax=ax, color=colors, edgecolor='white')
ax.set_title('RFECV Feature Ranking (green = selected)', fontsize=13)
ax.set_xlabel('Elimination Rank (1 = kept)')
plt.tight_layout()
plt.show()

# CV score curve
fig, ax = plt.subplots(figsize=(10, 4))
n_feats = range(rfecv.min_features_to_select, X_all.shape[1]+1)
ax.plot(list(n_feats), -
        rfecv.cv_results_['mean_test_score'], 'o-', lw=1.5, color='steelblue', ms=4)
ax.axvline(len(selected), color='crimson', ls='--',
           lw=1.2, label=f'Selected: {len(selected)}')
ax.set_title('RFECV â€” RMSE vs Number of Features')
ax.set_xlabel('Number of Features')
ax.set_ylabel('Mean CV RMSE')
ax.legend()
plt.tight_layout()
plt.show()

In [None]:
# 3. Walk-Forward Expanding-Window Cross-Validation

# â”€â”€â”€ 6. Temporal train/test + walk-forward CV â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
X_sel = X_all[selected].values
y = y_all.values

# Chronological 80/20 split
split = int(0.80 * len(X_sel))
X_train, X_test = X_sel[:split], X_sel[split:]
y_train, y_test = y[:split],     y[split:]

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

# Walk-forward CV scores
tscv = TimeSeriesSplit(n_splits=5)
models_wf = {
    'Ridge': Ridge(alpha=10),
    'RF':    RandomForestRegressor(n_estimators=150, max_depth=12, random_state=0, n_jobs=-1),
    'GBR':   GradientBoostingRegressor(n_estimators=300, max_depth=5, learning_rate=0.05, random_state=0),
}

print('Walk-forward CV results:')
wf_scores = {}
for name, model in models_wf.items():
    from sklearn.model_selection import cross_val_score
    scores = cross_val_score(model, X_train_s, y_train,
                             cv=tscv, scoring='neg_root_mean_squared_error')
    wf_scores[name] = -scores
    print(
        f'  {name:8s}: RMSE = {(-scores).mean():.2f} Â± {scores.std():.2f}  (folds: {[f"{s:.2f}" for s in -scores]})')

In [None]:
# 4. Stacking Ensemble â€” Meta-Learner

# â”€â”€â”€ 7. Stacking regressor â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
base_estimators = [
    ('ridge', Ridge(alpha=10)),
    ('rf',    RandomForestRegressor(n_estimators=150,
     max_depth=12, random_state=0, n_jobs=-1)),
    ('gbr',   GradientBoostingRegressor(n_estimators=300,
     max_depth=5, learning_rate=0.05, random_state=0)),
]

stacker = StackingRegressor(
    estimators=base_estimators,
    final_estimator=RidgeCV(alphas=[0.1, 1, 10, 100]),
    cv=TimeSeriesSplit(n_splits=5),
    passthrough=True     # pass original features to meta-learner too
)

print('Fitting stacking ensemble â€¦')
stacker.fit(X_train_s, y_train)
stack_preds = stacker.predict(X_test_s)

# Also get individual test predictions
ind_preds = {}
for name, model in models_wf.items():
    model.fit(X_train_s, y_train)
    ind_preds[name] = model.predict(X_test_s)

# Metrics
all_preds = {**ind_preds, 'Stacking': stack_preds}
print('\n{:<12} {:>8} {:>8} {:>8}'.format('Model', 'RMSE', 'MAE', 'RÂ²'))
print('-'*40)
for name, p in all_preds.items():
    rmse = np.sqrt(mean_squared_error(y_test, p))
    mae = mean_absolute_error(y_test, p)
    r2 = r2_score(y_test, p)
    print(f'{name:<12} {rmse:8.2f} {mae:8.2f} {r2:8.3f}')

In [None]:
# â”€â”€â”€ 8. Forecast comparison plot â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
fig, ax = plt.subplots(figsize=(16, 5))
idx = slice(0, 600)   # first 600 test points
ax.plot(y_test[idx], lw=1.2, color='black', label='Actual')
colors_map = {'Ridge': '#4c72b0', 'RF': '#55a868',
              'GBR': '#c44e52', 'Stacking': 'gold'}
for name, p in all_preds.items():
    lw = 2.0 if name == 'Stacking' else 0.9
    alph = 1.0 if name == 'Stacking' else 0.5
    ax.plot(p[idx], lw=lw, color=colors_map[name], alpha=alph, label=name)
ax.set_title('Stacking vs Individual Models â€” Test Set Forecast', fontsize=13)
ax.set_ylabel('Appliances Energy (Wh)')
ax.set_xlabel('Test Sample Index')
ax.legend()
plt.tight_layout()
plt.show()

In [None]:
# 5. Quantile Regression â€” Prediction Intervals

# â”€â”€â”€ 9. Quantile GBR â€” lower / median / upper bounds â”€â”€â”€â”€
quantiles = [0.05, 0.5, 0.95]
q_preds = {}

for q in quantiles:
    qgbr = GradientBoostingRegressor(
        n_estimators=300, max_depth=5, learning_rate=0.05,
        loss='quantile', quantile=q, random_state=0
    )
    qgbr.fit(X_train_s, y_train)
    q_preds[q] = qgbr.predict(X_test_s)
    print(f'Quantile {q:.2f} model trained.')

# Coverage check
lower, median, upper = q_preds[0.05], q_preds[0.5], q_preds[0.95]
covered = ((y_test >= lower) & (y_test <= upper)).mean()
width = (upper - lower).mean()
print(
    f'\n90% Prediction Interval â†’ Coverage: {covered*100:.1f}%  |  Mean Width: {width:.2f} Wh')

In [None]:
# â”€â”€â”€ 10. Quantile prediction interval plot â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
fig, ax = plt.subplots(figsize=(16, 5))
idx = slice(0, 500)
ax.fill_between(range(500), lower[idx], upper[idx],
                color='steelblue', alpha=0.15, label='90% PI (Quantile)')
ax.plot(median[idx], lw=1.2, color='steelblue', label='Median (q=0.5)')
ax.plot(y_test[idx], lw=1.0, color='black', alpha=0.7, label='Actual')
ax.plot(stack_preds[idx], lw=1.0, color='gold', ls='--', label='Stacking')
ax.set_title('Quantile Regression â€” 90% Prediction Interval', fontsize=13)
ax.set_ylabel('Appliances Energy (Wh)')
ax.set_xlabel('Test Sample Index')
ax.legend(loc='upper right')
plt.tight_layout()
plt.show()

In [None]:
# 6. Conformal Prediction â€” Distribution-Free Coverage Guarantee

# â”€â”€â”€ 11. Split-conformal prediction interval â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
# Use stacking model's point predictions.
# Split calibration set from train (last 15% of training data).
cal_start = int(0.85 * len(X_train_s))
X_cal = X_train_s[cal_start:]
y_cal = y_train[cal_start:]

# Calibration residuals (absolute)
cal_preds = stacker.predict(X_cal)
cal_resids = np.abs(y_cal - cal_preds)

# For 90% coverage â†’ (1 - Î±) = 0.90 â†’ take 95th percentile of cal residuals
alpha = 0.10
q_level = np.ceil((1 - alpha) * (len(cal_resids) + 1)) / len(cal_resids)
q_level = min(q_level, 1.0)
conf_width = np.quantile(cal_resids, q_level)

conf_lower = stack_preds - conf_width
conf_upper = stack_preds + conf_width

# Coverage on test set
conf_covered = ((y_test >= conf_lower) & (y_test <= conf_upper)).mean()
print(f'Conformal Î±={alpha:.2f}:')
print(f'  Quantile level used: {q_level:.4f}')
print(f'  Interval half-width: Â±{conf_width:.2f} Wh')
print(
    f'  Actual test coverage: {conf_covered*100:.1f}%  (target: {(1-alpha)*100:.0f}%)')

In [None]:
# â”€â”€â”€ 12. Conformal vs Quantile interval comparison â”€â”€â”€â”€â”€â”€â”€
fig, axes = plt.subplots(2, 1, figsize=(16, 8), sharex=True)
idx = slice(0, 500)

# Conformal
axes[0].fill_between(range(500), conf_lower[idx], conf_upper[idx],
                     color='#c44e52', alpha=0.15, label=f'Conformal PI (cov={conf_covered*100:.1f}%)')
axes[0].plot(stack_preds[idx], lw=1.2, color='#c44e52', label='Stacking pred')
axes[0].plot(y_test[idx], lw=1.0, color='black', alpha=0.7, label='Actual')
axes[0].set_title(
    'Conformal Prediction Interval (distribution-free)', fontsize=12)
axes[0].set_ylabel('Energy (Wh)')
axes[0].legend(loc='upper right')

# Quantile
axes[1].fill_between(range(500), lower[idx], upper[idx],
                     color='#4c72b0', alpha=0.15, label=f'Quantile PI (cov={covered*100:.1f}%)')
axes[1].plot(median[idx], lw=1.2, color='#4c72b0', label='Quantile median')
axes[1].plot(y_test[idx], lw=1.0, color='black', alpha=0.7, label='Actual')
axes[1].set_title('Quantile Regression Prediction Interval', fontsize=12)
axes[1].set_ylabel('Energy (Wh)')
axes[1].set_xlabel('Test Sample Index')
axes[1].legend(loc='upper right')

plt.suptitle('Prediction Interval Comparison', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

In [None]:
# â”€â”€â”€ 13. Summary metrics table â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
summary = pd.DataFrame({
    'Method':        ['Quantile (5/95)', 'Conformal (split)'],
    'Target Cover':  ['90%', '90%'],
    'Actual Cover':  [f'{covered*100:.1f}%', f'{conf_covered*100:.1f}%'],
    'Mean Width':    [f'{(upper-lower).mean():.2f}', f'{(conf_upper-conf_lower).mean():.2f}'],
    'Assumption':    ['Parametric (quantile loss)', 'Distribution-free (exchangeability)']
})
print(summary.to_string(index=False))

---
## Summary & Portfolio Takeaways

| Technique | Value |
|---|---|
| **STL Decomposition** | Cleanly separates trend, daily seasonality, and noise â€” critical for feature engineering |
| **RFECV** | Prunes 40+ engineered features down to ~15 most predictive; avoids overfitting |
| **Walk-forward CV** | Honest evaluation respecting temporal ordering; exposes fold-level variance |
| **Stacking Ensemble** | Consistently best point predictions â€” meta-learner learns when to trust each base model |
| **Quantile Regression** | Adaptive-width intervals that widen during high-uncertainty periods |
| **Conformal Prediction** | Distribution-free coverage guarantee â€” the gold standard for deployment safety |

These techniques are production-ready for **energy management systems**, **demand forecasting**, and any application requiring **calibrated uncertainty quantification**.
