# 01 — MVP Forecasting with Yahoo Finance Data

**Goal:** Build a **minimum viable forecasting pipeline** with:
- Data from Yahoo Finance (proxy for electricity futures, e.g., ETF `XLE`)
- Baseline ARIMA model
- Machine Learning model (LightGBM)
- Metrics & visual comparison

**Why this matters for hedge funds:**
- Demonstrates ability to structure time series forecasting projects
- Shows competence in classical + ML approaches
- Provides a baseline for extension with real energy data (ENTSO‑E, EEX, etc.)


## 0) Setup & Imports

In [None]:
# Install dependencies (with fixed compatible versions)
# %pip install numpy==1.26.4 scipy==1.13.1 scikit-learn==1.5.1 pmdarima==2.0.4 lightgbm yfinance matplotlib

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
from sklearn.metrics import mean_absolute_error, mean_squared_error
import yfinance as yf

from pmdarima import auto_arima
import lightgbm as lgb

plt.rcParams['figure.figsize'] = (12,5)
plt.rcParams['axes.grid'] = True


## 1) Helper functions

In [None]:
def add_time_features(df, ts_col="Date"):
    df = df.copy()
    df[ts_col] = pd.to_datetime(df[ts_col])
    df["year"] = df[ts_col].dt.year
    df["month"] = df[ts_col].dt.month
    df["dayofweek"] = df[ts_col].dt.dayofweek
    return df

def make_lags(df, y_col="Close", lags=(1,5,22)):
    df = df.copy()
    for L in lags:
        df[f"lag_{L}"] = df[y_col].shift(L)
    return df

def ts_train_test_split(df, test_size=0.2):
    n = len(df)
    split = int(n*(1-test_size))
    return df.iloc[:split].copy(), df.iloc[split:].copy()

def metrics(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = sqrt(mean_squared_error(y_true, y_pred))
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    return {"MAE": mae, "RMSE": rmse, "MAPE%": mape}


## 2) Download data from Yahoo Finance

In [None]:
TICKER = "XLE"  # Proxy energy ETF
df = yf.download(TICKER, start="2015-01-01").reset_index()
df = df[["Date", "Close"]].dropna().reset_index(drop=True)
df.head()


## 3) Quick EDA

In [None]:
ax = df.plot(x="Date", y="Close", title=f"{TICKER} Close Price")
ax.set_xlabel("Date"); ax.set_ylabel("Price")
plt.show()

df.describe().T


## 4) Feature engineering

In [None]:
df_feat = add_time_features(df, ts_col="Date")
df_feat = make_lags(df_feat, y_col="Close", lags=(1,5,22))
df_feat = df_feat.dropna().reset_index(drop=True)

train, test = ts_train_test_split(df_feat, test_size=0.2)
y_tr, y_te = train["Close"].values, test["Close"].values

feature_cols = [c for c in df_feat.columns if c.startswith("lag_")] + ["month","dayofweek","year"]
X_tr, X_te = train[feature_cols], test[feature_cols]


## 5) Baseline ARIMA

In [None]:
arima = auto_arima(y_tr, seasonal=False, suppress_warnings=True)
arima_fc = arima.predict(n_periods=len(test))

m_arima = metrics(y_te, arima_fc)
print("ARIMA metrics:", m_arima)

plt.figure()
plt.plot(test["Date"], y_te, label="Actual")
plt.plot(test["Date"], arima_fc, label="ARIMA")
plt.legend(); plt.title(f"ARIMA vs Actual — {TICKER}"); plt.show()


## 6) Machine Learning with LightGBM

In [None]:
lgbm = lgb.LGBMRegressor(
    n_estimators=800, learning_rate=0.02,
    subsample=0.9, colsample_bytree=0.9,
    random_state=42
)
lgbm.fit(X_tr, y_tr)
lgbm_fc = lgbm.predict(X_te)

m_lgbm = metrics(y_te, lgbm_fc)
print("LightGBM metrics:", m_lgbm)

plt.figure()
plt.plot(test["Date"], y_te, label="Actual")
plt.plot(test["Date"], lgbm_fc, label="LightGBM")
plt.legend(); plt.title(f"LightGBM vs Actual — {TICKER}"); plt.show()


## 7) Compare ARIMA vs LightGBM

In [None]:
plt.figure()
plt.plot(test["Date"], y_te, label="Actual")
plt.plot(test["Date"], arima_fc, label="ARIMA")
plt.plot(test["Date"], lgbm_fc, label="LightGBM")
plt.legend(); plt.title(f"Forecast Comparison — {TICKER}")
plt.show()

print("ARIMA:", m_arima)
print("LightGBM:", m_lgbm)


## 8) Next steps

- Add **exogenous variables** (e.g., natural gas `NG=F`, weather).
- Implement **walk‑forward** validation (see notebook 02).
- Hyperparameter tuning (Optuna/Bayesian search).
- Add LSTM comparison (see notebook 03).
