# SCL-Alpha: ML-Driven Supply-Chain & Logistics Alpha

**End-to-end walkthrough** â€” data â†’ features â†’ models â†’ backtest â†’ metrics

**Universe:** UPS Â· FDX Â· XPO Â· CHRW Â· JBHT Â· UNP Â· CSX Â· MATX Â· GXO Â· EXPD  
**Models:** Ridge Regression Â· Random Forest Â· XGBoost  
**Target:** 5-day forward return (regression)  
**Backtest:** Walk-forward (expanding window, weekly rebalance)

---

### Prerequisites
```
pip install -r requirements.txt
```
Copy `.env.example` â†’ `.env` and add your free FRED API key from https://fred.stlouisfed.org/docs/api/api_key.html

> **Run from the project root** (`scl-alpha/`), not from inside `notebooks/`.

## 0 Â· Setup

In [None]:
# â”€â”€ Standard path fix: run from project root â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
import sys, os
from pathlib import Path

# Make sure we can import from src/
ROOT = Path(os.getcwd())
if ROOT.name == "notebooks":          # if you launched from notebooks/ dir
    ROOT = ROOT.parent
    os.chdir(ROOT)
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

print(f"Project root: {ROOT}")

# â”€â”€ Load environment variables (.env â†’ FRED_API_KEY) â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
from dotenv import load_dotenv
load_dotenv()

# â”€â”€ Silence verbose warnings in a notebook context â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams["figure.dpi"] = 120

print(f"pandas  {pd.__version__}")
print(f"numpy   {np.__version__}")
print("Setup complete âœ“")

---
## 1 Â· Pull Data

We download:
- **OHLCV prices** from Yahoo Finance via `yfinance` (cached as parquet)
- **Macro series** from FRED (VIX, diesel prices, ISM PMI, 10Y yield, jobless claims)
- **Fama-French factors** from Ken French's data library (used as benchmark returns)

> First run takes ~30 seconds; subsequent runs are instant (cached).

In [None]:
from src.config import TICKERS, ALL_SYMBOLS, DATA_START, DATA_END
from src.data_pull import pull_all

print(f"Ticker universe ({len(TICKERS)} stocks): {TICKERS}")
print(f"Benchmarks: IYT, SPY")
print(f"Date range : {DATA_START} â†’ {DATA_END}")
print()

raw = pull_all()
print(f"\nRaw panel shape: {raw.shape}")
print(f"Columns: {list(raw.columns)}")
raw.head(3)

In [None]:
# â”€â”€ Basic data quality checks â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
print("=== Date range per ticker ===")
summary = (
    raw.reset_index()
    .groupby("ticker")["date"]
    .agg(["min", "max", "count"])
    .rename(columns={"min": "first", "max": "last", "count": "rows"})
)
print(summary.to_string())

print("\n=== Null counts ===")
print(raw.isnull().sum()[raw.isnull().sum() > 0].to_string() or "No nulls.")

---
## 2 Â· Feature Engineering

All features are computed **at time t using only information available up to time t** (no look-ahead bias).

| Group | Features |
|---|---|
| Price returns | `ret_1d`, `ret_5d`, `ret_20d` |
| Volatility | `vol_20d` |
| Momentum | `rsi_14`, `volume_ratio_20d` |
| Relative strength | `ret_vs_iyt_5d`, `ret_vs_spy_5d` |
| Macro | `vix_level`, `vix_change_5d`, `diesel_change_4w`, `ism_pmi`, `dgs10`, `claims_change_4w` |
| Advanced | `rolling_beta_60d`, `vol_regime`, `momentum_rank`, `mean_reversion_5d` |
| **Target** | `target_ret_5d_fwd` (5-day forward return â€” NOT a feature) |

In [None]:
from src.features import build_features, FEATURE_COLUMNS

panel = build_features(raw)

print(f"Panel shape after feature engineering: {panel.shape}")
print(f"\nFeature columns ({len(FEATURE_COLUMNS)}):")
for i, col in enumerate(FEATURE_COLUMNS, 1):
    print(f"  {i:2d}. {col}")

print(f"\nTarget in FEATURE_COLUMNS: {'target_ret_5d_fwd' in FEATURE_COLUMNS}  (should be False)")

panel[FEATURE_COLUMNS + ['target_ret_5d_fwd']].describe().T.round(3)

In [None]:
from src.plots import plot_target_distribution

target = panel["target_ret_5d_fwd"].dropna()
print(f"Target rows: {len(target)}")
print(f"Mean: {target.mean():.3f}%   Std: {target.std():.3f}%")
print(f"Skew: {target.skew():.3f}    Kurt: {target.kurtosis():.3f}")

fig = plot_target_distribution(target)
plt.show()

In [None]:
from src.plots import plot_correlation_heatmap

# Correlation of a subset of features
corr_cols = ["ret_1d", "ret_5d", "ret_20d", "vol_20d", "rsi_14",
             "vix_level", "ism_pmi", "rolling_beta_60d", "target_ret_5d_fwd"]
corr_cols = [c for c in corr_cols if c in panel.columns]

fig = plot_correlation_heatmap(panel[corr_cols].dropna(), title="Feature Correlations (incl. Target)")
plt.show()

---
## 3 Â· Train Models (Quick In-Sample Test)

We first train each model on the **first 80%** of dates and evaluate on the remaining **20%**. This is **not** the final backtest â€” it's a quick sanity check to ensure the models learn something.

> The real evaluation is the **walk-forward backtest** in Section 4.

In [None]:
from src.model import make_model, train_model, predict, get_feature_importance
from src.metrics import compute_ml_metrics

# â”€â”€ Train / test split on dates (NOT random!) â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
all_dates = panel.index.get_level_values("date").unique().sort_values()
split_idx = int(len(all_dates) * 0.80)
train_cutoff = all_dates[split_idx]

train_mask = panel.index.get_level_values("date") < train_cutoff
test_mask  = panel.index.get_level_values("date") >= train_cutoff

feature_cols = [c for c in FEATURE_COLUMNS if c in panel.columns]
train_data = panel[train_mask].dropna(subset=feature_cols + ["target_ret_5d_fwd"])
test_data  = panel[test_mask].dropna(subset=feature_cols + ["target_ret_5d_fwd"])

X_train, y_train = train_data[feature_cols], train_data["target_ret_5d_fwd"]
X_test,  y_test  = test_data[feature_cols],  test_data["target_ret_5d_fwd"]

print(f"Train: {len(X_train)} rows ({X_train.index.get_level_values('date').min().date()} â†’ {train_cutoff.date()})")
print(f"Test : {len(X_test)} rows ({train_cutoff.date()} â†’ {X_test.index.get_level_values('date').max().date()})")

model_results = {}

for model_name in ["ridge", "rf", "xgboost"]:
    mdl = make_model(model_name)
    mdl = train_model(mdl, X_train, y_train, X_val=X_test, y_val=y_test)
    preds = predict(mdl, X_test)
    metrics = compute_ml_metrics(y_test.values, preds)
    model_results[model_name] = {"model": mdl, "preds": preds, "metrics": metrics}
    print(f"\n{model_name.upper():10s}  MAE={metrics['mae']:.3f}  RMSE={metrics['rmse']:.3f}  "
          f"IC={metrics['ic']:.3f}  HitRate={metrics['hit_rate']:.3f}")

In [None]:
from src.plots import plot_feature_importance

# Show feature importance for each model
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

for ax, (name, res) in zip(axes, model_results.items()):
    imp = get_feature_importance(res["model"], feature_names=feature_cols)
    colors = ["#2ca02c" if v >= 0 else "#d62728" for v in imp.head(10)]
    imp.head(10).sort_values().plot.barh(ax=ax, color=colors)
    ax.set_title(f"{name.upper()} â€” Top 10 Features", fontsize=12)
    ax.set_xlabel("Importance / |Coeff|")
    ax.grid(True, axis="x", alpha=0.3)

plt.tight_layout()
plt.show()

---
## 4 Â· Walk-Forward Backtest

This is the **main evaluation**. The model is retrained every 13 weeks using an **expanding window** (all data up to the signal date). Predictions are generated for the following week's positions.

**Anti-overfitting safeguards applied:**
- Walk-forward (no future data ever seen during training)
- Transaction costs: 5 bps + 2 bps slippage per trade
- Top-K=3 positions (concentrated enough to test signal, diversified enough to reduce noise)
- Only go long if predicted return > 0 (otherwise sit in cash)

In [None]:
from src.backtest import run_backtest, buy_and_hold_baseline
from src.config import TICKERS, TOP_K, TRANSACTION_COST_BPS, SLIPPAGE_BPS

# â”€â”€ Run backtest for all three models â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
bt_results = {}
for model_name in ["ridge", "rf", "xgboost"]:
    print(f"\n{'â”€'*60}")
    print(f"Running walk-forward backtest: {model_name.upper()}")
    bt_results[model_name] = run_backtest(
        panel,
        model_name=model_name,
        top_k=TOP_K,
        cost_bps=TRANSACTION_COST_BPS,
        slippage_bps=SLIPPAGE_BPS,
    )

print("\nDone.")

In [None]:
from src.metrics import compute_trading_metrics

print("\n=== Walk-Forward Backtest Summary ===")
print(f"{'Model':12s} {'Periods':>8s} {'CAGR%':>8s} {'Sharpe':>8s} {'MaxDD%':>8s} {'HitRate':>9s}")
print("-" * 60)

for name, df in bt_results.items():
    if len(df) == 0:
        print(f"{name:12s}  NO RESULTS")
        continue
    m = compute_trading_metrics(df["net_return_pct"], df["cumulative_return"])
    print(f"{name:12s} {len(df):>8d} {m['cagr']*100:>8.2f} {m['sharpe']:>8.2f} "
          f"{m['max_drawdown']*100:>8.2f} {m['hit_rate']:>9.3f}")

In [None]:
from src.plots import plot_equity_curve
from src.backtest import buy_and_hold_baseline

# â”€â”€ Benchmark: IYT buy-and-hold over the same period â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
# Select model with most periods for the date range
best_model_name = max(bt_results, key=lambda k: len(bt_results[k]))
bt_df = bt_results[best_model_name]

try:
    bench = buy_and_hold_baseline(
        panel,
        ticker="IYT",
        start=bt_df.index[0],
        end=bt_df.index[-1],
    )
except KeyError:
    bench = None
    print("IYT not found in panel â€” plotting strategy only")

# Align benchmark to strategy dates
bench_aligned = None
if bench is not None:
    common_dates = bt_df.index.intersection(bench.index)
    if len(common_dates) > 0:
        bench_aligned = bench.loc[common_dates, "cumulative_return"]
    else:
        bench_aligned = bench["cumulative_return"]

fig = plot_equity_curve(
    strategy_cum=bt_df["cumulative_return"],
    benchmark_cum=bench_aligned,
    title=f"Walk-Forward Backtest â€” {best_model_name.upper()} vs IYT Buy & Hold",
)
plt.show()

final_val = bt_df["cumulative_return"].iloc[-1]
print(f"Strategy final value: ${10_000 * final_val:,.0f}  (started at $10,000)")

In [None]:
# â”€â”€ Compare all three models on a single chart â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
fig, ax = plt.subplots(figsize=(13, 5))

colors_map = {"ridge": "#1f77b4", "rf": "#2ca02c", "xgboost": "#d62728"}
for name, df in bt_results.items():
    if len(df) == 0:
        continue
    cum = df["cumulative_return"] / df["cumulative_return"].iloc[0] * 10_000
    ax.plot(cum.index, cum.values, label=name.upper(), color=colors_map[name], linewidth=1.8)

ax.axhline(10_000, color="gray", linestyle="--", linewidth=0.8, label="Break-even")
ax.set_title("All Models â€” Growth of $10,000", fontsize=14, fontweight="bold")
ax.set_ylabel("Portfolio Value ($)")
ax.yaxis.set_major_formatter(matplotlib.ticker.StrMethodFormatter("${x:,.0f}"))
ax.legend()
ax.grid(True, alpha=0.3)
import seaborn as sns; sns.despine()
fig.tight_layout()
plt.show()

---
## 5 Â· Signal Heatmap

Visualize what the model predicted for each stock each week. Green = bullish signal, red = bearish.

In [None]:
from src.plots import plot_signal_heatmap
from src.model import predict
from src.backtest import get_rebalance_dates
from src.config import TICKERS

# Re-train the best model on the first 80% and predict on remaining dates
# to show a realistic signal history
feature_cols = [c for c in FEATURE_COLUMNS if c in panel.columns]
best_mdl = model_results["ridge"]["model"]  # use Ridge for clarity

signal_rows = []
rebal_dates = get_rebalance_dates(panel[panel.index.get_level_values("ticker").isin(TICKERS)])
rebal_dates = rebal_dates[rebal_dates >= train_cutoff]  # test period only

for sig_date in rebal_dates:
    mask = (
        (panel.index.get_level_values("date") == sig_date) &
        (panel.index.get_level_values("ticker").isin(TICKERS))
    )
    sig_data = panel[mask].dropna(subset=feature_cols)
    if len(sig_data) == 0:
        continue
    preds = predict(best_mdl, sig_data)
    tickers_at_date = sig_data.index.get_level_values("ticker")
    for t, p in zip(tickers_at_date, preds):
        signal_rows.append({"date": sig_date, "ticker": t, "predicted_ret": p})

signals_df = pd.DataFrame(signal_rows)
print(f"Signal table shape: {signals_df.shape}")

if len(signals_df) > 0:
    fig = plot_signal_heatmap(signals_df, title="Ridge Regression â€” Weekly Predicted Returns (Test Period)")
    plt.show()
else:
    print("No signals to display.")

---
## 6 Â· Risk & Pitfall Checks

Every quant model should pass these sanity checks before going live.

In [None]:
from src.features import FEATURE_COLUMNS

issues = []
ok = []

# 1. Target not in feature list
if "target_ret_5d_fwd" not in FEATURE_COLUMNS:
    ok.append("âœ“ Target 'target_ret_5d_fwd' is NOT in FEATURE_COLUMNS (no leakage)")
else:
    issues.append("âœ— Target IS in FEATURE_COLUMNS â€” DATA LEAKAGE!")

# 2. No future data in features
sample_ticker = TICKERS[0]
try:
    from src.features import add_return_features
    ticker_data = panel.xs(sample_ticker, level="ticker")["adj_close"].sort_index()
    midpoint_date = ticker_data.index[len(ticker_data)//2]

    df_full = panel.copy()
    df_trunc = panel[panel.index.get_level_values("date") <= midpoint_date].copy()

    ret_full  = df_full.xs(sample_ticker, level="ticker")["ret_5d"].get(midpoint_date)
    ret_trunc = df_trunc.xs(sample_ticker, level="ticker")["ret_5d"].get(midpoint_date)

    if ret_full is not None and ret_trunc is not None:
        if abs(float(ret_full) - float(ret_trunc)) < 1e-6:
            ok.append(f"âœ“ ret_5d at {midpoint_date.date()} unchanged when future data appended (no look-ahead)")
        else:
            issues.append(f"âœ— ret_5d CHANGED when future data appended â€” LOOK-AHEAD BIAS!")
    else:
        ok.append("âœ“ Look-ahead check skipped (feature not present at midpoint date)")
except Exception as e:
    ok.append(f"âœ“ Look-ahead check skipped ({e})")

# 3. Transaction costs reduce returns
best_bt = bt_results.get("ridge", pd.DataFrame())
if len(best_bt) > 0 and "gross_return_pct" in best_bt.columns and "net_return_pct" in best_bt.columns:
    avg_cost = (best_bt["gross_return_pct"] - best_bt["net_return_pct"]).mean()
    if avg_cost >= 0:
        ok.append(f"âœ“ Costs correctly reduce returns (avg cost per period: {avg_cost:.4f}%)")
    else:
        issues.append("âœ— Costs are INCREASING returns â€” cost logic is wrong!")

# 4. Turnover is reasonable
if len(best_bt) > 0 and "turnover_frac" in best_bt.columns:
    avg_turnover = best_bt["turnover_frac"].mean()
    if 0 <= avg_turnover <= 1:
        ok.append(f"âœ“ Average turnover per period: {avg_turnover:.2%} (reasonable)")
    else:
        issues.append(f"âœ— Turnover {avg_turnover:.2%} outside [0,1]!")

print("=" * 60)
print("PITFALL CHECKLIST")
print("=" * 60)
for item in ok:
    print(item)
for item in issues:
    print(item)
if not issues:
    print("\nâœ“ All checks passed.")
else:
    print(f"\nâœ— {len(issues)} issue(s) found â€” fix before trusting results!")

---
## 7 Â· Cumulative Return Comparison (Tickers)

In [None]:
from src.plots import plot_cumulative_returns

fig = plot_cumulative_returns(
    prices_long=panel,
    tickers=TICKERS[:6],  # first 6 for readability
    title="Cumulative Price Returns â€” Supply-Chain Universe",
)
plt.show()

---
## 8 Â· Save Best Model

Save the best-performing model to disk so the Streamlit app can load it.

In [None]:
from src.model import make_model, train_model, save_model

# Retrain on the full dataset before saving
full_data = panel.dropna(subset=feature_cols + ["target_ret_5d_fwd"])
X_full = full_data[feature_cols]
y_full = full_data["target_ret_5d_fwd"]

for name in ["ridge", "rf", "xgboost"]:
    mdl = make_model(name)
    mdl = train_model(mdl, X_full, y_full)
    path = save_model(mdl, name, suffix="_full")
    print(f"Saved: {path}")

print("\nAll models saved âœ“")

---
## 9 Â· Next Steps

| Step | Description |
|---|---|
| ðŸš€ **Run the App** | `streamlit run app/app.py` in the terminal |
| ðŸ§ª **Run Tests** | `pytest tests/ -v` |
| ðŸ“¦ **Deploy** | Push to GitHub â†’ Railway auto-deploys via Dockerfile |
| ðŸ”§ **Tune** | Adjust `TOP_K`, `TRANSACTION_COST_BPS`, or model hyper-params in `src/config.py` |
| ðŸ“ˆ **Extend** | Add new features, new tickers, or try a classification approach |

### Biases to Keep Monitoring
- **Look-ahead bias**: always use data strictly before the signal date
- **Survivorship bias**: this universe is point-in-time curated; add delisted tickers for production
- **Overfitting**: IC on the test period should remain positive across multiple runs
- **Transaction costs**: always verify net > gross at reasonable cost assumptions

---
_Generated by SCL-Alpha blueprint â€” see `ml-quant-trading-blueprint.md` for full methodology._