# P-ML2 — LightGBM Baseline Model

**Goal:** Train a LightGBM regressor to predict 1-day-ahead BTC/USDT log-returns
using the 12-feature set identified in P-ML1 (F7). Evaluate with purged sequential
walk-forward (5 folds, purge=1 bar) to get honest OOS metrics.

**Success criterion (from roadmap):** OOS IC > 0.03 consistently across folds.

## §1 — Config

In [1]:
import sys
from pathlib import Path

repo_root = Path("__file__").resolve().parent.parent
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker

plt.rcParams.update({
    "figure.dpi":        120,
    "axes.spines.top":   False,
    "axes.spines.right": False,
    "font.size":         9,
})

SINCE      = "2022-01-01"   # 3 years → ~1075 usable daily bars
UNTIL      = "2025-01-01"
HORIZON    = 1              # 1-day-ahead prediction
N_SPLITS   = 5
TRAIN_FRAC = 0.6
PURGE      = 1              # = label horizon

# 12-feature set from F7 (deduplicated oscillator + volatility groups)
FEATURES = [
    "bar_ret",       # current-bar return (mean-reversion signal, IC=-0.081 @ 1h)
    "bb_zscore",     # z-score within Bollinger bands
    "rsi",           # RSI(14)
    "macd_hist_norm",# MACD histogram / close
    "atr_pct",       # ATR / close (volatility)
    "bb_width",      # (upper-lower) / mid (band squeeze)
    "upper_wick",    # upper shadow / close  (IC=0.165 @ 1d!)
    "lower_wick",    # lower shadow / close
    "hl_range",      # (high-low) / close (intra-day range)
    "vol_log_chg",   # log(volume / prev_volume)
    "di_diff",       # (+DI - -DI) / 100
    "adx",           # ADX / 100
]

print(f"Dataset: {SINCE} → {UNTIL} | timeframe: 1d | horizon: {HORIZON} bar")
print(f"Walk-forward: {N_SPLITS} folds, train_frac={TRAIN_FRAC}, purge={PURGE}")
print(f"Feature set: {len(FEATURES)} features")

Dataset: 2022-01-01 → 2025-01-01 | timeframe: 1d | horizon: 1 bar
Walk-forward: 5 folds, train_frac=0.6, purge=1
Feature set: 12 features


## §2 — Data

In [2]:
from data.fetch import fetch_ohlcv
from ml.features import build_feature_matrix
from ml.labels import forward_return

df = fetch_ohlcv(timeframe="1d", since=SINCE, until=UNTIL)
print(f"{len(df):,} daily bars  |  {df.index[0].date()} → {df.index[-1].date()}")

feats = build_feature_matrix(df)
label = forward_return(df, horizon=HORIZON)
comb  = pd.concat([feats, label], axis=1).dropna()
X_all = comb[feats.columns]
y_all = comb[label.name]
X     = X_all[FEATURES]

log_ret = np.log(df["close"] / df["close"].shift(1)).dropna()

print(f"\nUsable rows after warm-up: {len(X)}")
print(f"Label — mean: {y_all.mean():.6f}  std: {y_all.std():.6f}")
print(f"Direction split: up={( y_all>0).mean()*100:.1f}%  down={(y_all<0).mean()*100:.1f}%")

# Price chart with regime shading
fig, ax = plt.subplots(figsize=(14, 3))
df["close"].plot(ax=ax, color="steelblue", linewidth=1.0)
ax.set_title("BTC/USDT daily close — 2022–2024 (bear → recovery → bull)", fontsize=10)
ax.set_ylabel("USDT")
ax.axvspan("2022-01-01", "2022-11-21", alpha=0.07, color="red",   label="bear")
ax.axvspan("2022-11-21", "2023-12-31", alpha=0.07, color="gray",  label="recovery")
ax.axvspan("2024-01-01", "2025-01-01", alpha=0.07, color="green", label="bull")
ax.legend(fontsize=8)
plt.tight_layout()
plt.show()

1,097 daily bars  |  2022-01-01 → 2025-01-01

Usable rows after warm-up: 1075
Label — mean: 0.000923  std: 0.028373
Direction split: up=50.2%  down=49.8%


## §3 — Persistence baseline

In [3]:
# Persistence: predict next day = today's return (ret_lag1 = bar_ret)
# IC of persistence = Spearman(y[t-1], y[t]) = lag-1 autocorrelation

from ml.validation import purged_wf_splits

splits = list(purged_wf_splits(len(X), N_SPLITS, TRAIN_FRAC, purge_bars=PURGE))

persist_ics = []
for tr, te in splits:
    persistence = X.iloc[te]["bar_ret"].values  # today's return as prediction
    rho, _      = stats.spearmanr(persistence, y_all.iloc[te].values)
    persist_ics.append(rho)

print("Persistence baseline (predict next = current return):")
for i, (ic, (tr, te)) in enumerate(zip(persist_ics, splits)):
    print(f"  Fold {i+1}  [{X.index[te[0]].date()} → {X.index[te[-1]].date()}]  IC={ic:+.4f}")
print(f"  Mean IC: {np.mean(persist_ics):+.4f}  |  ICIR: {np.mean(persist_ics)/np.std(persist_ics):.3f}")

Persistence baseline (predict next = current return):
  Fold 1  [2022-07-20 → 2023-01-14]  IC=-0.0183
  Fold 2  [2023-01-15 → 2023-07-12]  IC=-0.1510
  Fold 3  [2023-07-13 → 2024-01-07]  IC=-0.0685
  Fold 4  [2024-01-08 → 2024-07-04]  IC=-0.0909
  Fold 5  [2024-07-05 → 2024-12-30]  IC=-0.0072
  Mean IC: -0.0672  |  ICIR: -1.290


## §4 — LightGBM purged walk-forward

In [4]:
from ml.models import LGBMForecaster

fold_results = []
all_preds    = []
all_actuals  = []

for i, (tr, te) in enumerate(splits):
    model = LGBMForecaster()
    model.fit(X.iloc[tr], y_all.iloc[tr])
    preds  = model.predict(X.iloc[te])
    actual = y_all.iloc[te].values

    rho, pval = stats.spearmanr(preds, actual)
    hit_rate  = (np.sign(preds) == np.sign(actual)).mean()
    rmse      = np.sqrt(np.mean((preds - actual) ** 2))

    fold_results.append({
        "fold":        i + 1,
        "test_start":  X.index[te[0]].date(),
        "test_end":    X.index[te[-1]].date(),
        "n_train":     len(tr),
        "n_test":      len(te),
        "IC":          rho,
        "IC_pval":     pval,
        "hit_rate":    hit_rate,
        "RMSE":        rmse,
        "model":       model,
    })
    all_preds.extend(preds)
    all_actuals.extend(actual)

    print(f"Fold {i+1}  [{X.index[te[0]].date()} → {X.index[te[-1]].date()}]  "
          f"train={len(tr)}  IC={rho:+.4f}  hit={hit_rate:.3f}  RMSE={rmse:.6f}")

ics  = [r["IC"] for r in fold_results]
hits = [r["hit_rate"] for r in fold_results]
print(f"\nMean IC:   {np.mean(ics):+.4f}  |  Std IC: {np.std(ics):.4f}")
print(f"ICIR:      {np.mean(ics)/np.std(ics):.3f}")
print(f"Mean hit:  {np.mean(hits):.3f}")
print(f"\nOverall OOS IC (pooled): ", end="")
rho_all, _ = stats.spearmanr(all_preds, all_actuals)
print(f"{rho_all:+.4f}")

Fold 1  [2022-07-20 → 2023-01-14]  train=178  IC=-0.0739  hit=0.514  RMSE=0.032534
Fold 2  [2023-01-15 → 2023-07-12]  train=267  IC=+0.0443  hit=0.564  RMSE=0.032309
Fold 3  [2023-07-13 → 2024-01-07]  train=267  IC=-0.2236  hit=0.413  RMSE=0.023746
Fold 4  [2024-01-08 → 2024-07-04]  train=267  IC=+0.0491  hit=0.486  RMSE=0.028649
Fold 5  [2024-07-05 → 2024-12-30]  train=267  IC=-0.0390  hit=0.508  RMSE=0.028433

Mean IC:   -0.0486  |  Std IC: 0.0995
ICIR:      -0.488
Mean hit:  0.497

Overall OOS IC (pooled): -0.0173


## §5 — Per-fold IC bar chart

In [5]:
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# IC per fold
labels  = [f"F{r['fold']}\n{str(r['test_start'])[5:]}" for r in fold_results]
ic_vals = [r["IC"] for r in fold_results]
p_vals  = [r["IC_pval"] for r in fold_results]
colors  = ["#2ecc71" if v > 0 else "#e74c3c" for v in ic_vals]
edgecolors = ["black" if p < 0.05 else "none" for p in p_vals]

bars = axes[0].bar(labels, ic_vals, color=colors, edgecolor=edgecolors, linewidth=1.2)
axes[0].axhline(0,     color="black", linewidth=0.7)
axes[0].axhline( 0.03, color="gray", linewidth=0.7, linestyle="--", label="±0.03 target")
axes[0].axhline(-0.03, color="gray", linewidth=0.7, linestyle="--")
axes[0].set_title("OOS IC per fold\n(black border = p<0.05)", fontsize=9)
axes[0].set_ylabel("Spearman IC")
axes[0].legend(fontsize=7)

# Hit rate per fold
hit_vals = [r["hit_rate"] for r in fold_results]
h_colors = ["#2ecc71" if v > 0.5 else "#e74c3c" for v in hit_vals]
axes[1].bar(labels, hit_vals, color=h_colors)
axes[1].axhline(0.5, color="black", linewidth=0.7, linestyle="--", label="random (50%)")
axes[1].set_title("Directional hit rate per fold", fontsize=9)
axes[1].set_ylabel("Hit rate")
axes[1].set_ylim(0.35, 0.65)
axes[1].legend(fontsize=7)

# Persistence IC vs LightGBM IC
x = np.arange(N_SPLITS)
w = 0.35
axes[2].bar(x - w/2, persist_ics, w, label="Persistence", color="#95a5a6", alpha=0.85)
axes[2].bar(x + w/2, ic_vals,     w, label="LightGBM",   color="#3498db", alpha=0.85)
axes[2].axhline(0, color="black", linewidth=0.7)
axes[2].set_xticks(x)
axes[2].set_xticklabels([f"F{i+1}" for i in range(N_SPLITS)])
axes[2].set_title("LightGBM vs persistence IC", fontsize=9)
axes[2].set_ylabel("IC")
axes[2].legend(fontsize=7)

plt.tight_layout()
plt.show()

## §6 — Feature importance (aggregate across folds)

In [6]:
# Average feature importance across all 5 fold models
fi_all = pd.DataFrame({
    f"fold{r['fold']}": r["model"].feature_importance
    for r in fold_results
})
fi_mean = fi_all.mean(axis=1).sort_values(ascending=False)
fi_std  = fi_all.std(axis=1).reindex(fi_mean.index)

fig, axes = plt.subplots(1, 2, figsize=(14, 4))

# Mean importance bar chart
axes[0].barh(fi_mean.index[::-1], fi_mean.values[::-1],
             xerr=fi_std.values[::-1], color="steelblue", alpha=0.85, capsize=3)
axes[0].set_title("Mean feature importance (LightGBM gain, ±1 std across folds)", fontsize=9)
axes[0].set_xlabel("Mean gain importance")

# Per-fold importance heatmap
fi_norm = fi_all.div(fi_all.sum()).reindex(fi_mean.index)  # normalise each fold
im = axes[1].imshow(fi_norm.values, cmap="YlOrRd", aspect="auto")
axes[1].set_xticks(range(N_SPLITS))
axes[1].set_xticklabels([f"F{i+1}" for i in range(N_SPLITS)], fontsize=8)
axes[1].set_yticks(range(len(fi_mean)))
axes[1].set_yticklabels(fi_mean.index, fontsize=7)
axes[1].set_title("Feature importance per fold (normalised)", fontsize=9)
plt.colorbar(im, ax=axes[1])

plt.tight_layout()
plt.show()

print("Mean feature importance (sorted):")
print(fi_mean.to_string(float_format="{:.1f}".format))

Mean feature importance (sorted):
vol_log_chg      185.0
adx              173.6
upper_wick       169.2
bb_width         149.2
hl_range         143.8
bar_ret          140.0
di_diff          125.2
macd_hist_norm   120.4
bb_zscore        108.6
atr_pct          107.2
lower_wick        94.2
rsi               75.8


## §7 — OOS equity curve

In [7]:
# Build equity curve from stitched OOS predictions
# Position: sign(prediction) = +1 / -1 / 0

equity_pieces   = []
equity_pieces_p = []  # persistence

bar_ret_daily = np.log(df["close"] / df["close"].shift(1)).reindex(X.index)

for r, (tr, te) in zip(fold_results, splits):
    model      = r["model"]
    preds      = model.predict(X.iloc[te])
    ret        = bar_ret_daily.iloc[te].values

    # LightGBM: position = sign of prediction, shifted 1 bar (enter next open proxy)
    pos        = np.sign(preds)
    eq_raw     = (1 + np.roll(pos, 1) * ret)
    eq_raw[0]  = 1.0
    equity     = pd.Series(np.cumprod(eq_raw), index=X.index[te])
    equity_pieces.append(equity / equity.iloc[0])

    # Persistence baseline
    pos_p      = np.sign(X.iloc[te]["bar_ret"].values)
    eq_raw_p   = (1 + np.roll(pos_p, 1) * ret)
    eq_raw_p[0] = 1.0
    equity_p   = pd.Series(np.cumprod(eq_raw_p), index=X.index[te])
    equity_pieces_p.append(equity_p / equity_p.iloc[0])


def stitch(pieces):
    out, anchor = [], 1.0
    for p in pieces:
        s = p / p.iloc[0] * anchor
        anchor = float(s.iloc[-1])
        out.append(s)
    return pd.concat(out)


oos_eq  = stitch(equity_pieces)
oos_eq_p = stitch(equity_pieces_p)

# Buy-and-hold over same OOS period
bah = df["close"].reindex(oos_eq.index)
bah = bah / bah.iloc[0]

fig, ax = plt.subplots(figsize=(14, 4))
oos_eq.plot(  ax=ax, label="LightGBM (sign)",   color="#3498db", linewidth=1.5)
oos_eq_p.plot(ax=ax, label="Persistence (sign)", color="#95a5a6", linewidth=1.2, linestyle="--")
bah.plot(     ax=ax, label="Buy-and-Hold",       color="darkorange", linewidth=1.0, linestyle=":")

for _, (tr, te) in zip(fold_results, splits):
    ax.axvline(X.index[te[0]], color="lightgray", linewidth=0.5, linestyle="--")

ax.axhline(1, color="black", linewidth=0.4)
ax.set_title("Stitched OOS equity — LightGBM vs persistence vs buy-and-hold", fontsize=11)
ax.set_ylabel("Equity (normalised)")
ax.legend(fontsize=9)
plt.tight_layout()
plt.show()

from backtesting import compute_metrics
for label_name, eq in [("LightGBM", oos_eq), ("Persistence", oos_eq_p), ("Buy-and-Hold", bah)]:
    m = compute_metrics(eq)
    print(f"{label_name:<14} Return={m['total_return']*100:+.1f}%  "
          f"Sharpe={m['sharpe_ratio']:+.3f}  MaxDD={m['max_drawdown']*100:.1f}%")

LightGBM       Return=-32.4%  Sharpe=-0.075  MaxDD=-76.8%
Persistence    Return=-13.6%  Sharpe=+0.126  MaxDD=-47.3%
Buy-and-Hold   Return=+299.6%  Sharpe=+1.379  MaxDD=-35.4%


## §8 — IC regime analysis: why does fold 3 fail?

In [8]:
# Scatter: predicted vs actual return per fold
fig, axes = plt.subplots(1, N_SPLITS, figsize=(18, 3))
fig.suptitle("Predicted vs actual 1d return — per fold", fontsize=10)

for ax, r, (tr, te) in zip(axes, fold_results, splits):
    model  = r["model"]
    preds  = model.predict(X.iloc[te])
    actual = y_all.iloc[te].values
    regime = "bear" if r["fold"] == 1 else "recovery" if r["fold"] <= 3 else "bull"
    color  = "#e74c3c" if regime == "bear" else "#95a5a6" if regime == "recovery" else "#2ecc71"

    ax.scatter(preds, actual, alpha=0.3, s=10, color=color)
    lim = max(abs(actual).max(), abs(preds).max()) * 1.1
    ax.set_xlim(-lim, lim); ax.set_ylim(-lim, lim)
    ax.axhline(0, color="black", linewidth=0.4)
    ax.axvline(0, color="black", linewidth=0.4)
    ax.plot([-lim, lim], [-lim, lim], color="gray", linewidth=0.5, linestyle=":")
    ax.set_title(f"F{r['fold']} ({regime})\nIC={r['IC']:+.3f}", fontsize=8)
    ax.set_xlabel("pred", fontsize=7)
    if ax == axes[0]: ax.set_ylabel("actual", fontsize=7)

plt.tight_layout()
plt.show()

# What features drove predictions in fold 3 vs fold 2?
print("Feature importance comparison — Fold 2 (recovery, IC=+0.04) vs Fold 3 (recovery→bull, IC=-0.22):")
fi_comp = pd.DataFrame({
    "Fold2_IC+0.04": fold_results[1]["model"].feature_importance,
    "Fold3_IC-0.22": fold_results[2]["model"].feature_importance,
}).sort_values("Fold2_IC+0.04", ascending=False)
print(fi_comp.to_string())

Feature importance comparison — Fold 2 (recovery, IC=+0.04) vs Fold 3 (recovery→bull, IC=-0.22):
                Fold2_IC+0.04  Fold3_IC-0.22
bb_width                  256            242
hl_range                  245            228
vol_log_chg               228            192
bb_zscore                 187             69
upper_wick                174            153
bar_ret                   149             69
adx                       145            178
di_diff                   143            147
macd_hist_norm            128            145
lower_wick                113            121
atr_pct                   104             90
rsi                       102             80


## §9 — All-34-feature model vs 12-feature model

In [9]:
ics_34 = []
for tr, te in splits:
    m = LGBMForecaster()
    m.fit(X_all.iloc[tr], y_all.iloc[tr])
    rho, _ = stats.spearmanr(m.predict(X_all.iloc[te]), y_all.iloc[te])
    ics_34.append(rho)

print("IC comparison — 12 features vs all 34 features:")
print(f"{'Fold':<6} {'IC-12':>8} {'IC-34':>8} {'Better':>10}")
print("-" * 36)
for i, (ic12, ic34) in enumerate(zip(ics, ics_34)):
    better = "12-feat" if abs(ic12) > abs(ic34) else "34-feat"
    print(f"  {i+1:<4} {ic12:>+8.4f} {ic34:>+8.4f} {better:>10}")
print(f"  {'Mean':<4} {np.mean(ics):>+8.4f} {np.mean(ics_34):>+8.4f}")

IC comparison — 12 features vs all 34 features:
Fold      IC-12    IC-34     Better
------------------------------------
  1     -0.0739  -0.0701    12-feat
  2     +0.0443  +0.0454    34-feat
  3     -0.2236  -0.1698    12-feat
  4     +0.0491  +0.0112    12-feat
  5     -0.0390  -0.1209    34-feat
  Mean  -0.0486  -0.0608


## §10 — Conclusions (Finding F8)

### Per-fold OOS results

| Fold | Period | Regime | IC | Hit rate | IC p<0.05? |
|---|---|---|---|---|---|
| 1 | Jul 2022 – Jan 2023 | Bear | −0.0739 | 0.514 | No |
| 2 | Jan 2023 – Jul 2023 | Recovery | +0.0443 | 0.564 | No |
| 3 | Jul 2023 – Jan 2024 | Recovery→Bull | **−0.2236** | 0.413 | **Yes** |
| 4 | Jan 2024 – Jul 2024 | Bull | +0.0491 | 0.486 | No |
| 5 | Jul 2024 – Dec 2024 | Bull | −0.0390 | 0.508 | No |

**Aggregate:** Mean IC = −0.0486 | ICIR = −0.488 | Pooled OOS IC = −0.0173

**Equity (stitched OOS):** LightGBM −32.4% (Sharpe −0.075, MaxDD −76.8%) vs B&H +299.6%

### Finding F8 — LightGBM IC is regime-sensitive and sign-unstable

1. **IC sign flips across regimes.** The model learns mean-reversion features
   (F7: bar_ret, bb_zscore, rsi all have negative IC at 1h) that work in ranging/bear
   markets but invert in strong trends. Fold 3 (recovery→bull, Jul 2023 – Jan 2024)
   shows IC = −0.2236 (statistically significant, p < 0.05): the model has real
   predictive skill, but fires in the wrong direction during the strong bull trend.

2. **Sign instability destroys equity.** The absolute |IC| is non-trivial
   (|mean| = 0.049), but because the sign alternates across regimes, naive
   position sizing loses money (−32.4% vs B&H +299.6%).

3. **Persistence is competitive.** The persistence baseline (predict next = current)
   achieves Mean IC = −0.0672, ICIR = −1.29 — comparable to LightGBM on the
   IC metric alone, but negative in the same direction. LightGBM is learning
   a slightly smoother version of persistence (mean reversion at 1d lag).

4. **Feature importance is stable.** `vol_log_chg` (185), `adx` (173.6),
   `upper_wick` (169.2), `bb_width` (149.2) consistently rank highest
   across folds — consistent with F7 IC analysis.

5. **12 vs 34 features:** comparable OOS IC (−0.049 vs −0.061); the curated
   12-feature set generalises as well as the full set with fewer degrees of
   freedom (better suited to small daily folds of ~90 bars).

### Next steps

- **P-ML3 — Regime detection:** Add a regime classifier (bull/bear/ranging) as a
  meta-feature or as a fold-selection mechanism. In bull regimes, flip the model's
  signal (or use a separate bull-regime model). This directly addresses the IC
  sign-flip observed in F8.
- **P7 — Optuna:** Tune LightGBM hyperparameters per regime using the `optimize_fn`
  hook in `walk_forward()`. May improve hit rate without addressing the sign issue.
- **Alternative framing:** Model volatility (squared returns) rather than direction.
  GARCH clustering (F7) is confirmed; volatility prediction has stable sign and
  is actionable for position sizing.