## Walk-Forward Cross-Validation (Rolling & Expanding)

We evaluate whether the signal is **stable out-of-sample** by repeatedly training on a past window and testing on the next window.

We use two setups:
- **Rolling window**: fixed training length, fixed test length (train window “moves” forward)
- **Expanding window**: training set grows over time, fixed test length

In [1]:
import polars as pl
import numpy as np
from datetime import datetime, timedelta

# Machine Learning Libraries 
import torch
import torch.nn as nn
import torch.optim as optim

# Visualization
import altair as alt

# Project modules (packaged)
from adausdt_qml import research, binance, models

sym = "ADAUSDT"
time_interval = "16h"
max_lags = 4
forecast_horizon = 1

def set_seed(seed: int = 99):
    import random, os
    os.environ["PYTHONHASHSEED"] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(99)

start_date = datetime(2023, 2, 8, 0, 0)
end_date   = datetime(2026, 2, 8, 0, 0)


In [2]:
ts = research.load_ohlc_timeseries_range(sym, time_interval, start_date, end_date)

Loading ADAUSDT: 100%|██████████| 1097/1097 [00:20<00:00, 53.32day/s]


In [3]:
# target
ts = ts.with_columns(
    (pl.col("close") / pl.col("close").shift(forecast_horizon))
    .log()
    .alias("close_log_return")
)

target = "close_log_return"
lr = pl.col(target)

# Create direction column
ts = ts.with_columns(
    (pl.col("close_log_return") > 0).cast(pl.Int8).alias("return_dir")
)

# features
ts = ts.with_columns(
    lr.shift(forecast_horizon * 1).alias(f"{target}_lag_1"),
    lr.shift(forecast_horizon * 2).alias(f"{target}_lag_2"),
    lr.shift(forecast_horizon * 3).alias(f"{target}_lag_3"),
    lr.shift(forecast_horizon * 4).alias(f"{target}_lag_4"),
).drop_nulls()

In [4]:
target = 'close_log_return'
features = [f'{target}_lag_1']

### Step 1 — Convert Polars data into PyTorch tensors

The model is trained in PyTorch, while the dataset is stored as a Polars DataFrame.  
These helpers convert feature columns and the target column into the correct tensor shapes.

In [5]:
# -----------------------
# Helpers (Polars -> Torch)
# -----------------------
def pl_to_torch_X(df: pl.DataFrame, features: list[str]) -> torch.Tensor:
    return torch.tensor(df.select(features).to_numpy(), dtype=torch.float32)

def pl_to_torch_y(df: pl.DataFrame, target: str) -> torch.Tensor:
    return torch.tensor(df.select(target).to_numpy(), dtype=torch.float32).reshape(-1, 1)

### Step 2 — Training loop (same setup as the baseline notebook)

To keep the comparison fair, we train using the **same configuration** used in the original model training notebook:
- Linear model  
- **L1Loss**  
- **Adam** optimizer  
- `lr = 0.0005`  
- `epochs = 5000`  

In [6]:
# -----------------------
# Train 
# -----------------------
def train_model_match_notebook(
    X_train: torch.Tensor,
    y_train: torch.Tensor,
    input_features: int,
    *,
    no_epochs: int = 5000,
    lr: float = 0.0005,
    verbose: bool = False
) -> nn.Module:
    model = nn.Linear(input_features, 1)
    criterion = nn.L1Loss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    model.train()
    for epoch in range(no_epochs):
        y_hat = model(X_train)
        loss = criterion(y_hat, y_train)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if verbose and (epoch + 1) % 500 == 0:
            print(f"Epoch [{epoch+1}/{no_epochs}], Loss: {loss.item():.6f}")

    return model

### Step 3 — Define the out-of-sample metric (EV)

For each test window, we compute:

- `signal = sign(prediction)`
- `trade_return = signal × realized_return`
- **EV = mean(trade_return)**

If EV is consistently positive across folds, it suggests a persistent edge.

In [7]:
# -----------------------
# Eval Profitability (EV)
# -----------------------
def eval_profitability(model: nn.Module, X_test: torch.Tensor, y_test: torch.Tensor) -> float:
    model.eval()
    with torch.no_grad():
        y_hat = model(X_test)

    signal = torch.sign(y_hat)
    trade_log_return = signal * y_test
    return trade_log_return.mean().item()

def eval_model_profitability_pl_match_notebook(
    df_train: pl.DataFrame,
    df_test: pl.DataFrame,
    features: list[str],
    target: str,
    *,
    no_epochs: int = 5000,
    lr: float = 0.0005,
    verbose: bool = False
) -> float:
    X_train = pl_to_torch_X(df_train, features)
    y_train = pl_to_torch_y(df_train, target)
    X_test  = pl_to_torch_X(df_test, features)
    y_test  = pl_to_torch_y(df_test, target)

    model = train_model_match_notebook(
        X_train, y_train,
        input_features=len(features),
        no_epochs=no_epochs,
        lr=lr,
        verbose=verbose
    )

    return eval_profitability(model, X_test, y_test)

### Step 4 — Rolling window validation

We train on a fixed window (e.g., ~1 year) and test on the next fixed window (e.g., ~3 months).  
Then we move forward by `step` and repeat.

In [8]:
# -----------------------
# Rolling Window CV
# -----------------------
def eval_rolling_window_cv_pl(
    df: pl.DataFrame,
    features: list[str],
    target: str,
    train_window: int,
    test_window: int,
    step: int | None = None,
    *,
    no_epochs: int = 5000,
    lr: float = 0.0005,
    verbose: bool = False
) -> pl.DataFrame:
    if step is None:
        step = test_window

    n = len(df)
    rows = []
    fold = 0

    for train_start in range(0, n - train_window - test_window + 1, step):
        train_end = train_start + train_window
        test_start = train_end

        df_train = df.slice(train_start, train_window)
        df_test  = df.slice(test_start,  test_window)

        if len(df_test) < test_window:
            break

        fold += 1

        ev = eval_model_profitability_pl_match_notebook(
            df_train, df_test, features, target,
            no_epochs=no_epochs, lr=lr, verbose=verbose
        )

        rows.append({
            "fold": fold,
            "train_start": df_train["datetime"][0],
            "train_end": df_train["datetime"][-1],
            "test_start": df_test["datetime"][0],
            "test_end": df_test["datetime"][-1],
            "ev": ev,
        })

    return pl.DataFrame(rows)

### Step 5 — Expanding window validation

Here we start with an initial training window (e.g., ~1 year).  
After each fold, the training data **expands**, which mimics a strategy that keeps learning from accumulating history.

In [9]:
# -----------------------
# Expanding Window CV
# -----------------------
def eval_expanding_window_cv_pl(
    df: pl.DataFrame,
    features: list[str],
    target: str,
    start_train: int,
    test_window: int,
    step: int | None = None,
    *,
    no_epochs: int = 5000,
    lr: float = 0.0005,
    verbose: bool = False
) -> pl.DataFrame:
    if step is None:
        step = test_window

    n = len(df)
    rows = []
    fold = 0
    train_end = start_train

    while True:
        test_start = train_end
        test_end = test_start + test_window
        if test_end > n:
            break

        df_train = df.slice(0, train_end)
        df_test  = df.slice(test_start, test_window)

        fold += 1

        ev = eval_model_profitability_pl_match_notebook(
            df_train, df_test, features, target,
            no_epochs=no_epochs, lr=lr, verbose=verbose
        )

        rows.append({
            "fold": fold,
            "train_start": df_train["datetime"][0],
            "train_end": df_train["datetime"][-1],
            "test_start": df_test["datetime"][0],
            "test_end": df_test["datetime"][-1],
            "ev": ev,
        })

        train_end += step

    return pl.DataFrame(rows)

### Step 6 — Run walk-forward validation on 16h bars

We approximate:
- **1 year ≈ 547 bars** (16h bars)
- **3 months ≈ 137 bars**

Then we run:
- Rolling CV (train=1y, test=3m)
- Expanding CV (start train=1y, test=3m)

In [10]:
# -----------------------
# Usage 
# -----------------------
ts = ts.drop_nulls()

target = "close_log_return"
features = [f"{target}_lag_1"]  # change to your chosen set

n = len(ts)
print("n bars:", n)

bars_1y = int(round(365 * 24 / 16))
bars_3m = int(round((365/4) * 24 / 16))
print("~bars_1y:", bars_1y, "| ~bars_3m:", bars_3m)

train_window = bars_1y
test_window  = bars_3m
step = test_window

rw_results = eval_rolling_window_cv_pl(
    ts, features, target,
    train_window=train_window,
    test_window=test_window,
    step=step,
    no_epochs=5000,
    lr=0.0005,
    verbose=False
)

ew_results = eval_expanding_window_cv_pl(
    ts, features, target,
    start_train=bars_1y,
    test_window=bars_3m,
    step=bars_3m,
    no_epochs=5000,
    lr=0.0005,
    verbose=False
)

rw_results, rw_results["ev"].mean(), ew_results, ew_results["ev"].mean()

n bars: 1641
~bars_1y: 548 | ~bars_3m: 137


(shape: (7, 6)
 ┌──────┬──────────────┬──────────────┬─────────────────────┬─────────────────────┬───────────┐
 │ fold ┆ train_start  ┆ train_end    ┆ test_start          ┆ test_end            ┆ ev        │
 │ ---  ┆ ---          ┆ ---          ┆ ---                 ┆ ---                 ┆ ---       │
 │ i64  ┆ datetime[μs] ┆ datetime[μs] ┆ datetime[μs]        ┆ datetime[μs]        ┆ f64       │
 ╞══════╪══════════════╪══════════════╪═════════════════════╪═════════════════════╪═══════════╡
 │ 1    ┆ 2023-02-11   ┆ 2024-02-11   ┆ 2024-02-11 16:00:00 ┆ 2024-05-12 08:00:00 ┆ 0.003741  │
 │      ┆ 08:00:00     ┆ 00:00:00     ┆                     ┆                     ┆           │
 │ 2    ┆ 2023-05-13   ┆ 2024-05-12   ┆ 2024-05-13 00:00:00 ┆ 2024-08-11 16:00:00 ┆ -0.001707 │
 │      ┆ 16:00:00     ┆ 08:00:00     ┆                     ┆                     ┆           │
 │ 3    ┆ 2023-08-13   ┆ 2024-08-11   ┆ 2024-08-12 08:00:00 ┆ 2024-11-11 00:00:00 ┆ -0.002329 │
 │      ┆ 00:00:00     ┆ 

### Walk-Forward Results (AR(1) model)


#### Rolling Window (1y train / 3m test)

- Mean EV ≈ **0.000196**
- 7 folds

The rolling mean EV is positive but very small.  
This suggests a weak directional edge that is not stable across regimes.  
Rolling validation is the most demanding setup, and here the signal shows limited robustness.

#### Expanding Window (start 1y train / 3m test)

- Mean EV ≈ **0.001004**
- 7 folds

The expanding window produces a larger mean EV.  
This indicates the model benefits from accumulating more historical data, but the dispersion across folds remains high.

---

### Interpretation

Although the mean EV is positive in both cases, the magnitude is extremely small relative to the variability observed across folds.

This implies:

- The AR(1) structure captures **some mild serial dependency**
- The effect is **economically weak**
- The signal is **not statistically strong enough** to conclude persistent edge

In short:  
The AR(1) effect exists, but it is too small and unstable to be considered a robust trading signal on its own.