# Walk-Forward Validation Framework

This notebook implements walk-forward validation for the momentum strategy with two modes:

## 1. Rolling Window Walk-Forward
- Fixed-size training window (e.g., 365 days) that rolls forward
- Tests whether the strategy adapts to recent market conditions
- Training window: `[t-365, t]` → Test: `[t, t+90]`
- Next iteration: `[t+90-365, t+90]` → Test: `[t+90, t+180]`

## 2. Expanding Window Walk-Forward
- Growing training window from the start of the dataset
- Tests strategy stability as more data accumulates
- Training window: `[0, t]` → Test: `[t, t+90]`
- Next iteration: `[0, t+90]` → Test: `[t+90, t+180]`

## Process
1. **Training Phase**: Optimize parameters (simple momentum window, volatility window) on training data
2. **Testing Phase**: Apply best parameters to out-of-sample test period
3. **Repeat**: Roll/expand window and continue
4. **Aggregate**: Combine all OOS results for true performance assessment

## Key Metrics
- **In-Sample (IS) Sharpe**: Performance on training data (optimization target)
- **Out-of-Sample (OOS) Sharpe**: Performance on test data (true measure)
- **Sharpe Degradation**: Difference between IS and OOS (overfitting indicator)
- **Parameter Stability**: How often the same parameters are selected

## Configuration
Adjust walk-forward settings in the parameter cell:
- `wf_train_days`: Training window size
- `wf_test_days`: Out-of-sample test period
- `wf_step_days`: How far to roll/expand between iterations
- `wf_mode`: 'rolling' or 'expanding'

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from pathlib import Path

from binance_data_loader import BinanceDataLoader

import plotly.graph_objects as go

In [2]:
data_loader = BinanceDataLoader(
    data_directory="/Users/chinjieheng/Documents/data/binance_dailydata",
    min_records=60,
    min_volume=1e5,
    start_date="2022-09-01",
    end_date=None
)

Loading Binance data from /Users/chinjieheng/Documents/data/binance_dailydata (timeframe=1d)...
Found 594 USDT trading pairs
Using a 30-bar rolling window for 30d volume checks
✓ BTCUSDT loaded successfully with 1174 records, avg volume: 14,911,837,860
Loaded 539 cryptocurrencies
Filtered 53 cryptocurrencies (insufficient data/volume)
Precomputing returns matrix (FAST numpy version)...
Building returns matrix for 539 tickers over 1174 dates...
Precomputed returns matrix shape: (1174, 539)
Date range: 2022-09-01 00:00:00 to 2025-11-17 00:00:00


In [3]:
price = data_loader.get_price_matrix()
#open_price = data_loader.get_open_price_matrix()
price

Unnamed: 0,0GUSDT,1000000BOBUSDT,1000000MOGUSDT,1000BONKUSDT,1000CATUSDT,1000CHEEMSUSDT,1000FLOKIUSDT,1000LUNCUSDT,1000PEPEUSDT,1000RATSUSDT,...,ZEREBROUSDT,ZETAUSDT,ZILUSDT,ZKCUSDT,ZKJUSDT,ZKUSDT,ZORAUSDT,ZRCUSDT,ZROUSDT,ZRXUSDT
2022-09-01,,,,,,,,,,,...,,,0.03642,,,,,,,0.2970
2022-09-02,,,,,,,,,,,...,,,0.03573,,,,,,,0.2923
2022-09-03,,,,,,,,,,,...,,,0.03566,,,,,,,0.2926
2022-09-04,,,,,,,,,,,...,,,0.03654,,,,,,,0.2992
2022-09-05,,,,,,,,,,,...,,,0.03630,,,,,,,0.2986
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-11-13,1.3034,0.04186,0.4013,0.011585,0.003930,0.001189,0.05642,0.03294,0.005422,0.04025,...,0.03477,0.0999,0.00698,0.1729,0.05737,0.04701,0.05889,0.01283,1.5399,0.1897
2025-11-14,1.2148,0.03758,0.3649,0.011014,0.003732,0.001214,0.05424,0.03214,0.004905,0.04265,...,0.03280,0.0956,0.00687,0.1613,0.05525,0.05161,0.05498,0.01211,1.4528,0.1773
2025-11-15,1.3547,0.03482,0.3719,0.011048,0.003802,0.001224,0.05507,0.03250,0.004982,0.04239,...,0.03118,0.0971,0.00692,0.1618,0.05737,0.05110,0.05197,0.01244,1.4732,0.1828
2025-11-16,1.2390,0.03209,0.3570,0.010591,0.003695,0.001229,0.05358,0.03162,0.004866,0.04156,...,0.02772,0.0986,0.00677,0.1575,0.05552,0.05189,0.05088,0.01212,1.4307,0.1777


In [4]:
# Information Discreteness (-ID) calculation helper
# Returns -ID values so that discrete uptrends produce positive numbers
def compute_id_matrix(price_df, Wd):
    lagged_price = price_df.shift(1)
    p_base = lagged_price.shift(Wd)
    p_recent = lagged_price
    pret = p_recent / p_base - 1.0
    pret_sign = np.sign(pret)

    daily_rets = lagged_price.pct_change(fill_method=None)
    neg_indicator = (daily_rets < 0).astype(float)
    pos_indicator = (daily_rets > 0).astype(float)
    zero_indicator = (daily_rets == 0).astype(float)

    neg_count = neg_indicator.rolling(window=Wd, min_periods=1).sum()
    pos_count = pos_indicator.rolling(window=Wd, min_periods=1).sum()
    zero_count = zero_indicator.rolling(window=Wd, min_periods=1).sum()

    non_zero_count = Wd - zero_count
    pct_neg = neg_count / non_zero_count.replace(0, np.nan)
    pct_pos = pos_count / non_zero_count.replace(0, np.nan)

    pct_neg = pct_neg.fillna(0)
    pct_pos = pct_pos.fillna(0)

    id_matrix = -pret_sign * (pct_neg - pct_pos)
    id_matrix = id_matrix.where(~pret.isna(), np.nan)
    return id_matrix

print('compute_id_matrix ready (returns -ID values for filtering)')


compute_id_matrix ready (returns -ID values for filtering)


In [5]:
# Walk-Forward Validation Configuration

# Parameter grids for optimization
simple_momentum_windows = [7,10,12,14]
vol_grid = [15,20,25,30,35,40,45]  # volume lookback window in days
# ID window is forced to match the simple momentum window (id_window = simple_window)
long_id_threshold_grid = [-0.1,0.0,0.1]
short_id_threshold_grid = [-0.6,-0.4]

# Strategy parameters
volume_percentile_grid = [0.2]  # universe selection by volume
momentum_percentile_grid = [0.1]  # top/bottom momentum fraction per side
volume_percentile = 0.2      # default universe selection by volume
momentum_percentile = 0.1    # default top/bottom momentum fraction per side
use_btc_filter = True
max_positions_per_side = 10
max_position_cap = 0.3

# Transaction costs
tc_bps = 5  # transaction cost in basis points

# Walk-Forward Configuration
wf_train_days = 270      # Training window size (1 year)
wf_test_days = 90        # Out-of-sample test period (3 months)
wf_step_days = 90        # Step size for rolling window (3 months)
wf_mode = 'rolling'      # 'rolling' or 'expanding'
score_mode = 'composite'  # options: 'sharpe', 'sortino', 'calmar', 'sharpe_sortino', 'sharpe_calmar', 'composite'

# Minimum data requirement
min_warmup_days = 120    # Minimum days needed before first training window

# Get volume data for universe selection
volume_data = {}
for ticker in data_loader.get_universe():
    ticker_data = data_loader._crypto_universe[ticker]['data']
    volume_data[ticker] = ticker_data['volume'].reindex(price.index)

volume_matrix = pd.DataFrame(volume_data, index=price.index)
rolling_volume_matrix = volume_matrix.rolling(window=20, min_periods=10).mean()

# BTC filter - 90-day return
btc_90d_return = price['BTCUSDT'].pct_change(90,fill_method=None)

# Returns matrix for volatility calculation
returns_matrix = price.pct_change(fill_method=None)

print(f"Walk-Forward Mode: {wf_mode.upper()}")
print(f"Training Window: {wf_train_days} days")
print(f"Test Window: {wf_test_days} days")
print(f"Step Size: {wf_step_days} days")
print(f"Total Data Range: {price.index[0]} to {price.index[-1]} ({len(price)} days)")


Walk-Forward Mode: ROLLING
Training Window: 270 days
Test Window: 90 days
Step Size: 90 days
Total Data Range: 2022-09-01 00:00:00 to 2025-11-17 00:00:00 (1174 days)


# Walk-Forward Validation Framework

This notebook implements walk-forward validation for the momentum strategy with two modes:
1. **Rolling Window**: Fixed-size training window that rolls forward
2. **Expanding Window**: Growing training window from the start

The process:
- Split data into overlapping train/test periods
- Optimize parameters on training data
- Test best parameters on out-of-sample data
- Aggregate OOS results to assess true performance

In [6]:

# Precompute matrices once for all parameter combos

def build_precomputed_matrices(price_df, simple_windows, vol_windows, rolling_volume_df, btc_90d_return, min_hist_days=30):
    price_np = price_df.to_numpy(dtype=float)
    dates = price_df.index.to_numpy()
    tickers = list(price_df.columns)
    forward_returns = (price_np[1:] / price_np[:-1]) - 1.0

    valid_counts = price_df.notna().rolling(window=min_hist_days, min_periods=1).sum().to_numpy()

    simple_rets = {}
    lagged_price = price_df.shift(1)
    for w in simple_windows:
        simple_rets[w] = lagged_price.pct_change(w, fill_method=None).to_numpy()

    id_matrix_map = {}
    for w in simple_windows:
        id_matrix_map[w] = compute_id_matrix(price_df, w).to_numpy()

    returns_matrix_local = price_df.pct_change(fill_method=None)
    vol_matrix_map = {}
    for w in vol_windows:
        vol_matrix_map[w] = returns_matrix_local.rolling(window=w).std().to_numpy()

    rolling_volume_np = rolling_volume_df.to_numpy(dtype=float)
    btc_ret_arr = btc_90d_return.to_numpy(dtype=float) if btc_90d_return is not None else None

    return {
        'price_np': price_np,
        'dates': dates,
        'tickers': tickers,
        'forward_returns': forward_returns,
        'valid_roll_counts': valid_counts,
        'simple_rets': simple_rets,
        'id_matrix': id_matrix_map,
        'vol_matrix': vol_matrix_map,
        'rolling_volume_np': rolling_volume_np,
        'btc_ret_arr': btc_ret_arr,
    }

print('build_precomputed_matrices ready (shared caches for backtest loops)')


build_precomputed_matrices ready (shared caches for backtest loops)


In [7]:

# Core backtest function for a single parameter set on a given date range (numpy-based)
def compute_cross_sectional_ic(signals, forward_returns):
    """Spearman IC between model signals and next-day returns for selected assets."""
    if len(signals) < 2 or len(forward_returns) < 2:
        return np.nan
    s = pd.Series(signals)
    r = pd.Series(forward_returns)
    if s.nunique() < 2 or r.nunique() < 2:
        return np.nan
    return s.rank().corr(r.rank(), method='pearson')


def run_backtest(price_data, simple_window, vol_window, id_window,
                 long_id_threshold, short_id_threshold,
                 start_idx=0, end_idx=None, collect_daily_ic=False, return_position_log=False, volume_pct=None, momentum_pct=None,
                 precomputed=None):
    """Run backtest with momentum ranking plus (-ID) filtering using precomputed matrices."""
    if end_idx is None:
        end_idx = len(price_data) - 1

    vol_pct = volume_pct if volume_pct is not None else volume_percentile
    mom_pct = momentum_pct if momentum_pct is not None else momentum_percentile

    cache = precomputed or {}
    price_np = cache.get('price_np')
    dates = cache.get('dates')
    tickers = cache.get('tickers')
    forward_returns = cache.get('forward_returns')
    valid_roll_counts = cache.get('valid_roll_counts')
    rolling_volume_np = cache.get('rolling_volume_np')
    simple_rets_map = cache.get('simple_rets') or {}
    id_map = cache.get('id_matrix') or {}
    vol_map = cache.get('vol_matrix') or {}
    btc_ret_arr = cache.get('btc_ret_arr') if use_btc_filter else None

    min_hist_days = 30

    if price_np is None:
        price_np = price_data.to_numpy(dtype=float)
        dates = price_data.index.to_numpy()
        tickers = list(price_data.columns)
        forward_returns = (price_np[1:] / price_np[:-1]) - 1.0
        valid_roll_counts = price_data.notna().rolling(window=min_hist_days, min_periods=1).sum().to_numpy()
        rolling_volume_np = rolling_volume_matrix.to_numpy(dtype=float)
        simple_rets_map = {simple_window: price_data.shift(1).pct_change(simple_window, fill_method=None).to_numpy()}
        id_map = {id_window: compute_id_matrix(price_data, id_window).to_numpy()}
        vol_map = {vol_window: returns_matrix.rolling(window=vol_window).std().to_numpy()}
        btc_ret_arr = btc_90d_return.to_numpy(dtype=float) if use_btc_filter and btc_90d_return is not None else None

    simple_rets = simple_rets_map.get(simple_window)
    if simple_rets is None:
        simple_rets = price_data.shift(1).pct_change(simple_window, fill_method=None).to_numpy()
    id_matrix_local = id_map.get(id_window)
    if id_matrix_local is None:
        id_matrix_local = compute_id_matrix(price_data, id_window).to_numpy()
    vol_matrix_local = vol_map.get(vol_window)
    if vol_matrix_local is None:
        vol_matrix_local = returns_matrix.rolling(window=vol_window).std().to_numpy()

    n_dates, n_assets = price_np.shape
    if end_idx is None or end_idx >= n_dates:
        end_idx = n_dates - 1

    max_loop_end = min(end_idx, n_dates - 1)

    equity_path = []
    return_path = []
    turnover_path = []
    date_path = []
    ic_values = []
    ic_dates = []
    position_records = []

    prev_weights = np.zeros(n_assets, dtype=float)
    seeded = False
    equity = 1.0
    min_weight = 0.05
    eps = 1e-8

    price_pair_mask = np.isfinite(price_np[:-1]) & np.isfinite(price_np[1:])

    def record_ic(date, value=np.nan):
        if collect_daily_ic:
            ic_dates.append(date)
            ic_values.append(value)

    for i in range(start_idx, max_loop_end):
        next_i = i + 1
        simple_row = simple_rets[i]

        if np.isnan(simple_row).all():
            if seeded:
                equity_path.append(equity)
                return_path.append(0.0)
                turnover_path.append(0.0)
                date_path.append(dates[next_i])
                record_ic(dates[next_i])
            continue

        if not seeded:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            seeded = True
            continue

        if use_btc_filter and btc_ret_arr is not None:
            btc_val = btc_ret_arr[i] if i < len(btc_ret_arr) else np.nan
            if np.isfinite(btc_val) and btc_val < 0:
                prev_weights = np.zeros_like(prev_weights)
                equity_path.append(equity)
                return_path.append(0.0)
                turnover_path.append(0.0)
                date_path.append(dates[next_i])
                record_ic(dates[next_i])
                continue

        if (i - start_idx + 1) < min_hist_days:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        vol_row = vol_matrix_local[i]
        volume_row = rolling_volume_np[i] if rolling_volume_np is not None else np.full(n_assets, np.nan)

        hist_ok = valid_roll_counts[i] >= min_hist_days
        base_mask = hist_ok & price_pair_mask[i] & np.isfinite(vol_row)

        vol_valid_idx = np.where(np.isfinite(volume_row))[0]
        if vol_valid_idx.size == 0:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        n_universe = max(1, int(vol_valid_idx.size * vol_pct))
        top_candidates = np.argpartition(-volume_row[vol_valid_idx], max(n_universe - 1, 0))[:n_universe]
        top_idx = vol_valid_idx[top_candidates]

        avail_mask = np.zeros(n_assets, dtype=bool)
        avail_mask[top_idx] = True
        avail_mask &= base_mask

        available_indices = np.nonzero(avail_mask)[0]
        if available_indices.size == 0:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        simple_vals = simple_row[available_indices]
        simple_valid_mask = np.isfinite(simple_vals)
        if not simple_valid_mask.any():
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        available_indices = available_indices[simple_valid_mask]
        simple_vals = simple_vals[simple_valid_mask]

        k = max(1, int(simple_vals.size * mom_pct))

        id_vals = id_matrix_local[i, available_indices]

        order_long = np.argsort(-simple_vals)
        order_short = np.argsort(simple_vals)

        long_mask = id_vals >= long_id_threshold
        short_mask = id_vals >= short_id_threshold

        long_ordered = order_long[long_mask[order_long]]
        short_ordered = order_short[short_mask[order_short]]

        long_idx = available_indices[long_ordered[:k]]
        short_idx = available_indices[short_ordered[:k]]

        if long_idx.size > max_positions_per_side:
            long_idx = long_idx[:max_positions_per_side]
        if short_idx.size > max_positions_per_side:
            short_idx = short_idx[:max_positions_per_side]

        if long_idx.size == 0 or short_idx.size == 0:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        long_vols = vol_row[long_idx]
        short_vols = vol_row[short_idx]

        if np.isnan(long_vols).any() or np.isnan(short_vols).any():
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        lw = 1.0 / long_vols
        sw = 1.0 / short_vols
        lw = np.nan_to_num(lw, nan=eps, posinf=eps, neginf=eps)
        sw = np.nan_to_num(sw, nan=eps, posinf=eps, neginf=eps)

        if lw.sum() <= 0 or sw.sum() <= 0:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        lw = lw / lw.sum()
        sw = sw / sw.sum()

        lw = np.minimum(lw, max_position_cap)
        sw = np.minimum(sw, max_position_cap)

        long_keep = lw >= min_weight
        short_keep = sw >= min_weight

        lw = lw[long_keep]
        sw = sw[short_keep]
        long_idx = long_idx[long_keep]
        short_idx = short_idx[short_keep]

        if lw.size == 0 or sw.size == 0:
            equity_path.append(equity)
            return_path.append(0.0)
            turnover_path.append(0.0)
            date_path.append(dates[next_i])
            record_ic(dates[next_i])
            continue

        lw = lw / lw.sum()
        sw = sw / sw.sum()

        lw *= 0.5
        sw *= 0.5

        current_weights = np.zeros(n_assets, dtype=float)
        current_weights[long_idx] = lw
        current_weights[short_idx] = -sw

        turnover = np.abs(current_weights - prev_weights).sum()

        fwd = forward_returns[i]
        daily_ret = np.nansum(current_weights * fwd)
        daily_ret -= turnover * (tc_bps / 10000.0)

        equity *= (1.0 + daily_ret)

        equity_path.append(equity)
        return_path.append(daily_ret)
        turnover_path.append(turnover)
        date_path.append(dates[next_i])

        if collect_daily_ic:
            traded_idx = np.concatenate([long_idx, short_idx])
            if traded_idx.size > 1:
                ic_val = compute_cross_sectional_ic(simple_rets[i, traded_idx], fwd[traded_idx])
            else:
                ic_val = np.nan
            record_ic(dates[next_i], ic_val)

        if return_position_log and (long_idx.size > 0 or short_idx.size > 0):
            position_records.append({
                'date': dates[i],
                'long_tickers': '|'.join([tickers[j] for j in long_idx]),
                'short_tickers': '|'.join([tickers[j] for j in short_idx]),
                'long_allocations': '|'.join([f"{tickers[j]}:{lw[k]:.6f}" for k, j in enumerate(long_idx)]),
                'short_allocations': '|'.join([f"{tickers[j]}:-{sw[k]:.6f}" for k, j in enumerate(short_idx)]),
                'long_positions': int(long_idx.size),
                'short_positions': int(short_idx.size),
                'total_long_exposure': float(lw.sum()),
                'total_short_exposure': float(sw.sum()),
                'turnover': float(turnover),
                'daily_return': float(daily_ret),
            })

        prev_weights = current_weights

    eq_s = pd.Series(equity_path, index=pd.to_datetime(date_path), name='equity') if equity_path else pd.Series(dtype=float)
    ret_s = pd.Series(return_path, index=pd.to_datetime(date_path), name='return') if return_path else pd.Series(dtype=float)
    turnover_s = pd.Series(turnover_path, index=pd.to_datetime(date_path), name='turnover') if turnover_path else pd.Series(dtype=float)
    ic_s = pd.Series(ic_values, index=pd.to_datetime(ic_dates), name='ic') if collect_daily_ic and ic_values else pd.Series(dtype=float)

    position_log = pd.DataFrame(position_records) if return_position_log else None
    if return_position_log and position_log is not None and position_log.empty:
        position_log = pd.DataFrame(columns=[
            'date', 'long_tickers', 'short_tickers', 'long_allocations', 'short_allocations',
            'long_positions', 'short_positions', 'total_long_exposure', 'total_short_exposure', 'turnover', 'daily_return'
        ])

    if collect_daily_ic and return_position_log:
        return eq_s, ret_s, turnover_s, ic_s, position_log
    if return_position_log:
        return eq_s, ret_s, turnover_s, position_log
    if collect_daily_ic:
        return eq_s, ret_s, turnover_s, ic_s
    return eq_s, ret_s, turnover_s


def compute_sharpe(ret_series):
    """Compute annualized Sharpe ratio from daily returns."""
    if ret_series.empty or ret_series.size < 2:
        return np.nan
    std = ret_series.std(ddof=1)
    if std < 1e-10 or not np.isfinite(std):
        return np.nan
    return (ret_series.mean() / std) * np.sqrt(365)


def compute_information_ratio(ic_series, annualization=365):
    """Annualized information ratio from daily IC values."""
    if ic_series is None or ic_series.empty:
        return np.nan
    clean = ic_series.dropna()
    if clean.empty:
        return np.nan
    std = clean.std(ddof=1)
    if std < 1e-10 or not np.isfinite(std):
        return np.nan
    return (clean.mean() / std) * np.sqrt(annualization)


print("Backtest function defined with (-ID) filtering and optional IC capture (numpy version).")


Backtest function defined with (-ID) filtering and optional IC capture (numpy version).


In [8]:
# Composite Score Functions: Sortino + Sharpe + Calmar
# Weights: 0.4 * Sortino + 0.3 * Sharpe + 0.3 * Calmar

def compute_sortino_ratio(ret_series, risk_free_rate=0.0):
    """
    Compute annualized Sortino ratio (focuses on downside risk)
    
    Args:
        ret_series: Daily returns series
        risk_free_rate: Annual risk-free rate (default 0)
    
    Returns:
        Annualized Sortino ratio
    """
    if ret_series.empty or ret_series.size < 2:
        return np.nan
    
    # Calculate excess returns (returns above risk-free rate)
    daily_rf = risk_free_rate / 365
    excess_returns = ret_series - daily_rf
    
    # Downside returns (only negative excess returns)
    downside_returns = excess_returns[excess_returns < 0]
    
    if len(downside_returns) == 0:
        # No downside risk - return high value
        return 100.0 if ret_series.mean() > daily_rf else np.nan
    
    downside_std = downside_returns.std(ddof=1)
    
    # Check for zero or near-zero std (constant downside returns)
    if downside_std < 1e-10 or not np.isfinite(downside_std):
        return np.nan
    
    return (ret_series.mean() / downside_std) * np.sqrt(365)


def compute_calmar_ratio(ret_series, equity_series=None, min_periods=30):
    """
    Compute Calmar ratio (return / max drawdown)
    
    Args:
        ret_series: Daily returns series
        equity_series: Optional pre-computed equity curve (if None, builds from returns)
        min_periods: Minimum days needed to compute ratio
    
    Returns:
        Calmar ratio (annualized)
    """
    if ret_series.empty or ret_series.size < min_periods:
        return np.nan
    
    # Build equity curve if not provided
    if equity_series is None:
        equity_series = (1 + ret_series).cumprod()
    
    # Calculate drawdown
    running_max = equity_series.expanding().max()
    drawdown = (equity_series - running_max) / running_max
    max_drawdown = abs(drawdown.min())
    
    # Annualized return
    annualized_return = ret_series.mean() * 365
    
    # Handle edge cases
    if max_drawdown == 0:
        # No drawdown - return very high value if positive returns, nan otherwise
        return 100.0 if annualized_return > 0 else np.nan
    
    if not np.isfinite(max_drawdown) or not np.isfinite(annualized_return):
        return np.nan
    
    return annualized_return / max_drawdown


def compute_composite_score(ret_series, equity_series=None, 
                            w_sortino=0.4, w_sharpe=0.3, w_calmar=0.3):
    """
    Compute composite score: weighted combination of Sortino, Sharpe, and Calmar ratios
    
    Args:
        ret_series: Daily returns series
        equity_series: Optional pre-computed equity curve
        w_sortino: Weight for Sortino ratio (default 0.4)
        w_sharpe: Weight for Sharpe ratio (default 0.3)
        w_calmar: Weight for Calmar ratio (default 0.3)
    
    Returns:
        Composite score (higher is better)
    """
    if ret_series.empty or ret_series.size < 2:
        return np.nan
    
    sortino = compute_sortino_ratio(ret_series)
    sharpe = compute_sharpe(ret_series)
    calmar = compute_calmar_ratio(ret_series, equity_series)
    
    # Handle NaN values - if any component is NaN, return NaN
    if np.isnan(sortino) or np.isnan(sharpe) or np.isnan(calmar):
        return np.nan
    
    # Normalize weights to sum to 1.0
    total_weight = w_sortino + w_sharpe + w_calmar
    w_sortino /= total_weight
    w_sharpe /= total_weight
    w_calmar /= total_weight
    
    composite = w_sortino * sortino + w_sharpe * sharpe + w_calmar * calmar
    
    return composite


print('Risk-adjusted metrics functions ready')
print(f'  - Sortino ratio: downside deviation penalty (uses excess returns)')
print(f'  - Calmar ratio: return/max_drawdown')
print(f'  - Composite score: 40% Sortino + 30% Sharpe + 30% Calmar')


# Unified scoring helper based on selected mode
def select_score(ret_series, equity_series=None, mode='composite'):
    mode = (mode or 'composite').lower()
    sharpe = compute_sharpe(ret_series)
    sortino = compute_sortino_ratio(ret_series)
    calmar = compute_calmar_ratio(ret_series, equity_series)

    if mode == 'sharpe':
        return sharpe
    if mode == 'sortino':
        return sortino
    if mode == 'calmar':
        return calmar
    if mode == 'sharpe_sortino':
        if np.isnan(sharpe) or np.isnan(sortino):
            return np.nan
        return 0.5 * sharpe + 0.5 * sortino
    if mode == 'sharpe_calmar':
        if np.isnan(sharpe) or np.isnan(calmar):
            return np.nan
        return 0.5 * sharpe + 0.5 * calmar
    if mode == 'composite':
        return compute_composite_score(ret_series, equity_series)
    raise ValueError(f"Unsupported score_mode: {mode}")


Risk-adjusted metrics functions ready
  - Sortino ratio: downside deviation penalty (uses excess returns)
  - Calmar ratio: return/max_drawdown
  - Composite score: 40% Sortino + 30% Sharpe + 30% Calmar


In [9]:
# TEST: Validate composite scoring functions

print("="*80)
print("TESTING COMPOSITE SCORING FUNCTIONS")
print("="*80)

# Test Case 1: Simple upward trend with no drawdown
test_returns_1 = pd.Series([0.01] * 100)  # Constant 1% daily returns
test_equity_1 = (1 + test_returns_1).cumprod()

sharpe_1 = compute_sharpe(test_returns_1)
sortino_1 = compute_sortino_ratio(test_returns_1)
calmar_1 = compute_calmar_ratio(test_returns_1, test_equity_1)
composite_1 = compute_composite_score(test_returns_1, test_equity_1)

print("\nTest 1: Constant positive returns (1% daily, no drawdown)")
print(f"  Sharpe:    {sharpe_1:.2f}")
print(f"  Sortino:   {sortino_1:.2f} (should be very high - no downside)")
print(f"  Calmar:    {calmar_1:.2f} (should be very high - no drawdown)")
print(f"  Composite: {composite_1:.2f}")

# Test Case 2: Volatile returns with drawdowns
np.random.seed(42)
test_returns_2 = pd.Series(np.random.normal(0.001, 0.02, 200))  # Mean 0.1%, std 2%
test_equity_2 = (1 + test_returns_2).cumprod()

sharpe_2 = compute_sharpe(test_returns_2)
sortino_2 = compute_sortino_ratio(test_returns_2)
calmar_2 = compute_calmar_ratio(test_returns_2, test_equity_2)
composite_2 = compute_composite_score(test_returns_2, test_equity_2)

print("\nTest 2: Volatile returns (mean 0.1%, std 2%)")
print(f"  Sharpe:    {sharpe_2:.2f}")
print(f"  Sortino:   {sortino_2:.2f} (should be higher than Sharpe)")
print(f"  Calmar:    {calmar_2:.2f}")
print(f"  Composite: {composite_2:.2f}")

# Test Case 3: High returns but large drawdown
test_returns_3 = pd.Series([0.02] * 50 + [-0.05] * 10 + [0.02] * 50)  # Big drawdown in middle
test_equity_3 = (1 + test_returns_3).cumprod()

sharpe_3 = compute_sharpe(test_returns_3)
sortino_3 = compute_sortino_ratio(test_returns_3)
calmar_3 = compute_calmar_ratio(test_returns_3, test_equity_3)
composite_3 = compute_composite_score(test_returns_3, test_equity_3)

print("\nTest 3: High returns with large drawdown")
print(f"  Sharpe:    {sharpe_3:.2f}")
print(f"  Sortino:   {sortino_3:.2f} (penalized for negative excess returns)")
print(f"  Calmar:    {calmar_3:.2f} (penalized for drawdown)")
print(f"  Composite: {composite_3:.2f} (lower than Test 1)")

# Test Case 4: Use actual strategy returns if available
if 'combined_returns_series' in dir() and combined_returns_series is not None and not combined_returns_series.empty:
    print("\nTest 4: Actual strategy OOS returns")
    sharpe_actual = compute_sharpe(combined_returns_series)
    sortino_actual = compute_sortino_ratio(combined_returns_series)
    calmar_actual = compute_calmar_ratio(combined_returns_series, combined_equity_series)
    composite_actual = compute_composite_score(combined_returns_series, combined_equity_series)
    
    print(f"  Sharpe:    {sharpe_actual:.2f}")
    print(f"  Sortino:   {sortino_actual:.2f}")
    print(f"  Calmar:    {calmar_actual:.2f}")
    print(f"  Composite: {composite_actual:.2f}")
    
    # Breakdown of composite
    print(f"\n  Composite breakdown:")
    print(f"    40% × Sortino ({sortino_actual:.2f}) = {0.4 * sortino_actual:.2f}")
    print(f"    30% × Sharpe  ({sharpe_actual:.2f}) = {0.3 * sharpe_actual:.2f}")
    print(f"    30% × Calmar  ({calmar_actual:.2f}) = {0.3 * calmar_actual:.2f}")
    print(f"    Total composite score: {composite_actual:.2f}")

print("\n" + "="*80)
print("✅ All tests completed successfully!")
print("="*80)

TESTING COMPOSITE SCORING FUNCTIONS

Test 1: Constant positive returns (1% daily, no drawdown)
  Sharpe:    nan
  Sortino:   100.00 (should be very high - no downside)
  Calmar:    100.00 (should be very high - no drawdown)
  Composite: nan

Test 2: Volatile returns (mean 0.1%, std 2%)
  Sharpe:    0.19
  Sortino:   0.33 (should be higher than Sharpe)
  Calmar:    0.26
  Composite: 0.27

Test 3: High returns with large drawdown
  Sharpe:    12.89
  Sortino:   nan (penalized for negative excess returns)
  Calmar:    12.40 (penalized for drawdown)
  Composite: nan (lower than Test 1)

✅ All tests completed successfully!


In [10]:

# Build precomputed matrices once (reuse across all parameter combinations)
precomputed_cache = build_precomputed_matrices(
    price,
    simple_momentum_windows,
    vol_grid,
    rolling_volume_matrix,
    btc_90d_return,
    min_hist_days=30,
)
print({
    'price_shape': precomputed_cache['price_np'].shape,
    'forward_returns_shape': precomputed_cache['forward_returns'].shape,
    'simple_windows': sorted(precomputed_cache['simple_rets'].keys()),
    'vol_windows': sorted(precomputed_cache['vol_matrix'].keys()),
})


{'price_shape': (1174, 539), 'forward_returns_shape': (1173, 539), 'simple_windows': [7, 10, 12, 14], 'vol_windows': [15, 20, 25, 30, 35, 40, 45]}


In [11]:
# Walk-Forward Validation Implementation with Parallel Processing

from joblib import Parallel, delayed


def evaluate_single_param(simple_window, vol_window, id_window,
                          long_id_threshold, short_id_threshold,
                          volume_pct, momentum_pct,
                          price_data, train_start_idx, train_end_idx):
    """Evaluate a single parameter combination using configurable score."""
    train_eq, train_ret, _ = run_backtest(
        price_data,
        simple_window,
        vol_window,
        id_window,
        long_id_threshold,
        short_id_threshold,
        volume_pct=volume_pct,
        momentum_pct=momentum_pct,
        start_idx=train_start_idx,
        end_idx=train_end_idx,
        precomputed=precomputed_cache,
    )

    score = select_score(train_ret, train_eq, score_mode)
    sharpe = compute_sharpe(train_ret)
    composite = compute_composite_score(train_ret, train_eq)
    sortino = compute_sortino_ratio(train_ret)
    calmar = compute_calmar_ratio(train_ret, train_eq)
    return (simple_window, vol_window, id_window, long_id_threshold, short_id_threshold, volume_pct, momentum_pct, score, sharpe, composite, sortino, calmar)



# Generate walk-forward windows
all_dates = price.index
total_days = len(all_dates)

start_offset = wf_train_days
if start_offset >= total_days:
    raise ValueError(f"Not enough data: need {start_offset} days, have {total_days}")

wf_results = []
wf_oos_equity = []
wf_oos_returns = []
wf_oos_turnover = []
wf_oos_ic = []
wf_oos_positions = []

iteration = 0
current_train_end_idx = start_offset

print(f"Starting walk-forward validation ({wf_mode} mode)...")
print(f"Total iterations expected (approx): ~{max(1, (total_days - start_offset + wf_step_days - 1) // wf_step_days)}")

while current_train_end_idx < total_days:
    iteration += 1

    if wf_mode == 'rolling':
        train_start_idx = current_train_end_idx - wf_train_days
    else:
        train_start_idx = 0

    test_start_idx = current_train_end_idx
    test_end_idx = min(current_train_end_idx + wf_test_days, total_days)

    if (test_end_idx - test_start_idx) < 2:
        print("Stopping: final window too short to compute returns")
        break

    train_start_date = all_dates[train_start_idx]
    train_end_date = all_dates[current_train_end_idx - 1]
    test_start_date = all_dates[test_start_idx]
    test_end_date = all_dates[test_end_idx - 1]

    print(f"Iteration {iteration}:")
    print(f"  Training: {train_start_date.date()} to {train_end_date.date()} ({current_train_end_idx - train_start_idx} days)")
    print(f"  Testing:  {test_start_date.date()} to {test_end_date.date()} ({test_end_idx - test_start_idx} days)")

    results = Parallel(n_jobs=-1, verbose=0)(
        delayed(evaluate_single_param)(sw, vw, sw, lt, st, vp, mp, price, train_start_idx, current_train_end_idx)
        for vp in volume_percentile_grid
        for mp in momentum_percentile_grid
        for sw in simple_momentum_windows
        for vw in vol_grid
        for lt in long_id_threshold_grid
        for st in short_id_threshold_grid
    )

    best_score = -np.inf
    best_composite = -np.inf
    best_sharpe = np.nan
    best_params = None

    for sw, vw, iw, lt, st, vp, mp, score, sharpe, composite, sortino, calmar in results:
        if not np.isnan(score) and score > best_score:
            best_score = score
            best_composite = composite
            best_sharpe = sharpe
            best_params = (sw, vw, iw, lt, st, vp, mp)

    if best_params is None:
        print("  WARNING: No valid parameters found in training period")
        current_train_end_idx += wf_step_days
        continue

    print(f"  Best params: simple={best_params[0]}, vol={best_params[1]}, id_window={best_params[2]}, long-ID≥{best_params[3]}, short-ID≥{best_params[4]}, vol%={best_params[5]*100:.1f}%, mom%={best_params[6]*100:.1f}%")
    print(f"    IS Score ({score_mode}): {best_score:.2f}, IS Sharpe={best_sharpe:.2f}, IS Composite={best_composite:.2f}")

    full_eq, full_ret, full_turnover, full_ic, full_position_log = run_backtest(
        price,
        best_params[0],
        best_params[1],
        best_params[2],
        best_params[3],
        best_params[4],
        start_idx=0,
        end_idx=test_end_idx,
        collect_daily_ic=True,
        return_position_log=True,
        volume_pct=best_params[5],
        momentum_pct=best_params[6],
        precomputed=precomputed_cache,
    )

    test_period_dates = price.index[test_start_idx:test_end_idx]
    oos_ret = full_ret.reindex(test_period_dates).dropna()
    oos_turnover = full_turnover.reindex(test_period_dates).dropna()
    oos_ic = full_ic.reindex(test_period_dates) if full_ic is not None else pd.Series(dtype=float)

    if not oos_ret.empty:
        oos_equity_values = [1.0]
        for ret in oos_ret.values:
            oos_equity_values.append(oos_equity_values[-1] * (1.0 + ret))
        oos_eq = pd.Series(oos_equity_values[1:], index=oos_ret.index)
    else:
        oos_eq = pd.Series(dtype=float)

    oos_sharpe = compute_sharpe(oos_ret)
    oos_sortino = compute_sortino_ratio(oos_ret)
    oos_calmar = compute_calmar_ratio(oos_ret, oos_eq)
    oos_composite = compute_composite_score(oos_ret, oos_eq)
    oos_score = select_score(oos_ret, oos_eq, score_mode)
    oos_total_ret = (oos_eq.iloc[-1] - 1.0) if not oos_eq.empty else 0.0
    oos_avg_turnover = oos_turnover.mean() if not oos_turnover.empty else 0.0

    oos_ic_clean = oos_ic.dropna()
    oos_ic_mean = oos_ic_clean.mean() if not oos_ic_clean.empty else np.nan
    oos_ic_ir = compute_information_ratio(oos_ic_clean) if not oos_ic_clean.empty else np.nan

    oos_positions = pd.DataFrame()
    if full_position_log is not None and not full_position_log.empty:
        mask = (full_position_log['date'] >= test_start_date) & (full_position_log['date'] <= test_end_date)
        oos_positions = full_position_log.loc[mask].copy()
        if not oos_positions.empty:
            oos_positions['iteration'] = iteration
            oos_positions['train_start'] = train_start_date
            oos_positions['train_end'] = train_end_date
            oos_positions['test_start'] = test_start_date
            oos_positions['test_end'] = test_end_date
            wf_oos_positions.append(oos_positions)

    print(f"  OOS Score ({score_mode}): {oos_score:.2f}, Sharpe: {oos_sharpe:.2f}, Sortino: {oos_sortino:.2f}, Calmar: {oos_calmar:.2f}, Composite: {oos_composite:.2f}")
    print(f"  OOS Total Return: {oos_total_ret*100:.2f}%, Avg Turnover: {oos_avg_turnover*100:.2f}%")
    print(f"  OOS IC (selected assets): mean={oos_ic_mean:.4f}, IR={oos_ic_ir:.2f}")

    wf_results.append({
        'iteration': iteration,
        'train_start': train_start_date,
        'train_end': train_end_date,
        'test_start': test_start_date,
        'test_end': test_end_date,
        'train_days': current_train_end_idx - train_start_idx,
        'test_days': test_end_idx - test_start_idx,
        'best_simple_window': best_params[0],
        'best_vol_window': best_params[1],
        'best_id_window': best_params[2],
        'best_long_id_threshold': best_params[3],
        'best_short_id_threshold': best_params[4],
        'best_volume_percentile': best_params[5],
        'best_momentum_percentile': best_params[6],
        'is_score': best_score,
        'is_composite': best_composite,
        'is_sharpe': best_sharpe,
        'oos_score': oos_score,
        'oos_composite': oos_composite,
        'oos_sharpe': oos_sharpe,
        'oos_sortino': oos_sortino,
        'oos_calmar': oos_calmar,
        'oos_total_return': oos_total_ret,
        'score_mode': score_mode,
        'oos_avg_turnover': oos_avg_turnover,
        'oos_ic_mean': oos_ic_mean,
        'oos_ic_ir': oos_ic_ir,
    })

    wf_oos_equity.append(oos_eq)
    wf_oos_returns.append(oos_ret)
    wf_oos_turnover.append(oos_turnover)
    wf_oos_ic.append(oos_ic)

    current_train_end_idx += wf_step_days

wf_summary = pd.DataFrame(wf_results)
all_oos_positions = pd.concat(wf_oos_positions, ignore_index=True) if len(wf_oos_positions) > 0 else pd.DataFrame()
print(f"{'='*80}")
print(f"Walk-Forward Validation Complete: {len(wf_results)} iterations")
print(f"{'='*80}")
if not wf_summary.empty:
    print(wf_summary[['iteration', 'test_start', 'test_end', 'best_simple_window', 'best_vol_window', 'best_id_window', 'best_long_id_threshold', 'best_short_id_threshold', 'is_composite', 'oos_composite', 'oos_sharpe', 'oos_ic_mean', 'oos_ic_ir']])
else:
    print("No walk-forward iterations produced results.")

Starting walk-forward validation (rolling mode)...
Total iterations expected (approx): ~11
Iteration 1:
  Training: 2022-09-01 to 2023-05-28 (270 days)
  Testing:  2023-05-29 to 2023-08-26 (90 days)
  Best params: simple=12, vol=40, id_window=12, long-ID≥-0.1, short-ID≥-0.6, vol%=20.0%, mom%=10.0%
    IS Score (composite): 2.55, IS Sharpe=1.45, IS Composite=2.55
  OOS Score (composite): -2.43, Sharpe: -1.68, Sortino: -2.70, Calmar: -2.84, Composite: -2.43
  OOS Total Return: -15.72%, Avg Turnover: 53.15%
  OOS IC (selected assets): mean=-0.1364, IR=-5.50
Iteration 2:
  Training: 2022-11-30 to 2023-08-26 (270 days)
  Testing:  2023-08-27 to 2023-11-24 (90 days)
  Best params: simple=14, vol=30, id_window=14, long-ID≥-0.1, short-ID≥-0.6, vol%=20.0%, mom%=10.0%
    IS Score (composite): 0.47, IS Sharpe=0.35, IS Composite=0.47
  OOS Score (composite): 8.22, Sharpe: 2.95, Sortino: 6.13, Calmar: 16.26, Composite: 8.22
  OOS Total Return: 35.88%, Avg Turnover: 32.56%
  OOS IC (selected assets



  Best params: simple=12, vol=35, id_window=12, long-ID≥-0.1, short-ID≥-0.6, vol%=20.0%, mom%=10.0%
    IS Score (composite): 7.75, IS Sharpe=3.55, IS Composite=7.75
  OOS Score (composite): nan, Sharpe: nan, Sortino: nan, Calmar: nan, Composite: nan
  OOS Total Return: 0.00%, Avg Turnover: 0.00%
  OOS IC (selected assets): mean=nan, IR=nan
Walk-Forward Validation Complete: 11 iterations
    iteration test_start   test_end  best_simple_window  best_vol_window  \
0           1 2023-05-29 2023-08-26                  12               40   
1           2 2023-08-27 2023-11-24                  14               30   
2           3 2023-11-25 2024-02-22                  14               15   
3           4 2024-02-23 2024-05-22                  14               15   
4           5 2024-05-23 2024-08-20                  14               15   
5           6 2024-08-21 2024-11-18                  14               15   
6           7 2024-11-19 2025-02-16                  12               35   
7

In [12]:
# Aggregate Out-of-Sample Performance

# Method: Rebuild equity curve from returns with proper compounding
combined_oos_returns = []
combined_oos_dates = []
combined_oos_turnover_values = []
combined_oos_ic_values = []
combined_oos_ic_dates = []

for i, (eq_series, ret_series, turn_series, ic_series) in enumerate(zip(wf_oos_equity, wf_oos_returns, wf_oos_turnover, wf_oos_ic)):
    if ret_series.empty:
        continue
    
    # Concatenate returns - will rebuild equity with compounding
    combined_oos_returns.extend(ret_series.values)
    combined_oos_dates.extend(ret_series.index)
    combined_oos_turnover_values.extend(turn_series.values)

    ic_aligned = ic_series.reindex(ret_series.index) if ic_series is not None else pd.Series(index=ret_series.index, dtype=float)
    combined_oos_ic_values.extend(ic_aligned.values)
    combined_oos_ic_dates.extend(ic_aligned.index)

# Create combined series
combined_returns_series = pd.Series(combined_oos_returns, index=combined_oos_dates, name='OOS_Returns')
combined_turnover_series = pd.Series(combined_oos_turnover_values, index=combined_oos_dates, name='OOS_Turnover')
combined_ic_series = pd.Series(combined_oos_ic_values, index=combined_oos_ic_dates, name='OOS_IC')

# Build equity curve with proper compounding: equity[t] = equity[t-1] * (1 + return[t])
combined_oos_equity = []
current_equity = 1.0

for daily_return in combined_oos_returns:
    current_equity *= (1.0 + daily_return)  # Compound each return
    combined_oos_equity.append(current_equity)

# Create combined equity series
combined_equity_series = pd.Series(combined_oos_equity, index=combined_oos_dates, name='OOS_Equity')

# Compute aggregate metrics
aggregate_sharpe = compute_sharpe(combined_returns_series)
aggregate_sortino = compute_sortino_ratio(combined_returns_series)
aggregate_calmar = compute_calmar_ratio(combined_returns_series, combined_equity_series)
aggregate_composite = compute_composite_score(combined_returns_series, combined_equity_series)
aggregate_score = select_score(combined_returns_series, combined_equity_series, score_mode)
aggregate_total_return = (combined_equity_series.iloc[-1] - 1.0) if len(combined_equity_series) > 0 else 0.0
aggregate_cagr = ((combined_equity_series.iloc[-1]) ** (365.25 / len(combined_equity_series)) - 1) if len(combined_equity_series) > 0 else 0.0
aggregate_avg_turnover = combined_turnover_series.mean() if not combined_turnover_series.empty else 0.0

aggregate_ic_mean = combined_ic_series.dropna().mean() if not combined_ic_series.empty else np.nan
aggregate_ic_ir = compute_information_ratio(combined_ic_series.dropna())

# Calculate drawdowns
running_max = combined_equity_series.expanding().max()
drawdown = (combined_equity_series - running_max) / running_max
max_drawdown = drawdown.min()

print(f"{'='*80}")
print(f"AGGREGATED OUT-OF-SAMPLE PERFORMANCE")
print(f"{'='*80}")
print(f"Mode: {wf_mode.upper()}")
print(f"Total OOS Days: {len(combined_equity_series)}")
print(f"Date Range: {combined_equity_series.index[0].date()} to {combined_equity_series.index[-1].date()}")
print(f"--- Risk-Adjusted Returns ---")
print(f"Selected Score ({score_mode}): {aggregate_score:.3f}")
print(f"Annualized Sharpe Ratio:  {aggregate_sharpe:.3f}")
print(f"Annualized Sortino Ratio: {aggregate_sortino:.3f}")
print(f"Calmar Ratio:             {aggregate_calmar:.3f}")
print(f"Composite Score:          {aggregate_composite:.3f}")
print(f"--- Absolute Returns ---")
print(f"Total Return: {aggregate_total_return*100:.2f}%")
print(f"CAGR:         {aggregate_cagr*100:.2f}%")
print(f"Max Drawdown: {max_drawdown*100:.2f}%")
print(f"--- Turnover ---")
print(f"Average Daily Turnover:   {aggregate_avg_turnover*100:.2f}%")
print(f"Median Daily Turnover:    {combined_turnover_series.median()*100:.2f}%")
print(f"Annualized Turnover Est.: {aggregate_avg_turnover*365*100:.2f}%")
print(f"--- Stock Selection ---")
print(f"Mean Daily IC (picked assets): {aggregate_ic_mean:.4f}")
print(f"IC Information Ratio (√365):   {aggregate_ic_ir:.3f}")
print(f"{'='*80}")

# Summary statistics of walk-forward iterations
if not wf_summary.empty:
    print("Walk-Forward Iteration Statistics:")
    print(f"--- Selected Score ({score_mode}):")
    print(f"    Mean IS Score:  {wf_summary['is_score'].mean():.3f}")
    print(f"    Mean OOS Score: {wf_summary['oos_score'].mean():.3f}")
    print(f"    Score Degradation: {(wf_summary['is_score'].mean() - wf_summary['oos_score'].mean()):.3f}")
    print(f"--- Composite Score (40% Sortino + 30% Sharpe + 30% Calmar):")
    print(f"    Mean IS Composite:  {wf_summary['is_composite'].mean():.3f}")
    print(f"    Mean OOS Composite: {wf_summary['oos_composite'].mean():.3f}")
    print(f"    Composite Degradation: {(wf_summary['is_composite'].mean() - wf_summary['oos_composite'].mean()):.3f}")
    print(f"    OOS Composite Std Dev: {wf_summary['oos_composite'].std():.3f}")
    
    print(f"Sharpe Ratio:")
    print(f"    Mean IS Sharpe:  {wf_summary['is_sharpe'].mean():.3f}")
    print(f"    Mean OOS Sharpe: {wf_summary['oos_sharpe'].mean():.3f}")
    print(f"    Sharpe Degradation: {(wf_summary['is_sharpe'].mean() - wf_summary['oos_sharpe'].mean()):.3f}")
    print(f"    OOS Sharpe Std Dev: {wf_summary['oos_sharpe'].std():.3f}")
    
    print(f"Sortino Ratio:")
    print(f"    Mean OOS Sortino: {wf_summary['oos_sortino'].mean():.3f}")
    print(f"    OOS Sortino Std Dev: {wf_summary['oos_sortino'].std():.3f}")
    
    print(f"Calmar Ratio:")
    print(f"    Mean OOS Calmar: {wf_summary['oos_calmar'].mean():.3f}")
    print(f"    OOS Calmar Std Dev: {wf_summary['oos_calmar'].std():.3f}")
    
    print(f"Turnover:")
    print(f"    Mean OOS Turnover: {wf_summary['oos_avg_turnover'].mean()*100:.2f}%")
    print(f"    OOS Turnover Std Dev: {wf_summary['oos_avg_turnover'].std()*100:.2f}%")
    
    print(f"Information Coefficient (picked assets):")
    print(f"    Mean OOS IC:  {wf_summary['oos_ic_mean'].mean():.4f}")
    print(f"    Mean OOS IC IR: {wf_summary['oos_ic_ir'].mean():.3f}")
    
    print(f"Other Statistics:")
    print(f"    % Positive OOS Returns: {(wf_summary['oos_total_return'] > 0).sum() / len(wf_summary) * 100:.1f}%")


AGGREGATED OUT-OF-SAMPLE PERFORMANCE
Mode: ROLLING
Total OOS Days: 904
Date Range: 2023-05-29 to 2025-11-17
--- Risk-Adjusted Returns ---
Selected Score (composite): 2.035
Annualized Sharpe Ratio:  1.518
Annualized Sortino Ratio: 2.624
Calmar Ratio:             1.766
Composite Score:          2.035
--- Absolute Returns ---
Total Return: 323.32%
CAGR:         79.14%
Max Drawdown: -38.58%
--- Turnover ---
Average Daily Turnover:   49.87%
Median Daily Turnover:    55.32%
Annualized Turnover Est.: 18203.53%
--- Stock Selection ---
Mean Daily IC (picked assets): -0.0422
IC Information Ratio (√365):   -1.934
Walk-Forward Iteration Statistics:
--- Selected Score (composite):
    Mean IS Score:  2.976
    Mean OOS Score: 3.740
    Score Degradation: -0.764
--- Composite Score (40% Sortino + 30% Sharpe + 30% Calmar):
    Mean IS Composite:  2.976
    Mean OOS Composite: 3.740
    Composite Degradation: -0.764
    OOS Composite Std Dev: 4.775
Sharpe Ratio:
    Mean IS Sharpe:  1.726
    Mean OOS

In [13]:
# Visualization: OOS Equity Curve

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=combined_equity_series.index,
    y=combined_equity_series.values,
    mode='lines',
    name='Out-of-Sample Equity',
    line=dict(color='blue', width=2)
))

# Add shaded regions for each test period
colors = ['rgba(255,0,0,0.1)', 'rgba(0,255,0,0.1)', 'rgba(0,0,255,0.1)']
for i, row in wf_summary.iterrows():
    fig.add_vrect(
        x0=row['test_start'],
        x1=row['test_end'],
        fillcolor=colors[i % len(colors)],
        layer='below',
        line_width=0,
        annotation_text=f"Iter {row['iteration']}",
        annotation_position="top left"
    )

fig.update_layout(
    title=f'Walk-Forward Out-of-Sample Equity Curve ({wf_mode.capitalize()} Window)<br>' +
          f'Sharpe={aggregate_sharpe:.2f}, Total Return={aggregate_total_return*100:.1f}%, MaxDD={max_drawdown*100:.1f}%',
    xaxis_title='Date',
    yaxis_title='Equity',
    hovermode='x unified',
    width=1200,
    height=600
)

fig.show()

In [14]:
# Visualization: Parameter Selection Over Time

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=wf_summary['test_start'],
    y=wf_summary['best_simple_window'],
    mode='lines+markers',
    name='Simple Momentum Window',
    line=dict(color='blue', width=2),
    marker=dict(size=8)
))

fig.add_trace(go.Scatter(
    x=wf_summary['test_start'],
    y=wf_summary['best_vol_window'],
    mode='lines+markers',
    name='Volatility Window',
    line=dict(color='red', width=2),
    marker=dict(size=8),
    yaxis='y2'
))

fig.add_trace(go.Scatter(
    x=wf_summary['test_start'],
    y=wf_summary['best_long_id_threshold'],
    mode='lines+markers',
    name='Long -ID Threshold',
    line=dict(color='green', width=2),
    marker=dict(size=8),
    yaxis='y3'
))

fig.add_trace(go.Scatter(
    x=wf_summary['test_start'],
    y=wf_summary['best_short_id_threshold'],
    mode='lines+markers',
    name='Short -ID Threshold',
    line=dict(color='orange', width=2),
    marker=dict(size=8),
    yaxis='y4'
))

fig.update_layout(
    title='Optimal Parameter Selection Over Time',
    xaxis_title='Test Period Start Date',
    yaxis=dict(title='Simple Momentum Window (days)', side='left'),
    yaxis2=dict(title='Volatility Window (days)', overlaying='y', side='right'),
    yaxis3=dict(title='Long -ID Threshold', overlaying='y', side='right', position=0.85),
    yaxis4=dict(title='Short -ID Threshold', overlaying='y', side='right', position=0.95),
    hovermode='x unified',
    width=1400,
    height=500
)

fig.show()

In [15]:
# Visualization: IS vs OOS Sharpe Comparison

fig = go.Figure()

fig.add_trace(go.Bar(
    x=wf_summary['iteration'],
    y=wf_summary['is_sharpe'],
    name='In-Sample Sharpe',
    marker_color='lightblue',
    text=wf_summary['is_sharpe'].round(2),
    textposition='outside'
))

fig.add_trace(go.Bar(
    x=wf_summary['iteration'],
    y=wf_summary['oos_sharpe'],
    name='Out-of-Sample Sharpe',
    marker_color='darkblue',
    text=wf_summary['oos_sharpe'].round(2),
    textposition='outside'
))

fig.update_layout(
    title='In-Sample vs Out-of-Sample Sharpe Ratios by Iteration',
    xaxis_title='Walk-Forward Iteration',
    yaxis_title='Annualized Sharpe Ratio',
    barmode='group',
    width=1200,
    height=500,
    hovermode='x unified'
)

fig.show()

In [16]:
# Visualization: Drawdown Analysis

running_max = combined_equity_series.expanding().max()
drawdown_series = (combined_equity_series - running_max) / running_max * 100

fig = go.Figure()

# Equity curve
fig.add_trace(go.Scatter(
    x=combined_equity_series.index,
    y=combined_equity_series.values,
    mode='lines',
    name='Equity',
    line=dict(color='blue', width=2),
    yaxis='y1'
))

# Drawdown
fig.add_trace(go.Scatter(
    x=drawdown_series.index,
    y=drawdown_series.values,
    mode='lines',
    name='Drawdown %',
    line=dict(color='red', width=1.5),
    fill='tozeroy',
    fillcolor='rgba(255,0,0,0.2)',
    yaxis='y2'
))

fig.update_layout(
    title='Out-of-Sample Equity and Drawdown',
    xaxis_title='Date',
    yaxis=dict(title='Equity', side='left'),
    yaxis2=dict(title='Drawdown (%)', overlaying='y', side='right', range=[drawdown_series.min()*1.1, 5]),
    hovermode='x unified',
    width=1200,
    height=600
)

fig.show()


# Additional Analysis: Parameter stability
print("\n" + "="*80)
print("PARAMETER SELECTION STATISTICS")
print("="*80)
print("\nSimple Momentum Window Selection:")
print(wf_summary['best_simple_window'].value_counts().sort_index())
print(f"\nMost common: {wf_summary['best_simple_window'].mode()[0]} days ({wf_summary['best_simple_window'].value_counts().max()} times)")

print("\nVolatility Window Selection:")
print(wf_summary['best_vol_window'].value_counts().sort_index())
print(f"\nMost common: {wf_summary['best_vol_window'].mode()[0]} days ({wf_summary['best_vol_window'].value_counts().max()} times)")

print("\nVolume Percentile Selection:")
print(wf_summary['best_volume_percentile'].value_counts().sort_index())

print("\nMomentum Percentile Selection:")
print(wf_summary['best_momentum_percentile'].value_counts().sort_index())

# Export results
results_output_path = f'/Users/chinjieheng/Documents/research/mom_research/walkforward_results_{score_mode}.csv'
wf_summary.to_csv(results_output_path, index=False)
print(f"\nResults exported to: {results_output_path}")

positions_output_path = f'/Users/chinjieheng/Documents/research/mom_research/walkforward_positions_{score_mode}.csv'
if 'all_oos_positions' in locals() and not all_oos_positions.empty:
    all_oos_positions = all_oos_positions.sort_values('date')
    all_oos_positions.to_csv(positions_output_path, index=False)
    print(f"Positions exported to: {positions_output_path}")
else:
    print("Positions export skipped: no position logs captured.")


PARAMETER SELECTION STATISTICS

Simple Momentum Window Selection:
best_simple_window
12    6
14    5
Name: count, dtype: int64

Most common: 12 days (6 times)

Volatility Window Selection:
best_vol_window
15    4
30    1
35    2
40    2
45    2
Name: count, dtype: int64

Most common: 15 days (4 times)

Volume Percentile Selection:
best_volume_percentile
0.2    11
Name: count, dtype: int64

Momentum Percentile Selection:
best_momentum_percentile
0.1    11
Name: count, dtype: int64

Results exported to: /Users/chinjieheng/Documents/research/mom_research/walkforward_results_composite.csv
Positions exported to: /Users/chinjieheng/Documents/research/mom_research/walkforward_positions_composite.csv


In [17]:
# Visualization: Strategy vs BTC Benchmark

# Calculate BTC buy-and-hold returns over the same OOS period
btc_oos_dates = combined_equity_series.index
btc_prices = price['BTCUSDT'].reindex(btc_oos_dates)

# Build BTC equity curve (buy and hold)
btc_equity = []
if len(btc_prices) > 0 and not pd.isna(btc_prices.iloc[0]):
    initial_btc_price = btc_prices.iloc[0]
    for btc_price in btc_prices:
        if pd.isna(btc_price):
            btc_equity.append(btc_equity[-1] if btc_equity else 1.0)
        else:
            btc_equity.append(btc_price / initial_btc_price)
    
    btc_equity_series = pd.Series(btc_equity, index=btc_oos_dates, name='BTC')
    
    # Calculate BTC metrics
    btc_returns = btc_equity_series.pct_change(fill_method=None).dropna()
    btc_sharpe = compute_sharpe(btc_returns) if len(btc_returns) > 1 else np.nan
    btc_sortino = compute_sortino_ratio(btc_returns) if len(btc_returns) > 1 else np.nan
    btc_calmar = compute_calmar_ratio(btc_returns, btc_equity_series) if len(btc_returns) > 1 else np.nan
    btc_total_return = (btc_equity_series.iloc[-1] - 1.0)
    btc_cagr = ((btc_equity_series.iloc[-1]) ** (365.25 / len(btc_equity_series)) - 1) if len(btc_equity_series) > 0 else 0.0
    
    # Calculate BTC drawdown
    btc_running_max = btc_equity_series.expanding().max()
    btc_drawdown = (btc_equity_series - btc_running_max) / btc_running_max
    btc_max_drawdown = btc_drawdown.min()
    
    # Create comparison plot
    fig = go.Figure()
    
    # Strategy equity
    fig.add_trace(go.Scatter(
        x=combined_equity_series.index,
        y=combined_equity_series.values,
        mode='lines',
        name=f'Strategy (Sharpe={aggregate_sharpe:.2f})',
        line=dict(color='blue', width=2)
    ))
    
    # BTC benchmark
    fig.add_trace(go.Scatter(
        x=btc_equity_series.index,
        y=btc_equity_series.values,
        mode='lines',
        name=f'BTC Buy & Hold (Sharpe={btc_sharpe:.2f})',
        line=dict(color='orange', width=2, dash='dash')
    ))
    
    fig.update_layout(
        title=f'Strategy vs BTC Benchmark - Out-of-Sample Performance<br>' +
              f'Strategy: {aggregate_total_return*100:.1f}% return, {aggregate_sharpe:.2f} Sharpe, {max_drawdown*100:.1f}% MaxDD | ' +
              f'BTC: {btc_total_return*100:.1f}% return, {btc_sharpe:.2f} Sharpe, {btc_max_drawdown*100:.1f}% MaxDD',
        xaxis_title='Date',
        yaxis_title='Equity (Starting at 1.0)',
        hovermode='x unified',
        width=1200,
        height=600,
        legend=dict(x=0.01, y=0.99, bordercolor="Black", borderwidth=1)
    )
    
    fig.show()
    
    # Print comparison summary
    print("\n" + "="*80)
    print("STRATEGY vs BTC BENCHMARK COMPARISON")
    print("="*80)
    print(f"\nStrategy Performance:")
    print(f"  Total Return:  {aggregate_total_return*100:.2f}%")
    print(f"  CAGR:          {aggregate_cagr*100:.2f}%")
    print(f"  Sharpe Ratio:  {aggregate_sharpe:.3f}")
    print(f"  Sortino Ratio: {aggregate_sortino:.3f}")
    print(f"  Calmar Ratio:  {aggregate_calmar:.3f}")
    print(f"  Max Drawdown:  {max_drawdown*100:.2f}%")
    print(f"  Avg Turnover:  {aggregate_avg_turnover*100:.2f}%/day")
    
    print(f"\nBTC Buy & Hold Performance:")
    print(f"  Total Return:  {btc_total_return*100:.2f}%")
    print(f"  CAGR:          {btc_cagr*100:.2f}%")
    print(f"  Sharpe Ratio:  {btc_sharpe:.3f}")
    print(f"  Sortino Ratio: {btc_sortino:.3f}")
    print(f"  Calmar Ratio:  {btc_calmar:.3f}")
    print(f"  Max Drawdown:  {btc_max_drawdown*100:.2f}%")
    
    print(f"\nOutperformance:")
    print(f"  Return Difference:  {(aggregate_total_return - btc_total_return)*100:.2f}%")
    print(f"  CAGR Difference:    {(aggregate_cagr - btc_cagr)*100:.2f}%")
    print(f"  Sharpe Difference:  {(aggregate_sharpe - btc_sharpe):.3f}")
    print(f"  Sortino Difference: {(aggregate_sortino - btc_sortino):.3f}")
    print(f"  Calmar Difference:  {(aggregate_calmar - btc_calmar):.3f}")
    
    excess_return_ratio = aggregate_total_return / btc_total_return if btc_total_return != 0 else np.inf
    print(f"  Return Ratio (Strategy/BTC): {excess_return_ratio:.2f}x")
    print("="*80)
else:
    print("Warning: BTC data not available for OOS period")


STRATEGY vs BTC BENCHMARK COMPARISON

Strategy Performance:
  Total Return:  323.32%
  CAGR:          79.14%
  Sharpe Ratio:  1.518
  Sortino Ratio: 2.624
  Calmar Ratio:  1.766
  Max Drawdown:  -38.58%
  Avg Turnover:  49.87%/day

BTC Buy & Hold Performance:
  Total Return:  245.05%
  CAGR:          64.94%
  Sharpe Ratio:  1.310
  Sortino Ratio: 2.024
  Calmar Ratio:  2.163
  Max Drawdown:  -28.10%

Outperformance:
  Return Difference:  78.27%
  CAGR Difference:    14.20%
  Sharpe Difference:  0.208
  Sortino Difference: 0.600
  Calmar Difference:  -0.396
  Return Ratio (Strategy/BTC): 1.32x


In [18]:
# Visualization: Turnover Analysis

from plotly.subplots import make_subplots

fig = make_subplots(
    rows=2, cols=1,
    subplot_titles=('Daily Turnover Over Time', 'Turnover Distribution'),
    vertical_spacing=0.12,
    row_heights=[0.6, 0.4]
)

# Top panel: Time series of daily turnover
fig.add_trace(
    go.Scatter(
        x=combined_turnover_series.index,
        y=combined_turnover_series.values * 100,  # Convert to percentage
        mode='lines',
        name='Daily Turnover',
        line=dict(color='purple', width=1),
        showlegend=True
    ),
    row=1, col=1
)

# Add mean line
mean_turnover = combined_turnover_series.mean()
fig.add_hline(
    y=mean_turnover * 100,
    line_dash="dash",
    line_color="red",
    annotation_text=f"Mean: {mean_turnover*100:.2f}%",
    annotation_position="top right",
    row=1, col=1
)

# Add median line
median_turnover = combined_turnover_series.median()
fig.add_hline(
    y=median_turnover * 100,
    line_dash="dot",
    line_color="green",
    annotation_text=f"Median: {median_turnover*100:.2f}%",
    annotation_position="bottom right",
    row=1, col=1
)

# Bottom panel: Histogram of turnover distribution
fig.add_trace(
    go.Histogram(
        x=combined_turnover_series.values * 100,
        nbinsx=50,
        name='Turnover Distribution',
        marker_color='purple',
        opacity=0.7,
        showlegend=False
    ),
    row=2, col=1
)

fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Turnover (%)", row=2, col=1)
fig.update_yaxes(title_text="Turnover (%)", row=1, col=1)
fig.update_yaxes(title_text="Frequency", row=2, col=1)

fig.update_layout(
    title_text='Turnover Analysis: Time Series and Distribution',
    height=800,
    width=1200,
    showlegend=True
)

fig.show()

# Print turnover statistics
print("\n" + "="*80)
print("TURNOVER STATISTICS")
print("="*80)
print(f"Mean Daily Turnover:     {combined_turnover_series.mean()*100:.2f}%")
print(f"Median Daily Turnover:   {combined_turnover_series.median()*100:.2f}%")
print(f"Std Dev:                 {combined_turnover_series.std()*100:.2f}%")
print(f"Min:                     {combined_turnover_series.min()*100:.2f}%")
print(f"Max:                     {combined_turnover_series.max()*100:.2f}%")
print(f"25th Percentile:         {combined_turnover_series.quantile(0.25)*100:.2f}%")
print(f"75th Percentile:         {combined_turnover_series.quantile(0.75)*100:.2f}%")
print(f"\nDays with zero turnover: {(combined_turnover_series == 0).sum()} ({(combined_turnover_series == 0).sum() / len(combined_turnover_series) * 100:.1f}%)")
print(f"Days with >100% turnover: {(combined_turnover_series > 1.0).sum()} ({(combined_turnover_series > 1.0).sum() / len(combined_turnover_series) * 100:.1f}%)")
print(f"\nAnnualized Turnover (365 × mean): {combined_turnover_series.mean()*365*100:.2f}%")
print("="*80)


TURNOVER STATISTICS
Mean Daily Turnover:     49.87%
Median Daily Turnover:   55.32%
Std Dev:                 40.62%
Min:                     0.00%
Max:                     187.09%
25th Percentile:         0.00%
75th Percentile:         81.58%

Days with zero turnover: 265 (29.3%)
Days with >100% turnover: 96 (10.6%)

Annualized Turnover (365 × mean): 18203.53%


---

## Walk-Forward with Continuous State Implementation

**State Continuity Solution:**

Each test period now maintains continuous state from the beginning of the data:
1. **Find optimal parameters** on the training window (parameter optimization)
2. **Run full backtest** from data start to end of test period (continuous state)
3. **Extract test period returns only** for OOS evaluation

This ensures positions carry forward naturally between test windows, providing realistic OOS performance.

**Mode Comparison:**

To compare rolling vs expanding window results, simply change `wf_mode` in the configuration cell and rerun from there. 

**Expected differences:**
- **Rolling**: Better adapts to regime changes, but less training data in early periods
- **Expanding**: More stable with increasing data, but may be slower to adapt to recent market shifts

**Tip**: Run both modes and compare:
- Aggregate OOS Sharpe ratios
- Parameter stability (frequency of selection)
- Drawdown patterns
- IS-OOS Sharpe degradation

In [19]:
# Diagnostic: Compare fixed-parameter full-sample run vs. walk-forward OOS aggregation

param_simple = 12
param_vol = 35
param_id_window = 12
param_long_id_threshold = -0.1
param_short_id_threshold = -0.4

full_eq, full_ret, full_turnover = run_backtest(
    price,
    param_simple,
    param_vol,
    param_id_window,
    param_long_id_threshold,
    param_short_id_threshold,
    start_idx=0,
    end_idx=len(price)
)

if not combined_equity_series.empty and not full_eq.empty:
    eq_full_on_oos = full_eq.reindex(combined_equity_series.index).ffill().dropna()
    eq_full_norm = eq_full_on_oos / eq_full_on_oos.iloc[0]
    eq_oos_norm = combined_equity_series / combined_equity_series.iloc[0]

    fig = go.Figure()
    fig.add_trace(go.Scatter(
        x=eq_oos_norm.index,
        y=eq_oos_norm.values,
        mode='lines',
        name='Walk-Forward OOS Equity (normalized)',
        line=dict(color='blue', width=2)
    ))
    fig.add_trace(go.Scatter(
        x=eq_full_norm.index,
        y=eq_full_norm.values,
        mode='lines',
        name=f'Full-Sample Equity (simple={param_simple}, vol={param_vol}, id={param_id_window}, long≥{param_long_id_threshold}, short≥{param_short_id_threshold})',
        line=dict(color='orange', width=2, dash='dash')
    ))
    fig.update_layout(
        title='Shape Comparison: Walk-Forward OOS vs Full-Sample Single Run',
        xaxis_title='Date',
        yaxis_title='Normalized Equity (start=1 at first OOS date)',
        hovermode='x unified', width=1200, height=500
    )
    fig.show()

    full_ret_on_oos = full_ret.reindex(combined_returns_series.index).fillna(0)
    oos_ret_aligned = combined_returns_series.reindex(full_ret_on_oos.index).fillna(0)

    if (full_ret_on_oos.std(ddof=1) > 0) and (oos_ret_aligned.std(ddof=1) > 0):
        corr = full_ret_on_oos.corr(oos_ret_aligned)
    else:
        corr = float('nan')

    print('=== OOS vs Full-Run Return Comparison ===')
    print(f'Parameters: simple={param_simple}, vol={param_vol}, id={param_id_window}, long≥{param_long_id_threshold}, short≥{param_short_id_threshold}')
    print(f'Overlap days: {len(full_ret_on_oos)}')
    print(f'Correlation (OOS vs Full): {corr:.4f}')
    print(f'Mean diff (OOS - Full): {(oos_ret_aligned - full_ret_on_oos).mean():.6f}')
    print(f'Max abs diff: {(oos_ret_aligned - full_ret_on_oos).abs().max():.6f}')

    diff = (oos_ret_aligned - full_ret_on_oos).abs()
    mismatches = diff[diff > 1e-12].head(10)
    if len(mismatches) > 0:
        print('Sample mismatched dates (abs diff > 1e-12):')
        for dt, val in mismatches.items():
            print(f'  {dt.date()}  diff={val:.6e}  oos={oos_ret_aligned.loc[dt]:.6f}  full={full_ret_on_oos.loc[dt]:.6f}')
else:
    print('Cannot run comparison: one of the series is empty.')


=== OOS vs Full-Run Return Comparison ===
Parameters: simple=12, vol=35, id=12, long≥-0.1, short≥-0.4
Overlap days: 904
Correlation (OOS vs Full): 0.9039
Mean diff (OOS - Full): -0.000028
Max abs diff: 0.075635
Sample mismatched dates (abs diff > 1e-12):
  2023-05-29  diff=6.443430e-04  oos=-0.024843  full=-0.024199
  2023-05-30  diff=2.481489e-05  oos=0.007747  full=0.007772
  2023-05-31  diff=1.086212e-04  oos=-0.000348  full=-0.000457
  2023-06-01  diff=3.573849e-05  oos=-0.012472  full=-0.012436
  2023-06-02  diff=1.522286e-04  oos=-0.016283  full=-0.016131
  2023-06-03  diff=1.535119e-04  oos=-0.030986  full=-0.030833
  2023-06-04  diff=1.737940e-05  oos=-0.026548  full=-0.026565
  2023-06-05  diff=1.635166e-04  oos=0.011539  full=0.011376
  2023-06-06  diff=4.474320e-05  oos=-0.014484  full=-0.014529
  2023-06-07  diff=2.234222e-04  oos=-0.017213  full=-0.017436



### BTC Volatility vs Strategy Return Analysis
Rebuild the combined OOS return stream using the walk-forward winners, align it with BTC 20-day rolling volatility, drop flat-return days, and visualize the relationships at time $t$ and $t+1$.


In [20]:
import os
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

cache_dir = Path('.matplotlib_cache')
cache_dir.mkdir(exist_ok=True)
os.environ['MPLCONFIGDIR'] = str(cache_dir.resolve())

fig_dir = Path('figures')
fig_dir.mkdir(exist_ok=True)

wf_summary = pd.read_csv(
    f'walkforward_results_{score_mode}.csv',
    parse_dates=['train_start', 'train_end', 'test_start', 'test_end']
).sort_values('test_start')

combined_returns = []
combined_dates = []
combined_turnover = []
wf_oos_stats = []
all_dates = price.index

for row in wf_summary.itertuples(index=False):
    simple = int(row.best_simple_window)
    vol = int(row.best_vol_window)
    id_window = int(getattr(row, "best_id_window", simple))
    long_id = float(getattr(row, "best_long_id_threshold", 0.0))
    short_id = float(getattr(row, "best_short_id_threshold", 0.0))

    volume_pct = float(getattr(row, "best_volume_percentile", volume_percentile))
    momentum_pct = float(getattr(row, "best_momentum_percentile", momentum_percentile))

    test_start = pd.Timestamp(row.test_start)
    test_end = pd.Timestamp(row.test_end)

    if test_start not in all_dates or test_end not in all_dates:
        raise ValueError(f"Walk-forward test dates not in price index for iteration {row.iteration}")

    test_start_idx = all_dates.get_loc(test_start)
    test_end_idx = all_dates.get_loc(test_end) + 1  # exclusive

    full_eq, full_ret, full_turnover = run_backtest(
        price, simple, vol, id_window, long_id, short_id, start_idx=0, end_idx=test_end_idx,
        volume_pct=volume_pct, momentum_pct=momentum_pct,
        precomputed=precomputed_cache
    )

    test_dates = all_dates[test_start_idx:test_end_idx]
    oos_ret = full_ret.reindex(test_dates).dropna()
    oos_turn = full_turnover.reindex(test_dates).reindex(oos_ret.index).fillna(0)

    combined_returns.extend(oos_ret.values)
    combined_dates.extend(oos_ret.index)
    combined_turnover.extend(oos_turn.values)

    if oos_ret.empty:
        wf_oos_stats.append({
            'iteration': row.iteration,
            'test_start': test_start,
            'test_end': test_end,
            'simple_window': simple,
            'vol_window': vol,
            'oos_sharpe_recomputed': np.nan,
            'oos_sortino_recomputed': np.nan,
            'oos_calmar_recomputed': np.nan,
            'oos_composite_recomputed': np.nan,
            'oos_total_return_recomputed': np.nan,
        })
        continue

    oos_eq = (1 + oos_ret).cumprod()
    wf_oos_stats.append({
        'iteration': row.iteration,
        'test_start': test_start,
        'test_end': test_end,
        'simple_window': simple,
        'vol_window': vol,
        'oos_sharpe_recomputed': compute_sharpe(oos_ret),
        'oos_sortino_recomputed': compute_sortino_ratio(oos_ret),
        'oos_calmar_recomputed': compute_calmar_ratio(oos_ret, oos_eq),
        'oos_composite_recomputed': compute_composite_score(oos_ret, oos_eq),
        'oos_total_return_recomputed': oos_eq.iloc[-1] - 1,
    })

combined_returns_series = pd.Series(combined_returns, index=combined_dates, name='OOS_Returns')
combined_returns_series = combined_returns_series[~combined_returns_series.index.duplicated(keep='first')].sort_index()
combined_turnover_series = pd.Series(combined_turnover, index=combined_returns_series.index, name='OOS_Turnover')

btc_returns = price['BTCUSDT'].pct_change(fill_method=None)
btc_vol_20 = btc_returns.rolling(window=20, min_periods=20).std()

analysis_df = pd.DataFrame({
    'strategy_ret_t': combined_returns_series,
    'strategy_ret_t1': combined_returns_series.shift(-1),
    'btc_vol': btc_vol_20.reindex(combined_returns_series.index),
})
analysis_df = analysis_df.dropna(subset=['btc_vol'])

mask_t = analysis_df['strategy_ret_t'] != 0
mask_t1 = analysis_df['strategy_ret_t1'] != 0
same_day_df = analysis_df[mask_t].dropna(subset=['strategy_ret_t'])
next_day_df = analysis_df[mask_t1].dropna(subset=['strategy_ret_t1'])

corr_t = same_day_df[['btc_vol', 'strategy_ret_t']].corr().iloc[0, 1] if not same_day_df.empty else np.nan
corr_t1 = next_day_df[['btc_vol', 'strategy_ret_t1']].corr().iloc[0, 1] if not next_day_df.empty else np.nan

plt.figure(figsize=(8, 5))
plt.scatter(same_day_df['btc_vol'], same_day_df['strategy_ret_t'], s=15, alpha=0.5, color='royalblue')
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
plt.xlabel('BTC 20d rolling vol (std of daily returns)')
plt.ylabel('Strategy return (t)')
plt.title(f'BTC Vol vs Strategy Return (t) | N={len(same_day_df)}, Corr={corr_t:.3f}')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(fig_dir / 'btc_vol_vs_strategy_return_t.png', dpi=150)
plt.close()

plt.figure(figsize=(8, 5))
plt.scatter(next_day_df['btc_vol'], next_day_df['strategy_ret_t1'], s=15, alpha=0.5, color='darkorange')
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
plt.xlabel('BTC 20d rolling vol (std of daily returns)')
plt.ylabel('Strategy return (t+1)')
plt.title(f'BTC Vol vs Strategy Return (t+1) | N={len(next_day_df)}, Corr={corr_t1:.3f}')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(fig_dir / 'btc_vol_vs_strategy_return_t1.png', dpi=150)
plt.close()

analysis_path = fig_dir / 'btc_vol_vs_strategy_returns.csv'
analysis_df.to_csv(analysis_path)

stats_path = fig_dir / 'wf_oos_stats_recomputed.csv'
pd.DataFrame(wf_oos_stats).to_csv(stats_path, index=False)

print({
    'same_day_points': len(same_day_df),
    'same_day_corr': float(corr_t) if pd.notna(corr_t) else None,
    'next_day_points': len(next_day_df),
    'next_day_corr': float(corr_t1) if pd.notna(corr_t1) else None,
    'plots': [str(fig_dir / 'btc_vol_vs_strategy_return_t.png'), str(fig_dir / 'btc_vol_vs_strategy_return_t1.png')],
    'data_path': str(analysis_path),
    'stats_path': str(stats_path),
})

{'same_day_points': 639, 'same_day_corr': 0.01578582590574769, 'next_day_points': 638, 'next_day_corr': 0.0005416279206274178, 'plots': ['figures/btc_vol_vs_strategy_return_t.png', 'figures/btc_vol_vs_strategy_return_t1.png'], 'data_path': 'figures/btc_vol_vs_strategy_returns.csv', 'stats_path': 'figures/wf_oos_stats_recomputed.csv'}
