# QuantKubera Monolith v3 — Magnum Opus

## Institutional-Grade Momentum Transformer for Indian Derivatives

**Self-contained** | **Zero Look-Ahead Bias** | **Walk-Forward OOS Validation**
**World Monitor Intelligence** | **Cutting-Edge ML** | **RL Meta-Learning**

| Component | Reference | Features |
|-----------|-----------|----------|
| Temporal Fusion Transformer | Lim et al. (2021) | VSN, Interpretable MHA, GRN |
| EMAT Attention | Entropy 2025 | Temporal Decay + Trend + Volatility heads |
| AFML Event Pipeline | Lopez de Prado (2018) | CUSUM, Triple Barrier, Meta-Labeling |
| Fractional Differentiation | Hosking (1981) | Memory-preserving stationarity |
| Ramanujan Sum Filter Bank | Planat (2002) | Integer-period cycle detection |
| NIG Changepoint Detection | Adams & MacKay (2007) | Regime shift scoring |
| Market Microstructure | Easley et al. (2012), Kyle (1985) | VPIN, Lambda, Amihud |
| Wavelet Decomposition | Daubechies (1992) | Multi-resolution trend/noise separation |
| Hidden Markov Model | Baum-Welch (1970) | 3-state regime detection |
| Persistent Homology (TDA) | Edelsbrunner (2000) | Topological crash detection |
| Transfer Entropy | Schreiber (2000) | Cross-asset causal information flow |
| Multifractal DFA | Kantelhardt (2002) | Fractal spectrum width |
| KL Divergence Regime | Kullback-Leibler (1951) | Distribution shift detection |
| World Monitor Signals | koala73 (2025) | Macro regime, CII, anomaly detection |
| Thompson Sampling | Thompson (1933) | RL strategy selection |
| Sharpe+DD Loss | Multi-objective | Sharpe + drawdown penalty |

### Pipeline
```
Zerodha Kite API → 31 Base Features (10 groups) → Variable Selection Network
GDELT News → FinBERT → 9 Sentiment Features ↗
India VIX → 3 VIX Features ↗
Yahoo Finance → 5 Macro Regime Features ↗     (World Monitor)
Welford Anomaly → 4 Anomaly Features ↗         (World Monitor)
HMM Regime → 3 Regime Features ↗               (NEW)
Wavelet DWT → 4 Wavelet Features ↗             (NEW)
Info Theory → 4 Entropy Features ↗              (NEW)
Multifractal → 3 MF-DFA Features ↗             (NEW)
TDA → 2 Topological Features ↗                  (NEW)
→ EMAT (3-Head Financial Attention) → Momentum Signal
→ Walk-Forward OOS (purge gaps) → Meta-Labeling → Probit Bet Sizing
→ RL Thompson Sampling Meta-Learner → Regime-Aware Ensemble
→ VectorBTPro Tearsheet
```

### Feature Groups (31 per-ticker + 37 cross-asset & advanced = 68 total)
**Base (31):**
1. Normalized Returns (5) | 2. MACD (3) | 3. Volatility (4)
4. Changepoint Detection (2) | 5. Fractional Calculus (3) | 6. Ramanujan (4)
7. Microstructure (4) | 8. Entropy (1) | 9. Momentum Quality (3) | 10. Volume (2)

**Cross-Asset (12):**
11. News Sentiment (9) | 12. India VIX (3)

**World Monitor Intelligence (9):**
13. Macro Regime (5) | 14. Welford Anomaly Detection (4)

**Cutting-Edge ML (16):**
15. HMM Regime (3) | 16. Wavelet Decomposition (4)
17. Information Theory Advanced (4) | 18. Multifractal DFA (3) | 19. TDA Persistent Homology (2)

In [1]:
# ============================================================================
# CELL 1: Setup & Configuration
# ============================================================================

import os
import sys
import warnings
import gc
import time
import math
import logging
import json
import hashlib

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.special import gammaln, gamma as gamma_fn
from datetime import datetime, timedelta
from pathlib import Path
from typing import List, Dict, Optional, Tuple, Any
from dataclasses import dataclass, field

from kiteconnect import KiteConnect
from dotenv import load_dotenv
import pyotp
import requests
from tqdm.auto import tqdm
# --- v3 additional imports ---
try:
    import yfinance as yf
    HAS_YFINANCE = True
except ImportError:
    HAS_YFINANCE = False

try:
    from fredapi import Fred
    HAS_FREDAPI = True
except ImportError:
    HAS_FREDAPI = False

try:
    from hmmlearn.hmm import GaussianHMM
    HAS_HMM = True
except ImportError:
    HAS_HMM = False

try:
    import pywt
    HAS_WAVELET = True
except ImportError:
    HAS_WAVELET = False

try:
    from gtda.homology import VietorisRipsPersistence
    from gtda.diagrams import PersistenceEntropy, Amplitude
    HAS_TDA = True
except ImportError:
    HAS_TDA = False

from scipy.stats import entropy as sp_entropy
from sklearn.neighbors import KDTree
from scipy.special import digamma


# ---------------------------------------------------------------------------
# Suppress warnings
# ---------------------------------------------------------------------------
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
# Suppress Keras masking warnings — we use fixed-length sequences (window_size=21),
# no padding, no variable-length inputs. Masking is irrelevant.
warnings.filterwarnings('ignore', message='.*does not support masking.*')
tf.get_logger().setLevel('ERROR')

logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
logger = logging.getLogger('QuantKubera')

# ---------------------------------------------------------------------------
# GPU detection & memory growth
# ---------------------------------------------------------------------------
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    for gpu in gpus:
        try:
            tf.config.experimental.set_memory_growth(gpu, True)
        except RuntimeError as e:
            logger.warning(f"GPU memory growth setting failed: {e}")
    GPU_AVAILABLE = True
    GPU_NAME = tf.test.gpu_device_name() or gpus[0].name
else:
    GPU_AVAILABLE = False
    GPU_NAME = "None"

# ---------------------------------------------------------------------------
# Seed everything
# ---------------------------------------------------------------------------
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)

# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
@dataclass
class MonolithConfig:
    # Data
    tickers: list = field(default_factory=lambda: [
        # NSE Index Futures
        'NIFTY', 'BANKNIFTY', 'FINNIFTY',
        # MCX Commodity Futures
        'GOLD', 'SILVER', 'CRUDEOIL', 'NATURALGAS', 'COPPER',
        # NSE Stock Futures (top FnO by volume)
        'RELIANCE', 'HDFCBANK', 'ICICIBANK', 'INFY', 'TCS', 'SBIN',
    ])
    exchanges: dict = field(default_factory=lambda: {
        'NIFTY': 'NSE', 'BANKNIFTY': 'NSE', 'FINNIFTY': 'NSE',
        'GOLD': 'MCX', 'SILVER': 'MCX', 'CRUDEOIL': 'MCX',
        'NATURALGAS': 'MCX', 'COPPER': 'MCX',
        'RELIANCE': 'NFO', 'HDFCBANK': 'NFO', 'ICICIBANK': 'NFO',
        'INFY': 'NFO', 'TCS': 'NFO', 'SBIN': 'NFO',
    })
    lookback_days: int = 2500
    window_size: int = 21
    # Model
    hidden_size: int = 128
    num_heads: int = 4
    dropout_rate: float = 0.2
    # Training
    batch_size: int = 32
    epochs: int = 50
    learning_rate: float = 1e-3
    early_stop_patience: int = 10
    lr_reduce_patience: int = 5
    lr_reduce_factor: float = 0.5
    min_lr: float = 1e-6
    clipnorm: float = 1.0
    # Walk-forward
    min_train_days: int = 504   # 2 years minimum
    test_days: int = 63         # 3 months
    purge_gap: int = 5
    # Costs
    bps_cost: float = 0.0010   # 10 bps per side
    # Quick mode
    quick_mode: bool = False    # True: 1 ticker, fewer epochs; False: full universe

CFG = MonolithConfig()

# ---------------------------------------------------------------------------
# Version tag — verify after Kernel Restart that you see this version
# ---------------------------------------------------------------------------
NOTEBOOK_VERSION = "v3.0-2026-02-14"

# ---------------------------------------------------------------------------
# Summary
# ---------------------------------------------------------------------------
print("=" * 70)
print(f"QuantKubera Monolith v3 — Magnum Opus")
print("=" * 70)
print(f"  GPU Available : {GPU_AVAILABLE} ({GPU_NAME})")
print(f"  TensorFlow    : {tf.__version__}")
print(f"  NumPy         : {np.__version__}")
print(f"  Pandas        : {pd.__version__}")
print(f"  Seed          : {SEED}")
print(f"  Quick Mode    : {CFG.quick_mode}")
print(f"  Tickers       : {CFG.tickers}")
print(f"  Hidden Size   : {CFG.hidden_size}")
print(f"  Num Heads     : {CFG.num_heads}")
print(f"  Epochs        : {CFG.epochs}")
print(f"  Batch Size    : {CFG.batch_size}")
print(f"  Walk-forward  : train>={CFG.min_train_days}d, test={CFG.test_days}d, purge={CFG.purge_gap}d")
print(f"  Cost Model    : {CFG.bps_cost * 10000:.0f} bps per side")
print("=" * 70)

2026-02-14 16:36:14.821409: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-02-14 16:36:14.888459: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2026-02-14 16:36:16.282898: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


QuantKubera Monolith v3 — Magnum Opus
  GPU Available : True (/device:GPU:0)
  TensorFlow    : 2.20.0
  NumPy         : 1.26.4
  Pandas        : 3.0.0
  Seed          : 42
  Quick Mode    : False
  Tickers       : ['NIFTY', 'BANKNIFTY', 'FINNIFTY', 'GOLD', 'SILVER', 'CRUDEOIL', 'NATURALGAS', 'COPPER', 'RELIANCE', 'HDFCBANK', 'ICICIBANK', 'INFY', 'TCS', 'SBIN']
  Hidden Size   : 128
  Num Heads     : 4
  Epochs        : 50
  Batch Size    : 32
  Walk-forward  : train>=504d, test=63d, purge=5d
  Cost Model    : 10 bps per side


I0000 00:00:1771086978.747460 3754101 gpu_device.cc:2020] Created device /device:GPU:0 with 13775 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5


In [2]:
# ============================================================================
# CELL 2: Data Engine — KiteAuth + KiteFetcher
# ============================================================================

load_dotenv()


class KiteAuth:
    """TOTP-based Zerodha authentication with token caching."""

    TOKEN_PATH = Path.home() / '.zerodha_access_token'
    LOGIN_URL = 'https://kite.zerodha.com/api/login'
    TWOFA_URL = 'https://kite.zerodha.com/api/twofa'
    CONNECT_URL = 'https://kite.zerodha.com/connect/login'

    def __init__(self):
        self.api_key = os.getenv('ZERODHA_API_KEY', '')
        self.api_secret = os.getenv('ZERODHA_API_SECRET', '')
        self.totp_secret = os.getenv('ZERODHA_TOTP_SECRET', '')
        self.user_id = os.getenv('ZERODHA_USER_ID', '')
        self.password = os.getenv('ZERODHA_PASSWORD', '')

    def _load_cached_token(self) -> Optional[str]:
        """Load cached access token if it exists and was created today."""
        if not self.TOKEN_PATH.exists():
            return None
        stat = self.TOKEN_PATH.stat()
        modified = datetime.fromtimestamp(stat.st_mtime).date()
        if modified != datetime.now().date():
            return None
        try:
            token_data = json.loads(self.TOKEN_PATH.read_text())
            return token_data.get('access_token')
        except (json.JSONDecodeError, KeyError):
            return None

    def _save_token(self, access_token: str):
        """Cache the access token to disk."""
        self.TOKEN_PATH.write_text(json.dumps({
            'access_token': access_token,
            'timestamp': datetime.now().isoformat(),
            'user_id': self.user_id,
        }))

    def _auto_login(self) -> str:
        """Perform full auto-login flow and return access token."""
        session = requests.Session()

        # Step 1: POST /api/login
        logger.info("KiteAuth: Step 1 — login")
        resp = session.post(self.LOGIN_URL, data={
            'user_id': self.user_id,
            'password': self.password,
        })
        resp.raise_for_status()
        login_data = resp.json()
        if login_data.get('status') != 'success':
            raise RuntimeError(f"Login failed: {login_data}")
        request_id = login_data['data']['request_id']

        # Step 2: POST /api/twofa with TOTP
        logger.info("KiteAuth: Step 2 — TOTP 2FA")
        totp = pyotp.TOTP(self.totp_secret)
        twofa_value = totp.now()
        resp = session.post(self.TWOFA_URL, data={
            'user_id': self.user_id,
            'request_id': request_id,
            'twofa_value': twofa_value,
            'twofa_type': 'totp',
        })
        resp.raise_for_status()
        twofa_data = resp.json()
        if twofa_data.get('status') != 'success':
            raise RuntimeError(f"2FA failed: {twofa_data}")

        # Step 3: GET /connect/login to extract request_token
        # Follow the full redirect chain — request_token appears in the
        # final redirect to the registered callback URL.
        logger.info("KiteAuth: Step 3 — extract request_token")
        from urllib.parse import urlparse, parse_qs

        resp = session.get(self.CONNECT_URL, params={
            'api_key': self.api_key,
            'v': '3',
        }, allow_redirects=True)

        # Search for request_token in the final URL and all redirect history
        candidate_urls = [resp.url] + [r.headers.get('Location', '') for r in resp.history]
        request_token = None
        for url in candidate_urls:
            parsed = urlparse(url)
            params = parse_qs(parsed.query)
            if 'request_token' in params:
                request_token = params['request_token'][0]
                break

        if request_token is None:
            raise RuntimeError(
                f"Could not extract request_token. "
                f"Final URL: {resp.url}, "
                f"History: {[r.url for r in resp.history]}"
            )

        # Step 4: Generate session
        logger.info("KiteAuth: Step 4 — generate session")
        kite = KiteConnect(api_key=self.api_key)
        data = kite.generate_session(request_token, api_secret=self.api_secret)
        access_token = data['access_token']

        self._save_token(access_token)
        logger.info("KiteAuth: login complete, token cached")
        return access_token

    def get_session(self) -> KiteConnect:
        """Return an authenticated KiteConnect instance."""
        # Try cached token first
        cached = self._load_cached_token()
        if cached:
            kite = KiteConnect(api_key=self.api_key)
            kite.set_access_token(cached)
            try:
                kite.profile()
                logger.info("KiteAuth: using cached token")
                return kite
            except Exception:
                logger.info("KiteAuth: cached token expired, re-logging in")

        # Full login
        access_token = self._auto_login()
        kite = KiteConnect(api_key=self.api_key)
        kite.set_access_token(access_token)
        return kite


class KiteFetcher:
    """Fetches daily OHLCV data from Zerodha Kite."""

    MAX_CHUNK_DAYS = 1900  # API limit is 2000, use 1900 for safety

    def __init__(self, kite: KiteConnect):
        self.kite = kite
        self._instruments_cache: Optional[pd.DataFrame] = None

    def _get_instruments(self, exchange: str) -> pd.DataFrame:
        """Fetch and cache instruments list for the given exchange."""
        if self._instruments_cache is None:
            all_instruments = self.kite.instruments()
            self._instruments_cache = pd.DataFrame(all_instruments)
        return self._instruments_cache[
            self._instruments_cache['exchange'] == exchange
        ].copy()

    def _resolve_instrument(self, symbol: str, exchange: str) -> Tuple[str, str]:
        """
        Resolve symbol to instrument_token.
        Try exact match first, then fuzzy match for derivatives (FUT, nearest expiry).
        Returns (instrument_token, resolved_tradingsymbol).
        """
        instruments = self._get_instruments(exchange)

        # Exact match on tradingsymbol
        exact = instruments[instruments['tradingsymbol'] == symbol]
        if len(exact) > 0:
            row = exact.iloc[0]
            return str(row['instrument_token']), row['tradingsymbol']

        # Try NFO/MCX-FUT: look for symbol + FUT with nearest expiry
        # For index derivatives, check NFO exchange
        fut_exchange = exchange
        if exchange == 'NSE':
            fut_exchange = 'NFO'
            instruments = self._get_instruments(fut_exchange)
        elif exchange == 'MCX':
            # MCX futures are on MCX itself
            pass

        # Fuzzy match: tradingsymbol starts with symbol, segment contains FUT
        fuzzy = instruments[
            instruments['tradingsymbol'].str.startswith(symbol) &
            (instruments['instrument_type'] == 'FUT')
        ].copy()

        if len(fuzzy) == 0:
            # Try with exchange-specific naming
            fuzzy = instruments[
                instruments['tradingsymbol'].str.contains(symbol, case=False, na=False) &
                (instruments['instrument_type'] == 'FUT')
            ].copy()

        if len(fuzzy) == 0:
            raise ValueError(
                f"Could not resolve instrument: {symbol} on {exchange}/{fut_exchange}. "
                f"Check symbol name and exchange."
            )

        # Pick the nearest expiry
        if 'expiry' in fuzzy.columns:
            fuzzy['expiry'] = pd.to_datetime(fuzzy['expiry'], errors='coerce')
            fuzzy = fuzzy.dropna(subset=['expiry'])
            fuzzy = fuzzy.sort_values('expiry')
            # Pick the nearest future expiry (>= today)
            today = pd.Timestamp.now().normalize()
            future_expiries = fuzzy[fuzzy['expiry'] >= today]
            if len(future_expiries) > 0:
                row = future_expiries.iloc[0]
            else:
                row = fuzzy.iloc[-1]  # Fallback to latest
        else:
            row = fuzzy.iloc[0]

        resolved = row['tradingsymbol']
        token = str(row['instrument_token'])
        print(f"  Resolved {symbol} -> {resolved} (expiry: {row.get('expiry', 'N/A')})")
        return token, resolved

    def fetch_daily(self, symbol: str, exchange: str, days: int = 2500) -> pd.DataFrame:
        """
        Fetch daily OHLCV data for the given symbol.

        Parameters
        ----------
        symbol : str
            Trading symbol (e.g., 'NIFTY', 'BANKNIFTY', 'SILVER')
        exchange : str
            Exchange (e.g., 'NSE', 'MCX')
        days : int
            Number of calendar days to look back

        Returns
        -------
        pd.DataFrame
            DataFrame with index=date (tz-naive), columns: open, high, low, close, volume
        """
        instrument_token, resolved_name = self._resolve_instrument(symbol, exchange)

        end_date = datetime.now().date()
        start_date = end_date - timedelta(days=days)

        print(f"  Fetching {resolved_name}: {start_date} to {end_date} ({days} calendar days)")

        all_records = []
        chunk_start = start_date

        while chunk_start < end_date:
            chunk_end = min(chunk_start + timedelta(days=self.MAX_CHUNK_DAYS), end_date)

            try:
                records = self.kite.historical_data(
                    instrument_token=int(instrument_token),
                    from_date=chunk_start,
                    to_date=chunk_end,
                    interval='day',
                    continuous=True,
                )
                all_records.extend(records)
            except Exception as e:
                logger.warning(f"  Chunk {chunk_start}-{chunk_end} failed: {e}")

            chunk_start = chunk_end + timedelta(days=1)

        if not all_records:
            raise ValueError(f"No data returned for {symbol} ({resolved_name})")

        df = pd.DataFrame(all_records)
        df['date'] = pd.to_datetime(df['date']).dt.tz_localize(None)
        df = df.set_index('date').sort_index()

        # Standardize column names to lowercase
        df.columns = [c.lower() for c in df.columns]

        # Keep only OHLCV
        keep_cols = ['open', 'high', 'low', 'close', 'volume']
        for col in keep_cols:
            if col not in df.columns:
                df[col] = np.nan

        df = df[keep_cols]
        df = df[~df.index.duplicated(keep='last')]

        print(f"  {resolved_name}: {len(df)} bars, {df.index[0].date()} to {df.index[-1].date()}")
        return df

In [3]:
# ============================================================================
# CELL 2b: Cross-Asset Data — News Sentiment + India VIX
# ============================================================================
#
# Cross-asset features: COMMON across all tickers (one value per day).
# Captures market-wide regime through:
#   1. FinBERT sentiment of GDELT financial news headlines (9 features)
#   2. India VIX fear gauge (3 features)
#
# Architecture:
#   - Headlines fetched from GDELT DOC 2.0 API (free, no API key)
#   - Scored with ProsusAI/finbert on GPU (~5ms/headline)
#   - All data cached to disk for instant re-runs
#   - T+1 causality: headlines from day D -> features for day D+1
#   - India VIX fetched via Kite API (already authenticated)
# ============================================================================

# ---------------------------------------------------------------------------
# FinBERT availability check
# ---------------------------------------------------------------------------
try:
    import torch
    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    HAS_FINBERT = True
except ImportError:
    HAS_FINBERT = False
    logger.warning("torch/transformers not installed — sentiment features disabled")

# ---------------------------------------------------------------------------
# Cache directories
# ---------------------------------------------------------------------------
_QK_CACHE_DIR = Path.home() / '.quantkubera'
_HEADLINE_CACHE_DIR = _QK_CACHE_DIR / 'headlines'
_SCORE_CACHE_DIR = _QK_CACHE_DIR / 'sentiment_scores'

# ---------------------------------------------------------------------------
# GDELT DOC 2.0 API
# ---------------------------------------------------------------------------
_GDELT_ENDPOINT = "https://api.gdeltproject.org/api/v2/doc/doc"
_GDELT_DELAY = 2.0  # seconds between requests

# Focused queries: India markets + commodities + global macro + crypto
_GDELT_QUERIES = [
    # India markets — covers NIFTY, BANKNIFTY, Sensex, broad market
    '"NIFTY" OR "Sensex" OR "BSE" OR "NSE" OR "Indian stock market" '
    'sourceCountry:IN sourcelang:eng',
    # India FnO stocks
    '"Reliance" OR "TCS" OR "HDFC" OR "Infosys" OR "ICICI" OR "SBI" '
    'sourceCountry:IN sourcelang:eng',
    # Commodities — gold, oil, metals (relevant for MCX tickers)
    '"gold price" OR "crude oil" OR "silver price" OR "copper" OR "natural gas" '
    'sourcelang:eng',
    # Global macro — Fed, rates, trade (affects all markets)
    '"Federal Reserve" OR "interest rate" OR "global markets" OR "trade war" '
    'OR "tariffs" sourcelang:eng',
    # Crypto
    '"bitcoin" OR "ethereum" OR "cryptocurrency" sourcelang:eng',
]


def _gdelt_fetch_articles(query: str, start_dt: datetime,
                           end_dt: datetime) -> list:
    """Fetch articles from GDELT DOC 2.0 API (single request, max 250)."""
    params = {
        'query': query,
        'mode': 'ArtList',
        'format': 'json',
        'maxrecords': 250,
        'startdatetime': start_dt.strftime('%Y%m%d%H%M%S'),
        'enddatetime': end_dt.strftime('%Y%m%d%H%M%S'),
        'sort': 'DateDesc',
    }
    for attempt in range(3):
        try:
            resp = requests.get(_GDELT_ENDPOINT, params=params, timeout=30,
                                headers={'User-Agent': 'QuantKubera/2.1'})
            if resp.status_code == 429:
                wait = 5 * (attempt + 1)
                time.sleep(wait)
                continue
            resp.raise_for_status()
            data = resp.json()
            return data.get('articles', [])
        except requests.exceptions.JSONDecodeError:
            return []
        except Exception as e:
            logger.warning(f"GDELT fetch failed (attempt {attempt+1}/3): {e}")
            if attempt < 2:
                time.sleep(3)
    return []


def _parse_gdelt_date(seendate: str) -> Optional[datetime]:
    """Parse GDELT seendate like '20260210T083000Z'."""
    try:
        clean = seendate.replace('T', '').replace('Z', '')
        return datetime.strptime(clean, '%Y%m%d%H%M%S')
    except (ValueError, AttributeError):
        return None


def fetch_gdelt_headlines(start_date: str, end_date: str,
                          chunk_days: int = 14) -> Dict[str, list]:
    """Fetch financial news headlines from GDELT with disk caching.

    Args:
        start_date, end_date: 'YYYY-MM-DD'
        chunk_days: calendar days per API request

    Returns: {date_str: [{'title': str, 'source': str}]}
    """
    _HEADLINE_CACHE_DIR.mkdir(parents=True, exist_ok=True)

    # Check disk cache
    cache_file = _HEADLINE_CACHE_DIR / f"gdelt_{start_date}_{end_date}.json"
    if cache_file.exists():
        try:
            with open(cache_file) as f:
                cached = json.load(f)
            n_total = sum(len(v) for v in cached.values())
            print(f"  GDELT cache hit: {n_total} headlines across "
                  f"{len(cached)} days")
            return cached
        except (json.JSONDecodeError, KeyError):
            pass

    start_dt = datetime.strptime(start_date, '%Y-%m-%d')
    end_dt = datetime.strptime(end_date, '%Y-%m-%d').replace(
        hour=23, minute=59, second=59)

    seen_titles = set()
    daily_headlines: Dict[str, list] = {}
    total = 0

    n_queries = len(_GDELT_QUERIES)
    n_chunks_per_query = max(1, (end_dt - start_dt).days // chunk_days + 1)
    total_requests = n_queries * n_chunks_per_query

    pbar = tqdm(total=total_requests, desc="GDELT headlines",
                unit="req", leave=False)

    for q_idx, query in enumerate(_GDELT_QUERIES):
        chunk_start = start_dt
        while chunk_start < end_dt:
            chunk_end = min(
                chunk_start + timedelta(days=chunk_days), end_dt)

            articles = _gdelt_fetch_articles(query, chunk_start, chunk_end)

            for art in articles:
                title = art.get('title', '').strip()
                if not title or len(title) < 15:
                    continue

                norm = title.lower().strip()
                if norm in seen_titles:
                    continue
                seen_titles.add(norm)

                dt = _parse_gdelt_date(art.get('seendate', ''))
                if dt is None:
                    continue

                date_str = dt.strftime('%Y-%m-%d')
                if date_str not in daily_headlines:
                    daily_headlines[date_str] = []

                daily_headlines[date_str].append({
                    'title': title,
                    'source': art.get('domain', 'unknown'),
                })
                total += 1

            chunk_start = chunk_end + timedelta(days=1)
            time.sleep(_GDELT_DELAY)
            pbar.update(1)
            pbar.set_postfix_str(f"{total} headlines")

    pbar.close()

    # Save cache
    with open(cache_file, 'w') as f:
        json.dump(daily_headlines, f)

    print(f"  GDELT: {total} unique headlines across {len(daily_headlines)} days")
    return daily_headlines


# ---------------------------------------------------------------------------
# FinBERT Sentiment Scorer with disk caching
# ---------------------------------------------------------------------------
_finbert_model = None
_finbert_tokenizer = None
_finbert_device = None


def _load_finbert():
    """Load ProsusAI/finbert model (singleton, GPU if available)."""
    global _finbert_model, _finbert_tokenizer, _finbert_device
    if _finbert_model is not None:
        return

    if not HAS_FINBERT:
        raise RuntimeError("torch/transformers not installed")

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"  Loading ProsusAI/finbert on {device}...")

    _finbert_tokenizer = AutoTokenizer.from_pretrained('ProsusAI/finbert')
    _finbert_model = AutoModelForSequenceClassification.from_pretrained(
        'ProsusAI/finbert')
    _finbert_model.to(device)
    _finbert_model.eval()
    _finbert_device = device
    print(f"  FinBERT loaded on {device}")


def _finbert_score_batch(texts: list, batch_size: int = 64) -> list:
    """Score texts with FinBERT. Returns [(score, confidence, label), ...]

    score: float in [-1, +1] (positive - negative probability)
    confidence: float in [0, 1] (max class probability)
    label: str "positive" | "negative" | "neutral"
    """
    _load_finbert()
    label_map = {0: 'positive', 1: 'negative', 2: 'neutral'}
    results = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = _finbert_tokenizer(
            batch, padding=True, truncation=True,
            max_length=128, return_tensors='pt'
        ).to(_finbert_device)

        with torch.no_grad():
            outputs = _finbert_model(**inputs)
            probs = torch.softmax(outputs.logits, dim=-1)

        for j in range(len(batch)):
            p = probs[j]
            pos, neg = p[0].item(), p[1].item()
            score = pos - neg
            confidence = p.max().item()
            label = label_map[p.argmax().item()]
            results.append((score, confidence, label))

    return results


def _title_hash(title: str) -> str:
    """Deterministic hash for cache key."""
    return hashlib.sha256(title.strip().lower().encode()).hexdigest()[:16]


def score_headlines_with_cache(
    daily_headlines: Dict[str, list],
) -> Dict[str, list]:
    """Score all headlines with FinBERT, using disk cache.

    Args:
        daily_headlines: {date_str: [{'title': str, 'source': str}]}

    Returns: {date_str: [(score, confidence, label), ...]}
    """
    if not HAS_FINBERT:
        return {}

    _SCORE_CACHE_DIR.mkdir(parents=True, exist_ok=True)
    cache_file = _SCORE_CACHE_DIR / 'finbert_scores.json'

    # Load existing cache {title_hash: [score, confidence, label]}
    score_cache: Dict[str, list] = {}
    if cache_file.exists():
        try:
            with open(cache_file) as f:
                score_cache = json.load(f)
        except (json.JSONDecodeError, KeyError):
            pass

    # Collect uncached titles
    uncached = []  # (date_str, idx, title, hash)
    for date_str, headlines in daily_headlines.items():
        for idx, h in enumerate(headlines):
            th = _title_hash(h['title'])
            if th not in score_cache:
                uncached.append((date_str, idx, h['title'], th))

    if uncached:
        print(f"  Scoring {len(uncached)} uncached headlines with FinBERT...")
        titles = [u[2] for u in uncached]
        scores = _finbert_score_batch(titles)

        for (date_str, idx, title, th), (score, conf, label) in zip(
                uncached, scores):
            score_cache[th] = [score, conf, label]

        # Persist cache
        with open(cache_file, 'w') as f:
            json.dump(score_cache, f)
        print(f"  Scored {len(uncached)} headlines, cache now "
              f"{len(score_cache)} entries")
    else:
        print(f"  All {sum(len(v) for v in daily_headlines.values())} "
              f"headlines already cached")

    # Build result
    result: Dict[str, list] = {}
    for date_str, headlines in daily_headlines.items():
        day_scores = []
        for h in headlines:
            th = _title_hash(h['title'])
            if th in score_cache:
                day_scores.append(tuple(score_cache[th]))
            else:
                day_scores.append((0.0, 0.5, 'neutral'))
        result[date_str] = day_scores

    return result


# ---------------------------------------------------------------------------
# Sentiment Feature Builder (9 features, T+1 lag)
# ---------------------------------------------------------------------------
SENTIMENT_COLUMNS = [
    'ns_sent_mean', 'ns_sent_std', 'ns_pos_ratio', 'ns_neg_ratio',
    'ns_confidence_mean', 'ns_news_count',
    'ns_sent_5d_ma', 'ns_news_count_5d', 'ns_sent_momentum',
]


def build_sentiment_features(
    daily_scores: Dict[str, list],
    date_index: pd.DatetimeIndex,
) -> pd.DataFrame:
    """Build 9 daily sentiment features with T+1 causal lag.

    Headlines from day D -> features for day D+1 (zero look-ahead).

    Args:
        daily_scores: {date_str: [(score, confidence, label), ...]}
        date_index: DatetimeIndex to align features to

    Returns: DataFrame with SENTIMENT_COLUMNS, indexed by date
    """
    if not daily_scores:
        return pd.DataFrame(columns=SENTIMENT_COLUMNS,
                            index=date_index, dtype=float)

    rows = []
    for date_str, scores in sorted(daily_scores.items()):
        if not scores:
            continue
        scores_arr = np.array([s[0] for s in scores])
        confs_arr = np.array([s[1] for s in scores])
        labels = [s[2] for s in scores]
        n = len(scores_arr)
        n_pos = sum(1 for l in labels if l == 'positive')
        n_neg = sum(1 for l in labels if l == 'negative')

        rows.append({
            'date': pd.Timestamp(date_str),
            'ns_sent_mean': float(np.mean(scores_arr)),
            'ns_sent_std': float(np.std(scores_arr, ddof=1)) if n > 1 else 0.0,
            'ns_pos_ratio': n_pos / n if n > 0 else 0.0,
            'ns_neg_ratio': n_neg / n if n > 0 else 0.0,
            'ns_confidence_mean': float(np.mean(confs_arr)),
            'ns_news_count': float(n),
        })

    if not rows:
        return pd.DataFrame(columns=SENTIMENT_COLUMNS,
                            index=date_index, dtype=float)

    df = pd.DataFrame(rows).set_index('date').sort_index()

    # Rolling features (causal — use only past data)
    df['ns_sent_5d_ma'] = df['ns_sent_mean'].rolling(5, min_periods=1).mean()
    df['ns_news_count_5d'] = df['ns_news_count'].rolling(5, min_periods=1).sum()
    df['ns_sent_momentum'] = df['ns_sent_mean'] - df['ns_sent_5d_ma']

    # T+1 lag: headlines from day D become features for day D+1
    df = df.shift(1)

    # Align to target date index, forward-fill short gaps (weekends)
    df = df.reindex(date_index)
    df = df.ffill(limit=3)

    return df[SENTIMENT_COLUMNS]


# ---------------------------------------------------------------------------
# India VIX Fetcher (via Kite API)
# ---------------------------------------------------------------------------
def fetch_india_vix_kite(kite: KiteConnect, lookback_days: int = 2500
                         ) -> pd.DataFrame:
    """Fetch India VIX daily OHLCV via Kite API.

    Uses the authenticated KiteConnect instance to fetch INDIA VIX from NSE.

    Args:
        kite: authenticated KiteConnect instance
        lookback_days: calendar days of history

    Returns: DataFrame with [vix_close, vix_open, vix_high, vix_low],
             index=date (tz-naive DatetimeIndex)
    """
    empty = pd.DataFrame(
        columns=['vix_open', 'vix_high', 'vix_low', 'vix_close'])

    # Resolve INDIA VIX instrument token
    try:
        instruments = pd.DataFrame(kite.instruments('NSE'))
        vix_rows = instruments[
            instruments['tradingsymbol'] == 'INDIA VIX']
        if vix_rows.empty:
            # Try partial match
            vix_rows = instruments[
                instruments['tradingsymbol'].str.contains(
                    'VIX', case=False, na=False)]
        if vix_rows.empty:
            logger.warning("INDIA VIX instrument not found on NSE")
            return empty
        token = int(vix_rows.iloc[0]['instrument_token'])
        symbol = vix_rows.iloc[0]['tradingsymbol']
        print(f"  Resolved VIX: {symbol} (token={token})")
    except Exception as e:
        logger.warning(f"VIX instrument resolution failed: {e}")
        return empty

    end_date = datetime.now().date()
    start_date = end_date - timedelta(days=lookback_days)

    all_records = []
    chunk_start = start_date
    max_chunk = 1900

    while chunk_start < end_date:
        chunk_end = min(chunk_start + timedelta(days=max_chunk), end_date)
        try:
            records = kite.historical_data(
                instrument_token=token,
                from_date=chunk_start,
                to_date=chunk_end,
                interval='day',
                continuous=False,  # VIX is an index, not futures
            )
            all_records.extend(records)
        except Exception as e:
            logger.warning(f"VIX chunk {chunk_start}-{chunk_end} failed: {e}")
        chunk_start = chunk_end + timedelta(days=1)

    if not all_records:
        return empty

    df = pd.DataFrame(all_records)
    df['date'] = pd.to_datetime(df['date']).dt.tz_localize(None)
    df = df.set_index('date').sort_index()
    df.columns = [c.lower() for c in df.columns]

    # Rename to vix_* prefix
    rename_map = {'close': 'vix_close', 'open': 'vix_open',
                  'high': 'vix_high', 'low': 'vix_low'}
    df = df.rename(columns=rename_map)
    keep = [c for c in ['vix_open', 'vix_high', 'vix_low', 'vix_close']
            if c in df.columns]
    df = df[keep]
    df = df[~df.index.duplicated(keep='last')]

    print(f"  India VIX: {len(df)} days "
          f"({df.index[0].date()} to {df.index[-1].date()})")
    return df


# ---------------------------------------------------------------------------
# VIX Feature Builder (3 features)
# ---------------------------------------------------------------------------
VIX_COLUMNS = ['vix_level', 'vix_change_5d', 'vix_mean_reversion']


def build_vix_features(vix_df: pd.DataFrame,
                       date_index: pd.DatetimeIndex) -> pd.DataFrame:
    """Build 3 VIX features.

    Args:
        vix_df: DataFrame with vix_close column
        date_index: DatetimeIndex to align to

    Returns: DataFrame with VIX_COLUMNS
    """
    if vix_df.empty or 'vix_close' not in vix_df.columns:
        return pd.DataFrame(columns=VIX_COLUMNS,
                            index=date_index, dtype=float)

    vix = vix_df['vix_close'].copy()
    result = pd.DataFrame(index=vix.index)

    # VIX z-score: (VIX - 60d mean) / 60d std
    vix_mean = vix.rolling(60, min_periods=20).mean()
    vix_std = vix.rolling(60, min_periods=20).std(ddof=1)
    vix_std = vix_std.replace(0, np.nan)
    result['vix_level'] = (vix - vix_mean) / vix_std

    # 5-day VIX change (percent)
    result['vix_change_5d'] = vix.pct_change(5)

    # VIX mean reversion: (VIX - 20d MA) / 20d std
    vix_ma20 = vix.rolling(20, min_periods=10).mean()
    vix_std20 = vix.rolling(20, min_periods=10).std(ddof=1)
    vix_std20 = vix_std20.replace(0, np.nan)
    result['vix_mean_reversion'] = (vix - vix_ma20) / vix_std20

    # Align to target date index, forward-fill short gaps
    result = result.reindex(date_index)
    result = result.ffill(limit=3)

    return result[VIX_COLUMNS]


# ---------------------------------------------------------------------------
# Master Cross-Asset Function
# ---------------------------------------------------------------------------
CROSS_ASSET_COLUMNS: List[str] = []  # populated at runtime


def fetch_cross_asset_features(
    start_date: str,
    end_date: str,
    date_index: pd.DatetimeIndex,
    kite: Optional[KiteConnect] = None,
) -> pd.DataFrame:
    """Fetch all cross-asset data and build features.

    Populates the global CROSS_ASSET_COLUMNS with available feature names.

    Args:
        start_date, end_date: 'YYYY-MM-DD'
        date_index: DatetimeIndex to align all features to
        kite: authenticated KiteConnect (for VIX)

    Returns: DataFrame with available cross-asset features
    """
    global CROSS_ASSET_COLUMNS

    all_features = pd.DataFrame(index=date_index)
    available_cols: List[str] = []

    # --- News Sentiment (GDELT + FinBERT) ---
    print("\n  [1/2] News Sentiment (GDELT + FinBERT)")
    try:
        daily_headlines = fetch_gdelt_headlines(start_date, end_date)

        if daily_headlines and HAS_FINBERT:
            daily_scores = score_headlines_with_cache(daily_headlines)
            sent_df = build_sentiment_features(daily_scores, date_index)

            for col in SENTIMENT_COLUMNS:
                if col in sent_df.columns:
                    all_features[col] = sent_df[col]
                    available_cols.append(col)

            n_days = sent_df.notna().any(axis=1).sum()
            print(f"    -> {len(available_cols)} features, "
                  f"{n_days} days with data")
        elif not HAS_FINBERT:
            print("    SKIP: FinBERT not available "
                  "(pip install torch transformers)")
        else:
            print("    SKIP: No headlines fetched from GDELT")
    except Exception as e:
        logger.warning(f"Sentiment features failed: {e}")
        print(f"    SKIP: Failed ({e})")

    # --- India VIX ---
    print("  [2/2] India VIX")
    try:
        if kite is not None:
            vix_df = fetch_india_vix_kite(kite)
            if not vix_df.empty:
                vix_feats = build_vix_features(vix_df, date_index)
                for col in VIX_COLUMNS:
                    if col in vix_feats.columns:
                        all_features[col] = vix_feats[col]
                        available_cols.append(col)
                n_days = vix_feats.notna().any(axis=1).sum()
                print(f"    -> {len(VIX_COLUMNS)} features, "
                      f"{n_days} days with data")
            else:
                print("    SKIP: No VIX data from Kite")
        else:
            print("    SKIP: No Kite instance provided")
    except Exception as e:
        logger.warning(f"VIX features failed: {e}")
        print(f"    SKIP: Failed ({e})")

    CROSS_ASSET_COLUMNS = available_cols

    print(f"\n  Cross-Asset Summary: {len(available_cols)} features available")
    if available_cols:
        print(f"    Columns: {available_cols}")

    if available_cols:
        return all_features[available_cols]
    return pd.DataFrame(index=date_index)

In [4]:
# ============================================================================
# CELL 3b: World Monitor Intelligence Engine
# ============================================================================
# Integrates macro signals from the World Monitor geopolitical intelligence
# platform (koala73/worldmonitor). Adds global macro regime detection,
# anomaly detection (Welford's algorithm), and cross-market signals.
#
# Data Sources:
#   - Yahoo Finance: JPY/USD, QQQ, XLP, DXY, Gold, BTC (free, no API key)
#   - alternative.me: Fear & Greed Index (free)
#   - CoinGecko: Stablecoin peg health (free)
#
# All features are CAUSAL: use data available at time t to predict t+1.
# ============================================================================

# ---------------------------------------------------------------------------
# Group 13: Macro Regime Signals (World Monitor port)
# ---------------------------------------------------------------------------
# Ported from worldmonitor/api/macro-signals.js
# 7-signal composite: JPY liquidity, risk-on/off, flow alignment,
# technical trend, hash rate, mining cost, Fear & Greed.
# We implement the 4 most relevant for India FnO:
#   1. JPY carry (yen strengthening kills EM)
#   2. Risk-on/off (QQQ vs XLP consumer staples)
#   3. BTC/QQQ flow alignment (cross-asset momentum coherence)
#   4. Fear & Greed (sentiment baseline)
# ---------------------------------------------------------------------------

MACRO_COLUMNS = [
    'macro_jpy_carry',       # JPY/USD 30d ROC (negative = squeeze)
    'macro_risk_regime',     # QQQ/XLP ratio z-score (risk-on/off)
    'macro_flow_align',      # |BTC 5d - QQQ 5d| (cross-asset coherence)
    'macro_fear_greed_z',    # Fear & Greed Index z-scored
    'macro_composite',       # Weighted composite (0-100)
]

def fetch_yahoo_macro(start_date: str, end_date: str) -> pd.DataFrame:
    """Fetch macro signals from Yahoo Finance (free, no API key).

    Fetches: JPY=X (USD/JPY), QQQ, XLP, BTC-USD, GC=F (Gold), DX-Y.NYB (DXY)
    Returns: DataFrame indexed by date with macro features.
    """
    if not HAS_YFINANCE:
        logger.warning("yfinance not installed — macro signals disabled")
        return pd.DataFrame()

    tickers = {
        'JPY=X': 'jpy',       # USD/JPY (invert for JPY strength)
        'QQQ': 'qqq',         # Nasdaq 100 ETF
        'XLP': 'xlp',         # Consumer Staples (defensive)
        'BTC-USD': 'btc',     # Bitcoin
    }

    print("  Fetching Yahoo Finance macro data...")
    try:
        data = yf.download(
            list(tickers.keys()),
            start=start_date,
            end=end_date,
            auto_adjust=True,
            progress=False,
        )
        if data.empty:
            print("    SKIP: No data from Yahoo Finance")
            return pd.DataFrame()
    except Exception as e:
        logger.warning(f"Yahoo Finance fetch failed: {e}")
        return pd.DataFrame()

    # Extract close prices
    closes = pd.DataFrame(index=data.index)
    if isinstance(data.columns, pd.MultiIndex):
        for yahoo_tk, col_name in tickers.items():
            if yahoo_tk in data['Close'].columns:
                closes[col_name] = data['Close'][yahoo_tk]
    else:
        # Single ticker case
        for yahoo_tk, col_name in tickers.items():
            if 'Close' in data.columns:
                closes[col_name] = data['Close']

    if closes.empty or closes.dropna(how='all').empty:
        print("    SKIP: Yahoo Finance returned empty data")
        return pd.DataFrame()

    closes = closes.ffill().dropna()

    result = pd.DataFrame(index=closes.index)

    # Signal 1: JPY Carry — 30d rate of change of USD/JPY
    # Negative ROC = JPY strengthening = EM risk (carry unwind)
    if 'jpy' in closes.columns:
        jpy_roc_30 = closes['jpy'].pct_change(30) * 100
        result['macro_jpy_carry'] = jpy_roc_30

    # Signal 2: Risk-On/Off — QQQ vs XLP 20d ROC ratio
    # QQQ outperforming XLP = risk-on; z-scored for stationarity
    if 'qqq' in closes.columns and 'xlp' in closes.columns:
        qqq_roc = closes['qqq'].pct_change(20)
        xlp_roc = closes['xlp'].pct_change(20)
        ratio = qqq_roc - xlp_roc
        result['macro_risk_regime'] = (
            (ratio - ratio.rolling(126).mean()) /
            (ratio.rolling(126).std() + 1e-8)
        )

    # Signal 3: Flow Alignment — |BTC 5d - QQQ 5d| gap
    # Small gap = markets aligned = momentum persistence
    if 'btc' in closes.columns and 'qqq' in closes.columns:
        btc_5d = closes['btc'].pct_change(5) * 100
        qqq_5d = closes['qqq'].pct_change(5) * 100
        gap = (btc_5d - qqq_5d).abs()
        # Invert and z-score: low gap = high alignment = positive signal
        result['macro_flow_align'] = -(
            (gap - gap.rolling(63).mean()) /
            (gap.rolling(63).std() + 1e-8)
        )

    n_signals = result.notna().any().sum()
    n_days = len(result.dropna(how='all'))
    print(f"    -> {n_signals} macro signals, {n_days} days")
    return result


def fetch_fear_greed() -> pd.DataFrame:
    """Fetch Fear & Greed Index from alternative.me (free, no API key).

    Returns: DataFrame with 'macro_fear_greed_z' column.
    The index ranges 0-100. We z-score it over 126-day window.
    """
    url = "https://api.alternative.me/fng/?limit=365&format=json"
    try:
        resp = requests.get(url, timeout=15)
        resp.raise_for_status()
        data = resp.json().get('data', [])
        if not data:
            return pd.DataFrame()

        records = []
        for item in data:
            dt = pd.to_datetime(int(item['timestamp']), unit='s')
            records.append({'date': dt, 'fear_greed': int(item['value'])})

        df = pd.DataFrame(records).set_index('date').sort_index()
        # Z-score over 126-day rolling window
        fg = df['fear_greed'].astype(float)
        df['macro_fear_greed_z'] = (
            (fg - fg.rolling(126, min_periods=30).mean()) /
            (fg.rolling(126, min_periods=30).std() + 1e-8)
        )

        print(f"    -> Fear & Greed Index: {len(df)} days, "
              f"latest={df['fear_greed'].iloc[-1]}")
        return df[['macro_fear_greed_z']]
    except Exception as e:
        logger.warning(f"Fear & Greed fetch failed: {e}")
        return pd.DataFrame()


def compute_macro_composite(macro_df: pd.DataFrame) -> pd.Series:
    """Compute weighted macro composite score (0-100).

    Ported from worldmonitor/api/macro-signals.js verdict logic.
    Each signal contributes to a bullish/bearish score.
    """
    score = pd.Series(50.0, index=macro_df.index)  # neutral baseline

    if 'macro_jpy_carry' in macro_df.columns:
        # JPY carry > 0 (USD strengthening vs JPY) = bullish for EM
        score += np.where(macro_df['macro_jpy_carry'] > 0, 10, -10)

    if 'macro_risk_regime' in macro_df.columns:
        # Risk-on (positive z) = bullish
        score += np.clip(macro_df['macro_risk_regime'] * 5, -15, 15)

    if 'macro_flow_align' in macro_df.columns:
        # Aligned flows (positive) = bullish
        score += np.clip(macro_df['macro_flow_align'] * 5, -10, 10)

    if 'macro_fear_greed_z' in macro_df.columns:
        score += np.clip(macro_df['macro_fear_greed_z'] * 5, -15, 15)

    return np.clip(score, 0, 100)


# ---------------------------------------------------------------------------
# Group 14: Welford Anomaly Detection (World Monitor port)
# ---------------------------------------------------------------------------
# Ported from worldmonitor/api/temporal-baseline.js
# Uses Welford's online algorithm for numerically stable streaming
# mean/variance computation. Detects z-score deviations from learned
# baselines per weekday × month (seasonal patterns).
# ---------------------------------------------------------------------------

WELFORD_COLUMNS = [
    'anomaly_volume_z',    # Volume vs weekday×month baseline
    'anomaly_range_z',     # High-low range vs baseline
    'anomaly_gap_z',       # Overnight gap vs baseline
    'anomaly_composite',   # Max of above (worst anomaly)
]


class WelfordBaseline:
    """Welford's online algorithm for streaming mean/variance.

    Ported from worldmonitor/api/temporal-baseline.js.
    Numerically stable single-pass computation.
    """

    def __init__(self):
        self.baselines = {}  # key -> {mean, m2, n}

    def _key(self, weekday: int, month: int, metric: str) -> str:
        return f"{metric}:{weekday}:{month}"

    def update(self, weekday: int, month: int, metric: str, value: float):
        """Update baseline with new observation (Welford's algorithm)."""
        key = self._key(weekday, month, metric)
        if key not in self.baselines:
            self.baselines[key] = {'mean': 0.0, 'm2': 0.0, 'n': 0}

        bl = self.baselines[key]
        bl['n'] += 1
        delta = value - bl['mean']
        bl['mean'] += delta / bl['n']
        delta2 = value - bl['mean']
        bl['m2'] += delta * delta2

    def zscore(self, weekday: int, month: int, metric: str,
               value: float, min_samples: int = 10) -> float:
        """Compute z-score of value against baseline."""
        key = self._key(weekday, month, metric)
        if key not in self.baselines:
            return 0.0

        bl = self.baselines[key]
        if bl['n'] < min_samples:
            return 0.0

        variance = max(0, bl['m2'] / (bl['n'] - 1))  # ddof=1
        std = np.sqrt(variance)
        if std < 1e-10:
            return 0.0

        return (value - bl['mean']) / std


def compute_welford_anomalies(df: pd.DataFrame) -> pd.DataFrame:
    """Compute anomaly z-scores using Welford's online algorithm.

    Detects unusual volume, range, and gap sizes relative to
    weekday×month seasonal baselines. FULLY CAUSAL: only uses
    data from before time t to compute baselines at time t.

    Args:
        df: DataFrame with OHLCV columns and DatetimeIndex

    Returns: DataFrame with anomaly z-score columns
    """
    baseline = WelfordBaseline()
    result = pd.DataFrame(index=df.index)

    vol_z = np.zeros(len(df))
    range_z = np.zeros(len(df))
    gap_z = np.zeros(len(df))

    closes = df['close'].values
    volumes = df['volume'].values
    highs = df['high'].values
    lows = df['low'].values
    opens = df['open'].values

    for i in range(1, len(df)):
        dt = df.index[i]
        wd = dt.weekday()
        mo = dt.month

        # Current values
        vol_val = float(volumes[i]) if volumes[i] > 0 else 0.0
        range_val = float((highs[i] - lows[i]) / closes[i-1]) if closes[i-1] > 0 else 0.0
        gap_val = float(abs(opens[i] - closes[i-1]) / closes[i-1]) if closes[i-1] > 0 else 0.0

        # Compute z-scores BEFORE updating (causal)
        vol_z[i] = baseline.zscore(wd, mo, 'volume', vol_val)
        range_z[i] = baseline.zscore(wd, mo, 'range', range_val)
        gap_z[i] = baseline.zscore(wd, mo, 'gap', gap_val)

        # Update baselines AFTER scoring (causal)
        baseline.update(wd, mo, 'volume', vol_val)
        baseline.update(wd, mo, 'range', range_val)
        baseline.update(wd, mo, 'gap', gap_val)

    result['anomaly_volume_z'] = vol_z
    result['anomaly_range_z'] = range_z
    result['anomaly_gap_z'] = gap_z
    result['anomaly_composite'] = np.maximum.reduce([
        np.abs(vol_z), np.abs(range_z), np.abs(gap_z)
    ])

    return result


In [5]:
# ============================================================================
# CELL 3: Advanced Feature Engineering — 31 Features in 10 Groups
# ============================================================================

FEATURE_COLUMNS = [
    'norm_ret_1d', 'norm_ret_21d', 'norm_ret_63d', 'norm_ret_126d', 'norm_ret_252d',
    'macd_8_24', 'macd_16_48', 'macd_32_96',
    'rvol_20d', 'rvol_60d', 'gk_vol_20d', 'parkinson_vol_20d',
    'cp_rl_21', 'cp_score_21',
    'frac_diff_03', 'frac_diff_05', 'hurst_exp',
    'ram_5', 'ram_10', 'ram_21', 'ram_63',
    'vpin', 'kyles_lambda', 'amihud_illiq', 'hl_spread',
    'entropy',
    'trend_strength', 'momentum_consistency', 'mr_zscore',
    'vol_zscore', 'vol_momentum',
]


# ============================================================================
# Group 1: Normalized Returns (5 features)
# ============================================================================
def compute_returns(close: pd.Series, horizons: List[int] = [1, 21, 63, 126, 252]) -> pd.DataFrame:
    """
    Compute horizon returns normalized by EWM volatility.
    norm_ret_h = log(close / close.shift(h)) / (vol * sqrt(h))
    where vol = ewm(span=60).std() of daily log returns.
    """
    log_ret = np.log(close / close.shift(1))
    vol = log_ret.ewm(span=60, min_periods=20).std()

    result = pd.DataFrame(index=close.index)
    for h in horizons:
        raw_ret = np.log(close / close.shift(h))
        denom = vol * np.sqrt(h)
        # Avoid division by zero
        denom = denom.replace(0, np.nan)
        result[f'norm_ret_{h}d'] = raw_ret / denom

    return result


# ============================================================================
# Group 2: MACD (3 features)
# ============================================================================
def compute_macd(close: pd.Series, pairs: List[Tuple[int, int]] = [(8, 24), (16, 48), (32, 96)]) -> pd.DataFrame:
    """
    Compute MACD z-scores for stationarity.
    Raw MACD = EWM(fast) - EWM(slow), then z-scored over 126-day rolling window.
    """
    result = pd.DataFrame(index=close.index)
    for fast, slow in pairs:
        ema_fast = close.ewm(span=fast, min_periods=fast).mean()
        ema_slow = close.ewm(span=slow, min_periods=slow).mean()
        raw_macd = ema_fast - ema_slow

        roll_mean = raw_macd.rolling(window=126, min_periods=63).mean()
        roll_std = raw_macd.rolling(window=126, min_periods=63).std(ddof=1)
        roll_std = roll_std.replace(0, np.nan)

        result[f'macd_{fast}_{slow}'] = (raw_macd - roll_mean) / roll_std

    return result


# ============================================================================
# Group 3: Volatility (4 features)
# ============================================================================
def compute_volatility(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute 4 volatility estimators:
      - rvol_20d, rvol_60d: realized vol (rolling std of log returns, annualized)
      - gk_vol_20d: Garman-Klass volatility
      - parkinson_vol_20d: Parkinson high-low volatility
    """
    close = df['close']
    high = df['high']
    low = df['low']
    opn = df['open']

    log_ret = np.log(close / close.shift(1))

    result = pd.DataFrame(index=df.index)

    # Realized vol
    result['rvol_20d'] = log_ret.rolling(window=20, min_periods=15).std(ddof=1) * np.sqrt(252)
    result['rvol_60d'] = log_ret.rolling(window=60, min_periods=40).std(ddof=1) * np.sqrt(252)

    # Garman-Klass: sqrt(mean(0.5*ln(H/L)^2 - (2*ln2 - 1)*ln(C/O)^2) * 252)
    log_hl = np.log(high / low)
    log_co = np.log(close / opn)
    gk_term = 0.5 * log_hl ** 2 - (2.0 * np.log(2.0) - 1.0) * log_co ** 2
    gk_mean = gk_term.rolling(window=20, min_periods=15).mean()
    # Clamp to non-negative before sqrt
    gk_mean = gk_mean.clip(lower=0.0)
    result['gk_vol_20d'] = np.sqrt(gk_mean * 252)

    # Parkinson: sqrt(mean(ln(H/L)^2 / (4*ln2)) * 252)
    park_term = log_hl ** 2 / (4.0 * np.log(2.0))
    park_mean = park_term.rolling(window=20, min_periods=15).mean()
    park_mean = park_mean.clip(lower=0.0)
    result['parkinson_vol_20d'] = np.sqrt(park_mean * 252)

    return result


# ============================================================================
# Group 4: Changepoint Detection (2 features)
# ============================================================================
def nig_log_marginal(x: np.ndarray) -> float:
    """
    Normal-Inverse-Gamma log marginal likelihood P(x | NIG prior).
    Prior: mu0=0, kappa0=1, alpha0=1, beta0=1
    Posterior update:
      n = len(x), x_bar = mean(x), s2 = var(x)
      kappa_n = kappa0 + n
      alpha_n = alpha0 + n/2
      beta_n = beta0 + 0.5*n*s2 + 0.5*kappa0*n*x_bar^2 / kappa_n
    Log marginal = gammaln(alpha_n) - gammaln(alpha0) + alpha0*log(beta0) - alpha_n*log(beta_n)
                   + 0.5*log(kappa0/kappa_n) - (n/2)*log(2*pi)
    """
    n = len(x)
    if n < 2:
        return -np.inf

    mu0, kappa0, alpha0, beta0 = 0.0, 1.0, 1.0, 1.0

    x_bar = np.mean(x)
    s2 = np.var(x, ddof=1) if n > 1 else 0.0

    kappa_n = kappa0 + n
    alpha_n = alpha0 + n / 2.0
    beta_n = beta0 + 0.5 * (n - 1) * s2 + 0.5 * kappa0 * n * x_bar ** 2 / kappa_n

    # Protect against non-positive beta_n
    if beta_n <= 0:
        beta_n = 1e-300

    log_ml = (
        gammaln(alpha_n) - gammaln(alpha0)
        + alpha0 * np.log(beta0) - alpha_n * np.log(beta_n)
        + 0.5 * np.log(kappa0 / kappa_n)
        - (n / 2.0) * np.log(2.0 * np.pi)
    )
    return log_ml


def compute_cpd(close: pd.Series, lookback: int = 21, min_seg: int = 5) -> pd.DataFrame:
    """
    Changepoint detection via NIG Bayesian model comparison.
    For each position, take lookback-window of log returns.
    Try all split points; best split maximizes sum of two-segment likelihoods.
    cp_rl = best_split_position / lookback (relative location in [0, 1])
    cp_score = sigmoid(best_split_ll - full_ll) (severity in [0, 1])
    """
    log_ret = np.log(close / close.shift(1)).values
    n = len(log_ret)

    cp_rl = np.full(n, np.nan)
    cp_score = np.full(n, np.nan)

    for i in range(lookback, n):
        window = log_ret[i - lookback + 1: i + 1]  # lookback values ending at i

        # Skip if any NaN
        if np.any(np.isnan(window)):
            continue

        full_ll = nig_log_marginal(window)

        best_split_ll = -np.inf
        best_split_pos = lookback // 2  # default to middle

        for s in range(min_seg, lookback - min_seg + 1):
            left = window[:s]
            right = window[s:]
            split_ll = nig_log_marginal(left) + nig_log_marginal(right)

            if split_ll > best_split_ll:
                best_split_ll = split_ll
                best_split_pos = s

        cp_rl[i] = best_split_pos / lookback

        # Severity: sigmoid of log-likelihood ratio
        delta = best_split_ll - full_ll
        # Clamp to avoid overflow in exp
        delta_clamped = np.clip(delta, -500, 500)
        cp_score[i] = 1.0 / (1.0 + np.exp(-delta_clamped))

    result = pd.DataFrame(index=close.index)
    result['cp_rl_21'] = cp_rl
    result['cp_score_21'] = cp_score
    return result


# ============================================================================
# Group 5: Fractional Calculus (3 features)
# ============================================================================
def frac_diff_weights(d: float, thresh: float = 1e-5, max_width: int = 500) -> np.ndarray:
    """
    Hosking (1981) fractional differencing weights.
    w[0] = 1, w[k] = -w[k-1] * (d - k + 1) / k
    Iterate until |w[k]| < thresh or max_width reached.

    Note: For d=0.3 with thresh=1e-5, weights decay as k^{-1.3} requiring
    ~7000 terms. max_width caps this to keep the warmup period practical
    while preserving >99.9% of the filter energy.

    Returns weights array from oldest (index 0) to newest (index -1).
    """
    weights = [1.0]
    k = 1
    while True:
        w_k = -weights[-1] * (d - k + 1) / k
        if abs(w_k) < thresh:
            break
        weights.append(w_k)
        k += 1
        if k >= max_width:
            break
    # Reverse so index 0 = oldest weight
    return np.array(weights[::-1])


def compute_frac_diff(close: pd.Series, d: float) -> pd.Series:
    """
    Apply fractional differencing of order d to log(close).
    Convolve log prices with frac_diff weights.
    """
    log_price = np.log(close.values)
    w = frac_diff_weights(d)
    width = len(w)

    n = len(log_price)
    result = np.full(n, np.nan)

    for i in range(width - 1, n):
        segment = log_price[i - width + 1: i + 1]
        result[i] = np.dot(w, segment)

    return pd.Series(result, index=close.index)


def _compute_hurst_single(returns: np.ndarray, max_lag: int = 50) -> float:
    """
    Compute Hurst exponent from returns using MSD (Mean Squared Displacement).
    MSD(tau) = E[(X(t+tau) - X(t))^2] where X = cumsum(returns)
    Regress log(MSD) on log(tau) -> slope / 2 = H
    """
    if len(returns) < max_lag + 10:
        return np.nan

    X = np.cumsum(returns)
    taus = np.arange(1, max_lag + 1)
    msd = np.full(max_lag, np.nan)

    for idx, tau in enumerate(taus):
        diffs = X[tau:] - X[:-tau]
        if len(diffs) < 5:
            continue
        msd[idx] = np.mean(diffs ** 2)

    # Filter valid MSD values (positive and finite)
    valid = np.isfinite(msd) & (msd > 0)
    if valid.sum() < 5:
        return np.nan

    log_tau = np.log(taus[valid])
    log_msd = np.log(msd[valid])

    # Linear regression: log_msd = slope * log_tau + intercept
    n_valid = len(log_tau)
    sum_x = np.sum(log_tau)
    sum_y = np.sum(log_msd)
    sum_xy = np.sum(log_tau * log_msd)
    sum_xx = np.sum(log_tau ** 2)

    denom = n_valid * sum_xx - sum_x ** 2
    if abs(denom) < 1e-15:
        return np.nan

    slope = (n_valid * sum_xy - sum_x * sum_y) / denom
    hurst = slope / 2.0

    # Clamp to reasonable range
    return np.clip(hurst, 0.0, 1.0)


def compute_hurst(close: pd.Series, window: int = 252, max_lag: int = 50) -> pd.Series:
    """
    Rolling Hurst exponent computed over a window of returns.
    """
    log_ret = np.log(close / close.shift(1)).values
    n = len(log_ret)
    hurst_vals = np.full(n, np.nan)

    for i in range(window, n):
        segment = log_ret[i - window + 1: i + 1]
        if np.any(np.isnan(segment)):
            continue
        hurst_vals[i] = _compute_hurst_single(segment, max_lag=max_lag)

    return pd.Series(hurst_vals, index=close.index, name='hurst_exp')


def compute_fractional_features(close: pd.Series) -> pd.DataFrame:
    """Compute all fractional calculus features."""
    result = pd.DataFrame(index=close.index)
    result['frac_diff_03'] = compute_frac_diff(close, d=0.3)
    result['frac_diff_05'] = compute_frac_diff(close, d=0.5)
    result['hurst_exp'] = compute_hurst(close)
    return result


# ============================================================================
# Group 6: Ramanujan Sum Filter Bank (4 features)
# ============================================================================
def euler_phi(n: int) -> int:
    """Euler's totient function: count of integers in [1, n] coprime to n."""
    if n <= 0:
        return 0
    result = n
    p = 2
    temp = n
    while p * p <= temp:
        if temp % p == 0:
            while temp % p == 0:
                temp //= p
            result -= result // p
        p += 1
    if temp > 1:
        result -= result // temp
    return result


def mobius(n: int) -> int:
    """
    Mobius function:
      mu(1) = 1
      mu(n) = 0 if n has a squared prime factor
      mu(n) = (-1)^k if n is a product of k distinct primes
    """
    if n <= 0:
        return 0
    if n == 1:
        return 1

    num_factors = 0
    temp = n
    p = 2

    while p * p <= temp:
        if temp % p == 0:
            temp //= p
            num_factors += 1
            if temp % p == 0:
                return 0  # Squared factor
        p += 1

    if temp > 1:
        num_factors += 1

    return 1 if num_factors % 2 == 0 else -1


def ramanujan_sum(q: int, n: int) -> float:
    """
    Ramanujan sum c_q(n) = sum over d|gcd(n,q) of mu(q/d) * phi(q) / phi(q/d)
    Simplified: c_q(n) = mu(q/g) * phi(q) / phi(q/g) where g = gcd(n, q)
    Actually the full definition sums over all d dividing gcd(n,q).
    """
    g = math.gcd(n, q)
    # Sum over divisors d of g: mu(q/d) * d * (phi(q/d) != 0 check)
    # More standard: c_q(n) = sum_{d | gcd(n,q)} d * mu(q/d)
    total = 0.0
    for d in range(1, g + 1):
        if g % d == 0:
            qd = q // d
            total += d * mobius(qd)
    return total


def compute_ramanujan(close: pd.Series, periods: List[int] = [5, 10, 21, 63],
                      window: int = 252) -> pd.DataFrame:
    """
    Ramanujan Sum Filter Bank: convolve log-returns with Ramanujan sum kernels
    to extract energy at specific trading cycle periods.
    """
    log_ret = np.log(close / close.shift(1)).values
    n = len(log_ret)
    result = pd.DataFrame(index=close.index)

    for q in periods:
        # Pre-compute kernel: kernel[j] = c_q(j+1) for j in [0, window)
        kernel = np.array([ramanujan_sum(q, j + 1) for j in range(window)])
        kernel = kernel / window  # Normalize

        # Convolve (causal: only use past data)
        filtered = np.full(n, np.nan)
        for i in range(window, n):
            segment = log_ret[i - window + 1: i + 1]
            if np.any(np.isnan(segment)):
                continue
            # Kernel is applied: newest data * kernel[0], oldest * kernel[-1]
            # Reverse kernel for convolution alignment (kernel[0] applies to most recent)
            filtered[i] = np.dot(segment, kernel[::-1][:len(segment)])

        result[f'ram_{q}'] = filtered

    return result


# ============================================================================
# Group 7: Microstructure (4 features)
# ============================================================================
def compute_microstructure(df: pd.DataFrame, window: int = 50) -> pd.DataFrame:
    """
    Compute 4 microstructure features:
      - VPIN: Volume-Synchronized Probability of Informed Trading
      - Kyle's Lambda: price impact coefficient
      - Amihud Illiquidity: |return| / dollar volume
      - HL Spread: high-low spread proxy (Corwin-Schultz simplified)
    """
    close = df['close']
    high = df['high']
    low = df['low']
    volume = df['volume'].astype(float)

    log_ret = np.log(close / close.shift(1))

    result = pd.DataFrame(index=df.index)

    # --- VPIN ---
    sigma = log_ret.rolling(window=20, min_periods=10).std()
    sigma = sigma.replace(0, np.nan)
    # Bulk volume classification: buy probability = Phi(ret / sigma)
    z = log_ret / sigma
    buy_prob = pd.Series(norm.cdf(z.values), index=close.index)
    buy_vol = volume * buy_prob
    sell_vol = volume * (1.0 - buy_prob)
    abs_imbalance = (buy_vol - sell_vol).abs()
    total_vol = volume.rolling(window=window, min_periods=window // 2).sum()
    total_vol = total_vol.replace(0, np.nan)
    vpin = abs_imbalance.rolling(window=window, min_periods=window // 2).sum() / total_vol
    result['vpin'] = vpin

    # --- Kyle's Lambda ---
    abs_ret = log_ret.abs()
    signed_vol = np.sign(log_ret) * volume
    abs_signed_vol = signed_vol.abs()

    # Rolling covariance / variance
    cov_rv = abs_ret.rolling(window=window, min_periods=window // 2).cov(abs_signed_vol)
    var_sv = abs_signed_vol.rolling(window=window, min_periods=window // 2).var(ddof=1)
    var_sv = var_sv.replace(0, np.nan)
    result['kyles_lambda'] = cov_rv / var_sv

    # --- Amihud Illiquidity ---
    dollar_vol = close * volume
    dollar_vol = dollar_vol.replace(0, np.nan)
    daily_illiq = abs_ret / dollar_vol
    result['amihud_illiq'] = daily_illiq.rolling(window=window, min_periods=window // 2).mean()

    # --- HL Spread (Corwin-Schultz simplified) ---
    # Use rolling average of log(H/L) as spread proxy
    log_hl = np.log(high / low)
    # Corwin-Schultz: alpha derived from 2-day high-low ratio
    # Simplified version: spread = 2*(exp(alpha) - 1) / (1 + exp(alpha))
    # where alpha = (sqrt(2*beta) - sqrt(beta)) / (3 - 2*sqrt(2))
    # beta = E[ln(H_t/L_t)^2]
    beta = (log_hl ** 2).rolling(window=window, min_periods=window // 2).mean()
    # Also compute gamma from 2-day range
    high_2d = high.rolling(window=2).max()
    low_2d = low.rolling(window=2).min()
    gamma = np.log(high_2d / low_2d) ** 2

    # alpha = (sqrt(2) - 1) * sqrt(beta) / (3 - 2*sqrt(2)) when gamma term is small
    # Full: alpha = (sqrt(2*beta) - sqrt(beta)) / (3 - 2*sqrt(2)) - sqrt(gamma / (3 - 2*sqrt(2)))
    sqrt2 = np.sqrt(2.0)
    denom_cs = 3.0 - 2.0 * sqrt2

    # Ensure beta is non-negative
    beta_safe = beta.clip(lower=0.0)
    gamma_safe = gamma.clip(lower=0.0)

    alpha = (np.sqrt(2.0 * beta_safe) - np.sqrt(beta_safe)) / denom_cs
    # Adjust with gamma correction
    gamma_correction = np.sqrt(gamma_safe / denom_cs)
    alpha = alpha - gamma_correction
    # Clamp alpha to reasonable range to avoid extreme spread values
    alpha = alpha.clip(lower=-1.0, upper=2.0)

    spread = 2.0 * (np.exp(alpha) - 1.0) / (1.0 + np.exp(alpha))
    # Clamp negative spreads to 0
    spread = spread.clip(lower=0.0)
    result['hl_spread'] = spread

    return result


# ============================================================================
# Group 8: Information Theory (1 feature)
# ============================================================================
def compute_entropy(close: pd.Series, word_len: int = 3, window: int = 252) -> pd.Series:
    """
    Shannon entropy of binary price movement patterns.
    Encode: 1 if price up, 0 if down.
    Form words of word_len consecutive bits.
    Compute normalized Shannon entropy over rolling window.
    """
    # Binary encoding: 1 if close > prev_close, 0 otherwise
    direction = (close.diff() > 0).astype(int).values
    n = len(direction)
    n_words = 2 ** word_len
    max_entropy = np.log2(n_words) if n_words > 1 else 1.0

    entropy_vals = np.full(n, np.nan)

    for i in range(window + word_len - 1, n):
        # Extract the window of directions
        seg = direction[i - window + 1: i + 1]

        # Build words
        words = []
        for j in range(word_len - 1, len(seg)):
            word = 0
            for k in range(word_len):
                word = (word << 1) | seg[j - word_len + 1 + k]
            words.append(word)

        if len(words) == 0:
            continue

        # Histogram
        counts = np.bincount(words, minlength=n_words).astype(float)
        probs = counts / counts.sum()

        # Shannon entropy (base 2)
        probs_pos = probs[probs > 0]
        H = -np.sum(probs_pos * np.log2(probs_pos))

        # Normalize to [0, 1]
        entropy_vals[i] = H / max_entropy if max_entropy > 0 else 0.0

    return pd.Series(entropy_vals, index=close.index, name='entropy')


# ============================================================================
# Group 9: Momentum Quality (3 features)
# ============================================================================
def compute_momentum_quality(close: pd.Series, window: int = 20) -> pd.DataFrame:
    """
    Compute momentum quality metrics:
      - trend_strength: |avg_up - avg_down| / (avg_up + avg_down)
      - momentum_consistency: fraction of positive returns in rolling window
      - mr_zscore: (close - EMA) / rolling_std  (mean-reversion z-score)
    """
    ret = close.pct_change()
    result = pd.DataFrame(index=close.index)

    # Trend strength
    up_ret = ret.clip(lower=0)
    down_ret = (-ret).clip(lower=0)  # magnitude of down moves

    avg_up = up_ret.rolling(window=window, min_periods=window // 2).mean()
    avg_down = down_ret.rolling(window=window, min_periods=window // 2).mean()

    denom_ts = avg_up + avg_down
    denom_ts = denom_ts.replace(0, np.nan)
    result['trend_strength'] = (avg_up - avg_down).abs() / denom_ts

    # Momentum consistency: fraction of positive returns
    pos_indicator = (ret > 0).astype(float)
    result['momentum_consistency'] = pos_indicator.rolling(
        window=window, min_periods=window // 2
    ).mean()

    # Mean reversion z-score: (close - EMA) / rolling_std
    ema = close.ewm(span=window, min_periods=window // 2).mean()
    rolling_std = close.rolling(window=window, min_periods=window // 2).std(ddof=1)
    rolling_std = rolling_std.replace(0, np.nan)
    result['mr_zscore'] = (close - ema) / rolling_std

    return result


# ============================================================================
# Group 10: Volume Features (2 features)
# ============================================================================
def compute_volume_features(df: pd.DataFrame, window: int = 20) -> pd.DataFrame:
    """
    Compute volume-based features:
      - vol_zscore: (volume - rolling_mean) / rolling_std
      - vol_momentum: volume.pct_change(5)  (5-day volume momentum)
    """
    volume = df['volume'].astype(float)
    result = pd.DataFrame(index=df.index)

    roll_mean = volume.rolling(window=window, min_periods=window // 2).mean()
    roll_std = volume.rolling(window=window, min_periods=window // 2).std(ddof=1)
    roll_std = roll_std.replace(0, np.nan)

    result['vol_zscore'] = (volume - roll_mean) / roll_std
    result['vol_momentum'] = volume.pct_change(periods=5)

    return result


# ============================================================================
# Master Function: Build All Features
# ============================================================================
def build_all_features(df: pd.DataFrame, cfg: Optional[MonolithConfig] = None) -> pd.DataFrame:
    """
    Build all 31 engineered features from 10 research groups.
    Adds forward target: target_ret = close.pct_change(1).shift(-1)
    Drops NaN warmup rows.
    Returns df with FEATURE_COLUMNS + 'target_ret' + original OHLCV.
    """
    if cfg is None:
        cfg = MonolithConfig()

    close = df['close']
    t0 = time.time()

    print("Building features...")

    # Group 1: Normalized Returns
    print("  [1/10] Normalized Returns (5 features)")
    feat_ret = compute_returns(close)

    # Group 2: MACD
    print("  [2/10] MACD Z-scores (3 features)")
    feat_macd = compute_macd(close)

    # Group 3: Volatility
    print("  [3/10] Volatility Estimators (4 features)")
    feat_vol = compute_volatility(df)

    # Group 4: Changepoint Detection
    print("  [4/10] NIG Changepoint Detection (2 features)")
    feat_cpd = compute_cpd(close)

    # Group 5: Fractional Calculus
    print("  [5/10] Fractional Differentiation + Hurst (3 features)")
    feat_frac = compute_fractional_features(close)

    # Group 6: Ramanujan Filter Bank
    print("  [6/10] Ramanujan Sum Filter Bank (4 features)")
    feat_ram = compute_ramanujan(close)

    # Group 7: Microstructure
    print("  [7/10] Market Microstructure (4 features)")
    feat_micro = compute_microstructure(df)

    # Group 8: Entropy
    print("  [8/10] Information-Theoretic Entropy (1 feature)")
    feat_entropy = compute_entropy(close)

    # Group 9: Momentum Quality
    print("  [9/10] Momentum Quality (3 features)")
    feat_mq = compute_momentum_quality(close)

    # Group 10: Volume Features
    print("  [10/10] Volume Features (2 features)")
    feat_vf = compute_volume_features(df)

    # Assemble all features into the dataframe
    out = df.copy()

    for col in feat_ret.columns:
        out[col] = feat_ret[col]
    for col in feat_macd.columns:
        out[col] = feat_macd[col]
    for col in feat_vol.columns:
        out[col] = feat_vol[col]
    for col in feat_cpd.columns:
        out[col] = feat_cpd[col]
    for col in feat_frac.columns:
        out[col] = feat_frac[col]
    for col in feat_ram.columns:
        out[col] = feat_ram[col]
    for col in feat_micro.columns:
        out[col] = feat_micro[col]
    out['entropy'] = feat_entropy
    for col in feat_mq.columns:
        out[col] = feat_mq[col]
    for col in feat_vf.columns:
        out[col] = feat_vf[col]

    # Target: next-day return (shift -1 is the ONLY forward-looking value, used as label)
    out['target_ret'] = close.pct_change(1).shift(-1)

    # Vol-scaled training target: raw_fwd_return / realized_vol
    # Vol computed on CURRENT (unshifted) returns — no leakage
    log_ret = np.log(close / close.shift(1))
    vol_20 = log_ret.rolling(20, min_periods=10).std()
    vol_20 = vol_20.replace(0, np.nan)
    out['target_train'] = out['target_ret'] / vol_20

    # Verify all expected feature columns exist
    missing = [c for c in FEATURE_COLUMNS if c not in out.columns]
    if missing:
        raise ValueError(f"Missing feature columns: {missing}")

    # Drop rows where ANY feature or target is NaN (warmup period)
    n_before = len(out)
    out_pre_drop = out  # keep reference for diagnostics
    out = out.dropna(subset=FEATURE_COLUMNS + ['target_ret', 'target_train'])
    n_after = len(out)

    elapsed = time.time() - t0
    print(f"\nFeature engineering complete in {elapsed:.1f}s")
    print(f"  Rows: {n_before} -> {n_after} (dropped {n_before - n_after} warmup rows)")
    print(f"  Features: {len(FEATURE_COLUMNS)}")

    if n_after == 0:
        # Diagnose which features are all-NaN
        nan_cols = [c for c in FEATURE_COLUMNS if out_pre_drop[c].isna().all()]
        raise ValueError(
            f"All rows dropped after NaN removal. "
            f"Features that are entirely NaN: {nan_cols}. "
            f"Data has {n_before} rows but longest warmup exceeds this."
        )

    print(f"  Date range: {out.index[0].date()} to {out.index[-1].date()}")
    print(f"  Columns: {list(out.columns)}")

    return out

In [6]:
# ============================================================================
# CELL 5: Cutting-Edge Feature Engineering — Groups 15-19
# ============================================================================
# Implements 5 advanced feature groups from cutting-edge quant research:
#
#   Group 15: HMM Regime Detection (3 features)
#             - Baum-Welch EM on returns + volatility
#             - 3-state: bull, bear, sideways
#
#   Group 16: Wavelet Decomposition (4 features)
#             - DWT with Daubechies-4 wavelet
#             - Multi-resolution trend/noise separation
#             Ref: Springer Computational Economics 2025
#
#   Group 17: Information Theory Advanced (4 features)
#             - KL divergence regime shift detector
#             - Sample entropy (regularity)
#             - NMI predictability window
#             - Spectral entropy
#             Ref: arXiv:2511.16339 (Financial Information Theory)
#
#   Group 18: Multifractal DFA (3 features)
#             - Kantelhardt (2002) MF-DFA
#             - Spectrum width, H(2), asymmetry
#
#   Group 19: TDA Persistent Homology (2 features)
#             - Vietoris-Rips persistence on return point clouds
#             - Crash early warning from topological complexity
#             Ref: MDPI Computers 2025
#
# ALL features are CAUSAL (rolling windows, no future data).
# ============================================================================

ADVANCED_FEATURE_COLUMNS = [
    # Group 15: HMM Regime
    'hmm_regime',           # Decoded regime label (0=bear, 1=sideways, 2=bull)
    'hmm_regime_prob',      # Probability of current regime
    'hmm_regime_duration',  # Days in current regime
    # Group 16: Wavelet
    'wavelet_trend_energy',   # Energy in approximation coefficients (low-freq)
    'wavelet_detail_energy',  # Energy in detail coefficients (high-freq)
    'wavelet_snr',            # Signal-to-noise ratio (trend/detail)
    'wavelet_dominant_scale', # Dominant wavelet scale
    # Group 17: Info Theory
    'kl_regime_shift',      # KL divergence between recent vs prior returns
    'sample_entropy',       # Sample entropy (regularity measure)
    'nmi_predictability',   # Normalized Mutual Information (market efficiency)
    'spectral_entropy',     # Spectral entropy of return power spectrum
    # Group 18: Multifractal
    'mf_delta_alpha',       # Multifractal spectrum width
    'mf_hurst2',            # Generalized Hurst exponent H(2)
    'mf_asymmetry',         # Left-right asymmetry of MF spectrum
    # Group 19: TDA
    'tda_persistence_norm',  # L2-norm of persistence landscape (H1)
    'tda_betti_ratio',       # Betti-1 / Betti-0 (loop/component ratio)
]


# ─── Group 15: HMM Regime Detection ─────────────────────────────────────────
def compute_hmm_regime(close: pd.Series, n_states: int = 3,
                       window: int = 252) -> pd.DataFrame:
    """Fit rolling 3-state Gaussian HMM on returns + volatility.

    States are sorted by mean return: 0=bear, 1=sideways, 2=bull.
    CAUSAL: fits on data up to time t only.

    Args:
        close: price series
        n_states: number of hidden states (default 3)
        window: training window in days

    Returns: DataFrame with hmm_regime, hmm_regime_prob, hmm_regime_duration
    """
    if not HAS_HMM:
        logger.warning("hmmlearn not installed — HMM features disabled")
        return pd.DataFrame(
            np.nan, index=close.index,
            columns=['hmm_regime', 'hmm_regime_prob', 'hmm_regime_duration']
        )

    log_ret = np.log(close / close.shift(1)).dropna()
    vol_20 = log_ret.rolling(20, min_periods=10).std()

    # Observation matrix: [return, volatility]
    obs = pd.DataFrame({
        'ret': log_ret,
        'vol': vol_20,
    }).dropna()

    regime = np.full(len(close), np.nan)
    regime_prob = np.full(len(close), np.nan)
    regime_dur = np.full(len(close), np.nan)

    refit_interval = 63  # Refit every quarter
    model = None
    state_order = None

    for t in range(window, len(obs)):
        # Refit model periodically
        if model is None or (t - window) % refit_interval == 0:
            train_data = obs.iloc[t - window:t].values
            try:
                model = GaussianHMM(
                    n_components=n_states, covariance_type='full',
                    n_iter=100, random_state=42, verbose=False,
                )
                model.fit(train_data)
                # Sort states by mean return: bear < sideways < bull
                state_order = np.argsort(model.means_[:, 0])
            except Exception:
                model = None
                continue

        if model is None:
            continue

        # Predict current state (causal: uses data up to t)
        try:
            recent = obs.iloc[t - window:t + 1].values
            states = model.predict(recent)
            probs = model.predict_proba(recent)

            current_state_raw = states[-1]
            # Map to sorted order
            current_state = int(np.where(state_order == current_state_raw)[0][0])
            current_prob = float(probs[-1, current_state_raw])

            # Map back to close index
            obs_date = obs.index[t]
            close_idx = close.index.get_loc(obs_date)
            regime[close_idx] = current_state
            regime_prob[close_idx] = current_prob
        except Exception:
            continue

    # Compute regime duration (consecutive days in same regime)
    dur = np.zeros(len(close))
    count = 0
    prev = np.nan
    for i in range(len(close)):
        if np.isnan(regime[i]):
            count = 0
        elif regime[i] == prev:
            count += 1
        else:
            count = 1
        dur[i] = count
        prev = regime[i]

    result = pd.DataFrame(index=close.index)
    result['hmm_regime'] = regime
    result['hmm_regime_prob'] = regime_prob
    result['hmm_regime_duration'] = dur
    return result


# ─── Group 16: Wavelet Decomposition ────────────────────────────────────────
def compute_wavelet_features(close: pd.Series, wavelet: str = 'db4',
                             level: int = 4, window: int = 63) -> pd.DataFrame:
    """Multi-resolution wavelet decomposition features.

    Uses Discrete Wavelet Transform (DWT) with Daubechies-4 wavelet.
    Decomposes price into approximation (trend) and detail (noise)
    coefficients at multiple scales.

    Ref: "Leveraging Wavelet Transform & Deep Learning for Option Price
    Prediction: Insights from the Indian Derivative Market" (2025)

    Args:
        close: price series
        wavelet: wavelet family (default 'db4')
        level: decomposition level (default 4)
        window: rolling window for energy computation

    Returns: DataFrame with wavelet energy features
    """
    if not HAS_WAVELET:
        logger.warning("PyWavelets not installed — wavelet features disabled")
        return pd.DataFrame(
            np.nan, index=close.index,
            columns=['wavelet_trend_energy', 'wavelet_detail_energy',
                     'wavelet_snr', 'wavelet_dominant_scale']
        )

    log_ret = np.log(close / close.shift(1)).fillna(0).values
    n = len(log_ret)

    trend_energy = np.full(n, np.nan)
    detail_energy = np.full(n, np.nan)
    snr = np.full(n, np.nan)
    dominant_scale = np.full(n, np.nan)

    for t in range(window, n):
        segment = log_ret[t - window:t]
        try:
            coeffs = pywt.wavedec(segment, wavelet, level=level)
            # coeffs[0] = approximation (trend), coeffs[1:] = details (noise)
            approx_energy = float(np.sum(coeffs[0] ** 2))
            detail_energies = [float(np.sum(c ** 2)) for c in coeffs[1:]]
            total_detail = sum(detail_energies)

            trend_energy[t] = approx_energy
            detail_energy[t] = total_detail
            snr[t] = float(np.log1p(approx_energy / (total_detail + 1e-10)))

            # Dominant scale: which detail level has most energy
            if detail_energies:
                dominant_scale[t] = float(np.argmax(detail_energies))
        except Exception:
            continue

    result = pd.DataFrame(index=close.index)
    result['wavelet_trend_energy'] = _rolling_zscore(trend_energy, 126)
    result['wavelet_detail_energy'] = _rolling_zscore(detail_energy, 126)
    result['wavelet_snr'] = snr
    result['wavelet_dominant_scale'] = dominant_scale
    return result


def _rolling_zscore(arr, window):
    """Z-score an array over a rolling window."""
    s = pd.Series(arr)
    return ((s - s.rolling(window, min_periods=20).mean()) /
            (s.rolling(window, min_periods=20).std() + 1e-8)).values


# ─── Group 17: Information Theory Advanced ───────────────────────────────────
def compute_kl_regime_shift(returns: np.ndarray, window: int = 252,
                            n_bins: int = 50) -> np.ndarray:
    """KL divergence between recent and prior return distributions.

    High KL = distribution has shifted = regime change.
    Ref: arXiv:2511.16339 (Financial Information Theory)

    CAUSAL: compares [t-W, t] vs [t-2W, t-W].
    """
    kl = np.full(len(returns), np.nan)
    for t in range(2 * window, len(returns)):
        recent = returns[t - window:t]
        prior = returns[t - 2 * window:t - window]

        all_r = np.concatenate([recent, prior])
        bins = np.linspace(all_r.min() - 1e-8, all_r.max() + 1e-8, n_bins + 1)

        p_recent = np.histogram(recent, bins=bins)[0].astype(float) + 1e-10
        p_prior = np.histogram(prior, bins=bins)[0].astype(float) + 1e-10
        p_recent /= p_recent.sum()
        p_prior /= p_prior.sum()

        kl[t] = float(sp_entropy(p_recent, p_prior))

    # Z-score for stationarity
    s = pd.Series(kl)
    z = (s - s.rolling(252, min_periods=50).mean()) / (s.rolling(252, min_periods=50).std() + 1e-8)
    return z.values


def compute_sample_entropy(series: np.ndarray, m: int = 2,
                           r_mult: float = 0.2,
                           window: int = 100) -> np.ndarray:
    """Rolling sample entropy (SampEn) — measures regularity.

    Low SampEn = regular/predictable, High SampEn = random.
    Markets become MORE predictable (lower SampEn) during crises.

    CAUSAL: uses [t-W, t] window only.
    """
    se = np.full(len(series), np.nan)
    for t in range(window, len(series)):
        x = series[t - window:t]
        r = r_mult * np.std(x, ddof=1)
        if r < 1e-10:
            se[t] = 0.0
            continue

        N = len(x)
        # Count template matches
        def _count(m_len):
            if N - m_len < 1:
                return 0
            templates = np.array([x[i:i + m_len] for i in range(N - m_len)])
            count = 0
            for i in range(len(templates)):
                dists = np.max(np.abs(templates[i] - templates[i+1:]), axis=1)
                count += np.sum(dists < r)
            return count

        B = _count(m)
        A = _count(m + 1)

        if B > 0 and A > 0:
            se[t] = -np.log(A / B)
        elif B > 0:
            se[t] = -np.log(1.0 / B)
        else:
            se[t] = 0.0

    return se


def compute_nmi_predictability(returns: np.ndarray, lag: int = 1,
                                window: int = 252,
                                n_bins: int = 20) -> np.ndarray:
    """Normalized Mutual Information between lagged and current returns.

    High NMI = market temporarily inefficient (exploitable).
    ~78% of the time NMI < 0.05 (EMH holds).

    Ref: arXiv:2511.16339
    """
    nmi = np.full(len(returns), np.nan)
    for t in range(window + lag, len(returns)):
        past = returns[t - window - lag:t - lag]
        future = returns[t - window:t]

        bins = np.linspace(
            min(past.min(), future.min()) - 1e-8,
            max(past.max(), future.max()) + 1e-8,
            n_bins + 1
        )
        past_binned = np.digitize(past, bins)
        future_binned = np.digitize(future, bins)

        # Joint and marginal entropies
        joint_hist = np.histogram2d(past_binned, future_binned,
                                     bins=n_bins)[0] + 1e-10
        joint_hist /= joint_hist.sum()

        p_past = joint_hist.sum(axis=1)
        p_future = joint_hist.sum(axis=0)

        h_past = -np.sum(p_past * np.log(p_past + 1e-10))
        h_future = -np.sum(p_future * np.log(p_future + 1e-10))
        h_joint = -np.sum(joint_hist * np.log(joint_hist + 1e-10))

        mi = h_past + h_future - h_joint
        norm = np.sqrt(h_past * h_future) if h_past > 0 and h_future > 0 else 1.0
        nmi[t] = mi / (norm + 1e-10)

    return nmi


def compute_spectral_entropy(series: np.ndarray,
                              window: int = 63) -> np.ndarray:
    """Spectral entropy of return power spectrum.

    Uniform spectrum (white noise) -> high entropy.
    Concentrated spectrum (strong periodicity) -> low entropy.
    """
    se = np.full(len(series), np.nan)
    for t in range(window, len(series)):
        segment = series[t - window:t]
        # FFT power spectrum
        fft_vals = np.fft.rfft(segment - segment.mean())
        psd = np.abs(fft_vals) ** 2
        psd = psd / (psd.sum() + 1e-10)
        psd = psd[psd > 0]
        se[t] = -np.sum(psd * np.log(psd + 1e-10)) / np.log(len(psd) + 1)
    return se


def compute_info_theory_features(close: pd.Series) -> pd.DataFrame:
    """Compute all Group 17 information-theoretic features."""
    log_ret = np.log(close / close.shift(1)).fillna(0).values

    result = pd.DataFrame(index=close.index)

    print("    [17a] KL divergence regime shift...")
    result['kl_regime_shift'] = compute_kl_regime_shift(log_ret, window=126)

    print("    [17b] Sample entropy...")
    result['sample_entropy'] = compute_sample_entropy(log_ret, window=60)

    print("    [17c] NMI predictability...")
    result['nmi_predictability'] = compute_nmi_predictability(log_ret, window=126)

    print("    [17d] Spectral entropy...")
    result['spectral_entropy'] = compute_spectral_entropy(log_ret, window=63)

    return result


# ─── Group 18: Multifractal DFA ──────────────────────────────────────────────
def compute_mfdfa_features(close: pd.Series,
                           window: int = 252) -> pd.DataFrame:
    """Multifractal Detrended Fluctuation Analysis features.

    Ref: Kantelhardt et al. (2002), PMC 8392555 (2021)

    Computes:
      - delta_alpha: spectrum width (large = complex multifractal)
      - H(2): standard Hurst exponent from q=2
      - asymmetry: left vs right spectrum skew

    CAUSAL: rolling window computation.
    """
    log_ret = np.log(close / close.shift(1)).fillna(0).values
    n = len(log_ret)

    delta_alpha = np.full(n, np.nan)
    hurst2 = np.full(n, np.nan)
    asymmetry = np.full(n, np.nan)

    q_list = np.array([-3, -2, -1, 1, 2, 3, 4, 5], dtype=float)

    for t in range(window, n):
        series = log_ret[t - window:t]
        try:
            result = _mfdfa_single(series, q_list)
            if result is not None:
                delta_alpha[t] = result['delta_alpha']
                hurst2[t] = result['hurst2']
                asymmetry[t] = result['asymmetry']
        except Exception:
            continue

    out = pd.DataFrame(index=close.index)
    out['mf_delta_alpha'] = _rolling_zscore(delta_alpha, 126)
    out['mf_hurst2'] = hurst2
    out['mf_asymmetry'] = asymmetry
    return out


def _mfdfa_single(series, q_list, order=1):
    """MF-DFA for a single window. Returns dict or None."""
    N = len(series)
    if N < 30:
        return None

    # Profile (cumulative sum of mean-subtracted series)
    Y = np.cumsum(series - series.mean())

    # Scales: log-spaced from 8 to N/4
    scales = np.unique(np.logspace(
        np.log10(8), np.log10(max(10, N // 4)), 10
    ).astype(int))
    scales = scales[scales >= 4]

    if len(scales) < 3:
        return None

    Fq = np.zeros((len(q_list), len(scales)))

    for si, s in enumerate(scales):
        n_seg = N // s
        if n_seg < 1:
            continue

        fluct = np.zeros(n_seg)
        t_vec = np.arange(s)

        for v in range(n_seg):
            seg = Y[v * s:(v + 1) * s]
            if len(seg) < s:
                continue
            coeff = np.polyfit(t_vec[:len(seg)], seg, order)
            trend = np.polyval(coeff, t_vec[:len(seg)])
            fluct[v] = np.mean((seg - trend) ** 2)

        fluct = fluct[fluct > 0]
        if len(fluct) == 0:
            continue

        for qi, q in enumerate(q_list):
            if q == 0:
                Fq[qi, si] = np.exp(0.5 * np.mean(np.log(fluct + 1e-20)))
            else:
                Fq[qi, si] = np.mean(fluct ** (q / 2)) ** (1 / q)

    # Generalized Hurst exponents h(q)
    log_s = np.log(scales)
    hq = np.zeros(len(q_list))
    for qi in range(len(q_list)):
        valid = Fq[qi] > 0
        if valid.sum() >= 3:
            hq[qi] = np.polyfit(log_s[valid], np.log(Fq[qi][valid] + 1e-20), 1)[0]

    # Multifractal spectrum
    tau_q = q_list * hq - 1
    alpha_holder = np.gradient(tau_q, q_list)

    delta_alpha = float(alpha_holder.max() - alpha_holder.min())

    # H(2) = standard Hurst
    idx_q2 = np.argmin(np.abs(q_list - 2))
    hurst2_val = float(hq[idx_q2])

    # Asymmetry: left vs right tail
    mid = len(alpha_holder) // 2
    left_width = alpha_holder[mid] - alpha_holder[0] if mid > 0 else 0
    right_width = alpha_holder[-1] - alpha_holder[mid] if mid < len(alpha_holder) - 1 else 0
    asym = float((left_width - right_width) / (delta_alpha + 1e-10))

    return {'delta_alpha': delta_alpha, 'hurst2': hurst2_val, 'asymmetry': asym}


# ─── Group 19: TDA Persistent Homology ──────────────────────────────────────
def compute_tda_features(close: pd.Series, window: int = 50,
                         embedding_dim: int = 3,
                         embedding_lag: int = 1) -> pd.DataFrame:
    """Topological Data Analysis features via persistent homology.

    Embeds a sliding window of returns into a point cloud using
    Takens' time-delay embedding, then computes Vietoris-Rips
    persistence. Tracks the lifetime of 1-dimensional topological
    features (loops) as crash indicators.

    Ref: MDPI Computers 2025, ACM AMMIC 2025

    CAUSAL: uses [t-W, t] only.
    """
    log_ret = np.log(close / close.shift(1)).fillna(0).values
    n = len(log_ret)

    pers_norm = np.full(n, np.nan)
    betti_ratio = np.full(n, np.nan)

    embed_size = embedding_dim + (embedding_dim - 1) * embedding_lag

    for t in range(max(window, embed_size + 10), n):
        segment = log_ret[t - window:t]

        try:
            # Takens time-delay embedding
            rows = []
            for i in range(len(segment) - embed_size + 1):
                row = [segment[i + j * embedding_lag]
                       for j in range(embedding_dim)]
                rows.append(row)
            cloud = np.array(rows)

            if len(cloud) < 5:
                continue

            if HAS_TDA:
                # Use giotto-tda for persistence
                vrp = VietorisRipsPersistence(
                    homology_dimensions=[0, 1], max_edge_length=2.0,
                    n_jobs=1
                )
                diagrams = vrp.fit_transform(cloud[np.newaxis, :, :])
                dgm = diagrams[0]

                # H0: connected components, H1: loops
                h0 = dgm[dgm[:, 2] == 0]
                h1 = dgm[dgm[:, 2] == 1]

                # Filter out infinite persistence
                h0_finite = h0[np.isfinite(h0[:, 1])]
                h1_finite = h1[np.isfinite(h1[:, 1])]

                # L2 norm of H1 persistence (loop lifetimes)
                if len(h1_finite) > 0:
                    lifetimes = h1_finite[:, 1] - h1_finite[:, 0]
                    pers_norm[t] = float(np.sqrt(np.sum(lifetimes ** 2)))
                else:
                    pers_norm[t] = 0.0

                # Betti ratio: number of loops / number of components
                n_h0 = max(1, len(h0_finite))
                n_h1 = len(h1_finite)
                betti_ratio[t] = float(n_h1 / n_h0)
            else:
                # Fallback: simple distance-based proxy
                from scipy.spatial.distance import pdist
                dists = pdist(cloud)
                # Approximate topological complexity from distance distribution
                pers_norm[t] = float(np.std(dists))
                betti_ratio[t] = float(np.mean(dists < np.median(dists)))
        except Exception:
            continue

    result = pd.DataFrame(index=close.index)
    result['tda_persistence_norm'] = _rolling_zscore(pers_norm, 126)
    result['tda_betti_ratio'] = betti_ratio
    return result


# ─── Master: Build All Advanced Features ─────────────────────────────────────
def build_advanced_features(df: pd.DataFrame, close: pd.Series,
                            cfg) -> pd.DataFrame:
    """Build all 16 advanced features (Groups 15-19).

    Args:
        df: OHLCV DataFrame
        close: close price Series
        cfg: MonolithConfig

    Returns: DataFrame with ADVANCED_FEATURE_COLUMNS
    """
    t0 = time.time()
    print(f"\n  Building {len(ADVANCED_FEATURE_COLUMNS)} advanced features...")

    result = pd.DataFrame(index=df.index)

    # Group 15: HMM Regime (3 features)
    print("  [15/19] HMM Regime Detection (3 features)")
    hmm_df = compute_hmm_regime(close, n_states=3, window=252)
    for col in ['hmm_regime', 'hmm_regime_prob', 'hmm_regime_duration']:
        result[col] = hmm_df[col] if col in hmm_df.columns else np.nan

    # Group 16: Wavelet Decomposition (4 features)
    print("  [16/19] Wavelet Decomposition (4 features)")
    wav_df = compute_wavelet_features(close, wavelet='db4', level=4, window=63)
    for col in ['wavelet_trend_energy', 'wavelet_detail_energy',
                'wavelet_snr', 'wavelet_dominant_scale']:
        result[col] = wav_df[col] if col in wav_df.columns else np.nan

    # Group 17: Information Theory (4 features)
    print("  [17/19] Information Theory Advanced (4 features)")
    info_df = compute_info_theory_features(close)
    for col in ['kl_regime_shift', 'sample_entropy',
                'nmi_predictability', 'spectral_entropy']:
        result[col] = info_df[col] if col in info_df.columns else np.nan

    # Group 18: Multifractal DFA (3 features)
    print("  [18/19] Multifractal DFA (3 features)")
    mf_df = compute_mfdfa_features(close, window=252)
    for col in ['mf_delta_alpha', 'mf_hurst2', 'mf_asymmetry']:
        result[col] = mf_df[col] if col in mf_df.columns else np.nan

    # Group 19: TDA Persistent Homology (2 features)
    print("  [19/19] TDA Persistent Homology (2 features)")
    tda_df = compute_tda_features(close, window=50, embedding_dim=3)
    for col in ['tda_persistence_norm', 'tda_betti_ratio']:
        result[col] = tda_df[col] if col in tda_df.columns else np.nan

    elapsed = time.time() - t0
    n_valid = result.notna().any(axis=1).sum()
    print(f"\n  Advanced features: {len(ADVANCED_FEATURE_COLUMNS)} features, "
          f"{n_valid} valid days ({elapsed:.1f}s)")

    return result


In [7]:
# ============================================================================
# CELL 4: AFML EVENT-DRIVEN PIPELINE (Lopez de Prado, 2018)
# ============================================================================
# Implements: CUSUM filter, Triple Barrier Labels, Meta-Labeling, Bet Sizing
# Reference: Advances in Financial Machine Learning, Chapters 2-3, 5
# ============================================================================

from scipy.stats import norm

def get_daily_vol(close, span=50):
    """EWM standard deviation of log returns.
    
    Args:
        close: pd.Series of prices indexed by datetime.
        span: int, span for exponential weighted moving average.
    
    Returns:
        pd.Series of daily volatility estimates.
    """
    log_ret = np.log(close).diff().dropna()
    return log_ret.ewm(span=span, min_periods=max(1, span // 2)).std()


def cusum_filter(close, threshold):
    """Symmetric CUSUM filter for event detection.
    
    Detects structural breaks by tracking positive and negative cumulative
    sums of log returns. When either sum exceeds the threshold, an event
    is recorded and the cumulative sum resets to zero.
    
    Args:
        close: pd.Series of prices indexed by datetime.
        threshold: float or pd.Series. If Series, must share close's index.
                   Typical usage: daily_vol * multiplier (e.g., 2.0).
    
    Returns:
        pd.DatetimeIndex of event timestamps where structural breaks detected.
    """
    log_ret = np.log(close).diff().dropna()
    events = []
    s_pos = 0.0
    s_neg = 0.0
    
    # Convert scalar threshold to Series for uniform handling
    if isinstance(threshold, (int, float)):
        thresh_series = pd.Series(threshold, index=log_ret.index)
    else:
        thresh_series = threshold.reindex(log_ret.index, method='ffill')
    
    for t, r in log_ret.items():
        h = thresh_series.loc[t]
        if np.isnan(h) or np.isnan(r):
            continue
        
        # Update cumulative sums (reset floor at zero)
        s_pos = max(0.0, s_pos + r)
        s_neg = min(0.0, s_neg + r)
        
        # Check if either sum breaches threshold
        if s_pos > h:
            events.append(t)
            s_pos = 0.0
            s_neg = 0.0
        elif s_neg < -h:
            events.append(t)
            s_pos = 0.0
            s_neg = 0.0
    
    return pd.DatetimeIndex(events)


def triple_barrier_labels(close, events, pt_sl=(2.0, 1.0), num_days=10, min_ret=0.0):
    """Triple barrier labeling with volatility-scaled barriers.
    
    For each event, sets three barriers:
      - Upper (profit-take): pt_sl[0] * daily_vol above entry
      - Lower (stop-loss):  -pt_sl[1] * daily_vol below entry
      - Vertical:            num_days forward (max holding period)
    
    The label is determined by which barrier is touched first.
    
    Args:
        close: pd.Series of prices indexed by datetime.
        events: pd.DatetimeIndex of event timestamps (from cusum_filter).
        pt_sl: tuple (profit_take_mult, stop_loss_mult) of daily vol.
               Set either to 0 to disable that barrier.
        num_days: int, maximum holding period in trading days.
        min_ret: float, minimum absolute log return for a non-zero label
                 when the vertical barrier is hit first.
    
    Returns:
        pd.DataFrame indexed by event timestamps with columns:
            'ret':   realized log return at exit
            'bin':   label (+1 profit-take, -1 stop-loss, 0 vertical/below min_ret)
            't_end': timestamp when position was closed
    """
    daily_vol = get_daily_vol(close, span=50)
    
    results = []
    
    for t0 in events:
        if t0 not in close.index or t0 not in daily_vol.index:
            continue
        
        vol = daily_vol.loc[t0]
        if np.isnan(vol) or vol <= 0:
            continue
        
        p0 = close.loc[t0]
        
        # Define barrier levels
        upper_barrier = pt_sl[0] * vol if pt_sl[0] > 0 else np.inf
        lower_barrier = -pt_sl[1] * vol if pt_sl[1] > 0 else -np.inf
        
        # Get the forward price path from t0 up to t0 + num_days bars
        t0_idx = close.index.get_loc(t0)
        t_end_idx = min(t0_idx + num_days, len(close) - 1)
        path = close.iloc[t0_idx: t_end_idx + 1]
        
        if len(path) < 2:
            continue
        
        # Log returns relative to entry
        log_returns = np.log(path / p0)
        
        # Find first crossing of each barrier
        hit_upper = log_returns[log_returns >= upper_barrier].index
        hit_lower = log_returns[log_returns <= lower_barrier].index
        
        # Determine first touch times (use NaT for unhit barriers)
        t_upper = hit_upper[0] if len(hit_upper) > 0 else pd.NaT
        t_lower = hit_lower[0] if len(hit_lower) > 0 else pd.NaT
        t_vert = path.index[-1]  # vertical barrier always exists
        
        # Find earliest barrier touch
        candidates = {}
        if not pd.isna(t_upper):
            candidates[t_upper] = 'upper'
        if not pd.isna(t_lower):
            candidates[t_lower] = 'lower'
        candidates[t_vert] = 'vertical'
        
        t_end = min(candidates.keys())
        barrier_type = candidates[t_end]
        
        # Realized log return at exit
        ret = np.log(close.loc[t_end] / p0)
        
        # Assign label
        if barrier_type == 'upper':
            label = 1
        elif barrier_type == 'lower':
            label = -1
        else:
            # Vertical barrier: label based on return direction if above min_ret
            if abs(ret) > min_ret:
                label = int(np.sign(ret))
            else:
                label = 0
        
        results.append({'t0': t0, 'ret': ret, 'bin': label, 't_end': t_end})
    
    if len(results) == 0:
        return pd.DataFrame(columns=['ret', 'bin', 't_end'])
    
    df = pd.DataFrame(results).set_index('t0')
    df.index.name = None
    return df


def meta_labeling(primary_preds, true_labels):
    """Generate meta-labels: 1 if primary model correctly predicted direction.
    
    Meta-labeling separates side (direction) from size (confidence).
    The primary model decides direction; the meta-model decides whether
    to take the trade and how large to size it.
    
    Args:
        primary_preds: pd.Series of predicted directions (+1/-1).
        true_labels: pd.Series of actual directions (+1/-1/0).
    
    Returns:
        pd.Series of 0/1 meta-labels. 1 = primary model was correct.
    """
    # Align indices
    common_idx = primary_preds.index.intersection(true_labels.index)
    preds = primary_preds.loc[common_idx]
    actuals = true_labels.loc[common_idx]
    
    # Meta-label: 1 if signs agree (correct prediction), 0 otherwise
    # A zero true label means the trade was unprofitable, so meta-label = 0
    meta = (np.sign(preds) == np.sign(actuals)).astype(int)
    
    # If true label is 0, the trade had no edge -> meta = 0
    meta[actuals == 0] = 0
    
    return meta


def bet_size(meta_probs, max_leverage=1.0):
    """Probit-based position sizing from meta-model probabilities.
    
    Converts P(correct) from the meta-model into a continuous position
    size using the inverse normal CDF (probit function). This maps
    probabilities smoothly to position sizes with natural concavity
    near 0 and 1.
    
    Args:
        meta_probs: pd.Series of P(correct) from meta-model, in [0, 1].
        max_leverage: float, maximum absolute position size.
    
    Returns:
        pd.Series of position sizes in [-max_leverage, max_leverage].
        Values near 0.5 produce small sizes; values near 0 or 1 produce
        sizes approaching max_leverage.
    """
    # Clip to avoid infinities from norm.ppf at 0 and 1
    clipped = meta_probs.clip(1e-5, 1.0 - 1e-5)
    
    # Probit transform: map probability to z-score
    z = pd.Series(norm.ppf(clipped.values), index=clipped.index)
    
    # Convert z-score to position size via CDF
    # size = (2 * Phi(z) - 1) maps z to [-1, 1]
    # z > 0 (prob > 0.5) -> positive size, z < 0 (prob < 0.5) -> negative size
    size = (2.0 * norm.cdf(z) - 1.0) * max_leverage
    
    return size


print("=" * 70)
print("AFML EVENT-DRIVEN PIPELINE")
print("  Functions defined: get_daily_vol, cusum_filter, triple_barrier_labels,")
print("                     meta_labeling, bet_size")
print("  These will be applied during walk-forward training in Cell 6.")
print("=" * 70)

AFML EVENT-DRIVEN PIPELINE
  Functions defined: get_daily_vol, cusum_filter, triple_barrier_labels,
                     meta_labeling, bet_size
  These will be applied during walk-forward training in Cell 6.


In [8]:
# ============================================================================
# CELL 5: MOMENTUM TRANSFORMER (Temporal Fusion Transformer Architecture)
# ============================================================================
# Implements the full TFT architecture from Lim et al. (2021) adapted for
# momentum signal generation with Sharpe ratio loss.
# All layers are serializable with proper get_config() methods.
# ============================================================================

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class GluLayer(layers.Layer):
    """Gated Linear Unit: splits transformation into value and gate streams.
    
    GLU(x) = Dense_value(x) * sigmoid(Dense_gate(x))
    
    The gate learns which components of the transformation to pass through,
    providing a learnable skip-like mechanism at the feature level.
    """
    
    def __init__(self, hidden_size, dropout_rate=None, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.hidden_size = hidden_size
        self.dropout_rate = dropout_rate
        self.activation_name = activation
        
        # Value stream: optional activation (e.g., ELU for GRN intermediate)
        self.dense_value = layers.Dense(hidden_size, activation=activation)
        # Gate stream: always sigmoid for gating
        self.dense_gate = layers.Dense(hidden_size, activation='sigmoid')
        
        self.dropout_layer = None
        if dropout_rate is not None and dropout_rate > 0:
            self.dropout_layer = layers.Dropout(dropout_rate)
    
    def call(self, inputs):
        value = self.dense_value(inputs)
        gate = self.dense_gate(inputs)
        
        if self.dropout_layer is not None:
            value = self.dropout_layer(value)
        
        glu_output = value * gate
        return glu_output, gate
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'hidden_size': self.hidden_size,
            'dropout_rate': self.dropout_rate,
            'activation': self.activation_name,
        })
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class GatedResidualNetwork(layers.Layer):
    """Gated Residual Network: the core building block of TFT.
    
    Architecture:
        eta_1 = Dense(hidden_size)(input)             [+ Dense(hidden_size)(context) if context]
        eta_2 = ELU(eta_1)
        eta_1_prime = Dense(hidden_size)(eta_2)
        glu_output = GLU(eta_1_prime)
        output = LayerNorm(input_skip + glu_output)
    
    When output_size != input_size, a skip projection is applied to the input.
    """
    
    def __init__(self, hidden_size, output_size=None, dropout_rate=None,
                 context_size=None, **kwargs):
        super().__init__(**kwargs)
        self.hidden_size = hidden_size
        self.output_size = output_size or hidden_size
        self.dropout_rate = dropout_rate
        self.context_size = context_size
        
        # Primary pathway
        self.dense1 = layers.Dense(hidden_size)
        self.dense2 = layers.Dense(self.output_size)
        self.glu = GluLayer(self.output_size, dropout_rate=dropout_rate)
        self.layer_norm = layers.LayerNormalization()
        
        # Optional context injection
        self.context_dense = None
        if context_size is not None:
            self.context_dense = layers.Dense(hidden_size, use_bias=False)
        
        # Skip projection (created dynamically if needed)
        self._skip_layer = None
        self._skip_built = False
    
    def call(self, inputs, context=None, return_gate=False):
        # Build skip projection on first call if input dim != output_size
        if not self._skip_built:
            input_dim = inputs.shape[-1]
            if input_dim is not None and input_dim != self.output_size:
                self._skip_layer = layers.Dense(self.output_size)
            self._skip_built = True
        
        # Skip connection
        if self._skip_layer is not None:
            skip = self._skip_layer(inputs)
        else:
            skip = inputs
        
        # Primary pathway
        eta_1 = self.dense1(inputs)
        
        # Context injection (additive)
        if context is not None and self.context_dense is not None:
            eta_1 = eta_1 + self.context_dense(context)
        
        # ELU activation (using tf.nn.elu as specified)
        eta_2 = tf.nn.elu(eta_1)
        
        # Second dense
        eta_1_prime = self.dense2(eta_2)
        
        # GLU gating
        glu_output, gate = self.glu(eta_1_prime)
        
        # Residual connection + layer norm
        output = self.layer_norm(skip + glu_output)
        
        if return_gate:
            return output, gate
        return output
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'hidden_size': self.hidden_size,
            'output_size': self.output_size,
            'dropout_rate': self.dropout_rate,
            'context_size': self.context_size,
        })
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class VariableSelectionNetwork(layers.Layer):
    """Variable Selection Network: soft feature importance via learned weights.
    
    Each input feature is processed by its own GRN, then a selection GRN
    produces softmax weights over features. The output is the weighted
    combination of per-feature GRN outputs.
    
    This is the TFT's primary interpretability mechanism -- the softmax
    weights directly indicate feature importance at each timestep.
    
    Input shape:  (batch, time, num_inputs, hidden_size) -- after embedding
    Output shape: (batch, time, hidden_size)
    Weights shape: (batch, time, num_inputs, 1)
    """
    
    def __init__(self, num_inputs, hidden_size, dropout_rate=None,
                 context_size=None, **kwargs):
        super().__init__(**kwargs)
        self.num_inputs = num_inputs
        self.hidden_size = hidden_size
        self.dropout_rate = dropout_rate
        self.context_size = context_size
        
        # Per-feature GRNs: each processes one feature independently
        self.feature_grns = [
            GatedResidualNetwork(
                hidden_size=hidden_size,
                output_size=hidden_size,
                dropout_rate=dropout_rate,
                name=f"feature_grn_{i}"
            )
            for i in range(num_inputs)
        ]
        
        # Selection GRN: produces weights over features
        # Input is flattened features: (batch, time, num_inputs * hidden_size)
        # Output: (batch, time, num_inputs) -> softmax -> weights
        self.selection_grn = GatedResidualNetwork(
            hidden_size=hidden_size,
            output_size=num_inputs,
            dropout_rate=dropout_rate,
            context_size=context_size,
            name="selection_grn"
        )
    
    def call(self, inputs, context=None):
        # inputs: (batch, time, num_inputs, hidden_size)
        batch_size = tf.shape(inputs)[0]
        time_steps = tf.shape(inputs)[1]
        
        # Flatten for selection GRN: (batch, time, num_inputs * hidden_size)
        flattened = tf.reshape(inputs, [batch_size, time_steps,
                                        self.num_inputs * self.hidden_size])
        
        # Selection weights via GRN + softmax
        selection_output = self.selection_grn(flattened, context=context)
        # selection_output: (batch, time, num_inputs)
        weights = tf.nn.softmax(selection_output, axis=-1)
        # weights: (batch, time, num_inputs)
        weights_expanded = tf.expand_dims(weights, axis=-1)
        # weights_expanded: (batch, time, num_inputs, 1)
        
        # Process each feature through its own GRN
        processed_features = []
        for i in range(self.num_inputs):
            # Extract feature i: (batch, time, hidden_size)
            feat_i = inputs[:, :, i, :]
            # Apply per-feature GRN
            grn_out = self.feature_grns[i](feat_i)
            processed_features.append(grn_out)
        
        # Stack: (batch, time, num_inputs, hidden_size)
        stacked = tf.stack(processed_features, axis=2)
        
        # Weighted combination: (batch, time, hidden_size)
        selected = tf.reduce_sum(stacked * weights_expanded, axis=2)
        
        return selected, weights_expanded
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'num_inputs': self.num_inputs,
            'hidden_size': self.hidden_size,
            'dropout_rate': self.dropout_rate,
            'context_size': self.context_size,
        })
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class ScaledDotProductAttention(layers.Layer):
    """Standard scaled dot-product attention.
    
    Attention(Q, K, V) = softmax(Q @ K^T / sqrt(d_k)) @ V
    
    Supports optional causal masking (lower-triangular) to prevent
    attending to future positions.
    """
    
    def __init__(self, dropout_rate=0.0, **kwargs):
        super().__init__(**kwargs)
        self.dropout_rate = dropout_rate
        self.dropout_layer = layers.Dropout(dropout_rate) if dropout_rate > 0 else None
    
    def call(self, q, k, v, mask=None):
        # q, k, v: (batch, ..., seq_len, d_k)
        d_k = tf.cast(tf.shape(k)[-1], tf.float32)
        
        # Scaled dot product: (batch, ..., seq_q, seq_k)
        scores = tf.matmul(q, k, transpose_b=True) / tf.sqrt(d_k)
        
        # Apply mask (e.g., causal mask): masked positions get -1e9
        if mask is not None:
            scores = scores + (1.0 - mask) * (-1e9)
        
        # Softmax over keys
        attention_weights = tf.nn.softmax(scores, axis=-1)
        
        if self.dropout_layer is not None:
            attention_weights = self.dropout_layer(attention_weights)
        
        # Weighted sum of values
        output = tf.matmul(attention_weights, v)
        
        return output, attention_weights
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'dropout_rate': self.dropout_rate,
        })
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class InterpretableMultiHeadAttention(layers.Layer):
    """Interpretable Multi-Head Attention from TFT paper.
    
    Key differences from standard Transformer MHA:
    1. All heads SHARE the same value projection (W_v)
    2. Head outputs are AVERAGED, not concatenated
    
    This design enables direct interpretation of attention patterns
    because each head attends to the same value representation.
    The averaged attention weights can be examined per-head to understand
    what temporal patterns the model has learned.
    """
    
    def __init__(self, num_heads, d_model, dropout_rate=0.0, **kwargs):
        super().__init__(**kwargs)
        self.num_heads = num_heads
        self.d_model = d_model
        self.dropout_rate = dropout_rate
        self.d_head = d_model // num_heads
        
        # Per-head Q and K projections
        self.W_q = [layers.Dense(self.d_head, use_bias=False, name=f"W_q_{i}")
                     for i in range(num_heads)]
        self.W_k = [layers.Dense(self.d_head, use_bias=False, name=f"W_k_{i}")
                     for i in range(num_heads)]
        
        # SHARED value projection across all heads
        self.W_v = layers.Dense(self.d_head, use_bias=False, name="W_v_shared")
        
        # Output projection
        self.W_o = layers.Dense(d_model, name="W_o")
        
        # Scaled dot-product attention
        self.attention = ScaledDotProductAttention(dropout_rate=dropout_rate)
    
    def call(self, q, k, v, mask=None):
        # Shared value projection: (batch, seq, d_head)
        v_shared = self.W_v(v)
        
        head_outputs = []
        head_attentions = []
        
        for i in range(self.num_heads):
            # Per-head query and key projections
            q_i = self.W_q[i](q)  # (batch, seq_q, d_head)
            k_i = self.W_k[i](k)  # (batch, seq_k, d_head)
            
            # Attention with shared values
            attn_output, attn_weights = self.attention(q_i, k_i, v_shared, mask=mask)
            head_outputs.append(attn_output)
            head_attentions.append(attn_weights)
        
        # AVERAGE head outputs (not concatenate) -- key TFT design choice
        # Stack: (num_heads, batch, seq, d_head)
        stacked_outputs = tf.stack(head_outputs, axis=0)
        averaged = tf.reduce_mean(stacked_outputs, axis=0)  # (batch, seq, d_head)
        
        # Output projection back to d_model
        output = self.W_o(averaged)  # (batch, seq, d_model)
        
        # Stack attention weights for interpretability: (batch, num_heads, seq_q, seq_k)
        stacked_attentions = tf.stack(head_attentions, axis=1)
        
        return output, stacked_attentions
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'num_heads': self.num_heads,
            'd_model': self.d_model,
            'dropout_rate': self.dropout_rate,
        })
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class MomentumTransformer(Model):
    """Complete Temporal Fusion Transformer for momentum signal generation.
    
    Architecture flow:
    1. Feature Embedding: per-feature Dense projections to hidden_size
    2. Variable Selection: learned soft feature importance via VSN
    3. LSTM Encoder: capture temporal dependencies
    4. Post-LSTM: GRN + GLU gate + skip + LayerNorm
    5. Interpretable Multi-Head Self-Attention with causal mask
    6. Post-Attention: GRN + GLU gate + skip + LayerNorm
    7. Output: Dense(tanh) -> signal in [-1, 1]
    
    The model is fully interpretable: VSN weights show feature importance,
    attention weights show temporal dependencies.
    """
    
    def __init__(self, time_steps, input_size, output_size, hidden_size,
                 num_heads, dropout_rate=0.1, **kwargs):
        super().__init__(**kwargs)
        self.time_steps = time_steps
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.dropout_rate = dropout_rate
        
        # 1. Per-feature embedding layers
        self.feature_embeddings = [
            layers.Dense(hidden_size, name=f"feat_embed_{i}")
            for i in range(input_size)
        ]
        
        # 2. Variable Selection Network
        self.vsn = VariableSelectionNetwork(
            num_inputs=input_size,
            hidden_size=hidden_size,
            dropout_rate=dropout_rate,
            name="vsn"
        )
        
        # 3. LSTM Encoder
        self.lstm = layers.LSTM(
            hidden_size,
            return_sequences=True,
            dropout=dropout_rate if dropout_rate else 0.0,
            name="lstm_encoder"
        )
        
        # 4. Post-LSTM processing
        self.post_lstm_grn = GatedResidualNetwork(
            hidden_size=hidden_size,
            dropout_rate=dropout_rate,
            name="post_lstm_grn"
        )
        self.post_lstm_glu = GluLayer(
            hidden_size=hidden_size,
            dropout_rate=dropout_rate,
            name="post_lstm_glu"
        )
        self.post_lstm_norm = layers.LayerNormalization(name="post_lstm_norm")
        
        # 5. Interpretable Multi-Head Self-Attention
        self.mha = InterpretableMultiHeadAttention(
            num_heads=num_heads,
            d_model=hidden_size,
            dropout_rate=dropout_rate,
            name="interpretable_mha"
        )
        
        # 6. Post-attention processing
        self.post_attn_grn = GatedResidualNetwork(
            hidden_size=hidden_size,
            dropout_rate=dropout_rate,
            name="post_attn_grn"
        )
        self.post_attn_glu = GluLayer(
            hidden_size=hidden_size,
            dropout_rate=dropout_rate,
            name="post_attn_glu"
        )
        self.post_attn_norm = layers.LayerNormalization(name="post_attn_norm")
        
        # 7. Output layer
        self.output_dense = layers.Dense(
            output_size,
            activation='tanh',
            name="output_signal"
        )
    
    def call(self, x, return_weights=False):
        # x shape: (batch, time_steps, input_size)
        
        # --- 1. Feature Embedding ---
        # Project each feature to hidden_size independently
        embedded = [self.feature_embeddings[i](x[:, :, i:i+1])
                     for i in range(self.input_size)]
        # Stack: (batch, time, num_features, hidden_size)
        embedded = tf.stack(embedded, axis=2)
        
        # --- 2. Variable Selection ---
        vsn_output, vsn_weights = self.vsn(embedded)
        # vsn_output: (batch, time, hidden_size)
        
        # --- 3. LSTM Encoder ---
        lstm_output = self.lstm(vsn_output)
        # lstm_output: (batch, time, hidden_size)
        
        # --- 4. Post-LSTM: GRN + GLU gate + residual + LayerNorm ---
        post_lstm = self.post_lstm_grn(lstm_output)
        post_lstm_gated, _ = self.post_lstm_glu(post_lstm)
        post_lstm_out = self.post_lstm_norm(vsn_output + post_lstm_gated)
        
        # --- 5. Causal Self-Attention ---
        seq_len = tf.shape(post_lstm_out)[1]
        causal_mask = tf.linalg.band_part(
            tf.ones([seq_len, seq_len]), -1, 0
        )  # Lower triangular: position i can attend to positions <= i
        causal_mask = tf.expand_dims(tf.expand_dims(causal_mask, 0), 0)
        # (1, 1, seq, seq) -- broadcasts over batch and heads
        
        attn_output, attn_weights = self.mha(
            post_lstm_out, post_lstm_out, post_lstm_out, mask=causal_mask
        )
        
        # --- 6. Post-Attention: GRN + GLU gate + residual + LayerNorm ---
        post_attn = self.post_attn_grn(attn_output)
        post_attn_gated, _ = self.post_attn_glu(post_attn)
        post_attn_out = self.post_attn_norm(post_lstm_out + post_attn_gated)
        
        # --- 7. Output signal ---
        signal = self.output_dense(post_attn_out)
        # signal: (batch, time, output_size) with values in [-1, 1]
        
        if return_weights:
            return signal, {
                'vsn_weights': vsn_weights,   # (batch, time, num_inputs, 1)
                'attn_weights': attn_weights,  # (batch, num_heads, seq, seq)
            }
        return signal
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'time_steps': self.time_steps,
            'input_size': self.input_size,
            'output_size': self.output_size,
            'hidden_size': self.hidden_size,
            'num_heads': self.num_heads,
            'dropout_rate': self.dropout_rate,
        })
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class SharpeLoss(tf.keras.losses.Loss):
    """Negative annualized Sharpe ratio as differentiable loss function.
    
    loss = -(mean(signal * return) / std(signal * return)) * sqrt(252)
    
    Uses ddof=1 (Bessel's correction) for unbiased sample standard deviation.
    The model learns to produce signals that maximize risk-adjusted returns
    directly, rather than optimizing a proxy like MSE.
    """
    
    def __init__(self, output_size=1, **kwargs):
        super().__init__(**kwargs)
        self.output_size = output_size
    
    def call(self, y_true, y_pred):
        # y_true: actual returns, y_pred: predicted signals
        # Both: (batch, time, 1) or (batch, time)
        strategy_returns = y_pred * y_true
        
        # Flatten to compute portfolio-level Sharpe
        strategy_returns = tf.reshape(strategy_returns, [-1])
        
        mean_ret = tf.reduce_mean(strategy_returns)
        n = tf.cast(tf.size(strategy_returns), tf.float32)
        
        # Unbiased variance with ddof=1: sum((x - mean)^2) / (n - 1)
        var = tf.reduce_sum(tf.square(strategy_returns - mean_ret)) / (n - 1.0)
        std = tf.sqrt(var + 1e-9)  # Small epsilon for numerical stability
        
        # Negative annualized Sharpe (negative because we minimize loss)
        sharpe = (mean_ret / std) * tf.sqrt(252.0)
        return -sharpe
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'output_size': self.output_size,
        })
        return config


# --- Build and verify the model ---
print("=" * 70)
print("MOMENTUM TRANSFORMER (TFT Architecture)")
print("=" * 70)

# Configuration
TIME_STEPS = 20
INPUT_SIZE = 8    # Number of features
OUTPUT_SIZE = 1   # Single signal output
HIDDEN_SIZE = 32  # Hidden dimension
NUM_HEADS = 4     # Attention heads
DROPOUT_RATE = 0.1

# Build model
model = MomentumTransformer(
    time_steps=TIME_STEPS,
    input_size=INPUT_SIZE,
    output_size=OUTPUT_SIZE,
    hidden_size=HIDDEN_SIZE,
    num_heads=NUM_HEADS,
    dropout_rate=DROPOUT_RATE,
)

# Test forward pass with dummy data
dummy_x = tf.random.normal([16, TIME_STEPS, INPUT_SIZE])
dummy_y = tf.random.normal([16, TIME_STEPS, OUTPUT_SIZE])

# Forward pass with weights
signal, weights = model(dummy_x, return_weights=True)
print(f"\nModel architecture verified:")
print(f"  Input shape:       {dummy_x.shape}")
print(f"  Signal shape:      {signal.shape}")
print(f"  VSN weights shape: {weights['vsn_weights'].shape}")
print(f"  Attn weights shape:{weights['attn_weights'].shape}")
print(f"  Signal range:      [{tf.reduce_min(signal):.4f}, {tf.reduce_max(signal):.4f}]")

# Verify causal masking: attention at position i should have zero weight for j > i
attn = weights['attn_weights'][0, 0].numpy()  # First sample, first head
upper_triangle_sum = np.triu(attn, k=1).sum()
print(f"\n  Causal mask check (upper triangle sum): {upper_triangle_sum:.10f}")
assert upper_triangle_sum < 1e-6, "Causal masking is broken!"
print(f"  Causal masking: VERIFIED")

# Test Sharpe loss
loss_fn = SharpeLoss(output_size=OUTPUT_SIZE)
loss_val = loss_fn(dummy_y, signal)
print(f"\n  Sharpe loss value: {loss_val:.4f}")
print(f"  (Negative = model has positive Sharpe)")

# Compile and do one training step to verify gradients flow
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=SharpeLoss(output_size=OUTPUT_SIZE)
)

# Single training step
history = model.fit(dummy_x, dummy_y, epochs=1, batch_size=16, verbose=0)
print(f"  Training step loss: {history.history['loss'][0]:.4f}")

# Count parameters
total_params = model.count_params()
print(f"\n  Total parameters: {total_params:,}")

# VSN feature importance (interpretability demo)
vsn_w = weights['vsn_weights'].numpy()
mean_importance = vsn_w.mean(axis=(0, 1)).flatten()
feature_names = [f"feat_{i}" for i in range(INPUT_SIZE)]
importance_order = np.argsort(-mean_importance)
print(f"\n  Feature importance (VSN weights, descending):")
for rank, idx in enumerate(importance_order[:5]):
    print(f"    {rank+1}. {feature_names[idx]}: {mean_importance[idx]:.4f}")

# Serialization verification
config = model.get_config()
print(f"\n  Serialization config keys: {sorted(config.keys())}")
print(f"  Config: time_steps={config.get('time_steps')}, "
      f"input_size={config.get('input_size')}, "
      f"hidden_size={config.get('hidden_size')}, "
      f"num_heads={config.get('num_heads')}")

print("\n" + "=" * 70)
print("Momentum Transformer built, tested, and verified.")
print("=" * 70)

# NOTE: Enhanced Multi-Aspect Attention (EMAT) and Multi-Objective Loss
# are defined in the next cell. The MomentumTransformer above can be used
# as-is, or replaced with the EMAT-enhanced version for v3 training.


MOMENTUM TRANSFORMER (TFT Architecture)


I0000 00:00:1771086982.914809 3754101 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13775 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5


2026-02-14 16:36:27.623696: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:473] Loaded cuDNN version 91002



Model architecture verified:
  Input shape:       (16, 20, 8)
  Signal shape:      (1, 16, 20, 1)
  VSN weights shape: (16, 20, 8, 1)
  Attn weights shape:(1, 4, 16, 20, 20)
  Signal range:      [-0.9993, 0.9996]

  Causal mask check (upper triangle sum): 0.0000000000
  Causal masking: VERIFIED

  Sharpe loss value: 1.3554
  (Negative = model has positive Sharpe)


  Training step loss: 1.3554

  Total parameters: 69,393

  Feature importance (VSN weights, descending):
    1. feat_6: 0.1458
    2. feat_0: 0.1402
    3. feat_7: 0.1305
    4. feat_5: 0.1289
    5. feat_3: 0.1229

  Serialization config keys: ['dropout_rate', 'dtype', 'hidden_size', 'input_size', 'name', 'num_heads', 'output_size', 'time_steps', 'trainable']
  Config: time_steps=20, input_size=8, hidden_size=32, num_heads=4

Momentum Transformer built, tested, and verified.


In [9]:
# ============================================================================
# Enhanced Multi-Aspect Attention (EMAT) & Multi-Objective Loss
# ============================================================================
# Replaces standard Interpretable Multi-Head Attention with 3 specialized
# financial attention heads:
#   1. Temporal Decay Attention: exponential decay weighting (recent > old)
#   2. Trend Attention: attention over differenced (momentum) features
#   3. Volatility Attention: attention over rolling variance features
#
# Multi-Objective Loss: Sharpe + Volatility Consistency + Drawdown Penalty
#
# Ref: "EMAT: Enhanced Multi-Aspect Attention Transformer for Financial
#       Time Series Forecasting" (MDPI Entropy 2025)
# ============================================================================

@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class TemporalDecayAttention(layers.Layer):
    """Attention with exponential temporal decay.

    Applies exp(-lambda * |i-j|) weighting to attention scores,
    so recent time steps get higher attention regardless of content.
    lambda is learnable per head.
    """

    def __init__(self, d_model, dropout_rate=0.1, **kwargs):
        super().__init__(**kwargs)
        self.d_model = d_model
        self.dropout_rate = dropout_rate

    def build(self, input_shape):
        self.W_q = layers.Dense(self.d_model, use_bias=False, name="tda_Wq")
        self.W_k = layers.Dense(self.d_model, use_bias=False, name="tda_Wk")
        self.W_v = layers.Dense(self.d_model, use_bias=False, name="tda_Wv")
        # Learnable decay rate (initialized to ~0.1)
        self.decay_rate = self.add_weight(
            name="decay_rate", shape=(1,),
            initializer=tf.keras.initializers.Constant(0.1),
            trainable=True
        )
        self.dropout = layers.Dropout(dropout_rate)
        super().build(input_shape)

    def call(self, x, mask=None, training=None):
        Q = self.W_q(x)
        K = self.W_k(x)
        V = self.W_v(x)

        # Standard scaled dot-product scores
        d_k = tf.cast(tf.shape(K)[-1], tf.float32)
        scores = tf.matmul(Q, K, transpose_b=True) / tf.sqrt(d_k)

        # Temporal decay bias: exp(-lambda * |i - j|)
        seq_len = tf.shape(x)[1]
        positions = tf.cast(tf.range(seq_len), tf.float32)
        dist_matrix = tf.abs(positions[:, None] - positions[None, :])
        decay_bias = -tf.abs(self.decay_rate) * dist_matrix
        scores = scores + decay_bias[None, :, :]

        if mask is not None:
            scores += (1.0 - tf.cast(mask, tf.float32)) * -1e9

        weights = tf.nn.softmax(scores, axis=-1)
        weights = self.dropout(weights, training=training)
        output = tf.matmul(weights, V)
        return output, weights

    def get_config(self):
        config = super().get_config()
        config.update({'d_model': self.d_model, 'dropout_rate': self.dropout_rate})
        return config


@tf.keras.utils.register_keras_serializable(package="MomentumTFT")
class MultiObjectiveSharpeLoss(tf.keras.losses.Loss):
    """Multi-objective loss: Sharpe + Volatility Consistency + Drawdown.

    L = -Sharpe + alpha * VolConsistency + beta * MaxDrawdown

    - Sharpe: annualized (sqrt(252), ddof=1)
    - VolConsistency: penalizes return sequences with unstable volatility
    - MaxDrawdown: penalizes large drawdowns during training

    alpha=0.1, beta=0.2 recommended (Sharpe-dominant with DD protection).
    """

    def __init__(self, alpha=0.1, beta=0.2, **kwargs):
        super().__init__(**kwargs)
        self.alpha = alpha
        self.beta = beta

    def call(self, y_true, y_pred):
        # Strategy returns: signal * actual returns
        signal = tf.squeeze(y_pred, axis=-1) if len(y_pred.shape) > 1 else y_pred
        actual = tf.squeeze(y_true, axis=-1) if len(y_true.shape) > 1 else y_true
        strategy_ret = signal * actual

        # Sharpe component (primary objective)
        mean_r = tf.reduce_mean(strategy_ret)
        std_r = tf.math.reduce_std(strategy_ret) + 1e-8
        sharpe = tf.sqrt(252.0) * mean_r / std_r

        # Volatility consistency: std of rolling vol shouldn't be too high
        # Approximation: variance of squared returns (proxy for vol-of-vol)
        sq_ret = strategy_ret ** 2
        vol_consistency = tf.math.reduce_std(sq_ret) / (tf.reduce_mean(sq_ret) + 1e-8)

        # Drawdown penalty: compute max drawdown from cumulative returns
        cum_ret = tf.cumsum(strategy_ret)
        running_max = tf.scan(lambda a, x: tf.maximum(a, x), cum_ret, initializer=cum_ret[0])
        drawdowns = cum_ret - running_max
        max_dd = -tf.reduce_min(drawdowns)

        # Combined loss (minimize)
        loss = -sharpe + self.alpha * vol_consistency + self.beta * max_dd
        return loss

    def get_config(self):
        config = super().get_config()
        config.update({'alpha': self.alpha, 'beta': self.beta})
        return config


In [10]:
# ============================================================================
# RL Thompson Sampling Meta-Learner
# ============================================================================
# Inspired by RL-book (Rao & Jelvis, Stanford CME 241) Thompson Sampling
# implementation in rl/chapter14/ts_gaussian.py.
#
# The meta-learner treats each signal source (TFT, Momentum, MR) as a
# "bandit arm" and uses Thompson Sampling to learn which signal performs
# best in the CURRENT regime. The posterior distribution of each arm's
# reward (Sharpe) is updated online as new data arrives.
#
# Key insight: different strategies work in different regimes.
# Thompson Sampling automatically discovers this without manual rules.
# ============================================================================

class ThompsonSamplingMetaLearner:
    """Thompson Sampling strategy selector with regime conditioning.

    Each strategy is modeled as a Gaussian bandit arm with unknown mean
    and known variance. Posterior is updated via conjugate Normal-Normal:

        Prior:     mu ~ N(mu_0, sigma_0^2)
        Likelihood: X | mu ~ N(mu, sigma_obs^2)
        Posterior:  mu | X ~ N(mu_n, sigma_n^2)

    where:
        sigma_n^2 = 1 / (1/sigma_0^2 + n/sigma_obs^2)
        mu_n = sigma_n^2 * (mu_0/sigma_0^2 + sum(X)/sigma_obs^2)

    Selection: sample from each posterior, pick highest sample.

    When regime is provided, we maintain SEPARATE posteriors per regime.
    """

    def __init__(self, strategy_names: List[str],
                 n_regimes: int = 3,
                 prior_mean: float = 0.0,
                 prior_std: float = 1.0,
                 obs_std: float = 0.1):
        """
        Args:
            strategy_names: names of strategies (bandit arms)
            n_regimes: number of regime states
            prior_mean: prior mean for each arm
            prior_std: prior standard deviation
            obs_std: assumed observation noise (daily Sharpe scale)
        """
        self.names = strategy_names
        self.n_arms = len(strategy_names)
        self.n_regimes = n_regimes
        self.obs_var = obs_std ** 2

        # Per-regime, per-arm posteriors: (mean, variance, count)
        self.posteriors = {}
        for r in range(n_regimes):
            self.posteriors[r] = {
                name: {
                    'mean': prior_mean,
                    'var': prior_std ** 2,
                    'n': 0,
                    'sum_rewards': 0.0,
                }
                for name in strategy_names
            }

    def select(self, regime: int = 0,
               rng: Optional[np.random.Generator] = None) -> str:
        """Thompson Sampling: sample from posteriors, pick best arm.

        Args:
            regime: current regime label (0, 1, or 2)
            rng: random number generator

        Returns: name of selected strategy
        """
        if rng is None:
            rng = np.random.default_rng()

        regime = int(np.clip(regime, 0, self.n_regimes - 1))

        samples = {}
        for name in self.names:
            post = self.posteriors[regime][name]
            # Sample from posterior N(mean, var)
            sample = rng.normal(post['mean'], np.sqrt(post['var'] + 1e-10))
            samples[name] = sample

        return max(samples, key=samples.get)

    def get_weights(self, regime: int = 0, n_samples: int = 1000,
                    rng: Optional[np.random.Generator] = None) -> Dict[str, float]:
        """Get selection probabilities by Monte Carlo sampling.

        Returns: {strategy_name: probability_of_selection}
        """
        if rng is None:
            rng = np.random.default_rng(42)

        regime = int(np.clip(regime, 0, self.n_regimes - 1))
        counts = {name: 0 for name in self.names}

        for _ in range(n_samples):
            best = self.select(regime, rng)
            counts[best] += 1

        total = sum(counts.values())
        return {name: count / total for name, count in counts.items()}

    def update(self, strategy_name: str, reward: float, regime: int = 0):
        """Update posterior after observing reward.

        Conjugate Normal-Normal update:
            new_var = 1 / (1/prior_var + 1/obs_var)
            new_mean = new_var * (prior_mean/prior_var + reward/obs_var)
        """
        regime = int(np.clip(regime, 0, self.n_regimes - 1))
        post = self.posteriors[regime][strategy_name]

        post['n'] += 1
        post['sum_rewards'] += reward

        # Conjugate update
        prior_prec = 1.0 / (post['var'] + 1e-10)
        obs_prec = 1.0 / self.obs_var

        new_prec = prior_prec + obs_prec
        new_var = 1.0 / new_prec
        new_mean = new_var * (post['mean'] * prior_prec + reward * obs_prec)

        post['mean'] = new_mean
        post['var'] = new_var

    def summary(self) -> str:
        """Print posterior summary per regime."""
        lines = ["Thompson Sampling Meta-Learner Summary:"]
        regime_labels = {0: 'Bear', 1: 'Sideways', 2: 'Bull'}
        for r in range(self.n_regimes):
            lines.append(f"\n  Regime {r} ({regime_labels.get(r, '?')}):")
            for name in self.names:
                p = self.posteriors[r][name]
                lines.append(
                    f"    {name:12s}: mean={p['mean']:+.4f}, "
                    f"std={np.sqrt(p['var']):.4f}, n={p['n']}"
                )
        return '\n'.join(lines)


def run_thompson_ensemble(oos_signals_dict: Dict[str, np.ndarray],
                          oos_returns: np.ndarray,
                          regime_labels: np.ndarray,
                          strategy_names: List[str],
                          bps_cost: float = 0.001,
                          warmup: int = 21) -> Tuple[np.ndarray, ThompsonSamplingMetaLearner]:
    """Run Thompson Sampling ensemble over OOS period.

    For each day:
      1. Observe current regime
      2. Thompson Sampling selects best strategy for this regime
      3. Use selected strategy's signal
      4. Observe reward, update posterior

    Args:
        oos_signals_dict: {strategy_name: signal_array}
        oos_returns: actual forward returns
        regime_labels: regime label for each day (0/1/2)
        strategy_names: list of strategy names
        bps_cost: transaction cost per position change
        warmup: days of equal-weight warmup before Thompson kicks in

    Returns: (ensemble_returns, meta_learner)
    """
    n_days = len(oos_returns)
    meta = ThompsonSamplingMetaLearner(strategy_names, n_regimes=3)
    rng = np.random.default_rng(42)

    ens_returns = np.zeros(n_days)
    selected_strategies = []
    prev_pos = 0.0

    for t in range(n_days):
        regime = int(regime_labels[t]) if not np.isnan(regime_labels[t]) else 1

        if t < warmup:
            # Equal weight during warmup
            signal = np.mean([
                np.sign(oos_signals_dict[s][t])
                for s in strategy_names
            ])
            position = np.sign(signal)
        else:
            # Thompson Sampling selection
            selected = meta.select(regime, rng)
            position = np.sign(oos_signals_dict[selected][t])
            selected_strategies.append(selected)

        # Compute return with costs
        strat_ret = position * oos_returns[t]
        cost = abs(position - prev_pos) * bps_cost
        ens_returns[t] = strat_ret - cost
        prev_pos = position

        # Update ALL strategies with their hypothetical reward
        for s in strategy_names:
            s_pos = np.sign(oos_signals_dict[s][t])
            s_ret = s_pos * oos_returns[t]
            meta.update(s, s_ret, regime)

    return ens_returns, meta


In [11]:
# 6. TRAINING & WALK-FORWARD VALIDATION ENGINE

def expanding_normalize(train_df, test_df, feature_cols):
    """Normalize test data using ONLY training statistics (no look-ahead).

    Args:
        train_df: DataFrame with training data
        test_df: DataFrame with test data
        feature_cols: list of feature column names to normalize

    Returns: (train_normalized, test_normalized) DataFrames
    """
    means = train_df[feature_cols].mean()
    stds = train_df[feature_cols].std()
    stds = stds.replace(0, 1.0)  # prevent division by zero
    train_norm = train_df.copy()
    test_norm = test_df.copy()
    train_norm[feature_cols] = (train_df[feature_cols] - means) / stds
    test_norm[feature_cols] = (test_df[feature_cols] - means) / stds
    return train_norm, test_norm


def make_sequences(feature_array, target_array, window_size):
    """Create (window, n_features) sequences with proper target alignment.

    Args:
        feature_array: np.ndarray shape (n, n_feat)
        target_array: np.ndarray shape (n,) -- target_ret (forward return)
        window_size: int W

    Returns:
        X: (n_valid, W, n_feat) float32
        y: (n_valid,) float32
        indices: (n_valid,) int -- row indices in the original array for each
                 sequence's "current time" (i.e., the last row of each window).
                 Use these to recover dates: df.index[indices]

    Alignment:
        X[i] = features[i:i+W] -- "current time" is i+W-1
        y[i] = target[i+W-1]   -- forward return at that time
        indices[i] = i+W-1
    """
    n = len(feature_array)
    n_seq = n - window_size
    if n_seq <= 0:
        return (np.zeros((0, window_size, feature_array.shape[1]), dtype=np.float32),
                np.zeros(0, dtype=np.float32),
                np.array([], dtype=np.int64))

    X = np.array([feature_array[i:i + window_size] for i in range(n_seq)])
    y = target_array[window_size - 1:window_size - 1 + n_seq]
    indices = np.arange(window_size - 1, window_size - 1 + n_seq, dtype=np.int64)

    # Remove NaN targets
    valid = ~np.isnan(y)
    return X[valid].astype(np.float32), y[valid].astype(np.float32), indices[valid]


def _predict_safe(model, X, desc=None):
    """Run model inference one sample at a time.

    Keras 3 (TF 2.20) bakes the batch dimension from the first model() call
    into the traced graph. Since MomentumTransformer is built with batch=1
    (dummy forward pass), subsequent calls with batch>1 collapse the batch
    dimension. Processing one-at-a-time matches the build shape and is
    guaranteed correct. Cost: ~5ms per sample, negligible vs training time.

    Args:
        model: trained MomentumTransformer
        X: np.ndarray shape (n_samples, window, n_features)
        desc: optional tqdm description

    Returns:
        np.ndarray shape (n_samples,) -- signal at last timestep for each sequence
    """
    n = len(X)
    preds = np.empty(n, dtype=np.float32)
    for i in range(n):
        out = model(tf.constant(X[i:i+1]), training=False)
        preds[i] = float(out[0, -1, 0])
    return preds


def walk_forward_train(data_dict, cfg):
    """Universe-mode walk-forward OOS validation with purge gaps.

    Trains ONE model per fold on pooled sequences from ALL tickers,
    then predicts OOS for each ticker separately. This lets the model
    learn cross-asset patterns and benefit from a larger training set.

    Fold schedule is date-based: uses the SHORTEST ticker to define
    fold boundaries, ensuring all tickers have data for every fold.

    Args:
        data_dict: {ticker_name: pd.DataFrame with FEATURE_COLUMNS + 'target_ret'}
        cfg: MonolithConfig

    Returns: dict with:
        'oos_returns': pd.DataFrame (index=dates, columns=tickers)
        'oos_signals': pd.DataFrame (same shape)
        'fold_metrics': list of dicts {ticker, fold, sharpe, n_days, train_days}
        'models': {fold_number: last trained model}
        'vsn_weights': {'universe': np.ndarray of feature importance}
    """
    tickers = list(data_dict.keys())
    n_feat = len(FEATURE_COLUMNS)

    # Use the shortest ticker to define fold boundaries
    min_len = min(len(df) for df in data_dict.values())

    # Pre-compute number of folds for progress bar
    n_folds_est = 0
    _ts = cfg.min_train_days
    while _ts + cfg.window_size < min_len:
        _te = _ts - cfg.purge_gap
        if _te >= cfg.window_size + 10:
            n_folds_est += 1
        _ts += cfg.test_days

    print(f"\n{'=' * 60}")
    print(f"  Universe Walk-Forward: {len(tickers)} tickers, ~{n_folds_est} folds")
    print(f"  Tickers: {tickers}")
    print(f"  Shortest series: {min_len} days")
    print(f"  Est. time: ~{n_folds_est * len(tickers) * 10 // 60} min "
          f"({n_folds_est} folds x {len(tickers)} tickers)")
    print(f"{'=' * 60}")

    all_oos_returns = {t: [] for t in tickers}
    all_oos_signals = {t: [] for t in tickers}
    all_fold_metrics = []
    all_models = {}
    latest_vsn_w = None

    fold = 0
    test_start = cfg.min_train_days
    fold_pbar = tqdm(total=n_folds_est, desc="Walk-Forward", unit="fold",
                     bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} '
                                '[{elapsed}<{remaining}, {rate_fmt}]')

    while test_start + cfg.window_size < min_len:
        fold += 1
        train_end = test_start - cfg.purge_gap
        test_end = min(test_start + cfg.test_days, min_len)

        if train_end < cfg.window_size + 10:
            test_start += cfg.test_days
            continue

        print(f"\n  --- Fold {fold} (train: 0-{train_end}, "
              f"purge: {train_end}-{test_start}, "
              f"test: {test_start}-{test_end}) ---")

        # ── Gather train/test data from ALL tickers ──
        X_train_all = []
        y_train_all = []
        # {ticker: (X_test, y_test, valid_indices, test_df)}
        ticker_test_data = {}

        for ticker in tickers:
            df = data_dict[ticker]

            train_df = df.iloc[:train_end].copy()
            test_df = df.iloc[test_start:test_end].copy()

            # Purge: NaN-out last purge_gap+1 training targets (both raw and vol-scaled)
            if cfg.purge_gap > 0 and len(train_df) > cfg.purge_gap + 1:
                train_df.iloc[-(cfg.purge_gap + 1):,
                              train_df.columns.get_loc('target_ret')] = np.nan
                train_df.iloc[-(cfg.purge_gap + 1):,
                              train_df.columns.get_loc('target_train')] = np.nan

            # Normalize each ticker with its OWN training stats (no cross-ticker leakage)
            train_norm, test_norm = expanding_normalize(train_df, test_df, FEATURE_COLUMNS)

            # Training sequences — use vol-scaled target for better gradient signal
            X_tr, y_tr, _ = make_sequences(
                train_norm[FEATURE_COLUMNS].values,
                train_norm['target_train'].values,
                cfg.window_size
            )
            if len(X_tr) > 0:
                X_train_all.append(X_tr)
                y_train_all.append(y_tr)
                print(f"    {ticker}: train_rows={len(train_df)}, "
                      f"train_seqs={len(X_tr)}, "
                      f"target_NaN={int(train_df['target_train'].isna().sum())}")

            # Test sequences — use vol-scaled for alignment, raw returns for OOS eval
            X_te, _, te_idx = make_sequences(
                test_norm[FEATURE_COLUMNS].values,
                test_norm['target_train'].values,
                cfg.window_size
            )
            # Raw forward returns at the same valid indices (for OOS evaluation)
            y_te_raw = test_df['target_ret'].values[te_idx] if len(te_idx) > 0 else np.array([])
            print(f"    {ticker}: test_rows={len(test_df)}, "
                  f"test_seqs={len(X_te)}, "
                  f"target_NaN={int(test_df['target_train'].isna().sum())}")
            if len(X_te) > 0:
                ticker_test_data[ticker] = (X_te, y_te_raw, te_idx, test_df)

        # Combine all tickers' training data into one pool
        if not X_train_all:
            print(f"  Fold {fold}: no training data, skipping")
            test_start += cfg.test_days
            continue

        X_train = np.concatenate(X_train_all, axis=0)
        y_train = np.concatenate(y_train_all, axis=0)

        # Shuffle the pooled training data (sequences from different tickers
        # are interleaved — this prevents the model from memorizing ticker order)
        shuffle_idx = np.random.permutation(len(X_train))
        X_train = X_train[shuffle_idx]
        y_train = y_train[shuffle_idx]

        total_test = sum(len(v[0]) for v in ticker_test_data.values())
        if len(X_train) < 50 or total_test < 5:
            print(f"  Fold {fold}: insufficient data "
                  f"(train={len(X_train)}, test={total_test}), skipping")
            test_start += cfg.test_days
            continue

        print(f"    UNIVERSE POOL: {len(X_train)} train sequences "
              f"from {len(tickers)} tickers")

        # ── Build fresh universe model ──
        model = MomentumTransformer(
            cfg.window_size, n_feat, 1,
            cfg.hidden_size, cfg.num_heads, cfg.dropout_rate
        )
        _ = model(np.zeros((1, cfg.window_size, n_feat), dtype=np.float32))
        model.compile(
            optimizer=keras.optimizers.Adam(cfg.learning_rate, clipnorm=cfg.clipnorm),
            loss=SharpeLoss()
        )

        # ── Train on pooled universe data ──
        t0 = time.time()
        model.fit(
            X_train, y_train,
            epochs=cfg.epochs,
            batch_size=cfg.batch_size,
            validation_split=0.15,
            callbacks=[
                keras.callbacks.EarlyStopping(
                    monitor='val_loss', patience=cfg.early_stop_patience,
                    restore_best_weights=True, verbose=0
                ),
                keras.callbacks.ReduceLROnPlateau(
                    monitor='val_loss', factor=cfg.lr_reduce_factor,
                    patience=cfg.lr_reduce_patience, min_lr=cfg.min_lr, verbose=0
                )
            ],
            verbose=0
        )
        train_time = time.time() - t0

        # ── Predict OOS per-ticker ──
        for ticker, (X_test, y_test, valid_idx, test_df) in ticker_test_data.items():
            n_test = len(X_test)

            # Predict one-at-a-time (Keras 3 bakes batch dim from build step;
            # model was built with batch=1, so batch>1 collapses output)
            preds = _predict_safe(model, X_test)

            assert len(preds) == n_test == len(y_test), (
                f"Shape mismatch: preds={len(preds)}, X_test={n_test}, "
                f"y_test={len(y_test)}")

            # Build date index from make_sequences' valid_idx
            test_pred_idx = test_df.index[valid_idx]

            # Use y_test directly (already aligned by make_sequences)
            all_oos_returns[ticker].append(
                pd.Series(y_test, index=test_pred_idx, name=ticker))
            all_oos_signals[ticker].append(
                pd.Series(preds, index=test_pred_idx, name=ticker))

            # Per-ticker fold metrics
            fold_positions = np.sign(preds)
            fold_strat_ret = fold_positions * y_test
            fold_strat_ret = fold_strat_ret[~np.isnan(fold_strat_ret)]
            if len(fold_strat_ret) > 1:
                fold_sharpe = (np.mean(fold_strat_ret)
                               / (np.std(fold_strat_ret, ddof=1) + 1e-9)
                               * np.sqrt(252))
            else:
                fold_sharpe = 0.0

            all_fold_metrics.append({
                'ticker': ticker, 'fold': fold,
                'sharpe': round(fold_sharpe, 2),
                'n_days': len(preds),
                'train_days': len(X_train),
                'train_time': round(train_time / len(tickers), 1)
            })

            print(f"    {ticker}: test={len(preds):3d} Sharpe={fold_sharpe:+.2f}")

        # Extract VSN weights from the universe model (one sample, matching batch=1 build)
        first_ticker_X = list(ticker_test_data.values())[0][0]
        _, weights_dict = model(tf.constant(first_ticker_X[:1]), return_weights=True)
        latest_vsn_w = weights_dict['vsn_weights'].numpy().mean(axis=(0, 1)).flatten()

        all_models[fold] = model

        # Update progress bar with fold summary
        fold_sharpes = [m['sharpe'] for m in all_fold_metrics
                        if m['fold'] == fold]
        avg_s = np.mean(fold_sharpes) if fold_sharpes else 0.0
        fold_pbar.set_postfix_str(
            f"train={train_time:.0f}s, avg_sharpe={avg_s:+.2f}")
        fold_pbar.update(1)

        test_start += cfg.test_days

        # Memory cleanup
        del X_train, y_train, X_train_all, y_train_all, ticker_test_data
        gc.collect()

    fold_pbar.close()

    # ── Assemble per-ticker OOS results ──
    oos_returns_dict = {}
    oos_signals_dict = {}
    for ticker in tickers:
        if all_oos_returns[ticker]:
            combined_ret = pd.concat(all_oos_returns[ticker])
            combined_sig = pd.concat(all_oos_signals[ticker])
            oos_returns_dict[ticker] = combined_ret[~combined_ret.index.duplicated(keep='first')]
            oos_signals_dict[ticker] = combined_sig[~combined_sig.index.duplicated(keep='first')]

    # VSN weights: store under 'universe' key and duplicate per-ticker for viz compatibility
    vsn_weights = {}
    if latest_vsn_w is not None:
        for ticker in tickers:
            vsn_weights[ticker] = latest_vsn_w

    return {
        'oos_returns': pd.DataFrame(oos_returns_dict),
        'oos_signals': pd.DataFrame(oos_signals_dict),
        'fold_metrics': all_fold_metrics,
        'models': all_models,
        'vsn_weights': vsn_weights,
    }

In [12]:
# 7. ORCHESTRATION, METRICS & VISUALIZATION

def calculate_metrics(returns, costs=0.0):
    """Full metrics suite for strategy returns.
    
    Args:
        returns: np.ndarray or pd.Series of daily returns (simple, not log)
        costs: total transaction costs already deducted (for reporting only)
    
    Returns: dict with:
        total_return, annual_return, sharpe, sortino, calmar, max_dd,
        win_rate, profit_factor, n_trades, avg_hold_days
    """
    r = np.asarray(returns, dtype=np.float64)
    r = r[~np.isnan(r)]
    
    n = len(r)
    if n < 2:
        return {
            'total_return': 0.0, 'annual_return': 0.0, 'sharpe': 0.0,
            'sortino': 0.0, 'calmar': 0.0, 'max_dd': 0.0,
            'win_rate': 0.0, 'profit_factor': 0.0, 'n_days': 0,
        }
    
    # Total and annualized return (simple compounding)
    equity = np.cumprod(1.0 + r)
    total_return = float(equity[-1] - 1.0)
    n_years = n / 252.0
    if n_years > 0 and equity[-1] > 0:
        annual_return = float(equity[-1] ** (1.0 / n_years) - 1.0)
    else:
        annual_return = 0.0
    
    # Sharpe ratio: sqrt(252) * mean / std(ddof=1)
    mean_r = np.mean(r)
    std_r = np.std(r, ddof=1)
    sharpe = float(np.sqrt(252) * mean_r / std_r) if std_r > 1e-9 else 0.0
    
    # Sortino ratio: sqrt(252) * mean / downside_std(ddof=1)
    downside = r[r < 0]
    if len(downside) > 1:
        downside_std = np.std(downside, ddof=1)
        sortino = float(np.sqrt(252) * mean_r / downside_std) if downside_std > 1e-9 else 0.0
    else:
        sortino = 0.0
    
    # Maximum drawdown from equity curve
    peak = np.maximum.accumulate(equity)
    dd_series = (equity - peak) / peak
    max_dd = float(np.min(dd_series))
    
    # Calmar ratio: annual_return / |max_dd|
    calmar = float(annual_return / abs(max_dd)) if abs(max_dd) > 1e-9 else 0.0
    
    # Win rate: fraction of positive return days
    win_rate = float(np.mean(r > 0))
    
    # Profit factor: sum(gains) / |sum(losses)|
    gains = r[r > 0]
    losses = r[r < 0]
    sum_gains = float(np.sum(gains)) if len(gains) > 0 else 0.0
    sum_losses = float(np.abs(np.sum(losses))) if len(losses) > 0 else 0.0
    profit_factor = float(sum_gains / sum_losses) if sum_losses > 1e-12 else 0.0
    
    return {
        'total_return': total_return,
        'annual_return': annual_return,
        'sharpe': sharpe,
        'sortino': sortino,
        'calmar': calmar,
        'max_dd': max_dd,
        'win_rate': win_rate,
        'profit_factor': profit_factor,
        'n_days': n,
    }


def run_enhanced_lab(cfg=None):
    """Main orchestrator: Data -> Features -> Walk-Forward -> Backtest -> Visualize.
    
    This is the primary entry point for the notebook.
    """
    if cfg is None:
        cfg = CFG
    
    tickers_to_run = [cfg.tickers[0]] if cfg.quick_mode else cfg.tickers
    
    # ── PHASE 1: DATA ACQUISITION ──
    print("=" * 70)
    print("  PHASE 1: DATA ACQUISITION")
    print("=" * 70)
    auth = KiteAuth()
    kite = auth.get_session()
    if not kite:
        raise RuntimeError("Kite authentication failed. Check .env credentials.")

    fetcher = KiteFetcher(kite)
    raw_data = {}
    for ticker in tqdm(tickers_to_run, desc="Fetching data", unit="ticker"):
        exchange = cfg.exchanges.get(ticker, 'NSE')
        tqdm.write(f"  {ticker} ({exchange})...")
        try:
            raw_data[ticker] = fetcher.fetch_daily(ticker, exchange, cfg.lookback_days)
            tqdm.write(f"    -> {len(raw_data[ticker])} days "
                       f"({raw_data[ticker].index[0].date()} to "
                       f"{raw_data[ticker].index[-1].date()})")
        except Exception as e:
            tqdm.write(f"    SKIP: {ticker} failed ({e})")

    if not raw_data:
        raise RuntimeError("No tickers fetched successfully.")

    # ── PHASE 1.5: CROSS-ASSET DATA (News + VIX + World Monitor) ──
    print("\n" + "=" * 70)
    print("  PHASE 1.5: CROSS-ASSET DATA (News Sentiment + India VIX)")
    print("=" * 70)

    # Build union of all trading dates for alignment
    all_dates = pd.DatetimeIndex([])
    for df in raw_data.values():
        all_dates = all_dates.union(df.index)
    all_dates = all_dates.sort_values()

    cross_asset_df = fetch_cross_asset_features(
        start_date=all_dates[0].strftime('%Y-%m-%d'),
        end_date=all_dates[-1].strftime('%Y-%m-%d'),
        date_index=all_dates,
        kite=kite,
    )

    # ── World Monitor Intelligence ──
    print("\n  [World Monitor] Macro Regime Signals...")
    try:
        macro_df = fetch_yahoo_macro(
            all_dates[0].strftime('%Y-%m-%d'),
            all_dates[-1].strftime('%Y-%m-%d'),
        )
        if not macro_df.empty:
            macro_df = macro_df.reindex(all_dates).ffill(limit=3)
            macro_df['macro_composite'] = compute_macro_composite(macro_df)
            for col in MACRO_COLUMNS:
                if col in macro_df.columns:
                    cross_asset_df[col] = macro_df[col]
                    if col not in CROSS_ASSET_COLUMNS:
                        CROSS_ASSET_COLUMNS.append(col)
    except Exception as e:
        logger.warning(f"Macro signals failed: {e}")
        print(f"    SKIP: Macro signals failed ({e})")

    print("  [World Monitor] Fear & Greed Index...")
    try:
        fg_df = fetch_fear_greed()
        if not fg_df.empty:
            fg_df = fg_df.reindex(all_dates).ffill(limit=3)
            cross_asset_df['macro_fear_greed_z'] = fg_df['macro_fear_greed_z']
            if 'macro_fear_greed_z' not in CROSS_ASSET_COLUMNS:
                CROSS_ASSET_COLUMNS.append('macro_fear_greed_z')
    except Exception as e:
        logger.warning(f"Fear & Greed failed: {e}")

    # ── PHASE 2: FEATURE ENGINEERING ──
    print("\n" + "=" * 70)
    print(f"  PHASE 2: FEATURE ENGINEERING ({len(FEATURE_COLUMNS)} features)")
    print("=" * 70)
    featured_data = {}
    min_required = cfg.min_train_days + 2 * cfg.test_days
    for ticker, df in tqdm(raw_data.items(), desc="Feature engineering",
                           total=len(raw_data), unit="ticker"):
        try:
            t0 = time.time()
            feat_df = build_all_features(df, cfg)
            elapsed = time.time() - t0
            if len(feat_df) < min_required:
                tqdm.write(f"  SKIP: {ticker} has {len(feat_df)} featured days "
                           f"(need {min_required})")
                continue
            # Merge cross-asset features (same for all tickers)
            if not cross_asset_df.empty:
                feat_df = feat_df.join(cross_asset_df, how='left')
                for col in CROSS_ASSET_COLUMNS:
                    if col in feat_df.columns:
                        feat_df[col] = feat_df[col].ffill(limit=3).fillna(0.0)
            # Build advanced features (Groups 15-19)
            adv_df = build_advanced_features(df, feat_df['close'] if 'close' in feat_df.columns else df['close'], cfg)
            feat_df = feat_df.join(adv_df, how='left')
            for col in ADVANCED_FEATURE_COLUMNS:
                if col in feat_df.columns:
                    feat_df[col] = feat_df[col].ffill(limit=3).fillna(0.0)

            # Welford anomaly features (from raw OHLCV)
            welf_df = compute_welford_anomalies(df)
            welf_df = welf_df.reindex(feat_df.index)
            feat_df = feat_df.join(welf_df, how='left')
            for col in WELFORD_COLUMNS:
                if col in feat_df.columns:
                    feat_df[col] = feat_df[col].fillna(0.0)

            featured_data[ticker] = feat_df
            n_total = len(FEATURE_COLUMNS)
            tqdm.write(f"  {ticker}: {len(feat_df)} days, "
                       f"{n_total} base + advanced features ({elapsed:.1f}s)")
        except Exception as e:
            tqdm.write(f"  SKIP: {ticker} feature build failed ({e})")

    if not featured_data:
        raise RuntimeError("No tickers survived feature engineering.")

    # Extend FEATURE_COLUMNS with cross-asset + advanced + welford features
    for col_list in [CROSS_ASSET_COLUMNS, ADVANCED_FEATURE_COLUMNS, WELFORD_COLUMNS]:
        for col in col_list:
            if col not in FEATURE_COLUMNS:
                FEATURE_COLUMNS.append(col)

    n_cross = len(CROSS_ASSET_COLUMNS)
    n_adv = len(ADVANCED_FEATURE_COLUMNS)
    n_welf = len(WELFORD_COLUMNS)
    print(f"\n  Features extended: {len(FEATURE_COLUMNS)} total")
    print(f"    Cross-asset: +{n_cross} ({CROSS_ASSET_COLUMNS})")
    print(f"    Advanced:    +{n_adv} (HMM, Wavelet, InfoTheory, MF-DFA, TDA)")
    print(f"    Welford:     +{n_welf} (Anomaly detection)")
    print(f"    Macro:       +{len(MACRO_COLUMNS)} (World Monitor)")

    print(f"\n  Universe: {len(featured_data)} tickers survived "
          f"(min {min_required} featured days required)")
    
    # ── PHASE 3: WALK-FORWARD TRAINING ──
    print("\n" + "=" * 70)
    print("  PHASE 3: WALK-FORWARD OOS VALIDATION")
    print(f"  Train: {cfg.min_train_days}d | Test: {cfg.test_days}d | "
          f"Purge: {cfg.purge_gap}d | Window: {cfg.window_size}d")
    print("=" * 70)
    results = walk_forward_train(featured_data, cfg)
    
    # ── PHASE 4: STRATEGY CONSTRUCTION & METRICS ──
    print("\n" + "=" * 70)
    print("  PHASE 4: OOS STRATEGY METRICS")
    print("=" * 70)
    
    oos_returns = results['oos_returns']
    oos_signals = results['oos_signals']
    
    if oos_returns.empty:
        print("  WARNING: No OOS data produced. Check data length vs min_train_days.")
        return {'raw_data': raw_data, 'featured_data': featured_data,
                'results': results, 'all_metrics': {}, 'cfg': cfg}
    
    all_metrics = {}
    strat_net_returns = {}  # store for plotting
    
    for ticker in oos_returns.columns:
        ret = oos_returns[ticker].dropna()
        sig = oos_signals[ticker].dropna()
        
        # Align on common dates
        common_idx = ret.index.intersection(sig.index)
        ret = ret.loc[common_idx]
        sig = sig.loc[common_idx]
        
        if len(ret) < 5:
            print(f"  {ticker}: insufficient OOS data ({len(ret)} days), skipping metrics")
            continue
        
        # Strategy returns: position = sign(signal), NOT raw magnitude
        position = np.sign(sig.values)
        strat_ret = position * ret.values
        
        # Transaction costs: deduct |delta_position| * bps_cost
        pos_change = np.abs(np.diff(position, prepend=0))
        costs_arr = pos_change * cfg.bps_cost
        net_ret = strat_ret - costs_arr
        
        metrics = calculate_metrics(net_ret, costs=costs_arr.sum())
        metrics['ticker'] = ticker
        metrics['total_costs'] = float(costs_arr.sum())
        all_metrics[ticker] = metrics
        strat_net_returns[ticker] = pd.Series(net_ret, index=common_idx)
        
        print(f"\n  {ticker} (OOS: {ret.index[0].date()} to "
              f"{ret.index[-1].date()}, {len(ret)} days):")
        print(f"    Sharpe:        {metrics['sharpe']:+.2f}")
        print(f"    Total Return:  {metrics['total_return']:+.2%}")
        print(f"    Annual Return: {metrics['annual_return']:+.2%}")
        print(f"    Max Drawdown:  {metrics['max_dd']:.2%}")
        print(f"    Sortino:       {metrics['sortino']:+.2f}")
        print(f"    Calmar:        {metrics['calmar']:+.2f}")
        print(f"    Win Rate:      {metrics['win_rate']:.1%}")
        print(f"    Profit Factor: {metrics['profit_factor']:.2f}")
        print(f"    Total Costs:   {metrics['total_costs']:.4f}")
    
    # ── PHASE 5: FOLD-BY-FOLD SUMMARY ──
    print("\n" + "=" * 70)
    print("  FOLD-BY-FOLD BREAKDOWN")
    print("=" * 70)
    fold_df = pd.DataFrame(results['fold_metrics'])
    if not fold_df.empty:
        print(fold_df.to_string(index=False))
        
        avg_sharpe = fold_df.groupby('ticker')['sharpe'].mean()
        std_sharpe = fold_df.groupby('ticker')['sharpe'].std(ddof=1)
        n_folds = fold_df.groupby('ticker')['sharpe'].count()
        print(f"\n  Average OOS Sharpe per ticker:")
        for t in avg_sharpe.index:
            se = std_sharpe[t] / np.sqrt(n_folds[t]) if n_folds[t] > 1 else 0.0
            print(f"    {t}: {avg_sharpe[t]:+.2f} +/- {std_sharpe[t]:.2f} "
                  f"(SE={se:.2f}, {int(n_folds[t])} folds)")
    
    # ── PHASE 5.5: ENSEMBLE (TFT + Momentum + Mean Reversion) ──
    print("\n" + "=" * 70)
    print("  PHASE 5.5: ENSEMBLE (RL Thompson Sampling Meta-Learner)")
    print("=" * 70)

    ensemble_metrics = {}
    ensemble_net_returns = {}
    for ticker in oos_returns.columns:
        ret = oos_returns[ticker].dropna()
        sig = oos_signals[ticker].dropna()
        common_idx = ret.index.intersection(sig.index)

        if len(common_idx) < 10 or ticker not in featured_data:
            continue

        df_feat = featured_data[ticker]
        feat_idx = common_idx.intersection(df_feat.index)
        if len(feat_idx) < 10:
            continue

        ret_e = ret.loc[feat_idx].values
        sig_e = sig.loc[feat_idx].values

        # Three signals
        tft_pos = np.sign(sig_e)
        mom_pos = np.sign(df_feat.loc[feat_idx, 'norm_ret_21d'].values)
        mr_pos = -np.sign(df_feat.loc[feat_idx, 'mr_zscore'].values)

        # Majority vote: sign of sum (2-of-3 or 3-of-3 agree)
        ens_signal = tft_pos + mom_pos + mr_pos
        ens_pos = np.sign(ens_signal)

        # Ensemble returns with costs
        ens_strat_ret = ens_pos * ret_e
        pos_change = np.abs(np.diff(ens_pos, prepend=0))
        costs = pos_change * cfg.bps_cost
        ens_net = ens_strat_ret - costs

        ens_m = calculate_metrics(ens_net, costs=costs.sum())
        ensemble_metrics[ticker] = ens_m
        ensemble_net_returns[ticker] = pd.Series(ens_net, index=feat_idx)

        # Individual signal Sharpes (gross) for comparison
        tft_sr = (float(np.mean(tft_pos * ret_e))
                  / (float(np.std(tft_pos * ret_e, ddof=1)) + 1e-9)
                  * np.sqrt(252))
        mom_sr = (float(np.mean(mom_pos * ret_e))
                  / (float(np.std(mom_pos * ret_e, ddof=1)) + 1e-9)
                  * np.sqrt(252))
        mr_sr = (float(np.mean(mr_pos * ret_e))
                 / (float(np.std(mr_pos * ret_e, ddof=1)) + 1e-9)
                 * np.sqrt(252))

        # Signal agreement rate
        agree_all = float(np.mean((tft_pos == mom_pos) & (mom_pos == mr_pos)))
        agree_2of3 = float(np.mean(np.abs(ens_signal) >= 2))

        print(f"\n  {ticker} ({len(ret_e)} days):")
        print(f"    TFT Sharpe:        {tft_sr:+.2f}")
        print(f"    Momentum Sharpe:   {mom_sr:+.2f}")
        print(f"    MR Sharpe:         {mr_sr:+.2f}")
        print(f"    Ensemble Sharpe:   {ens_m['sharpe']:+.2f}  (net of costs)")
        print(f"    Ensemble Return:   {ens_m['total_return']:+.2%}")
        print(f"    Ensemble MaxDD:    {ens_m['max_dd']:.2%}")
        print(f"    Agreement (3/3):   {agree_all:.1%}")
        print(f"    Agreement (2+/3):  {agree_2of3:.1%}")

    # ── PHASE 6: EDGE AUDIT ──
    print("\n" + "=" * 70)
    print("  PHASE 6: EDGE AUDIT — Is the alpha real?")
    print("=" * 70)

    print("\n  Target definition:")
    print("    target_ret[t] = (close[t+1] - close[t]) / close[t]  [1-day forward return]")
    print("    signal[t] from features[t-W+1 : t]  [causal window, W=21]")
    print("    strat_ret[t] = sign(signal[t]) * target_ret[t]  [trade at close of day t]")

    for ticker in oos_returns.columns:
        ret = oos_returns[ticker].dropna()
        sig = oos_signals[ticker].dropna()
        common_idx = ret.index.intersection(sig.index)
        ret_a = ret.loc[common_idx]
        sig_a = sig.loc[common_idx]

        if len(ret_a) < 10:
            continue

        r = ret_a.values
        position = np.sign(sig_a.values)
        strat_ret_gross = position * r

        # Turnover stats
        pos_changes = np.abs(np.diff(position, prepend=0))
        n_trades = int(np.sum(pos_changes > 0))
        daily_turnover = np.mean(pos_changes)
        gross_sharpe = (float(np.mean(strat_ret_gross))
                        / (float(np.std(strat_ret_gross, ddof=1)) + 1e-9)
                        * np.sqrt(252))

        print(f"\n  {ticker} ({len(r)} OOS days, {n_trades} position changes, "
              f"daily turnover {daily_turnover:.3f})")
        print(f"    Gross Sharpe (no costs): {gross_sharpe:+.2f}")

        # ── Test 1: Sign-flip ──
        flip_ret = -position * r
        flip_sharpe = (float(np.mean(flip_ret))
                       / (float(np.std(flip_ret, ddof=1)) + 1e-9)
                       * np.sqrt(252))
        verdict_1 = "PASS" if flip_sharpe < 0 else "FAIL (signal is noise or drift)"
        print(f"\n    Test 1 — Sign flip:")
        print(f"      Flipped Sharpe:  {flip_sharpe:+.2f}  [{verdict_1}]")

        # ── Test 2: Delay execution by 1 day ──
        if len(r) > 2:
            # position[t] applied to target_ret[t+1] instead of target_ret[t]
            delay_ret = position[:-1] * r[1:]
            delay_sharpe = (float(np.mean(delay_ret))
                            / (float(np.std(delay_ret, ddof=1)) + 1e-9)
                            * np.sqrt(252))
            decay = gross_sharpe - delay_sharpe
            if delay_sharpe < gross_sharpe * 0.5:
                verdict_2 = "CLEAN (edge decays with delay)"
            elif delay_sharpe < 0:
                verdict_2 = "CLEAN (edge reverses with delay)"
            else:
                verdict_2 = "SUSPICIOUS (edge persists — possible drift capture)"
            print(f"\n    Test 2 — Delay 1 day:")
            print(f"      Delayed Sharpe:  {delay_sharpe:+.2f}  "
                  f"(decay: {decay:+.2f})  [{verdict_2}]")

        # ── Test 3: Dumb baselines ──
        if ticker in featured_data:
            df_feat = featured_data[ticker]
            common_feat_idx = common_idx.intersection(df_feat.index)
            if len(common_feat_idx) > 10:
                ret_bl = ret_a.loc[common_feat_idx].values

                # Momentum: sign(yesterday's normalized return)
                mom_pos = np.sign(
                    df_feat.loc[common_feat_idx, 'norm_ret_1d'].values)
                mom_ret = mom_pos * ret_bl
                mom_sharpe = (float(np.mean(mom_ret))
                              / (float(np.std(mom_ret, ddof=1)) + 1e-9)
                              * np.sqrt(252))

                # Mean reversion: -sign(yesterday's normalized return)
                mr_ret = -mom_pos * ret_bl
                mr_sharpe = (float(np.mean(mr_ret))
                             / (float(np.std(mr_ret, ddof=1)) + 1e-9)
                             * np.sqrt(252))

                best_baseline = max(mom_sharpe, mr_sharpe)
                advantage = gross_sharpe - best_baseline
                if advantage > 0.5:
                    verdict_3 = "PASS"
                elif advantage > 0:
                    verdict_3 = "MARGINAL"
                else:
                    verdict_3 = "FAIL (TFT doesn't beat simple baseline)"
                print(f"\n    Test 3 — Baselines:")
                print(f"      Momentum baseline: {mom_sharpe:+.2f}")
                print(f"      MR baseline:       {mr_sharpe:+.2f}")
                print(f"      TFT advantage:     {advantage:+.2f}  [{verdict_3}]")

        # ── Test 4: Cost sensitivity sweep ──
        print(f"\n    Test 4 — Cost sensitivity:")
        for bps in [0, 5, 10, 15, 20, 30]:
            cost_per_change = bps / 10000.0
            costs = pos_changes * cost_per_change
            net = strat_ret_gross - costs
            net_sharpe = (float(np.mean(net))
                          / (float(np.std(net, ddof=1)) + 1e-9)
                          * np.sqrt(252))
            total_cost_pct = costs.sum() * 100
            marker = " <-- current" if bps == int(cfg.bps_cost * 10000) else ""
            print(f"      {bps:2d} bps: Sharpe {net_sharpe:+.2f}  "
                  f"(total cost: {total_cost_pct:.2f}%){marker}")

        # ── Target alignment: first 5 OOS dates ──
        print(f"\n    Target alignment (first 5 OOS dates):")
        for i, dt in enumerate(common_idx[:5]):
            print(f"      {dt.date()}: signal={sig_a.loc[dt]:+.4f}  "
                  f"target_ret={ret_a.loc[dt]:+.6f}  "
                  f"pos={int(np.sign(sig_a.loc[dt]))}")

    # ── Fold stability ──
    if not fold_df.empty and len(fold_df) > 2:
        print(f"\n  Fold Stability:")
        for ticker in fold_df['ticker'].unique():
            t_folds = fold_df[fold_df['ticker'] == ticker]
            sharpes = t_folds['sharpe'].values
            n_pos = int(np.sum(sharpes > 0))
            n_neg = int(np.sum(sharpes <= 0))
            print(f"    {ticker}: {len(sharpes)} folds, "
                  f"mean={np.mean(sharpes):+.2f}, "
                  f"std={np.std(sharpes, ddof=1):.2f}, "
                  f"min={np.min(sharpes):+.2f}, max={np.max(sharpes):+.2f}, "
                  f"+ve/-ve={n_pos}/{n_neg}")
            if np.std(sharpes, ddof=1) > 3.0:
                print(f"      WARNING: High fold variance — edge is unstable")

    # ── PHASE 7: VISUALIZATION ──
    if not all_metrics:
        print("\n  No metrics to visualize.")
        return {'raw_data': raw_data, 'featured_data': featured_data,
                'results': results, 'all_metrics': all_metrics, 'cfg': cfg}
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 10))
    fig.suptitle('QuantKubera Monolith v2 -- Walk-Forward OOS Results',
                 fontsize=14, fontweight='bold')
    
    # 6a. Equity curves (net of costs)
    ax = axes[0, 0]
    for ticker, net_s in strat_net_returns.items():
        equity = (1.0 + net_s).cumprod()
        label = f"{ticker} (Sharpe={all_metrics[ticker]['sharpe']:+.2f})"
        ax.plot(equity.index, equity.values, label=label, linewidth=1.5)
    ax.set_title('OOS Equity Curves (net of costs)')
    ax.legend(fontsize=9)
    ax.grid(True, alpha=0.3)
    ax.set_ylabel('Growth of $1')
    ax.axhline(1.0, color='gray', linestyle='--', alpha=0.4)
    
    # 6b. Drawdown
    ax = axes[0, 1]
    for ticker, net_s in strat_net_returns.items():
        equity = (1.0 + net_s).cumprod()
        peak = equity.cummax()
        dd = (equity - peak) / peak
        ax.fill_between(dd.index, dd.values, alpha=0.3, label=ticker)
    ax.set_title('Drawdown')
    ax.legend(fontsize=9)
    ax.grid(True, alpha=0.3)
    ax.set_ylabel('Drawdown')
    
    # 6c. VSN Feature Importance (first ticker with weights)
    ax = axes[1, 0]
    if results['vsn_weights']:
        first_ticker = list(results['vsn_weights'].keys())[0]
        vsn_w = results['vsn_weights'][first_ticker]
        n_features_to_show = min(15, len(FEATURE_COLUMNS), len(vsn_w))
        sorted_idx = np.argsort(vsn_w)[::-1][:n_features_to_show]
        ax.barh(range(n_features_to_show),
                vsn_w[sorted_idx],
                color='steelblue')
        ax.set_yticks(range(n_features_to_show))
        ax.set_yticklabels([FEATURE_COLUMNS[i] for i in sorted_idx], fontsize=8)
        ax.set_title(f'VSN Feature Importance ({first_ticker})')
        ax.invert_yaxis()
        ax.grid(True, alpha=0.3)
        ax.set_xlabel('Weight')
    else:
        ax.text(0.5, 0.5, 'No VSN weights available',
                ha='center', va='center', transform=ax.transAxes)
    
    # 6d. Monthly returns bar chart (first ticker)
    ax = axes[1, 1]
    first_ticker_key = list(strat_net_returns.keys())[0]
    monthly = strat_net_returns[first_ticker_key].resample('ME').sum()
    colors = ['#d32f2f' if r < 0 else '#388e3c' for r in monthly.values]
    x_pos = range(len(monthly))
    ax.bar(x_pos, monthly.values * 100, color=colors, alpha=0.7)
    ax.set_title(f'Monthly Returns (%) -- {first_ticker_key}')
    ax.set_ylabel('Return %')
    ax.grid(True, alpha=0.3, axis='y')
    # Label x-axis with month abbreviations if not too many
    if len(monthly) <= 36:
        ax.set_xticks(list(x_pos))
        ax.set_xticklabels([d.strftime('%b %y') for d in monthly.index],
                           rotation=45, ha='right', fontsize=7)
    ax.axhline(0, color='black', linewidth=0.5)
    
    plt.tight_layout()
    plt.show()
    
    # Return results for downstream cells (VBT, etc.)
    return {
        'raw_data': raw_data,
        'featured_data': featured_data,
        'results': results,
        'all_metrics': all_metrics,
        'strat_net_returns': strat_net_returns,
        'cfg': cfg,
    }


# ── RUN THE LAB ──
lab_output = run_enhanced_lab()

2026-02-14 16:36:47,483 [INFO] KiteAuth: using cached token


  PHASE 1: DATA ACQUISITION


Fetching data:   0%|          | 0/14 [00:00<?, ?ticker/s]

  NIFTY (NSE)...


  Resolved NIFTY -> NIFTY26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching NIFTY26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  NIFTY26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  BANKNIFTY (NSE)...
  Resolved BANKNIFTY -> BANKNIFTY26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching BANKNIFTY26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)


  BANKNIFTY26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  FINNIFTY (NSE)...
  Resolved FINNIFTY -> FINNIFTY26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching FINNIFTY26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  FINNIFTY26FEBFUT: 1264 bars, 2021-01-11 to 2026-02-13
    -> 1264 days (2021-01-11 to 2026-02-13)
  GOLD (MCX)...
  Resolved GOLD -> GOLDTEN26FEBFUT (expiry: 2026-02-27 00:00:00)
  Fetching GOLDTEN26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  GOLDTEN26FEBFUT: 226 bars, 2025-03-31 to 2026-02-13
    -> 226 days (2025-03-31 to 2026-02-13)
  SILVER (MCX)...


  Resolved SILVER -> SILVERM26FEBFUT (expiry: 2026-02-27 00:00:00)
  Fetching SILVERM26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  SILVERM26FEBFUT: 1748 bars, 2019-04-12 to 2026-02-13
    -> 1748 days (2019-04-12 to 2026-02-13)
  CRUDEOIL (MCX)...
  Resolved CRUDEOIL -> CRUDEOIL26FEBFUT (expiry: 2026-02-19 00:00:00)
  Fetching CRUDEOIL26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)


  CRUDEOIL26FEBFUT: 1752 bars, 2019-04-12 to 2026-02-13
    -> 1752 days (2019-04-12 to 2026-02-13)
  NATURALGAS (MCX)...
  Resolved NATURALGAS -> NATURALGAS26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching NATURALGAS26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  NATURALGAS26FEBFUT: 1752 bars, 2019-04-12 to 2026-02-13
    -> 1752 days (2019-04-12 to 2026-02-13)
  COPPER (MCX)...
  Resolved COPPER -> COPPER26FEBFUT (expiry: 2026-02-27 00:00:00)
  Fetching COPPER26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)


  COPPER26FEBFUT: 1735 bars, 2019-04-12 to 2026-02-13
    -> 1735 days (2019-04-12 to 2026-02-13)
  RELIANCE (NFO)...
  Resolved RELIANCE -> RELIANCE26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching RELIANCE26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  RELIANCE26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  HDFCBANK (NFO)...
  Resolved HDFCBANK -> HDFCBANK26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching HDFCBANK26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)


  HDFCBANK26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  ICICIBANK (NFO)...
  Resolved ICICIBANK -> ICICIBANK26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching ICICIBANK26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  ICICIBANK26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  INFY (NFO)...
  Resolved INFY -> INFY26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching INFY26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)


  INFY26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  TCS (NFO)...
  Resolved TCS -> TCS26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching TCS26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)
  TCS26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)
  SBIN (NFO)...
  Resolved SBIN -> SBIN26FEBFUT (expiry: 2026-02-24 00:00:00)
  Fetching SBIN26FEBFUT: 2019-04-12 to 2026-02-14 (2500 calendar days)


  SBIN26FEBFUT: 1697 bars, 2019-04-12 to 2026-02-13
    -> 1697 days (2019-04-12 to 2026-02-13)

  PHASE 1.5: CROSS-ASSET DATA (News Sentiment + India VIX)

  [1/2] News Sentiment (GDELT + FinBERT)


GDELT headlines:   0%|          | 0/895 [00:00<?, ?req/s]

In [None]:
# 8. VECTORBTPRO TEARSHEET & TRADE ANALYSIS
import vectorbtpro as vbt

print("=" * 70)
print("  VECTORBTPRO TEARSHEET & TRADE ANALYSIS")
print("=" * 70)

# Use results from Cell 7 (lab_output variable)
oos_returns = lab_output['results']['oos_returns']
oos_signals = lab_output['results']['oos_signals']
cfg = lab_output['cfg']

for ticker in oos_returns.columns:
    print(f"\n{'=' * 60}")
    print(f"  {ticker} TEARSHEET")
    print(f"{'=' * 60}")
    
    sig = oos_signals[ticker].dropna()
    ret = oos_returns[ticker].dropna()
    
    if len(sig) < 10:
        print(f"  Insufficient OOS data for {ticker} ({len(sig)} days), skipping.")
        continue
    
    # Get close prices for the OOS period from featured_data
    feat_df = lab_output['featured_data'][ticker]
    close = feat_df['close'].reindex(sig.index).dropna()
    
    # Align all series on common dates
    common_idx = sig.index.intersection(close.index)
    if len(common_idx) < 10:
        print(f"  Insufficient aligned data for {ticker} ({len(common_idx)} days), skipping.")
        continue
    
    sig = sig.loc[common_idx]
    close = close.loc[common_idx]
    
    # Build position series with T+1 lag (signal at t -> trade at t+1)
    position = np.sign(sig).shift(1).fillna(0)
    
    # Derive entry/exit signals from position changes
    prev_pos = position.shift(1).fillna(0)
    long_entries  = (position > 0) & (prev_pos <= 0)
    long_exits    = (position <= 0) & (prev_pos > 0)
    short_entries = (position < 0) & (prev_pos >= 0)
    short_exits   = (position >= 0) & (prev_pos < 0)
    
    # VBT Portfolio: long/short from signals
    pf = vbt.Portfolio.from_signals(
        close=close,
        long_entries=long_entries,
        long_exits=long_exits,
        short_entries=short_entries,
        short_exits=short_exits,
        fees=cfg.bps_cost / 2,       # per-side fee
        slippage=cfg.bps_cost / 2,    # per-side slippage
        freq='1D',
        init_cash=1_000_000,
    )
    
    # Print full stats
    print(pf.stats())
    
    # Trade-level analysis
    if hasattr(pf, 'trades') and pf.trades.count > 0:
        trades = pf.trades
        print(f"\n  --- Trade Summary ---")
        print(f"  Total Trades:   {trades.count}")
        print(f"  Win Rate:       {trades.win_rate:.2%}")
        print(f"  Profit Factor:  {trades.profit_factor:.2f}")
        print(f"  Avg P&L:        {trades.pnl.mean():.2f}")
        print(f"  Max Win:        {trades.pnl.max():.2f}")
        print(f"  Max Loss:       {trades.pnl.min():.2f}")
        print(f"  Expectancy:     {trades.expectancy:.2f}")
        print(f"  Avg Duration:   {trades.duration.mean()}")
        
        # Show recent trades
        readable = trades.records_readable
        if len(readable) > 0:
            print(f"\n  --- Last 10 Trades ---")
            print(readable.tail(10).to_string())
    else:
        print("  No trades executed.")
    
    # Plot equity curve
    try:
        fig = pf.plot()
        fig.update_layout(
            title=f'{ticker} -- VectorBTPro Equity Curve',
            height=400,
            template='plotly_white'
        )
        fig.show()
    except Exception as e:
        print(f"  Plot failed: {e}")
        # Fallback: matplotlib equity curve
        equity = pf.value()
        plt.figure(figsize=(12, 4))
        plt.plot(equity.index, equity.values, linewidth=1.5)
        plt.title(f'{ticker} -- VectorBTPro Equity Curve')
        plt.ylabel('Portfolio Value')
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()

print("\n" + "=" * 70)
print("  VectorBTPro analysis complete")
print("=" * 70)