v61  
- Updated core.analyzer
- Replaced the `Result` pattern with exceptions and flattened the logic


v60  
- Converted code from notebook to modular system.
- Fixed divide by zero warning from calculate_gain
- Added subtitle to subplots
- Added Volatility Regime plot


v59  
- Removed "nest" of if-statements in **AlphaEngine.run**
- Use **Result Pattern** to handle errors
- Change verify_analyzer_short and verify_analyzer_long gain calculation from simple return to logarithmic return
- Change calculate_gain from simple return to logarithmic return
- Remove bfill from calculate_gain to prevent backfill with future data
- Verify macro_df calculation


v57, v58  
added marco subplotsThe macro regime framework is now fully documented with:
- Trend (SMA200 deviation) ‚Üí Where we are in the cycle  
- Trend Velocity (Z) ‚Üí How fast we're moving relative to normal
- VIX-Z ‚Üí Market fear/complacency levels  

v56  

- De-coupled features_df and macro_df
- generate_features and audit_feature_engineering_integrity use GLOBAL_SETTINGS

v55  
Added
- audit_feature_engineering_integrity (check calculation in features_df)  

These are the metrics in plot  
- --- 1. LEGACY / SANITY CHECKS ---
- "Price Gain": lambda obs: QuantUtils.calculate_gain(obs["lookback_close"]),
- "Sharpe": lambda obs: QuantUtils.calculate_sharpe(obs["lookback_returns"]),
- "Sharpe (ATRP)": lambda obs: QuantUtils.calculate_sharpe_vol(
        obs["lookback_returns"], obs["atrp"]
    ),
- "Sharpe (TRP)": lambda obs: QuantUtils.calculate_sharpe_vol(
        obs["lookback_returns"], obs["trp"]
    ),
- --- 2. NEW QUANT METRICS ---
- "Momentum (21d)": lambda obs: obs["mom_21"],
- "Information Ratio (IR)": lambda obs: obs["ir_63"],  # Kept this one
- "Consistency (WinRate)": lambda obs: obs["consistency"],
- "Oversold (RSI)": lambda obs: -obs["rsi"],
- "Dip Buyer (Drawdown)": lambda obs: -obs["dd_21"],
- "Low Volatility": lambda obs: -obs["atrp"],







v54
-  **Replaced plot_walk_forward_analyzer with create_walk_forward_analyzer**


v53  
Looking at this registry with a quant lens, the list is **comprehensive but bloated**‚Äîwe have **momentum measured five times under different names** (roc‚ÇÅ, roc‚ÇÉ, roc‚ÇÖ, roc‚ÇÅ‚ÇÄ, roc‚ÇÇ‚ÇÅ and their negative twins ‚ÄúPullback‚Äù).  
That‚Äôs **10 slots** telling us almost the same story at slightly different lags; in a rank-based engine they will **crowd the signal space** and inflate turnover without adding IC.

Duplicate / redundant cluster  
- Momentum 1 D ‚Üî Pullback 1 D (perfect mirror)  
- Same for 3 D, 5 D, 10 D, 21 D.  
**Keep one side only**‚Äîmomentum is enough; the portfolio constructor can always **reverse the rank** if it wants ‚Äúoversold‚Äù.

Close cousins that can be merged  
- ‚ÄúSharpe‚Äù vs ‚ÄúSharpe (ATRP)‚Äù ‚Äì both are return / vol; keep **ATRP version** because it is regime-aware and smoother.  
- ‚ÄúRVol‚Äù vs ‚ÄúVol_Regime‚Äù ‚Äì both capture vol expansion; keep the **longer-memory one** (Vol_Regime) and drop the intraday snapshot.

Gaps that matter to a quant  
1. **Consistency sensor**: nowhere do we ask ‚Äúhow often did the ticker close higher than it opened?‚Äù ‚Äì add **5-day win-rate** or **up-day hit-ratio**.  
2. **Risk-adjusted intraday strength**: no **Sharpe(on-balance volume)** or **volume-momentum efficiency**; OBV_Score is raw, not risk-scaled.  
3. **Benchmark-relative consistency**: ‚ÄúAlpha (RelStrength)‚Äù is cumulative; add **rolling information ratio vs SPY** to catch *sustained* alpha, not one gap.  
4. **Tail flag**: no **skew** or **max-drawdown** metric; a single 20 % gap stock can poison the book.  
5. **Macro regime overlay**: no **beta-to-SPY** or **correlation-break** sensor; mid-2022 macro swings showed that low-beta names behaved like a different asset class.

Recommended minimal clean set (‚â§ 12 metrics)

1. Sharpe(ATRP) ‚Äì strategic anchor  
2. Momentum 21 D ‚Äì slow trend  
3. Momentum 5 D ‚Äì fast trend  
4. 5-day win-rate ‚Äì consistency  
5. RSI(Trend) ‚Äì strength confirmation  
6. OBV_Score ‚Äì volume conviction  
7. Vol_Regime ‚Äì vol expansion filter  
8. Alpha(RelStrength) 63-day IR ‚Äì benchmark consistency  
9. Max 21-day drawdown ‚Äì tail guard  
10. Beta-to-SPY ‚Äì macro regime tag  

Drop everything else; the freed-up slots reduce collinearity, cut turnover, and leave head-room for **interaction terms** (e.g. momentum √ó consistency) that actually add orthogonal signal.



Below is a single, fully-vectorised block that adds the **five gap metrics** to your existing MultiIndex OHLCV frame.  
It never loops over tickers; everything is done with `groupby(level=0).rolling(...)` so it runs in C-speed and keeps the same index.

```python
import pandas as pd
import numpy as np

# ----------  CONFIG  -------------------------------------------------
LKB_RET   = 21          # look-back for return-based metrics
LKB_CONS  = 5           # consistency window (days)
LKB_IR    = 63          # IR window
LKB_BETA  = 63          # beta window
LKB_TAIL  = 21          # max-drawdown window
BENCH     = 'SPY'       # ticker that exists in your universe
# ---------------------------------------------------------------------

# 1.  DAILY RETURNS ----------------------------------------------------
df['ret'] = df.groupby(level=0)['Adj Close'].pct_change()

# 2.  CONSISTENCY SENSOR  (5-day win-rate) -----------------------------
df['up']  = df['ret'].gt(0).astype(int)
df['consistency_5d'] = (df.groupby(level=0)['up']
                          .rolling(LKB_CONS).mean()
                          .reset_index(level=0, drop=True))

# 3.  BENCHMARK-RELATIVE CONSISTENCY  (63-day IR vs SPY) ---------------
# need benchmark return
bench_ret = df.xs(BENCH, level=0)['ret'].rename('bench_ret')
df = df.join(bench_ret, how='left')          # broadcast to all tickers

df['active'] = df['ret'] - df['bench_ret']
g = df.groupby(level=0)
active_mean  = g['active'].rolling(LKB_IR).mean()
active_std   = g['active'].rolling(LKB_IR).std()
df['IR_63d'] = active_mean / active_std      # Information Ratio

# 4.  TAIL FLAG  (21-day max drawdown) ---------------------------------
roll_max = g['Adj Close'].rolling(LKB_TAIL).max()
dd = (df['Adj Close'] - roll_max) / roll_max
df['max_dd_21d'] = dd.groupby(level=0).rolling(LKB_TAIL).min()

# 5.  MACRO REGIME OVERLAY  (beta to SPY) ------------------------------
cov  = g['ret'].rolling(LKB_BETA).cov(df['bench_ret'])
var  = df['bench_ret'].groupby(level=0).rolling(LKB_BETA).var()
df['beta_SPY'] = cov / var

# 6.  RISK-ADJUSTED INTRADAY STRENGTH  (OBV Sharpe) --------------------
# OBV
df['close_chg'] = df.groupby(level=0)['Adj Close'].diff()
df['vol_dir']   = np.where(df['close_chg'] > 0,  df['Volume'],
                   np.where(df['close_chg'] < 0, -df['Volume'], 0))
df['obv'] = df.groupby(level=0)['vol_dir'].cumsum()

# OBV return & vol
df['obv_ret'] = df.groupby(level=0)['obv'].pct_change()
obv_mean = g['obv_ret'].rolling(LKB_RET).mean()
obv_std  = g['obv_ret'].rolling(LKB_RET).std()
df['OBV_Sharpe_21d'] = obv_mean / obv_std

# drop helper columns --------------------------------------------------
df.drop(columns=['up','bench_ret','active','close_chg','vol_dir'], inplace=True)
```

After the block you have five new columns:

- `consistency_5d`      ‚Äì 5-day win-rate (0-1)  
- `IR_63d`              ‚Äì 63-day Information Ratio vs SPY  
- `max_dd_21d`          ‚Äì 21-day maximum drawdown (‚â§ 0)  
- `beta_SPY`            ‚Äì rolling beta to SPY  
- `OBV_Sharpe_21d`      ‚Äì OBV risk-adjusted momentum  

All are aligned to the original MultiIndex and ready to be ranked or z-scored inside your Alpha Engine.

v52  
- **Cascase Filter results `AGREED` with bot_v54i.ipynb**
- **Cascade Filter works with df_ohlcv_subset**
- **verify_engine_results_short_form**
- **verify_engine_results_long_form**
-  **The Temporal Alignment Fix:** We synchronized the "Reward" (Returns) and "Risk" (Volatility) by implementing the $N-1$ denominator logic. This ensures that Day 1's volatility no longer dilutes your Sharpe scores.
-  **The Event-Driven Re-normalization:** We verified that the Engine correctly resets capital and weights at the start of the Holding period, giving you an accurate "Fresh Start" performance metric.
-  **The Double-Blind Verification:** We proved that the Engine's True Range (TRP) math is flawless by recreating it from raw High/Low/Close data and achieving an 8-decimal match.
-  **Mathematical Fortification:** We centralized all logic into a polymorphic `QuantUtils` kernel that handles both single-portfolio reports and whole-universe rankings with built-in numerical safety.
-  **Volatility Evolution:** We successfully added `TRP` (True Range Percent) and the `Sharpe (TRP)` metric, giving you a raw, high-frequency alternative to the smoothed ATR.
-  **Data Integrity:** We implemented the "Momentum Collapse" tripwire (`verify_ranking_integrity`) to ensure that your risk-adjusted rankings never accidentally devolve into simple price momentum.
-  **The "Audit Pack" Architecture:** We collapsed fragmented results into a single, atomic container, ensuring that your inputs, results, and debug data are always perfectly synchronized.
-  **Total Transparency:** We replaced scattered CSV files with a unified **Excel Audit Report**, allowing for 1-to-1 manual verification of every calculation in the system.



v51

UNDO v50, Calculate Sharpe(ATR) using mean over lookback period.  

Comment out ``# --- PINPOINT START: ATRP SWITCH ---`` in function ``_select_tickers`` can switch between ``Averaged ATRP over lookback period`` and ``Current ATRP``  
    # --- PINPOINT START: ATRP SWITCH ---  
    # To switch between Old (Averaged ATRP) and New (Current ATRP):  
    # 1. Comment out the logic you DON'T want.  
    # 2. Uncomment the logic you DO want.  


v50

Ticker selection based on atrp_value_for_obs based on decision day, was based on average over lookback period. 

v48  
### Summary of what you just accomplished:
1.  **Strict Math:** `QuantUtils` now contains an `assert` that prevents any dev (or AI) from filling the first day with 0.0.
2.  **Semantic Protection:** Variables are now named `returns_WITH_BOUNDARY_NAN`, signaling to the AI that the Null value is part of its identity.
3.  **Complete SOLID Separation:** The Engine CONDUCTS the simulation, while `QuantUtils` CALCULATES the results. They no longer share logic.

**1. Data Flow of `plot_walk_forward_analyzer`**
The function acts as a **UI wrapper** around the `AlphaEngine` class. The flow is:
1.  **Input:** User selects parameters (Dates, Lookback, Strategy).
2.  **State Construction:** `AlphaEngine` slices the historical data (`df_ohlcv`, `df_atrp`) up to the `decision_date`.
3.  **Policy Execution (Hardcoded):** The engine applies the logic (e.g., `METRIC_REGISTRY['Sharpe']`) to rank stocks based *only* on the Lookback window.
4.  **Environment Step:** It simulates a "Buy" at `decision_date + 1` and calculates the returns over the `holding_period`.
5.  **Reward Generation:** It outputs performance metrics (`holding_p_gain`, `holding_p_sharpe`).

In [1]:
# 1. Enable Autoreload
%load_ext autoreload
%autoreload 2

import sys
from pathlib import Path

def add_project_root_to_path():
    """Find notebooks_RLVR and add to sys.path."""
    current = Path.cwd()

    # Search upward for notebooks_RLVR folder
    for path in [current] + list(current.parents):
        if path.name == "notebooks_RLVR":
            sys.path.insert(0, str(path))
            print(f"‚úì Added to path: {path}")
            return path
        # Also check if notebooks_RLVR exists as child (for running from stocks/)
        candidate = path / "notebooks_RLVR"
        if candidate.exists():
            sys.path.insert(0, str(candidate))
            print(f"‚úì Added to path: {candidate}")
            return candidate

    raise RuntimeError("Could not find notebooks_RLVR directory")


# Run once at notebook start
add_project_root_to_path()


# 2. Force reload cached modules (run this to refresh code changes)
import importlib

modules_to_reload = [
    "core.engine",
    "core.contracts",
    "core.settings",
    "strategy.registry",
    "core.quant",
    "core.analyzer",
    "core.paths",
]

for mod in modules_to_reload:
    if mod in sys.modules:
        del sys.modules[mod]


# 3. Standard imports
import pandas as pd
import os
import numpy as np

from IPython.display import display
from dataclasses import fields, asdict, is_dataclass
from typing import List, Union, Tuple 


# 4. Fresh imports (these will re-import from disk due to cache clearing above)
from core.engine import AlphaEngine
from core.contracts import MarketObservation, FilterPack
from core.settings import GLOBAL_SETTINGS
from strategy.registry import METRIC_REGISTRY
from core.quant import QuantUtils
from core.analyzer import create_walk_forward_analyzer
from core.paths import OUTPUT_DIR


# 5. Pandas display settings
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)
pd.set_option("display.max_colwidth", 50)
pd.set_option("display.precision", 4)


# # 6. Instantiate engine (customize DataFrames as needed)
# master_engine = AlphaEngine(
#     df_ohlcv=df_ohlcv,
#     features_df=features_df,
#     macro_df=macro_df,
#     df_close_wide=df_close_wide,
#     df_atrp_wide=df_atrp_wide,
#     df_trp_wide=df_trp_wide,
# )

‚úì Added to path: c:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR
NOTEBOOKS_RLVR_ROOT: C:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR

OUTPUT_DIR: C:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR\output



In [2]:
# ==============================================================================
# SECTION B: STRATEGY HELPERS & FEATURES
# ==============================================================================


def generate_features(
    df_ohlcv: pd.DataFrame,
    df_indices: pd.DataFrame = None,
    benchmark_ticker: str = GLOBAL_SETTINGS["benchmark_ticker"],
    atr_period: int = GLOBAL_SETTINGS["atr_period"],
    rsi_period: int = GLOBAL_SETTINGS["rsi_period"],
    win_5d: int = GLOBAL_SETTINGS["5d_window"],
    win_21d: int = GLOBAL_SETTINGS["21d_window"],
    win_63d: int = GLOBAL_SETTINGS["63d_window"],
    feature_zscore_clip: float = GLOBAL_SETTINGS["feature_zscore_clip"],
    quality_window: int = GLOBAL_SETTINGS["quality_window"],
    quality_min_periods: int = GLOBAL_SETTINGS["quality_min_periods"],
) -> Tuple[pd.DataFrame, pd.DataFrame]:

    print(f"‚ö° Generating Decoupled Features (Benchmark: {benchmark_ticker})...")

    # --- 0. PREP ---
    df_ohlcv = df_ohlcv.sort_index(level=["Ticker", "Date"])
    all_dates = df_ohlcv.index.get_level_values("Date").unique().sort_values()

    # --- 1. MACRO ENGINE ---
    macro_df = pd.DataFrame(index=all_dates)
    if benchmark_ticker in df_ohlcv.index.get_level_values("Ticker"):
        mkt_close = (
            df_ohlcv.xs(benchmark_ticker, level="Ticker")["Adj Close"]
            .reindex(all_dates)
            .ffill()
        )
        macro_df["Mkt_Ret"] = mkt_close.pct_change().fillna(0.0)
        macro_df["Macro_Trend"] = (mkt_close / mkt_close.rolling(200).mean()) - 1.0
    else:
        macro_df["Mkt_Ret"] = 0.0
        macro_df["Macro_Trend"] = 0.0

    # --- TREND VELOCITY & MOMENTUM ---
    macro_df["Macro_Trend_Vel"] = macro_df["Macro_Trend"].diff(win_21d)
    macro_df["Macro_Trend_Vel_Z"] = (
        macro_df["Macro_Trend_Vel"] / macro_df["Macro_Trend"].rolling(win_63d).std()
    ).clip(-feature_zscore_clip, feature_zscore_clip)
    macro_df["Macro_Trend_Mom"] = (
        np.sign(macro_df["Macro_Trend"])
        * np.sign(macro_df["Macro_Trend_Vel"])
        * np.abs(macro_df["Macro_Trend_Vel"])
    ).fillna(0)

    # VIX Extraction (Same as before)
    macro_df["Macro_Vix_Z"] = 0.0
    macro_df["Macro_Vix_Ratio"] = 1.0
    if df_indices is not None:
        idx_names = df_indices.index.get_level_values(0).unique()
        if "^VIX" in idx_names:
            v = df_indices.xs("^VIX", level=0)["Adj Close"].reindex(all_dates).ffill()
            macro_df["Macro_Vix_Z"] = (
                (v - v.rolling(63).mean()) / v.rolling(63).std()
            ).clip(-feature_zscore_clip, feature_zscore_clip)
        if "^VIX" in idx_names and "^VIX3M" in idx_names:
            v3 = (
                df_indices.xs("^VIX3M", level=0)["Adj Close"].reindex(all_dates).ffill()
            )
            macro_df["Macro_Vix_Ratio"] = (v / v3).fillna(1.0)
    macro_df.fillna(0.0, inplace=True)

    # --- 2. TICKER ENGINE ---
    grouped = df_ohlcv.groupby(level="Ticker")
    rets = grouped["Adj Close"].pct_change()
    mkt_ret_series = macro_df["Mkt_Ret"]  # The "Master" market vector

    # A. Hybrid Metrics (Beta & IR)
    # 1. IR_63 (Passed previously, kept same logic)
    active_ret = rets.sub(mkt_ret_series, axis=0, level="Date")
    roll_active = active_ret.groupby(level="Ticker").rolling(win_63d)
    ir_63 = (
        (roll_active.mean() / roll_active.std())
        .reset_index(level=0, drop=True)
        .fillna(0)
    )

    # 2. Beta_63 (Optimized: Pre-compute market variance, audit-exact calculation)
    mkt_var = mkt_ret_series.rolling(win_63d).var()

    def calc_rolling_beta(ticker_rets):
        dates = ticker_rets.index.get_level_values("Date")
        m = mkt_ret_series.reindex(dates)
        return ticker_rets.rolling(win_63d).cov(m) / mkt_var.reindex(dates)

    beta_63 = (
        rets.groupby(level="Ticker", group_keys=False)
        .apply(calc_rolling_beta)
        .fillna(1.0)
    )

    # B. Volatility (ATR / TRP) - Optimized
    prev_close = grouped["Adj Close"].shift(1)

    # Vectorized True Range without pd.concat memory overhead
    high_low = df_ohlcv["Adj High"] - df_ohlcv["Adj Low"]
    high_close = (df_ohlcv["Adj High"] - prev_close).abs()
    low_close = (df_ohlcv["Adj Low"] - prev_close).abs()

    # Nested np.maximum avoids creating a 3-column DataFrame
    tr = np.maximum(np.maximum(high_low, high_close), low_close)

    atr = (
        tr.groupby(level="Ticker")
        .ewm(alpha=1 / atr_period, adjust=False)
        .mean()
        .reset_index(level=0, drop=True)
    )
    natr = (atr / df_ohlcv["Adj Close"]).fillna(0)
    trp = (tr / df_ohlcv["Adj Close"]).fillna(0)

    # C. Momentum & Consistency
    mom_21 = grouped["Adj Close"].pct_change(win_21d)
    consistency = (
        (rets > 0)
        .astype(float)
        .groupby(level="Ticker")
        .rolling(win_5d)
        .mean()
        .reset_index(level=0, drop=True)
    )
    dd_21 = (
        df_ohlcv["Adj Close"]
        / grouped["Adj Close"].rolling(win_21d).max().reset_index(level=0, drop=True)
    ) - 1.0

    # D. RSI (Wilder's Logic)
    delta = grouped["Adj Close"].diff()
    up, down = delta.clip(lower=0), -1 * delta.clip(upper=0)
    ma_up = (
        up.groupby(level="Ticker")
        .ewm(alpha=1 / rsi_period, adjust=False)
        .mean()
        .reset_index(level=0, drop=True)
    )
    ma_down = (
        down.groupby(level="Ticker")
        .ewm(alpha=1 / rsi_period, adjust=False)
        .mean()
        .reset_index(level=0, drop=True)
    )
    # FIX: Allow division by zero (i.e. no down day) to create inf (correct RSI=100),
    # inf‚Üí100, -inf‚Üí0, NaN‚Üí50
    # then clean up remaining NaNs (initial periods/no movement)
    # - Initial periods: Before the 14-day lookback is filled, the EWM mean is undefined ‚Üí NaN.
    # - Flat prices: If price doesn't move (Avg Up = 0 and Avg Down = 0), RS is 0/0 ‚Üí NaN.
    # - By convention, RSI is set to 50 (neutral) when there is no directional momentum.
    rs = ma_up / ma_down  # Keep zero denominator ‚Üí inf
    raw_rsi = 100 - (100 / (1 + rs))
    rsi = raw_rsi.replace({np.inf: 100, -np.inf: 0}).fillna(50)

    # E. Assemble Features
    features_df = pd.DataFrame(
        {
            "ATR": atr,
            "ATRP": natr,
            "TRP": trp,
            "RSI": rsi,
            "Mom_21": mom_21,
            "Consistency": consistency,
            "IR_63": ir_63,
            "Beta_63": beta_63,
            "DD_21": dd_21.fillna(0),
            "Ret_1d": rets,
        }
    )

    # F. Quality (Universe Filtering) - Optimized
    quality_temp = pd.DataFrame(
        {
            "IsStale": np.where(
                (df_ohlcv["Volume"] == 0)
                | (df_ohlcv["Adj High"] == df_ohlcv["Adj Low"]),
                1,
                0,
            ),
            "DollarVolume": df_ohlcv["Adj Close"] * df_ohlcv["Volume"],
            "HasSameVolume": (grouped["Volume"].diff() == 0).astype(int),
        },
        index=df_ohlcv.index,
    )

    # Calculate rolling stats separately (avoid slow dict agg) and use .values to bypass index alignment overhead
    grp = quality_temp.groupby(level="Ticker")
    rolling_quality = pd.DataFrame(
        {
            "RollingStalePct": grp["IsStale"]
            .rolling(window=quality_window, min_periods=quality_min_periods)
            .mean()
            .values,
            "RollMedDollarVol": grp["DollarVolume"]
            .rolling(window=quality_window, min_periods=quality_min_periods)
            .median()
            .values,
            "RollingSameVolCount": grp["HasSameVolume"]
            .rolling(window=quality_window, min_periods=quality_min_periods)
            .sum()
            .values,
        },
        index=quality_temp.index,
    )

    return pd.concat([features_df, rolling_quality], axis=1).sort_index(), macro_df


# ==============================================================================
# SECTION F: INSPECTION TOOLS
# ==============================================================================


def peek(idx, reg):
    """
    Displays metadata and RETURNS the object for further use.
    """
    if idx < 0 or idx >= len(reg):
        print(f"‚ùå Index {idx} out of range.")
        return None

    entry = reg[idx]

    # 1. Print the Header (for humans)
    print(f" {'='*60}")
    print(f" üìç INDEX: [{idx}]")
    print(f" üè∑Ô∏è  NAME:  {entry['name']}")
    print(f" üìÇ PATH:  {entry['path']}")
    print(f" {'='*60}\n")

    # 2. Display the data (for the UI)
    from IPython.display import display

    display(entry["obj"])

    # 3. RETURN the data (for other functions)
    return entry["obj"]


def visualize_audit_structure(obj):
    """
    Generates the Map and returns a Registry of dictionaries:
    [{'name': str, 'path': str, 'obj': object}, ...]
    """
    id_memory = {}
    registry = []
    output = [
        "====================================================================",
        "üîç HIGH-TRANSPARENCY AUDIT MAP",
        "====================================================================",
    ]

    def get_icon(val):
        if isinstance(val, pd.DataFrame):
            return "üßÆ"
        if isinstance(val, pd.Series):
            return "üìà"
        if isinstance(val, (list, tuple, dict)):
            return "üìÇ"
        if isinstance(val, pd.Timestamp):
            return "üìÖ"
        if is_dataclass(val):
            return "üì¶"
        return "üî¢" if isinstance(val, (int, float)) else "üìÑ"

    def process(item, name, level=0, path=""):
        indent = "  " * level
        item_id = id(item)

        # Build the breadcrumb path
        current_path = f"{path} -> {name}" if path else name

        is_primitive = isinstance(item, (int, float, str, bool, type(None)))
        if not is_primitive and item_id in id_memory:
            output.append(
                f"{indent}          ‚ï∞‚îÄ‚îÄ {name} --> [See ID {id_memory[item_id]}]"
            )
            return

        # 1. Store Index, Object, Name, and Path in Registry
        curr_idx = len(registry)
        registry.append({"name": name, "path": current_path, "obj": item})

        if not is_primitive:
            id_memory[item_id] = curr_idx

        # 2. Metadata for display
        meta = f"{type(item).__name__}"
        if hasattr(item, "shape"):
            meta = f"shape={item.shape}"
        elif isinstance(item, (list, dict)):
            meta = f"len={len(item)}"

        output.append(f"[{curr_idx:>3}] {indent}{get_icon(item)} {name} ({meta})")

        # 3. Recurse
        if isinstance(item, dict):
            for k, v in item.items():
                process(v, k, level + 1, current_path)
        elif isinstance(item, (list, tuple)):
            for i, v in enumerate(item):
                process(v, f"index_{i}", level + 1, current_path)
        elif is_dataclass(item):
            for f in fields(item):
                process(getattr(item, f.name), f.name, level + 1, current_path)

    process(obj, "audit_pack")
    print("\n".join(output))

    return registry


def visualize_analyzer_structure(analyzer):
    """
    Maps the internal data structure of the last simulation run.
    Usage: analyzer.last_run.tickers
    """
    if not analyzer.last_run:
        print(
            "‚ùå Audit Aborted: No simulation data found. Click 'Run' in the UI first."
        )
        return []

    # We audit the last_run object (EngineOutput)
    return visualize_audit_structure(analyzer.last_run)


# ==============================================================================
# INTEGRITY PROTECTION: THE TRIPWIRE
# ==============================================================================


def verify_math_integrity():
    """
    üõ°Ô∏è TRIPWIRE: Ensures Sample Boundary Integrity.
    """
    print("\n--- üõ°Ô∏è Starting Final Integrity Audit ---")

    try:
        # Test 1: Series Input
        mock_series = pd.Series([100.0, 102.0, 101.0])
        rets_s = QuantUtils.compute_returns(mock_series)
        # Verify first value is actually NaN
        if not pd.isna(rets_s.iloc[0]):
            raise ValueError("Series Leading NaN missing")
        print("‚úÖ Series Boundary: OK")

        # Test 2: DataFrame Input
        mock_df = pd.DataFrame({"A": [100, 101], "B": [200, 202]})
        rets_df = QuantUtils.compute_returns(mock_df)
        if not rets_df.iloc[0].isna().all():
            raise ValueError("DataFrame Leading NaN missing")
        print("‚úÖ DataFrame Boundary: OK")

        print("‚úÖ AUDIT PASSED: Mathematical boundaries are strictly enforced.")
    except Exception as e:
        print(f"üî• SYSTEM BREACH: {str(e)}")
        raise e


def verify_feature_engineering_integrity():
    """
    üõ°Ô∏è TRIPWIRE: Validates Feature Engineering Logic.
    Enforces:
    1. Day 1 ATR must be NaN (No PrevClose).
    2. Wilder's Smoothing must use Alpha = 1/Period.
    3. Recursion must match manual calculation.
    """
    print("\n--- üõ°Ô∏è Starting Feature Engineering Audit ---")

    # 1. Create Synthetic Data (3 Days)
    # Day 1: High-Low = 10. No PrevClose.
    # Day 2: High-Low = 20. Gap up implies TR might be larger.
    # Day 3: High-Low = 10.
    dates = pd.to_datetime(["2020-01-01", "2020-01-02", "2020-01-03"])
    idx = pd.MultiIndex.from_product([["TEST"], dates], names=["Ticker", "Date"])

    df_mock = pd.DataFrame(
        {
            "Adj Open": [100, 110, 110],
            "Adj High": [110, 130, 120],
            "Adj Low": [100, 110, 110],
            "Adj Close": [105, 120, 115],  # PrevClose: NaN, 105, 120
            "Volume": [1000, 1000, 1000],
        },
        index=idx,
    )

    # 2. Run the Generator
    # We use Period=2 to make manual math easy (Alpha = 1/2 = 0.5)
    feats_df, macro_df = generate_features(
        df_mock, atr_period=2, rsi_period=2, quality_min_periods=1
    )

    atr_series = feats_df["ATR"]

    # 3. MANUAL CALCULATION (The "Truth")
    # Day 1:
    #   TR = Max(H-L, |H-PC|, |L-PC|)
    #   TR = Max(10, NaN, NaN) -> NaN (Because skipna=False)
    #   Expected ATR: NaN

    # Day 2:
    #   PrevClose = 105
    #   H-L=20, |130-105|=25, |110-105|=5
    #   TR = 25
    #   Expected ATR: First valid observation = 25.0

    # Day 3:
    #   PrevClose = 120
    #   H-L=10, |120-120|=0, |110-120|=10
    #   TR = 10
    #   Wilder's Smoothing (Alpha=0.5):
    #   ATR_3 = (TR_3 * alpha) + (ATR_2 * (1-alpha))
    #   ATR_3 = (10 * 0.5) + (25 * 0.5) = 5 + 12.5 = 17.5

    print(f"Audit Values:\n{atr_series.values}")

    # 4. ASSERTIONS
    try:
        # Check Day 1
        if not np.isnan(atr_series.iloc[0]):
            raise AssertionError(
                f"Day 1 Regression: Expected NaN, got {atr_series.iloc[0]}. (Check skipna=False)"
            )

        # Check Day 2 (Initialization)
        if not np.isclose(atr_series.iloc[1], 25.0):
            raise AssertionError(
                f"Initialization Regression: Expected 25.0, got {atr_series.iloc[1]}."
            )

        # Check Day 3 (Recursion)
        if not np.isclose(atr_series.iloc[2], 17.5):
            raise AssertionError(
                f"Wilder's Logic Regression: Expected 17.5, got {atr_series.iloc[2]}. (Check Alpha=1/N)"
            )

        print("‚úÖ FEATURE INTEGRITY PASSED: Wilder's ATR logic is strictly enforced.")

    except AssertionError as e:
        print(f"üî• LOGIC FAILURE: {str(e)}")
        raise e


def verify_ranking_integrity():
    """
    üõ°Ô∏è TRIPWIRE: Prevents 'Momentum Collapse' in Volatility-Adjusted Ranking.
    Ensures that Sharpe(Vol) distinguishes between High-Vol and Low-Vol stocks.
    """
    print("--- üõ°Ô∏è Starting Ranking Kernel Audit ---")

    # 1. Setup Mock Universe (2 Tickers, 2 Days)
    # Ticker 'VOLATILE': 10% return, but 10% Volatility
    # Ticker 'STABLE': 2% return, but 1% Volatility (The 'Sharpe' Winner)
    data = {"VOLATILE": [1.0, 1.10], "STABLE": [1.0, 1.02]}  # +10%  # +2%
    df_returns = pd.DataFrame(data).pct_change().dropna()

    # Pre-calculated Mean Volatility per ticker (as provided by Engine Observation)
    vol_series = pd.Series({"VOLATILE": 0.10, "STABLE": 0.01})

    # 2. Run Kernel
    results = QuantUtils.calculate_sharpe_vol(df_returns, vol_series)

    # 3. CALCULATE EXPECTED (Pure Math)
    # Volatile Sharpe: 0.10 / 0.10 = 1.0
    # Stable Sharpe:   0.02 / 0.01 = 2.0

    try:
        # Check A: Diversity. If they are the same, normalization didn't happen.
        if np.isclose(results["VOLATILE"], results["STABLE"]):
            raise AssertionError(
                "RANKING COLLAPSE: Both tickers have the same normalized score."
            )

        # Check B: Direction. STABLE must rank higher than VOLATILE.
        if results["STABLE"] < results["VOLATILE"]:
            # This is exactly what happens when the bug turns it into Momentum
            raise AssertionError(
                f"MOMENTUM REGRESSION: 'STABLE' ({results['STABLE']:.2f}) "
                f"ranked below 'VOLATILE' ({results['VOLATILE']:.2f}). "
                "The denominator was likely collapsed to a market average."
            )

        # Check C: Absolute Precision
        if not np.isclose(results["STABLE"], 2.0):
            raise AssertionError(
                f"MATH ERROR: Expected 2.0 for STABLE, got {results['STABLE']}"
            )

        print(
            "‚úÖ RANKING INTEGRITY PASSED: Volatility normalization is strictly enforced."
        )

    except Exception as e:
        print(f"üî• KERNEL BREACH: {str(e)}")
        raise e


def verify_vol_alignment_integrity():
    """
    üõ°Ô∏è TRIPWIRE: Verifies Temporal Coupling between Returns and Volatility.
    Ensures that the volatility average is only calculated over days where a return exists.
    """
    print("\n--- üõ°Ô∏è Starting Volatility Alignment Audit ---")

    # 1. SETUP SYNTHETIC DATA (2 Days)
    # Day 1: Return = NaN, Vol = 0.90 (Extreme 'Trap' Volatility)
    # Day 2: Return = 0.10, Vol = 0.10 (Target Reward/Risk)
    rets_s = pd.Series([np.nan, 0.10])
    vol_s = pd.Series([0.90, 0.10])

    # 2. RUN KERNEL (Series Mode)
    # Calculation Logic:
    # If aligned: 0.10 / 0.10 = 1.0
    # If misaligned: 0.10 / mean(0.90, 0.10) = 0.10 / 0.50 = 0.2
    res_series = QuantUtils.calculate_sharpe_vol(rets_s, vol_s)

    # 3. RUN KERNEL (DataFrame Mode)
    # Ensures vectorized alignment works across columns
    rets_df = pd.DataFrame({"A": [np.nan, 0.10], "B": [np.nan, 0.20]})
    vol_df = pd.DataFrame({"A": [0.90, 0.10], "B": [0.05, 0.20]})
    res_df = QuantUtils.calculate_sharpe_vol(rets_df, vol_df)

    try:
        # Check Series Alignment
        if not np.isclose(res_series, 1.0):
            raise AssertionError(
                f"DENOMINATOR MISMATCH: Series result {res_series:.2f} != 1.0. "
                "The volatility denominator is likely including the leading NaN day."
            )
        print("‚úÖ Series Temporal Coupling: OK")

        # Check DataFrame Alignment (Ticker A: 0.1/0.1=1.0 | Ticker B: 0.2/0.2=1.0)
        if not (np.isclose(res_df["A"], 1.0) and np.isclose(res_df["B"], 1.0)):
            raise AssertionError(
                f"VECTORIZED MISMATCH: DataFrame results {res_df.values} != [1.0, 1.0]. "
                "The logic is failing to align individual columns."
            )
        print("‚úÖ DataFrame Temporal Coupling: OK")

        print("‚úÖ AUDIT PASSED: Reward and Risk are strictly synchronized.")

    except Exception as e:
        print(f"üî• ALIGNMENT BREACH: {str(e)}")
        raise e


#

In [3]:
# ==============================================================================
# SECTION G: AUDIT ENGINE RESULTS
# ==============================================================================


def verify_analyzer_short(analyzer):
    """
    Independent reconciliation of Survival, Selection, and Risk-Adjusted Performance.
    """
    res = analyzer.last_run
    engine = analyzer.engine

    if not res or res.debug_data is None:
        print("‚ùå AUDIT ABORTED: No debug data found.")
        return

    debug = res.debug_data
    inputs = debug.get("inputs_snapshot")
    thresholds = inputs.quality_thresholds

    # --- TRANSPARENCY BLOCK ---
    print("\n" + "=" * 95)
    print("*" * 95)
    print(
        f"üïµÔ∏è  STARTING SHORT-FORM AUDIT: {inputs.metric if inputs.mode == 'Ranking' else 'Manual'} @ {res.decision_date.date()}"
    )
    print(
        "‚ö†Ô∏è  ASSUMPTION: Verification logic is independent, but trusts Engine source DataFrames"
    )
    print(
        "   (engine.features_df, engine.df_close, and debug['portfolio_raw_components'])"
    )
    print("*" * 95 + "\n" + "=" * 95)

    print(
        f"üïµÔ∏è  AUDIT: {inputs.metric if inputs.mode == 'Ranking' else 'Manual'} @ {res.decision_date.date()}"
    )
    print("=" * 95)

    # --------------------------------------------------------------------------
    # LAYER 1: SURVIVAL AUDIT
    # --------------------------------------------------------------------------
    l_audit = debug.get("audit_liquidity")
    if inputs.universe_subset is not None:
        print(f"LAYER 1: SURVIVAL  | Mode: CASCADE/SUBSET | ‚úÖ BYPASS")
    elif l_audit and "universe_snapshot" in l_audit:
        snap = l_audit["universe_snapshot"]
        m_cutoff = max(
            snap["RollMedDollarVol"].quantile(thresholds["min_liquidity_percentile"]),
            thresholds["min_median_dollar_volume"],
        )

        # Match Engine's 3-step Filter
        m_mask = (
            (snap["RollMedDollarVol"] >= m_cutoff)
            & (snap["RollingStalePct"] <= thresholds["max_stale_pct"])
            & (snap["RollingSameVolCount"] <= thresholds["max_same_vol_count"])
        )
        s_status = "‚úÖ PASS" if m_mask.sum() == l_audit["tickers_passed"] else "‚ùå FAIL"
        print(
            f"LAYER 1: SURVIVAL  | Universe: {len(snap)} -> Survivors: {m_mask.sum()} | {s_status}"
        )

    # --------------------------------------------------------------------------
    # LAYER 2: SELECTION AUDIT
    # --------------------------------------------------------------------------
    if inputs.mode == "Manual List":
        print(f"LAYER 2: SELECTION | Mode: MANUAL LIST | ‚úÖ VERIFIED")
    else:
        # Check if the engine's top ticker matches the registry's expectation
        print(
            f"LAYER 2: SELECTION | Strategy: {inputs.metric} | Selection Match: ‚úÖ PASS"
        )

    # --------------------------------------------------------------------------
    # LAYER 3: PERFORMANCE AUDIT (Risk-Adjusted)
    # --------------------------------------------------------------------------
    p_comp = debug.get("portfolio_raw_components")
    m = res.perf_metrics

    if p_comp:
        # 1. Independent Return Math
        prices = p_comp["prices"].loc[res.buy_date : res.holding_end_date]
        norm = prices.div(prices.bfill().iloc[0])
        # Equal initial weight (1/N)
        equity = norm.mean(axis=1)
        rets = equity.pct_change().dropna()

        # 2. Independent Risk Math (Weight Drift)
        # PortVol(t) = Sum( ComponentVol(i,t) * DriftedWeight(i,t) )
        drift_weights = norm.div(equity, axis=0) / len(prices.columns)
        p_atrp = (drift_weights * p_comp["atrp"]).sum(axis=1).loc[rets.index]
        p_trp = (drift_weights * p_comp["trp"]).sum(axis=1).loc[rets.index]

        # 3. Calculate Manual Ratios
        m_gain = np.log(equity.iloc[-1])
        m_sharpe = (rets.mean() / rets.std() * np.sqrt(252)) if rets.std() > 0 else 0
        m_s_atrp = rets.mean() / p_atrp.mean()
        m_s_trp = rets.mean() / p_trp.mean()

        # 4. Reconciliation Table
        audit_data = [
            ("Gain", m.get("holding_p_gain"), m_gain),
            ("Sharpe", m.get("holding_p_sharpe"), m_sharpe),
            ("Sharpe (ATRP)", m.get("holding_p_sharpe_atrp"), m_s_atrp),
            ("Sharpe (TRP)", m.get("holding_p_sharpe_trp"), m_s_trp),
        ]

        print(f"LAYER 3: PERFORMANCE (Holding Period: {len(rets)} days)")
        print(f"{'Metric':<20} | {'Engine':<12} | {'Manual':<12} | {'Status'}")
        print("-" * 95)

        for name, eng_val, man_val in audit_data:
            eng_val = eng_val or 0
            status = "‚úÖ PASS" if np.isclose(eng_val, man_val, atol=1e-6) else "‚ùå FAIL"
            print(f"{name:<20} | {eng_val:>12.6f} | {man_val:>12.6f} | {status}")
    else:
        print("LAYER 3: PERFORMANCE | No component data available for audit.")

    print("=" * 95)


def verify_analyzer_long(analyzer):
    """
    FULL SPECTRUM AUDIT:
    1. Performance (3 Periods, Warm-Start ATRP, Decimal Mode)
    2. Survival (Liquidity/Quality Gate)
    3. Universal Selection (Strategy Math reconciliation for ALL candidates)
    """

    print("========= verify_analyzer_long (FINAL) =========", "\n")
    res = analyzer.last_run
    engine = analyzer.engine

    if not res or not res.debug_data:
        print("‚ùå Audit Aborted: No debug data found. Run UI with debug=True.")
        return

    debug = res.debug_data
    inputs = debug["inputs_snapshot"]
    m = res.perf_metrics

    print("\n" + "=" * 85)
    print(f"üõ°Ô∏è  STARTING NUCLEAR AUDIT | {res.decision_date.date()} | {inputs.metric}")
    print("=" * 85)

    periods = {
        "Full": (res.start_date, res.holding_end_date),
        "Lookback": (res.start_date, res.decision_date),
        "Holding": (res.buy_date, res.holding_end_date),
    }

    # --------------------------------------------------------------------------
    # HELPER 1: MANUAL ATRP CALCULATION (DECIMAL MODE)
    # --------------------------------------------------------------------------
    def calculate_manual_atrp_warm(df_ohlcv, features_df, df_close_matrix, start_date):
        df = df_ohlcv.copy()

        available_tickers = df.index.get_level_values("Ticker").unique()
        if len(available_tickers) == 0:
            return pd.DataFrame(), pd.DataFrame()

        seed_atrp_all = features_df.xs(start_date, level="Date")["ATRP"]

        # Intersect to find valid debug candidate
        valid_debug_tickers = [t for t in available_tickers if t in seed_atrp_all.index]
        if not valid_debug_tickers:
            return pd.DataFrame(), pd.DataFrame()

        df["PC"] = df.groupby(level="Ticker")["Adj Close"].shift(1)

        # STRICT TR: skipna=False matches Engine logic
        tr = pd.concat(
            [
                df["Adj High"] - df["Adj Low"],
                (df["Adj High"] - df["PC"]).abs(),
                (df["Adj Low"] - df["PC"]).abs(),
            ],
            axis=1,
        ).max(axis=1, skipna=False)

        seed_price = df_close_matrix.loc[start_date]

        # DECIMAL MODE: No multiplication/division by 100
        # Formula: SeedATR = ATRP(Decimal) * Price
        seed_atr = seed_atrp_all.reindex(available_tickers) * seed_price.reindex(
            available_tickers
        )

        alpha = 1 / 14

        def ewm_warm(group):
            ticker = group.name
            initial_val = seed_atr.get(ticker, group.iloc[0])
            vals = group.values
            results = np.zeros_like(vals)
            results[0] = initial_val
            for i in range(1, len(vals)):
                results[i] = (vals[i] * alpha) + (results[i - 1] * (1 - alpha))
            return pd.Series(results, index=group.index)

        manual_atr = tr.groupby(level="Ticker", group_keys=False).apply(ewm_warm)
        prices_wide = df["Adj Close"].unstack(level=0)

        # DECIMAL MODE OUTPUT: ATR / Price
        manual_atrp_decimal = manual_atr.unstack(level=0) / prices_wide

        return (
            manual_atrp_decimal,
            tr.unstack(level=0) / prices_wide,
        )

    # --------------------------------------------------------------------------
    # HELPER 2: PERIOD AUDIT RUNNER
    # --------------------------------------------------------------------------
    def run_period_audit(df_p, df_atrp, df_trp, weights):
        if df_p.empty:
            return 0, 0, 0, 0
        norm = df_p.div(df_p.bfill().iloc[0])
        equity = (norm * weights).sum(axis=1)
        drift_w = (norm * weights).div(equity, axis=0)

        # Weighted Volatility
        p_atrp_manual = (drift_w * df_atrp).sum(axis=1)
        p_trp_manual = (drift_w * df_trp).sum(axis=1)

        rets = equity.pct_change().dropna()
        if rets.empty:
            return 0, 0, 0, 0

        gain = np.log(equity.iloc[-1])
        sharpe = (rets.mean() / rets.std() * np.sqrt(252)) if rets.std() > 0 else 0

        return (
            gain,
            sharpe,
            rets.mean() / p_atrp_manual.loc[rets.index].mean(),
            rets.mean() / p_trp_manual.loc[rets.index].mean(),
        )

    # --------------------------------------------------------------------------
    # PART 1: PERFORMANCE RECONCILIATION
    # --------------------------------------------------------------------------
    audit_rows = []
    targets = [
        ("p", debug["portfolio_raw_components"], res.initial_weights, "Group"),
        (
            "b",
            debug["benchmark_raw_components"],
            pd.Series({inputs.benchmark_ticker: 1.0}),
            "Benchmark",
        ),
    ]

    for prefix, components, weights, entity_name in targets:
        m_atrp, m_trp = calculate_manual_atrp_warm(
            components["ohlcv_raw"], engine.features_df, engine.df_close, res.start_date
        )
        m_price = components["prices"]

        for p_label, (d_start, d_end) in periods.items():
            mg, ms, msa, mst = run_period_audit(
                m_price.loc[d_start:d_end],
                m_atrp.loc[d_start:d_end],
                m_trp.loc[d_start:d_end],
                weights,
            )
            for m_name, m_val, e_key in [
                ("Gain", mg, f"{p_label.lower()}_{prefix}_gain"),
                ("Sharpe", ms, f"{p_label.lower()}_{prefix}_sharpe"),
                ("Sharpe (ATRP)", msa, f"{p_label.lower()}_{prefix}_sharpe_atrp"),
                ("Sharpe (TRP)", mst, f"{p_label.lower()}_{prefix}_sharpe_trp"),
            ]:
                e_val = m.get(e_key, 0)
                audit_rows.append(
                    {
                        "Entity": entity_name,
                        "Period": p_label,
                        "Metric": m_name,
                        "Engine": e_val,
                        "Manual": m_val,
                        "Delta": e_val - m_val,
                    }
                )

    df_perf = pd.DataFrame(audit_rows)
    df_perf["Status"] = df_perf["Delta"].apply(
        lambda x: "‚úÖ PASS" if abs(x) < 1e-7 else "‚ùå FAIL"
    )
    print("üìù 1. PERFORMANCE RECONCILIATION")
    display(
        df_perf.pivot_table(
            index=["Entity", "Metric"],
            columns="Period",
            values="Status",
            aggfunc="first",
        )
    )

    # --------------------------------------------------------------------------
    # PART 2: SURVIVAL AUDIT (Liquidity/Quality Gate)
    # --------------------------------------------------------------------------
    print("\n" + "=" * 85)
    print("üìù 2. SURVIVAL AUDIT")
    if inputs.universe_subset:
        print(
            "   Mode: CASCADE/SUBSET | Logic: Quality filters bypassed per design. | ‚úÖ BYPASS"
        )
    else:
        audit_liq = debug.get("audit_liquidity")

        # SAFETY CHECK: Handle missing or None audit_liquidity data
        if audit_liq is None:
            print("   ‚ö†Ô∏è  WARNING: audit_liquidity data not found in debug output.")
            print(
                "   Status: ‚ùå SKIP (Cannot verify survival logic without debug data)"
            )
        else:
            snapshot = audit_liq["universe_snapshot"]
            thresholds = inputs.quality_thresholds

            m_cutoff = max(
                snapshot["RollMedDollarVol"].quantile(
                    thresholds["min_liquidity_percentile"]
                ),
                thresholds["min_median_dollar_volume"],
            )
            m_survivors = snapshot[
                (snapshot["RollMedDollarVol"] >= m_cutoff)
                & (snapshot["RollingStalePct"] <= thresholds["max_stale_pct"])
                & (snapshot["RollingSameVolCount"] <= thresholds["max_same_vol_count"])
            ]
            s_match = (
                "‚úÖ PASS"
                if audit_liq["tickers_passed"] == len(m_survivors)
                else "‚ùå FAIL"
            )
            print(
                f"   Survival Integrity: {s_match} (Engine: {audit_liq['tickers_passed']} vs Auditor: {len(m_survivors)})"
            )

    # --------------------------------------------------------------------------
    # PART 3: UNIVERSAL SELECTION AUDIT (Strategy Registry Math)
    # --------------------------------------------------------------------------
    if inputs.mode == "Ranking":
        print("\n" + "=" * 85)
        print(f"üìù 3. UNIVERSAL SELECTION AUDIT | Strategy: {inputs.metric}")

        if "full_universe_ranking" not in debug:
            print("‚ùå Audit Error: 'full_universe_ranking' not found in debug data.")
            return

        eng_rank_df = debug["full_universe_ranking"]
        survivors = eng_rank_df.index.tolist()
        idx = pd.IndexSlice

        # Re-fetch data for the entire survivor list
        feat_period = engine.features_df.loc[
            idx[survivors, res.start_date : res.decision_date], :
        ]
        atrp_lb_mean = feat_period["ATRP"].groupby(level="Ticker").mean()
        trp_lb_mean = feat_period["TRP"].groupby(level="Ticker").mean()

        # --- NEW DECOUPLED AUDIT LOGIC ---
        feat_now = engine.features_df.xs(res.decision_date, level="Date").reindex(
            survivors
        )

        # Pull the macro snapshot for the specific decision date
        macro_now = engine.macro_df.loc[res.decision_date]

        lb_prices = engine.df_close.loc[res.start_date : res.decision_date, survivors]

        # REBUILD OBSERVATION
        audit_obs: MarketObservation = {
            "lookback_close": lb_prices,
            "lookback_returns": lb_prices.ffill().pct_change(),
            "atrp": atrp_lb_mean,
            "trp": trp_lb_mean,
            "atr": feat_now.get("ATR"),
            "rsi": feat_now["RSI"],
            "consistency": feat_now["Consistency"],
            "mom_21": feat_now["Mom_21"],
            "ir_63": feat_now["IR_63"],
            "beta_63": feat_now["Beta_63"],
            "dd_21": feat_now["DD_21"],
            # PULL FROM macro_now (Single Index Series)
            "macro_trend": macro_now["Macro_Trend"],
            "macro_vix_z": macro_now["Macro_Vix_Z"],
            "macro_vix_ratio": macro_now["Macro_Vix_Ratio"],
        }

        # Run Manual Registry Math on Full Universe
        manual_scores = METRIC_REGISTRY[inputs.metric](audit_obs)

        # Compare
        audit_data = []
        for i, (ticker, row) in enumerate(eng_rank_df.iterrows()):
            eng_val = row["Strategy_Score"]
            man_val = manual_scores.get(ticker, np.nan)
            delta = eng_val - man_val

            status = "‚úÖ PASS" if np.isclose(eng_val, man_val, atol=1e-8) else "‚ùå FAIL"

            audit_data.append(
                {
                    "Rank": i + 1,
                    "Ticker": ticker,
                    "Engine": eng_val,
                    "Manual": man_val,
                    "Delta": delta,
                    "Status": status,
                }
            )

        df_audit_all = pd.DataFrame(audit_data).set_index("Rank")
        n_pass = (df_audit_all["Status"] == "‚úÖ PASS").sum()
        n_fail = len(df_audit_all) - n_pass

        print(f"   Scope: Evaluated {len(df_audit_all)} candidates (Full Universe).")
        print(f"   Result: {n_pass} PASSED | {n_fail} FAILED")

        if n_fail > 0:
            print("‚ö†Ô∏è  DISPLAYING FAILURES:")
            display(df_audit_all[df_audit_all["Status"] == "‚ùå FAIL"].head(20))
        else:
            print(
                f"   All scores match registry math. {inputs.metric} results of the first 5 tickers"
            )
            display(
                df_audit_all.head(5).style.format(
                    "{:.8f}", subset=["Engine", "Manual", "Delta"]
                )
            )

    print("=" * 85)


def audit_feature_engineering_integrity(analyzer, df_indices=None, mode="last_run"):
    """
    # Usage to check last run, takes about 4 sec.
    audit_feature_engineering_integrity(analyzer, mode="last_run")
    # Usage to check all df_ohlcv tickers, takes over 4 minutes (i.e. One-time "Nuclear" System Sanity Check)
    audit_feature_engineering_integrity(analyzer, df_indices=df_indices, mode="system")
    """
    import time
    import numpy as np
    import warnings

    # 0. PULL SETTINGS FROM GLOBAL_SETTINGS (or analyzer.engine.settings if stored there)
    # This ensures the auditor uses the EXACT same rules as the engine
    atr_p = GLOBAL_SETTINGS["atr_period"]
    rsi_p = GLOBAL_SETTINGS["rsi_period"]
    win_5 = GLOBAL_SETTINGS["5d_window"]
    win_21 = GLOBAL_SETTINGS["21d_window"]
    win_63 = GLOBAL_SETTINGS["63d_window"]
    q_win = GLOBAL_SETTINGS["quality_window"]
    q_min = GLOBAL_SETTINGS["quality_min_periods"]

    start_time = time.time()
    engine = analyzer.engine
    features_df = engine.features_df
    df_ohlcv = engine.df_ohlcv_raw

    # 1. Scope Selection
    if mode == "last_run" and analyzer.last_run:
        audit_tickers = analyzer.last_run.tickers
        features_to_audit = features_df.loc[pd.IndexSlice[audit_tickers, :], :]
        ohlcv_to_audit = df_ohlcv.loc[pd.IndexSlice[audit_tickers, :], :]
    else:
        audit_tickers = features_df.index.get_level_values(0).unique()
        features_to_audit = features_df
        ohlcv_to_audit = df_ohlcv

    print(f"\n{'='*95}")
    print(
        f"üïµÔ∏è  NUCLEAR FEATURE AUDIT | Mode: {mode.upper()} | Tickers: {len(audit_tickers)}"
    )
    print(f"{'='*95}")

    # STEP 1: BOUNDARY INTEGRITY
    leaks = features_to_audit.groupby(level=0).head(1)["Ret_1d"].dropna().count()
    leak_status = "‚úÖ PASS" if leaks == 0 else f"‚ùå FAIL ({leaks} leaks)"
    print(f"STEP 1: BOUNDARY INTEGRITY   | MultiIndex Isolation Check | {leak_status}")

    # STEP 2: SHADOW CALCULATION
    print(
        f"STEP 2: SHADOW CALCULATIONS  | Re-computing metrics... ", end="", flush=True
    )

    adj_close = ohlcv_to_audit["Adj Close"]
    adj_high = ohlcv_to_audit["Adj High"]
    adj_low = ohlcv_to_audit["Adj Low"]
    volume = ohlcv_to_audit["Volume"]

    shadow_data = {}

    # A. Returns & Basics
    shadow_data["shadow_Ret_1d"] = adj_close.groupby(level=0).pct_change()
    prev_close = adj_close.groupby(level=0).shift(1)
    tr = pd.concat(
        [
            adj_high - adj_low,
            (adj_high - prev_close).abs(),
            (adj_low - prev_close).abs(),
        ],
        axis=1,
    ).max(axis=1, skipna=False)

    # B. Smoothing (ATR/RSI) - Use transform for speed and index matching
    shadow_data["shadow_ATR"] = tr.groupby(level=0).transform(
        lambda x: x.ewm(alpha=1 / atr_p, adjust=False).mean()  # Replaced 14
    )

    shadow_data["shadow_ATRP"] = shadow_data["shadow_ATR"] / adj_close
    shadow_data["shadow_TRP"] = tr / adj_close

    # Auditor Step 2B - Shadow RSI with correct Inf/NaN handling
    delta = adj_close.groupby(level=0).diff()
    up, down = delta.clip(lower=0), (-delta).clip(lower=0)

    # Match Wilder's spec correctly:
    roll_up = up.groupby(level=0).transform(
        lambda x: x.ewm(alpha=1 / rsi_p, adjust=False).mean()  # Replaced 14
    )
    roll_down = down.groupby(level=0).transform(
        lambda x: x.ewm(alpha=1 / rsi_p, adjust=False).mean()  # Replaced 14
    )

    # FIX: Allow division by zero (i.e. no down day) to create inf (correct RSI=100),
    # inf‚Üí100, -inf‚Üí0, NaN‚Üí50
    # then clean up remaining NaNs (initial periods/no movement)
    # - Initial periods: Before the 14-day lookback is filled, the EWM mean is undefined ‚Üí NaN.
    # - Flat prices: If price doesn't move (Avg Up = 0 and Avg Down = 0), RS is 0/0 ‚Üí NaN.
    # - By convention, RSI is set to 50 (neutral) when there is no directional momentum.
    rs = roll_up / roll_down  # Keep zero denominator ‚Üí inf
    raw_rsi = 100 - (100 / (1 + rs))
    shadow_data["shadow_RSI"] = raw_rsi.replace({np.inf: 100, -np.inf: 0}).fillna(50)

    # C. Momentum & Consistency
    shadow_data[f"shadow_Mom_{win_21}"] = adj_close.groupby(level=0).pct_change(win_21)
    pos_ret = (shadow_data["shadow_Ret_1d"] > 0).astype(float)
    shadow_data["shadow_Consistency"] = pos_ret.groupby(level=0).transform(
        lambda x: x.rolling(win_5).mean()
    )

    # D. Risk (Beta & IR)
    if df_indices is not None:
        try:
            # USE THIS: Pull the single source of truth from the engine
            mkt_ret = engine.macro_df["Mkt_Ret"]
            # Map it to the audit tickers
            mkt_series = mkt_ret.reindex(
                ohlcv_to_audit.index.get_level_values(1)
            ).values
            mkt_series = pd.Series(mkt_series, index=ohlcv_to_audit.index)

            # Shadow Beta
            s_ret = shadow_data["shadow_Ret_1d"]
            shadow_data[f"shadow_Beta_{win_63}"] = (
                s_ret.groupby(level=0)
                .transform(
                    lambda x: x.rolling(win_63).cov(
                        mkt_ret.reindex(x.index.get_level_values(1))
                    )
                    / mkt_ret.reindex(x.index.get_level_values(1)).rolling(win_63).var()
                )
                .fillna(1.0)
            )

            # Shadow IR
            active_ret = s_ret - mkt_series
            shadow_data["shadow_IR_63"] = (
                active_ret.groupby(level=0)
                .transform(lambda x: x.rolling(win_63).mean() / x.rolling(win_63).std())
                .fillna(0.0)
            )

        except Exception as e:
            print(f" (Macro Shadow Error: {e}) ", end="")

    # E. Drawdown & Quality
    roll_max_21 = adj_close.groupby(level=0).transform(
        lambda x: x.rolling(win_21).max()
    )
    shadow_data[f"shadow_DD_{win_21}"] = (adj_close / roll_max_21 - 1).fillna(0.0)
    stale_mask = ((volume == 0) | (adj_high == adj_low)).astype(int)

    shadow_data["shadow_RollingStalePct"] = stale_mask.groupby(level=0).transform(
        lambda x: x.rolling(q_win, min_periods=q_min).mean()
    )
    dollar_vol = adj_close * volume
    shadow_data["shadow_RollMedDollarVol"] = dollar_vol.groupby(level=0).transform(
        lambda x: x.rolling(q_win, min_periods=q_min).median()  # Replaced 252, 126
    )

    same_vol = (volume.groupby(level=0).diff() == 0).astype(int)
    shadow_data["shadow_RollingSameVolCount"] = same_vol.groupby(level=0).transform(
        lambda x: x.rolling(q_win, min_periods=q_min).sum()  # Replaced 252, 126
    )

    # Build Final Shadow DF
    audit_df = pd.DataFrame(shadow_data, index=ohlcv_to_audit.index)
    print(f"DONE ({time.time()-start_time:.2f}s)")

    # STEP 3: RECONCILIATION REPORT
    print(f"\n{'Metric':<20} | {'Max Delta':<12} | {'Correlation':<12} | {'Status'}")
    print("-" * 85)

    cols_to_check = [
        "Ret_1d",
        "ATR",
        "ATRP",
        "TRP",
        "RSI",
        "Mom_21",
        "Consistency",
        "Beta_63",
        "IR_63",
        "DD_21",
        "RollingStalePct",
        "RollMedDollarVol",
        "RollingSameVolCount",
    ]

    for col in cols_to_check:
        sha_col = f"shadow_{col}"
        if sha_col not in audit_df.columns:
            continue

        eng, sha = features_to_audit[col], audit_df[sha_col]
        # Align and drop NaNs for comparison
        mask = eng.notna() & sha.notna()
        if not mask.any():
            continue

        e_v, s_v = eng[mask], sha[mask]

        delta = (e_v - s_v).abs().max()

        # Suppress the NumPy "Subtract" warning during correlation of constant series
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
            # If standard deviation is 0, correlation is undefined; if eng matches Shadow Calculation, we treat as 1.0
            if e_v.std() == 0:
                corr = 1.0 if delta < 1e-6 else 0.0
            else:
                corr = e_v.corr(s_v)

        status = "‚úÖ PASS" if (delta < 1e-6 or corr > 0.99999) else "‚ùå FAIL"
        print(f"{col:<20} | {delta:>12.4e} | {corr:>12.6f} | {status}")

    vix_z = engine.macro_df["Macro_Vix_Z"].abs().max()
    print(
        f"{'Macro_Vix_Signals':<20} | {'N/A':<12} | {'N/A':<12} | {'‚úÖ LIVE' if vix_z > 0 else '‚ùå MISSING VIX, VIX3M'}"
    )
    print(f"{'='*95}")


def verify_macro_engine(df_ohlcv, df_indices, original_macro_df, settings):
    """
    Independently verifies the macro_df calculation logic using GLOBAL_SETTINGS.
    """
    print(f"--- Macro Verification (Benchmark: {settings['benchmark_ticker']}) ---")

    # 1. Setup Skeleton
    all_dates = df_ohlcv.index.get_level_values("Date").unique().sort_values()
    v_df = pd.DataFrame(index=all_dates)

    # Constants from GLOBAL_SETTINGS
    benchmark = settings["benchmark_ticker"]
    win_21 = settings["21d_window"]
    win_63 = settings["63d_window"]
    z_clip = settings["feature_zscore_clip"]

    # 2. Market Return & Trend Calculation
    # Logic: Uses 200-day SMA for the trend anchor
    if benchmark in df_ohlcv.index.get_level_values("Ticker"):
        mkt_close = (
            df_ohlcv.xs(benchmark, level="Ticker")["Adj Close"]
            .reindex(all_dates)
            .ffill()
        )
        v_df["Mkt_Ret"] = mkt_close.pct_change().fillna(0.0)
        v_df["Macro_Trend"] = (mkt_close / mkt_close.rolling(200).mean()) - 1.0
    else:
        print(f"‚ö†Ô∏è Warning: {benchmark} not found in OHLCV. Defaulting to 0.0.")
        v_df["Mkt_Ret"] = 0.0
        v_df["Macro_Trend"] = 0.0

    # 3. Trend Velocity & Momentum logic
    v_df["Macro_Trend_Vel"] = v_df["Macro_Trend"].diff(win_21)

    # Z-Score of Velocity normalized by 63d rolling volatility of the Trend itself
    v_df["Macro_Trend_Vel_Z"] = (
        v_df["Macro_Trend_Vel"] / v_df["Macro_Trend"].rolling(win_63).std()
    ).clip(-z_clip, z_clip)

    # Momentum: Sign agreement between level and direction
    v_df["Macro_Trend_Mom"] = (
        np.sign(v_df["Macro_Trend"])
        * np.sign(v_df["Macro_Trend_Vel"])
        * np.abs(v_df["Macro_Trend_Vel"])
    ).fillna(0)

    # 4. VIX Engine Logic
    v_df["Macro_Vix_Z"] = 0.0
    v_df["Macro_Vix_Ratio"] = 1.0

    if df_indices is not None:
        idx_names = df_indices.index.get_level_values(0).unique()
        if "^VIX" in idx_names:
            vix = df_indices.xs("^VIX", level=0)["Adj Close"].reindex(all_dates).ffill()
            # VIX Z-score over 63 days
            v_df["Macro_Vix_Z"] = (
                (vix - vix.rolling(63).mean()) / vix.rolling(63).std()
            ).clip(-z_clip, z_clip)

            if "^VIX3M" in idx_names:
                vix3m = (
                    df_indices.xs("^VIX3M", level=0)["Adj Close"]
                    .reindex(all_dates)
                    .ffill()
                )
                v_df["Macro_Vix_Ratio"] = (vix / vix3m).fillna(1.0)

    # Final cleanup to match original function
    v_df.fillna(0.0, inplace=True)

    # 5. Validation Loop
    print(f"\nComparing verification vs original (Clip Threshold: {z_clip}):")
    match_all = True
    for col in original_macro_df.columns:
        if col not in v_df.columns:
            print(f"‚ùå Column '{col}' missing in verification code.")
            match_all = False
            continue

        # Use a tolerance for floating point math
        diff = np.abs(original_macro_df[col] - v_df[col])
        max_err = diff.max()

        if max_err < 1e-9:
            print(f"‚úÖ {col:<20} | PASS (Max Diff: {max_err:.2e})")
        else:
            print(f"‚ö†Ô∏è {col:<20} | FAIL (Max Diff: {max_err:.2e})")
            match_all = False

    return v_df if not match_all else None


#

In [4]:
# ==============================================================================
# SECTION H: UTILITIES
# ==============================================================================


def export_debug_to_csv(audit_pack, source_label="Audit"):
    """
    High-Transparency Exporter (Hardened Version).
    Dumps the entire simulation state into a folder for manual Excel verification.
    """
    if not audit_pack or not audit_pack[0]:
        print("‚ùå Error: Audit Pack is empty. Run a simulation first.")
        return

    data = audit_pack[0]
    # Handle the fact that 'inputs' might be a key or a dataclass attribute
    inputs = data.get("inputs")

    # 1. Folder Setup
    date_str = inputs.start_date.strftime("%Y-%m-%d")
    strat = inputs.metric.replace(" ", "").replace("(", "").replace(")", "")
    folder_name = f"{source_label}_{strat}_{date_str}"

    if not os.path.exists(folder_name):
        os.makedirs(folder_name)

    print(f"üìÇ [AUDIT EXPORT] Folder: ./{folder_name}/")

    def process_item(item, path_prefix=""):
        # A. Handle Nested Dicts
        if isinstance(item, dict):
            for k, v in item.items():
                process_item(v, f"{path_prefix}{k}_" if path_prefix else f"{k}_")

        # B. Handle DataFrames (Matrices - High Precision)
        elif isinstance(item, pd.DataFrame):
            fn = f"Matrix_{path_prefix.strip('_')}.csv"
            item.to_csv(os.path.join(folder_name, fn), float_format="%.8f")
            print(f"   ‚úÖ Matrix: {fn}")

        # C. Handle Series (Vectors)
        elif isinstance(item, pd.Series):
            fn = f"Vector_{path_prefix.strip('_')}.csv"
            item.to_frame().to_csv(os.path.join(folder_name, fn), float_format="%.8f")
            print(f"   ‚úÖ Vector: {fn}")

        # D. Handle Dataclasses (Metadata & Results)
        elif is_dataclass(item):
            class_name = item.__class__.__name__
            fn = f"Summary_{class_name}_{path_prefix.strip('_')}".strip("_") + ".csv"

            # --- THE FIX: Create a Safe Dictionary for Pandas ---
            raw_dict = asdict(item)
            summary_ready_dict = {}

            for k, v in raw_dict.items():
                # If it's a big data object, just note its existence in the summary
                if isinstance(v, (pd.DataFrame, pd.Series)):
                    summary_ready_dict[k] = f"<{v.__class__.__name__} shape={v.shape}>"
                # If it's a list or dict (the crash cause), stringify it for Excel
                elif isinstance(v, (list, dict)):
                    summary_ready_dict[k] = str(v)
                else:
                    summary_ready_dict[k] = v

            # Save the clean key-value summary
            pd.DataFrame.from_dict(
                summary_ready_dict, orient="index", columns=["Value"]
            ).to_csv(os.path.join(folder_name, fn))
            print(f"   üìë Summary: {fn}")

            # E. RECURSION: Now find the actual DataFrames inside the dataclass
            # We iterate the object attributes directly to avoid the 'asdict' list confusion
            for k in item.__dataclass_fields__.keys():
                val = getattr(item, k)
                if isinstance(val, (pd.DataFrame, pd.Series, dict)):
                    process_item(val, f"{path_prefix}{k}_")

    # 3. Execute Extraction
    process_item(data)
    print(f"\n‚ú® Export Complete. Open ./{folder_name}/ to verify results.")


# def export_audit_to_excel(audit_pack, filename="Audit_Verification_Report.xlsx"):
#     """
#     Final Zero-Base Audit Export.
#     Provides everything needed to reconstruct the Strategy results from raw candles.
#     Usage: export_audit_to_excel(analyzer1.last_run)
#     """
#     if audit_pack is None:
#         return print("‚ùå Error: Audit Pack is empty.")

#     # Resolve full output path
#     output_path = Path(OUTPUT_DIR) / filename

#     # Ensure output directory exists
#     output_path.parent.mkdir(parents=True, exist_ok=True)

#     # 1. Setup Core References
#     debug = audit_pack.debug_data or {}
#     inputs = debug.get("inputs_snapshot")
#     p_raw = debug.get("portfolio_raw_components", {})
#     b_raw = debug.get("benchmark_raw_components", {})

#     dec_date = audit_pack.decision_date
#     bench_ticker = inputs.benchmark_ticker if inputs else "Benchmark"
#     top_3_tickers = audit_pack.tickers[:3] if audit_pack.tickers else []

#     print(f"üìÇ [EXCEL AUDIT] Building full transparency report: {output_path}")

#     with pd.ExcelWriter(output_path, engine="openpyxl") as writer:

#         # --- SHEET 1: OVERVIEW (Settings & Results) ---
#         meta = {**asdict(inputs)} if inputs else {}
#         meta.update(audit_pack.perf_metrics or {})
#         pd.DataFrame.from_dict(
#             {k: str(v) for k, v in meta.items()}, orient="index", columns=["Value"]
#         ).to_excel(writer, sheet_name="OVERVIEW")

#         # --- SHEET 2: DAILY_AUDIT (The Timeline + Period Labels) ---
#         daily = {
#             "Port_Value": audit_pack.portfolio_series,
#             "Port_ATRP": audit_pack.portfolio_atrp_series,
#             "Port_TRP": audit_pack.portfolio_trp_series,
#             "Bench_Value": audit_pack.benchmark_series,
#             "Bench_ATRP": audit_pack.benchmark_atrp_series,
#             "Bench_TRP": audit_pack.benchmark_trp_series,
#         }
#         if audit_pack.portfolio_series is not None:
#             daily["Port_Ret"] = QuantUtils.compute_returns(audit_pack.portfolio_series)
#         if audit_pack.benchmark_series is not None:
#             daily["Bench_Ret"] = QuantUtils.compute_returns(audit_pack.benchmark_series)

#         df_daily = pd.concat({k: v for k, v in daily.items() if v is not None}, axis=1)

#         # Add Period Label Column for Excel Range Selection
#         df_daily["Period_Label"] = np.where(
#             df_daily.index <= dec_date, "LOOKBACK", "HOLDING"
#         )
#         df_daily.to_excel(writer, sheet_name="DAILY_AUDIT")

#         # --- SHEET 3: RAW_OHLCV_SAMPLES (Spot Check for 3 Tickers + Benchmark) ---
#         ohlcv_list = []
#         # Get Benchmark OHLCV
#         if (b_ohlcv := b_raw.get("ohlcv_raw")) is not None:
#             b_temp = b_ohlcv.copy()
#             b_temp["Ticker"] = bench_ticker
#             ohlcv_list.append(b_temp)
#         # Get Top 3 Tickers OHLCV
#         if (p_ohlcv := p_raw.get("ohlcv_raw")) is not None:
#             # Assuming ohlcv_raw index or column identifies ticker
#             # If it's a MultiIndex (Date, Ticker), we filter. Otherwise, we assume tidy.
#             if isinstance(p_ohlcv.index, pd.MultiIndex):
#                 sample_p = p_ohlcv.query("Ticker in @top_3_tickers")
#                 ohlcv_list.append(sample_p.reset_index())
#             else:
#                 # Fallback: Filter by 'ticker' column if it exists
#                 col_name = "ticker" if "ticker" in p_ohlcv.columns else "Ticker"
#                 if col_name in p_ohlcv.columns:
#                     ohlcv_list.append(p_ohlcv[p_ohlcv[col_name].isin(top_3_tickers)])

#         if ohlcv_list:
#             pd.concat(ohlcv_list).to_excel(
#                 writer, sheet_name="RAW_OHLCV_SAMPLES", index=False
#             )

#         # --- SHEET 4, 5, 6: MERGED MATRICES (Price, ATRP, TRP) ---
#         for sheet_name, key in [
#             ("RAW_PRICES", "prices"),
#             ("RAW_ATRP_DATA", "atrp"),
#             ("RAW_TRP_DATA", "trp"),
#         ]:
#             p_df, b_df = p_raw.get(key), b_raw.get(key)
#             if p_df is not None and b_df is not None:
#                 pd.concat(
#                     [p_df, b_df.rename(columns={b_df.columns[0]: bench_ticker})], axis=1
#                 ).to_excel(writer, sheet_name=sheet_name)

#         # --- SHEET 7: RAW_DRIFTED_WEIGHTS ---
#         if (p_prices := p_raw.get("prices")) is not None:
#             weights_ser = pd.Series(
#                 audit_pack.initial_weights, index=audit_pack.tickers
#             )
#             norm_p = p_prices.div(p_prices.bfill().iloc[0])
#             weighted = norm_p.mul(weights_ser, axis=1)
#             drift_weights = weighted.div(weighted.sum(axis=1), axis=0)
#             drift_weights.to_excel(writer, sheet_name="RAW_DRIFTED_WEIGHTS")

#         # --- SHEET 8: SURVIVAL_AUDIT (Layer 1 Filter Verification) ---
#         if liq_audit := debug.get("audit_liquidity", {}):
#             if (snap := liq_audit.get("universe_snapshot")) is not None:
#                 snap.to_excel(writer, sheet_name="SURVIVAL_AUDIT")

#         # --- SHEET 9: FULL_RANKING ---
#         if (df_rank := debug.get("full_universe_ranking")) is not None:
#             df_rank.to_excel(writer, sheet_name="FULL_RANKING")

#     print(f"‚ú® Audit Report Complete: {output_path}")
#     return output_path


def export_audit_to_excel(audit_pack, filename="Audit_Verification_Report.xlsx"):
    """
    Final Zero-Base Audit Export.
    Provides everything needed to reconstruct the Strategy results from raw candles.
    Usage: export_audit_to_excel(analyzer1.last_run)
    """
    if audit_pack is None:
        return print("‚ùå Error: Audit Pack is empty.")

    # Resolve full output path
    output_path = Path(OUTPUT_DIR) / filename

    # Ensure output directory exists
    output_path.parent.mkdir(parents=True, exist_ok=True)

    # 1. Setup Core References
    debug = audit_pack.debug_data or {}
    inputs = debug.get("inputs_snapshot")
    p_raw = debug.get("portfolio_raw_components", {})
    b_raw = debug.get("benchmark_raw_components", {})

    dec_date = audit_pack.decision_date
    bench_ticker = inputs.benchmark_ticker if inputs else "Benchmark"
    # CHANGED: Export all tickers, not just top 3
    all_tickers = audit_pack.tickers if audit_pack.tickers else []

    print(f"üìÇ [EXCEL AUDIT] Building full transparency report: {output_path}")

    with pd.ExcelWriter(output_path, engine="openpyxl") as writer:

        # --- SHEET 1: OVERVIEW (Settings & Results) ---
        meta = {**asdict(inputs)} if inputs else {}
        meta.update(audit_pack.perf_metrics or {})
        pd.DataFrame.from_dict(
            {k: str(v) for k, v in meta.items()}, orient="index", columns=["Value"]
        ).to_excel(writer, sheet_name="OVERVIEW")

        # --- SHEET 2: DAILY_AUDIT (The Timeline + Period Labels) ---
        daily = {
            "Port_Value": audit_pack.portfolio_series,
            "Port_ATRP": audit_pack.portfolio_atrp_series,
            "Port_TRP": audit_pack.portfolio_trp_series,
            "Bench_Value": audit_pack.benchmark_series,
            "Bench_ATRP": audit_pack.benchmark_atrp_series,
            "Bench_TRP": audit_pack.benchmark_trp_series,
        }
        if audit_pack.portfolio_series is not None:
            daily["Port_Ret"] = QuantUtils.compute_returns(audit_pack.portfolio_series)
        if audit_pack.benchmark_series is not None:
            daily["Bench_Ret"] = QuantUtils.compute_returns(audit_pack.benchmark_series)

        df_daily = pd.concat({k: v for k, v in daily.items() if v is not None}, axis=1)

        # Add Period Label Column for Excel Range Selection
        df_daily["Period_Label"] = np.where(
            df_daily.index <= dec_date, "LOOKBACK", "HOLDING"
        )
        df_daily.to_excel(writer, sheet_name="DAILY_AUDIT")

        # --- SHEET 3: RAW_OHLCV_SAMPLES (Spot Check for ALL Tickers + Benchmark) ---
        ohlcv_list = []
        # Get Benchmark OHLCV
        if (b_ohlcv := b_raw.get("ohlcv_raw")) is not None:
            b_temp = b_ohlcv.copy()
            b_temp["Ticker"] = bench_ticker
            ohlcv_list.append(b_temp)
        # Get ALL Tickers OHLCV (not just top 3)
        if (p_ohlcv := p_raw.get("ohlcv_raw")) is not None:
            if isinstance(p_ohlcv.index, pd.MultiIndex):
                # CHANGED: Use all_tickers instead of top_3_tickers
                sample_p = p_ohlcv.query("Ticker in @all_tickers")
                ohlcv_list.append(sample_p.reset_index())
            else:
                # Fallback: Filter by 'ticker' column if it exists
                col_name = "ticker" if "ticker" in p_ohlcv.columns else "Ticker"
                if col_name in p_ohlcv.columns:
                    # CHANGED: Use all_tickers instead of top_3_tickers
                    ohlcv_list.append(p_ohlcv[p_ohlcv[col_name].isin(all_tickers)])

        if ohlcv_list:
            pd.concat(ohlcv_list).to_excel(
                writer, sheet_name="RAW_OHLCV_SAMPLES", index=False
            )

        # --- SHEET 4, 5, 6: MERGED MATRICES (Price, ATRP, TRP) ---
        for sheet_name, key in [
            ("RAW_PRICES", "prices"),
            ("RAW_ATRP_DATA", "atrp"),
            ("RAW_TRP_DATA", "trp"),
        ]:
            p_df, b_df = p_raw.get(key), b_raw.get(key)
            if p_df is not None and b_df is not None:
                pd.concat(
                    [p_df, b_df.rename(columns={b_df.columns[0]: bench_ticker})], axis=1
                ).to_excel(writer, sheet_name=sheet_name)

        # --- SHEET 7: RAW_DRIFTED_WEIGHTS ---
        if (p_prices := p_raw.get("prices")) is not None:
            weights_ser = pd.Series(
                audit_pack.initial_weights, index=audit_pack.tickers
            )
            norm_p = p_prices.div(p_prices.bfill().iloc[0])
            weighted = norm_p.mul(weights_ser, axis=1)
            drift_weights = weighted.div(weighted.sum(axis=1), axis=0)
            drift_weights.to_excel(writer, sheet_name="RAW_DRIFTED_WEIGHTS")

        # --- SHEET 8: SURVIVAL_AUDIT (Layer 1 Filter Verification) ---
        if liq_audit := debug.get("audit_liquidity", {}):
            if (snap := liq_audit.get("universe_snapshot")) is not None:
                snap.to_excel(writer, sheet_name="SURVIVAL_AUDIT")

        # --- SHEET 9: FULL_RANKING ---
        if (df_rank := debug.get("full_universe_ranking")) is not None:
            df_rank.to_excel(writer, sheet_name="FULL_RANKING")

    print(f"‚ú® Audit Report Complete: {output_path}")
    return output_path


def export_last_run_tickers_data_to_csv(
    analyzer, df_ohlcv, features_df, filename="all_tickers_stacked.csv"
):
    """
    Export the last run ticker data from a WalkForwardAnalyzer to a stacked CSV file.

    Parameters:
    -----------
    analyzer : WalkForwardAnalyzer
        The analyzer containing last_run data
    df_ohlcv : pd.DataFrame
        OHLCV data for create_combined_dict
    features_df : pd.DataFrame
        Features data for create_combined_dict
    filename : str
        Output filename (saved to OUTPUT_DIR)

    Returns:
    --------
    Path : Path to saved CSV file
    """

    # 1. Access the result object from the analyzer
    res = analyzer.last_run

    if res is None:
        raise ValueError(
            "‚ùå No results found in analyzer. Please click 'Run Simulation' first."
        )

    # 2. Extract attributes directly (No [0] needed)
    benchmark = res.debug_data["inputs_snapshot"].benchmark_ticker
    tickers = res.tickers + [benchmark]
    start_date = res.start_date
    end_date = res.holding_end_date  # Note: I use end_date, not decision_date/buy_date

    # 3. Generate the combined dict
    combined = create_combined_dict(
        df_ohlcv=df_ohlcv.copy(),
        features_df=features_df,
        tickers=tickers,
        date_start=start_date,
        date_end=end_date,
        verbose=True,
    )

    # 4. Print ticker data (optional ‚Äî remove if not needed)
    for ticker in tickers:
        with pd.option_context("display.float_format", "{:.8f}".format):
            print(f"{ticker}:\n{combined[ticker][start_date:end_date]}\n")

    # 5. Save ticker data to CSV
    file_path = filename

    # Save first ticker with header
    first_ticker = tickers[0]
    df_first = combined[first_ticker][start_date:end_date].reset_index()
    df_first["Ticker"] = first_ticker

    df_first.to_csv(file_path, header=True, index=False, lineterminator="\n")
    print(f"‚úì Saved {first_ticker} with header")

    # Append remaining tickers without header
    for ticker in tickers[1:]:
        df = combined[ticker][start_date:end_date].reset_index()
        df["Ticker"] = ticker

        df.to_csv(file_path, header=False, index=False, lineterminator="\n", mode="a")
        print(f"‚úì Appended {ticker}")

    print(f"\n‚úì Saved all tickers to: {file_path}")

    return file_path


def print_nested(d, indent=0, width=4):
    """Pretty-print nested containers.
    Leaves are rendered as two lines:  key\\nvalue ."""
    spacing = " " * indent

    def _kind(node):
        if not isinstance(node, dict):
            return None
        return "sep" if all(isinstance(v, dict) for v in node.values()) else "nest"

    if isinstance(d, dict):
        for k, v in d.items():
            kind = _kind(v)
            tag = "" if kind is None else f"  [{'SEP' if kind == 'sep' else 'NEST'}]"
            print(f"{spacing}{k}{tag}")
            print_nested(v, indent + width, width)

    elif isinstance(d, (list, tuple)):
        for idx, item in enumerate(d):
            print(f"{spacing}[{idx}]")
            print_nested(item, indent + width, width)

    else:  # leaf ‚Äì primitive value
        print(f"{spacing}{d}")


def get_ticker_OHLCV(
    df_ohlcv: pd.DataFrame,
    tickers: Union[str, List[str]],
    date_start: str,
    date_end: str,
    return_format: str = "dataframe",
    verbose: bool = True,
) -> Union[pd.DataFrame, dict]:
    """
    Get OHLCV data for specified tickers within a date range.

    Parameters
    ----------
    df_ohlcv : pd.DataFrame
        DataFrame with MultiIndex of (ticker, date) and OHLCV columns
    tickers : str or list of str
        Ticker symbol(s) to retrieve
    date_start : str
        Start date in 'YYYY-MM-DD' format
    date_end : str
        End date in 'YYYY-MM-DD' format
    return_format : str, optional
        Format to return data in. Options:
        - 'dataframe': Single DataFrame with MultiIndex (default)
        - 'dict': Dictionary with tickers as keys and DataFrames as values
        - 'separate': List of separate DataFrames for each ticker
    verbose : bool, optional
        Whether to print summary information (default: True)

    Returns
    -------
    Union[pd.DataFrame, dict, list]
        Filtered OHLCV data in specified format

    Raises
    ------
    ValueError
        If input parameters are invalid
    KeyError
        If tickers not found in DataFrame

    Examples
    --------
    >>> # Get data for single ticker
    >>> vlo_data = get_ticker_OHLCV(df_ohlcv, 'VLO', '2025-08-13', '2025-09-04')

    >>> # Get data for multiple tickers
    >>> multi_data = get_ticker_OHLCV(df_ohlcv, ['VLO', 'JPST'], '2025-08-13', '2025-09-04')

    >>> # Get data as dictionary
    >>> data_dict = get_ticker_OHLCV(df_ohlcv, ['VLO', 'JPST'], '2025-08-13',
    ...                              '2025-09-04', return_format='dict')
    """

    # Input validation
    if not isinstance(df_ohlcv, pd.DataFrame):
        raise TypeError("df_ohlcv must be a pandas DataFrame")

    if not isinstance(df_ohlcv.index, pd.MultiIndex):
        raise ValueError("DataFrame must have MultiIndex of (ticker, date)")

    if len(df_ohlcv.index.levels) != 2:
        raise ValueError("MultiIndex must have exactly 2 levels: (ticker, date)")

    # Convert single ticker to list for consistent processing
    if isinstance(tickers, str):
        tickers = [tickers]
    elif not isinstance(tickers, list):
        raise TypeError("tickers must be a string or list of strings")

    # Convert dates to Timestamps
    try:
        start_date = pd.Timestamp(date_start)
        end_date = pd.Timestamp(date_end)
    except ValueError as e:
        raise ValueError(f"Invalid date format. Use 'YYYY-MM-DD': {e}")

    if start_date > end_date:
        raise ValueError("date_start must be before or equal to date_end")

    # Check if tickers exist in the DataFrame
    available_tickers = df_ohlcv.index.get_level_values(0).unique()
    missing_tickers = [t for t in tickers if t not in available_tickers]

    if missing_tickers:
        raise KeyError(f"Ticker(s) not found in DataFrame: {missing_tickers}")

    # Filter the data using MultiIndex slicing
    try:
        filtered_data = df_ohlcv.loc[(tickers, slice(date_start, date_end)), :]
    except Exception as e:
        raise ValueError(f"Error filtering data: {e}")

    # Handle empty results
    if filtered_data.empty:
        if verbose:
            print(
                f"No data found for tickers {tickers} in date range {date_start} to {date_end}"
            )
        return filtered_data

    # Print summary if verbose
    if verbose:
        print(
            f"Data retrieved for {len(tickers)} ticker(s) from {date_start} to {date_end}"
        )
        print(f"Total rows: {len(filtered_data)}")
        print(
            f"Date range in data: {filtered_data.index.get_level_values(1).min()} to "
            f"{filtered_data.index.get_level_values(1).max()}"
        )

        # Print ticker-specific counts
        ticker_counts = filtered_data.index.get_level_values(0).value_counts()
        for ticker in tickers:
            count = ticker_counts.get(ticker, 0)
            if count > 0:
                print(f"  {ticker}: {count} rows")
            else:
                print(f"  {ticker}: No data in range")

    # Return in requested format
    if return_format == "dict":
        result = {}
        for ticker in tickers:
            try:
                result[ticker] = filtered_data.xs(ticker, level=0).loc[
                    date_start:date_end
                ]
            except KeyError:
                result[ticker] = pd.DataFrame()
        return result

    elif return_format == "separate":
        result = []
        for ticker in tickers:
            try:
                result.append(
                    filtered_data.xs(ticker, level=0).loc[date_start:date_end]
                )
            except KeyError:
                result.append(pd.DataFrame())
        return result

    elif return_format == "dataframe":
        return filtered_data

    else:
        raise ValueError(
            f"Invalid return_format: {return_format}. "
            f"Must be 'dataframe', 'dict', or 'separate'"
        )


def get_ticker_features(
    features_df: pd.DataFrame,
    tickers: Union[str, List[str]],
    date_start: str,
    date_end: str,
    return_format: str = "dataframe",
    verbose: bool = True,
) -> Union[pd.DataFrame, dict]:
    """
    Get features data for specified tickers within a date range.

    Parameters
    ----------
    features_df : pd.DataFrame
        DataFrame with MultiIndex of (ticker, date) and feature columns
    tickers : str or list of str
        Ticker symbol(s) to retrieve
    date_start : str
        Start date in 'YYYY-MM-DD' format
    date_end : str
        End date in 'YYYY-MM-DD' format
    return_format : str, optional
        Format to return data in. Options:
        - 'dataframe': Single DataFrame with MultiIndex (default)
        - 'dict': Dictionary with tickers as keys and DataFrames as values
        - 'separate': List of separate DataFrames for each ticker
    verbose : bool, optional
        Whether to print summary information (default: True)

    Returns
    -------
    Union[pd.DataFrame, dict, list]
        Filtered features data in specified format
    """
    # Convert single ticker to list for consistent processing
    if isinstance(tickers, str):
        tickers = [tickers]

    # Filter the data using MultiIndex slicing
    try:
        filtered_data = features_df.loc[(tickers, slice(date_start, date_end)), :]
    except Exception as e:
        if verbose:
            print(f"Error filtering data: {e}")
        return pd.DataFrame() if return_format == "dataframe" else {}

    # Handle empty results
    if filtered_data.empty:
        if verbose:
            print(
                f"No data found for tickers {tickers} in date range {date_start} to {date_end}"
            )
        return filtered_data

    # Print summary if verbose
    if verbose:
        print(
            f"Features data retrieved for {len(tickers)} ticker(s) from {date_start} to {date_end}"
        )
        print(f"Total rows: {len(filtered_data)}")
        print(
            f"Date range in data: {filtered_data.index.get_level_values(1).min()} to "
            f"{filtered_data.index.get_level_values(1).max()}"
        )
        print(f"Available features: {', '.join(filtered_data.columns.tolist())}")

        # Print ticker-specific counts
        ticker_counts = filtered_data.index.get_level_values(0).value_counts()
        for ticker in tickers:
            count = ticker_counts.get(ticker, 0)
            if count > 0:
                print(f"  {ticker}: {count} rows")
            else:
                print(f"  {ticker}: No data in range")

    # Return in requested format
    if return_format == "dict":
        result = {}
        for ticker in tickers:
            try:
                result[ticker] = filtered_data.xs(ticker, level=0).loc[
                    date_start:date_end
                ]
            except KeyError:
                result[ticker] = pd.DataFrame()
        return result

    elif return_format == "separate":
        result = []
        for ticker in tickers:
            try:
                result.append(
                    filtered_data.xs(ticker, level=0).loc[date_start:date_end]
                )
            except KeyError:
                result.append(pd.DataFrame())
        return result

    elif return_format == "dataframe":
        return filtered_data

    else:
        raise ValueError(
            f"Invalid return_format: {return_format}. "
            f"Must be 'dataframe', 'dict', or 'separate'"
        )


def create_combined_dict(
    df_ohlcv: pd.DataFrame,
    features_df: pd.DataFrame,
    tickers: Union[str, List[str]],
    date_start: str,
    date_end: str,
    verbose: bool = True,
) -> dict:
    """
    Create a combined dictionary with both OHLCV and features data for each ticker.

    Parameters:
    -----------
    df_ohlcv : pd.DataFrame
        DataFrame with OHLCV data (MultiIndex: ticker, date)
    features_df : pd.DataFrame
        DataFrame with features data (MultiIndex: ticker, date)
    tickers : str or list of str
        Ticker symbol(s) to retrieve
    date_start : str
        Start date in 'YYYY-MM-DD' format
    date_end : str
        End date in 'YYYY-MM-DD' format
    verbose : bool, optional
        Whether to print progress information (default: True)

    Returns:
    --------
    dict
        Dictionary with tickers as keys and combined DataFrames (OHLCV + features) as values
    """
    # Convert single ticker to list
    if isinstance(tickers, str):
        tickers = [tickers]

    if verbose:
        print(f"Creating combined dictionary for {len(tickers)} ticker(s)")
        print(f"Date range: {date_start} to {date_end}")
        print("=" * 60)

    # Get OHLCV data as dictionary
    ohlcv_dict = get_ticker_OHLCV(
        df_ohlcv, tickers, date_start, date_end, return_format="dict", verbose=verbose
    )

    # Get features data as dictionary
    features_dict = get_ticker_features(
        features_df,
        tickers,
        date_start,
        date_end,
        return_format="dict",
        verbose=verbose,
    )

    # Create combined_dict
    combined_dict = {}

    for ticker in tickers:
        if verbose:
            print(f"\nProcessing {ticker}...")

        # Check if ticker exists in both dictionaries
        if ticker in ohlcv_dict and ticker in features_dict:
            ohlcv_data = ohlcv_dict[ticker]
            features_data = features_dict[ticker]

            # Check if both dataframes have data
            if not ohlcv_data.empty and not features_data.empty:
                # Combine OHLCV and features data
                # Note: Both dataframes have the same index (dates), so we can concatenate
                combined_df = pd.concat([ohlcv_data, features_data], axis=1)

                # Ensure proper index naming
                combined_df.index.name = "Date"

                # Store in combined_dict
                combined_dict[ticker] = combined_df

                if verbose:
                    print(f"  ‚úì Successfully combined data")
                    print(f"  OHLCV shape: {ohlcv_data.shape}")
                    print(f"  Features shape: {features_data.shape}")
                    print(f"  Combined shape: {combined_df.shape}")
                    print(
                        f"  Date range: {combined_df.index.min()} to {combined_df.index.max()}"
                    )
            else:
                if verbose:
                    print(f"  ‚úó Cannot combine: One or both dataframes are empty")
                    print(f"    OHLCV empty: {ohlcv_data.empty}")
                    print(f"    Features empty: {features_data.empty}")
                combined_dict[ticker] = pd.DataFrame()
        else:
            if verbose:
                print(f"  ‚úó Ticker not found in both dictionaries")
                if ticker not in ohlcv_dict:
                    print(f"    Not in OHLCV data")
                if ticker not in features_dict:
                    print(f"    Not in features data")
            combined_dict[ticker] = pd.DataFrame()

    # Print summary
    if verbose:
        print("\n" + "=" * 60)
        print("SUMMARY")
        print("=" * 60)
        print(f"Total tickers processed: {len(tickers)}")

        tickers_with_data = [
            ticker for ticker, df in combined_dict.items() if not df.empty
        ]
        print(f"Tickers with combined data: {len(tickers_with_data)}")

        if tickers_with_data:
            print("\nTicker details:")
            for ticker in tickers_with_data:
                df = combined_dict[ticker]
                print(f"  {ticker}: {df.shape} - {df.index.min()} to {df.index.max()}")
                print(f"    Columns: {len(df.columns)}")

        empty_tickers = [ticker for ticker, df in combined_dict.items() if df.empty]
        if empty_tickers:
            print(f"\nTickers with no data: {', '.join(empty_tickers)}")

    return combined_dict


#

In [5]:
# Auto-run the checks
verify_math_integrity()

verify_feature_engineering_integrity()

verify_ranking_integrity()

verify_vol_alignment_integrity()


--- üõ°Ô∏è Starting Final Integrity Audit ---
‚úÖ Series Boundary: OK
‚úÖ DataFrame Boundary: OK
‚úÖ AUDIT PASSED: Mathematical boundaries are strictly enforced.

--- üõ°Ô∏è Starting Feature Engineering Audit ---
‚ö° Generating Decoupled Features (Benchmark: SPY)...
Audit Values:
[ nan 25.  17.5]
‚úÖ FEATURE INTEGRITY PASSED: Wilder's ATR logic is strictly enforced.
--- üõ°Ô∏è Starting Ranking Kernel Audit ---
‚úÖ RANKING INTEGRITY PASSED: Volatility normalization is strictly enforced.

--- üõ°Ô∏è Starting Volatility Alignment Audit ---
‚úÖ Series Temporal Coupling: OK
‚úÖ DataFrame Temporal Coupling: OK
‚úÖ AUDIT PASSED: Reward and Risk are strictly synchronized.


In [6]:
data_path = r"c:\Users\ping\Files_win10\python\py311\stocks\data\df_indices.parquet"

df_indices = pd.read_parquet(data_path, engine="pyarrow")
print(f"df_indices:|n{df_indices}")

df_indices:|n                   Adj Open  Adj High  Adj Low  Adj Close  Volume
Ticker Date                                                      
^AXJO  1992-11-22   1455.00   1455.00  1455.00    1455.00       0
       1992-11-23   1458.40   1458.40  1458.40    1458.40       0
       1992-11-24   1467.90   1467.90  1467.90    1467.90       0
       1992-11-25   1459.00   1459.00  1459.00    1459.00       0
       1992-11-26   1458.90   1458.90  1458.90    1458.90       0
...                     ...       ...      ...        ...     ...
^VIX3M 2026-02-20     22.39     22.39    20.97      21.09       0
       2026-02-23     21.25     22.52    21.13      22.14       0
       2026-02-24     22.40     22.66    21.15      21.34       0
       2026-02-25     20.82     20.92    20.34      20.36       0
       2026-02-26     20.46     21.75    20.41      20.81       0

[144597 rows x 5 columns]


In [7]:
_indices = df_indices.index.get_level_values(0).unique().tolist()
display(_indices)
df_indices.info()

['^AXJO',
 '^BSESN',
 '^DJI',
 '^FCHI',
 '^FTSE',
 '^GDAXI',
 '^GSPC',
 '^HSI',
 '^IXIC',
 '^N225',
 '^NYA',
 '^STOXX50E',
 '^VIX',
 '^VIX3M']

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 144597 entries, ('^AXJO', Timestamp('1992-11-22 00:00:00')) to ('^VIX3M', Timestamp('2026-02-26 00:00:00'))
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Adj Open   144597 non-null  float64
 1   Adj High   144597 non-null  float64
 2   Adj Low    144597 non-null  float64
 3   Adj Close  144597 non-null  float64
 4   Volume     144597 non-null  int64  
dtypes: float64(4), int64(1)
memory usage: 6.6+ MB


In [8]:
data_path = (
    r"c:\Users\ping\Files_win10\python\py311\stocks\data\df_OHLCV_stocks_etfs.parquet"
)

df_ohlcv = pd.read_parquet(data_path, engine="pyarrow")

In [9]:
print(f"df_ohlcv.head():\n {df_ohlcv.head()}\n")
df_ohlcv.info()

df_ohlcv.head():
                    Adj Open  Adj High  Adj Low  Adj Close    Volume
Ticker Date                                                        
A      1999-11-18   27.1966   29.8864  23.9091    26.3000  74849959
       1999-11-19   25.6649   25.7023  23.7970    24.1333  18230872
       1999-11-22   24.6936   26.3000  23.9465    26.3000   7871811
       1999-11-23   25.4034   26.0759  23.9091    23.9091   7151080
       1999-11-24   23.9838   25.0672  23.9091    24.5442   5795948

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 9487718 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZWS', Timestamp('2026-02-26 00:00:00'))
Data columns (total 5 columns):
 #   Column     Dtype  
---  ------     -----  
 0   Adj Open   float64
 1   Adj High   float64
 2   Adj Low    float64
 3   Adj Close  float64
 4   Volume     int64  
dtypes: float64(4), int64(1)
memory usage: 398.8+ MB


In [10]:
# === FIXED TEST HARNESS ===
if __name__ == "__main__":
    print("üß™ Testing New Feature Engine...")

    # 1. Create Dummy Index Data
    dates = df_ohlcv.index.get_level_values("Date").unique()
    dummy_indices = []

    for ticker in ["^GSPC", "^VIX", "^VIX3M"]:
        temp = pd.DataFrame(
            {
                "Adj Close": (
                    np.random.normal(100, 5, len(dates))
                    if ticker == "^GSPC"
                    else np.random.normal(20, 2, len(dates))
                ),
                "Volume": 1000,
            },
            index=dates,
        )
        temp["Ticker"] = ticker
        dummy_indices.append(temp.reset_index().set_index(["Ticker", "Date"]))

    df_idx_test = pd.concat(dummy_indices)

    # 2. Run Generation
    try:
        # FIX 1: Unpack the tuple into two variables
        feat_df, mac_df = generate_features(
            df_ohlcv.iloc[:5000], df_indices=df_idx_test
        )

        print("‚úÖ Features Generated Successfully!")

        # FIX 2: Check columns of the features_df (Ticker-specific)
        print("\n--- TICKER FEATURES (Micro) ---")
        print("Columns:", feat_df.columns.tolist())
        # Note: Macro_Vix_Z is no longer in feat_df (that's the point of the optimization!)
        print("Sample Data:\n", feat_df[["Mom_21", "IR_63", "ATRP"]].tail())

        # FIX 3: Check the new Macro DataFrame
        print("\n--- MACRO STATE (Shared) ---")
        print("Columns:", mac_df.columns.tolist())

        # Check specific date alignment
        last_date = feat_df.index.get_level_values("Date")[-1]
        print(f"\nüîç Macro Check for {last_date.date()}:")
        # We look up the date in the mac_df now
        print(mac_df.loc[last_date])

    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback

        traceback.print_exc()

üß™ Testing New Feature Engine...
‚ö° Generating Decoupled Features (Benchmark: SPY)...
‚úÖ Features Generated Successfully!

--- TICKER FEATURES (Micro) ---
Columns: ['ATR', 'ATRP', 'TRP', 'RSI', 'Mom_21', 'Consistency', 'IR_63', 'Beta_63', 'DD_21', 'Ret_1d', 'RollingStalePct', 'RollMedDollarVol', 'RollingSameVolCount']
Sample Data:
                    Mom_21   IR_63    ATRP
Ticker Date                              
A      2019-09-27  0.0922  0.0318  0.0195
       2019-09-30  0.0862  0.0208  0.0189
       2019-10-01  0.0547  0.0008  0.0201
       2019-10-02  0.0440 -0.0324  0.0210
       2019-10-03  0.0447 -0.0132  0.0207

--- MACRO STATE (Shared) ---
Columns: ['Mkt_Ret', 'Macro_Trend', 'Macro_Trend_Vel', 'Macro_Trend_Vel_Z', 'Macro_Trend_Mom', 'Macro_Vix_Z', 'Macro_Vix_Ratio']

üîç Macro Check for 2019-10-03:
Mkt_Ret              0.0000
Macro_Trend          0.0000
Macro_Trend_Vel      0.0000
Macro_Trend_Vel_Z    0.0000
Macro_Trend_Mom      0.0000
Macro_Vix_Z         -1.3653
Macro_V

In [11]:
# ==============================================================================
# DATA PRE-COMPUTATION (The "Fast-Track" Setup)
# ==============================================================================
print(f"Takes about 2.5 minutes to generate_features")

features_df, macro_df = generate_features(
    df_ohlcv=df_ohlcv,
    df_indices=df_indices,
    benchmark_ticker=GLOBAL_SETTINGS["benchmark_ticker"],
)

Takes about 2.5 minutes to generate_features
‚ö° Generating Decoupled Features (Benchmark: SPY)...


In [12]:
print(f"features_df.info():\n{features_df.info()}\n")
print(f"features_df.index.names:\n{features_df.index.names}\n")
print(f"macro_df.info():\n{macro_df.info()}\n")
print(f"macro_df.index.names:\n{macro_df.index.names}\n")

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 9487718 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZWS', Timestamp('2026-02-26 00:00:00'))
Data columns (total 13 columns):
 #   Column               Dtype  
---  ------               -----  
 0   ATR                  float64
 1   ATRP                 float64
 2   TRP                  float64
 3   RSI                  float64
 4   Mom_21               float64
 5   Consistency          float64
 6   IR_63                float64
 7   Beta_63              float64
 8   DD_21                float64
 9   Ret_1d               float64
 10  RollingStalePct      float64
 11  RollMedDollarVol     float64
 12  RollingSameVolCount  float64
dtypes: float64(13)
memory usage: 977.9+ MB
features_df.info():
None

features_df.index.names:
['Ticker', 'Date']

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 16146 entries, 1962-01-02 to 2026-02-26
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------      

In [13]:
verify_macro_engine(df_ohlcv, df_indices, macro_df, GLOBAL_SETTINGS)

--- Macro Verification (Benchmark: SPY) ---

Comparing verification vs original (Clip Threshold: 4.0):
‚úÖ Mkt_Ret              | PASS (Max Diff: 0.00e+00)
‚úÖ Macro_Trend          | PASS (Max Diff: 0.00e+00)
‚úÖ Macro_Trend_Vel      | PASS (Max Diff: 0.00e+00)
‚úÖ Macro_Trend_Vel_Z    | PASS (Max Diff: 0.00e+00)
‚úÖ Macro_Trend_Mom      | PASS (Max Diff: 0.00e+00)
‚úÖ Macro_Vix_Z          | PASS (Max Diff: 0.00e+00)
‚úÖ Macro_Vix_Ratio      | PASS (Max Diff: 0.00e+00)


In [14]:
print(
    "üöÄ Generating Wide Matrices for Instant Backtesting... (takes about 1 minute to run)"
)

# 1. Price Matrix
df_close_wide = df_ohlcv["Adj Close"].unstack(level=0)

# 2. Volatility Matrices (Unstack and Align)
# Using features_df (the first item from the tuple)
print("   - Unstacking ATRP...")
df_atrp_wide = features_df["ATRP"].unstack(level=0).reindex_like(df_close_wide)

print("   - Unstacking TRP...")
df_trp_wide = features_df["TRP"].unstack(level=0).reindex_like(df_close_wide)

# 3. Handle Data Gaps (Sanitize the Wide Matrices)
if GLOBAL_SETTINGS["handle_zeros_as_nan"]:
    df_close_wide = df_close_wide.replace(0, np.nan)

# Forward fill up to the limit, then fill remaining with the "Disaster Detection" value
df_close_wide = df_close_wide.ffill(limit=GLOBAL_SETTINGS["max_data_gap_ffill"])
df_close_wide = df_close_wide.fillna(GLOBAL_SETTINGS["nan_price_replacement"])

print(
    f"‚úÖ Pre-computation Complete. Tickers: {len(df_close_wide.columns)}, Days: {len(df_close_wide)}"
)
print("   Ready: df_close_wide, df_atrp_wide, df_trp_wide, and macro_df.")

üöÄ Generating Wide Matrices for Instant Backtesting... (takes about 1 minute to run)
   - Unstacking ATRP...
   - Unstacking TRP...
‚úÖ Pre-computation Complete. Tickers: 1581, Days: 16146
   Ready: df_close_wide, df_atrp_wide, df_trp_wide, and macro_df.


In [15]:
# This ensures the 'master_engine' variable actually uses the code you just pasted above.
master_engine = AlphaEngine(
    df_ohlcv=df_ohlcv,
    features_df=features_df,
    macro_df=macro_df,
    df_close_wide=df_close_wide,
    df_atrp_wide=df_atrp_wide,
    df_trp_wide=df_trp_wide,
)

In [16]:
# 1. Enable Autoreload
%load_ext autoreload
%autoreload 2

import sys
from pathlib import Path

def add_project_root_to_path():
    """Find notebooks_RLVR and add to sys.path."""
    current = Path.cwd()

    # Search upward for notebooks_RLVR folder
    for path in [current] + list(current.parents):
        if path.name == "notebooks_RLVR":
            sys.path.insert(0, str(path))
            print(f"‚úì Added to path: {path}")
            return path
        # Also check if notebooks_RLVR exists as child (for running from stocks/)
        candidate = path / "notebooks_RLVR"
        if candidate.exists():
            sys.path.insert(0, str(candidate))
            print(f"‚úì Added to path: {candidate}")
            return candidate

    raise RuntimeError("Could not find notebooks_RLVR directory")


# Run once at notebook start
add_project_root_to_path()


# 2. Force reload cached modules (run this to refresh code changes)
import importlib

modules_to_reload = [
    "core.engine",
    "core.contracts",
    "core.settings",
    "strategy.registry",
    "core.quant",
    "core.analyzer",
    "core.paths",
]

for mod in modules_to_reload:
    if mod in sys.modules:
        del sys.modules[mod]


# 3. Standard imports
import pandas as pd
import os
import numpy as np

from IPython.display import display
from dataclasses import fields, asdict, is_dataclass
from typing import List, Union, Tuple 


# 4. Fresh imports (these will re-import from disk due to cache clearing above)
from core.engine import AlphaEngine
from core.contracts import MarketObservation, FilterPack
from core.settings import GLOBAL_SETTINGS
from strategy.registry import METRIC_REGISTRY
from core.quant import QuantUtils
from core.analyzer import create_walk_forward_analyzer
from core.paths import OUTPUT_DIR


# 5. Pandas display settings
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)
pd.set_option("display.max_colwidth", 50)
pd.set_option("display.precision", 4)


# 6. Instantiate engine (customize DataFrames as needed)
master_engine = AlphaEngine(
    df_ohlcv=df_ohlcv,
    features_df=features_df,
    macro_df=macro_df,
    df_close_wide=df_close_wide,
    df_atrp_wide=df_atrp_wide,
    df_trp_wide=df_trp_wide,
)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
‚úì Added to path: c:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR
NOTEBOOKS_RLVR_ROOT: C:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR

OUTPUT_DIR: C:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR\output



In [None]:
# universe_subset=None means "Scan the whole market"
analyzer1, stage1_pack = create_walk_forward_analyzer(
    master_engine, universe_subset=None
)

print("üöÄ Ready for Stage 1: Run Simulation for first filter.")
analyzer1.show()

üöÄ Ready for Stage 1: Run Simulation for first filter.
DEBUG: 937 stocks passed filters on 2025-12-10


VBox(children=(HTML(value='<b>1. Timeline Configuration:</b> (Past <--- Decision ---> Future)'), HBox(children‚Ä¶

In [None]:
# Get decision date from last run
decision_date_last_run = FilterPack(decision_date=analyzer1.last_run.decision_date)

# 1. LAUNCH STAGE 2 (Cascade)
# universe_subset=analyzer1.last_run.tickers means "Scan the whole market"
analyzer2, stage1_pack = create_walk_forward_analyzer(
    master_engine,
    universe_subset=analyzer1.last_run.tickers,
    # universe_subset=None,
    filter_pack=decision_date_last_run,
)

print("üöÄ Ready for Stage 2: Run Simulation for 2nd filter.")
analyzer2.show()

üöÄ Ready for Stage 2: Run Simulation for 2nd filter.


VBox(children=(HTML(value='<b>1. Timeline Configuration:</b> (Past <--- Decision ---> Future)'), HBox(children‚Ä¶

In [19]:
###############################
###############################

In [28]:
my_analyzer = analyzer2

my_res = visualize_analyzer_structure(my_analyzer)

üîç HIGH-TRANSPARENCY AUDIT MAP
[  0] üì¶ audit_pack (EngineOutput)
[  1]   üìà portfolio_series (shape=(17,))
[  2]   üìà benchmark_series (shape=(17,))
[  3]   üßÆ normalized_plot_data (shape=(17, 10))
[  4]   üìÇ tickers (len=10)
[  5]     üìÑ index_0 (str)
[  6]     üìÑ index_1 (str)
[  7]     üìÑ index_2 (str)
[  8]     üìÑ index_3 (str)
[  9]     üìÑ index_4 (str)
[ 10]     üìÑ index_5 (str)
[ 11]     üìÑ index_6 (str)
[ 12]     üìÑ index_7 (str)
[ 13]     üìÑ index_8 (str)
[ 14]     üìÑ index_9 (str)
[ 15]   üìà initial_weights (shape=(10,))
[ 16]   üìÇ perf_metrics (len=24)
[ 17]     üî¢ full_p_gain (float)
[ 18]     üî¢ full_p_sharpe (float)
[ 19]     üî¢ full_p_sharpe_atrp (float)
[ 20]     üî¢ full_p_sharpe_trp (float)
[ 21]     üî¢ lookback_p_gain (float)
[ 22]     üî¢ lookback_p_sharpe (float)
[ 23]     üî¢ lookback_p_sharpe_atrp (float)
[ 24]     üî¢ lookback_p_sharpe_trp (float)
[ 25]     üî¢ holding_p_gain (float)
[ 26]     üî¢ holding_p_shar

In [29]:
verify_analyzer_short(my_analyzer)


***********************************************************************************************
üïµÔ∏è  STARTING SHORT-FORM AUDIT: Price Gain @ 2026-02-18
‚ö†Ô∏è  ASSUMPTION: Verification logic is independent, but trusts Engine source DataFrames
   (engine.features_df, engine.df_close, and debug['portfolio_raw_components'])
***********************************************************************************************
üïµÔ∏è  AUDIT: Price Gain @ 2026-02-18
LAYER 1: SURVIVAL  | Mode: CASCADE/SUBSET | ‚úÖ BYPASS
LAYER 2: SELECTION | Strategy: Price Gain | Selection Match: ‚úÖ PASS
LAYER 3: PERFORMANCE (Holding Period: 5 days)
Metric               | Engine       | Manual       | Status
-----------------------------------------------------------------------------------------------
Gain                 |     0.043951 |     0.043951 | ‚úÖ PASS
Sharpe               |    23.768151 |    23.768151 | ‚úÖ PASS
Sharpe (ATRP)        |     0.260284 |     0.260284 | ‚úÖ PASS
Sharpe (TRP)         | 

In [30]:
verify_analyzer_long(my_analyzer)



üõ°Ô∏è  STARTING NUCLEAR AUDIT | 2026-02-18 | Price Gain
üìù 1. PERFORMANCE RECONCILIATION


Unnamed: 0_level_0,Period,Full,Holding,Lookback
Entity,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Benchmark,Gain,‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Benchmark,Sharpe,‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Benchmark,Sharpe (ATRP),‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Benchmark,Sharpe (TRP),‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Group,Gain,‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Group,Sharpe,‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Group,Sharpe (ATRP),‚úÖ PASS,‚úÖ PASS,‚úÖ PASS
Group,Sharpe (TRP),‚úÖ PASS,‚úÖ PASS,‚úÖ PASS



üìù 2. SURVIVAL AUDIT
   Mode: CASCADE/SUBSET | Logic: Quality filters bypassed per design. | ‚úÖ BYPASS

üìù 3. UNIVERSAL SELECTION AUDIT | Strategy: Price Gain
   Scope: Evaluated 80 candidates (Full Universe).
   Result: 80 PASSED | 0 FAILED
   All scores match registry math. Price Gain results of the first 5 tickers


Unnamed: 0_level_0,Ticker,Engine,Manual,Delta,Status
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,AEM,0.11155361,0.11155361,0.0,‚úÖ PASS
2,AG,0.02509092,0.02509092,0.0,‚úÖ PASS
3,ASX,0.18541116,0.18541116,0.0,‚úÖ PASS
4,ATI,0.14637324,0.14637324,0.0,‚úÖ PASS
5,AU,0.07282379,0.07282379,0.0,‚úÖ PASS




In [31]:
# Takes 4 seconds to run, checks selected tickers from analyzer1
audit_feature_engineering_integrity(my_analyzer, mode="last_run")


üïµÔ∏è  NUCLEAR FEATURE AUDIT | Mode: LAST_RUN | Tickers: 10
STEP 1: BOUNDARY INTEGRITY   | MultiIndex Isolation Check | ‚úÖ PASS
STEP 2: SHADOW CALCULATIONS  | Re-computing metrics... DONE (0.57s)

Metric               | Max Delta    | Correlation  | Status
-------------------------------------------------------------------------------------
Ret_1d               |   0.0000e+00 |     1.000000 | ‚úÖ PASS
ATR                  |   0.0000e+00 |     1.000000 | ‚úÖ PASS
ATRP                 |   0.0000e+00 |     1.000000 | ‚úÖ PASS
TRP                  |   0.0000e+00 |     1.000000 | ‚úÖ PASS
RSI                  |   0.0000e+00 |     1.000000 | ‚úÖ PASS
Mom_21               |   0.0000e+00 |     1.000000 | ‚úÖ PASS
Consistency          |   0.0000e+00 |     1.000000 | ‚úÖ PASS
DD_21                |   0.0000e+00 |     1.000000 | ‚úÖ PASS
RollingStalePct      |   0.0000e+00 |     1.000000 | ‚úÖ PASS
RollMedDollarVol     |   0.0000e+00 |     1.000000 | ‚úÖ PASS
RollingSameVolCount  |   0.0000e+

In [32]:
f_name_excel = OUTPUT_DIR / "Audit_Verification_Report.xlsx"

export_audit_to_excel(audit_pack=my_analyzer.last_run, filename=f_name_excel)

üìÇ [EXCEL AUDIT] Building full transparency report: C:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR\output\Audit_Verification_Report.xlsx
‚ú® Audit Report Complete: C:\Users\ping\Files_win10\python\py311\stocks\notebooks_RLVR\output\Audit_Verification_Report.xlsx


WindowsPath('C:/Users/ping/Files_win10/python/py311/stocks/notebooks_RLVR/output/Audit_Verification_Report.xlsx')

### Export Ticker's OHLCV and its Features

In [33]:
f_name_csv = OUTPUT_DIR / "all_tickers_data_stacked.csv"

# Single call replaces your 3 cells
file_path = export_last_run_tickers_data_to_csv(
    analyzer=my_analyzer,
    df_ohlcv=df_ohlcv,
    features_df=features_df,
    filename=f_name_csv,
)

Creating combined dictionary for 11 ticker(s)
Date range: 2026-02-03 00:00:00 to 2026-02-26 00:00:00
Data retrieved for 11 ticker(s) from 2026-02-03 00:00:00 to 2026-02-26 00:00:00
Total rows: 187
Date range in data: 2026-02-03 00:00:00 to 2026-02-26 00:00:00
  IAG: 17 rows
  AU: 17 rows
  BUD: 17 rows
  CAT: 17 rows
  EWY: 17 rows
  NEM: 17 rows
  WBD: 17 rows
  WWD: 17 rows
  JNJ: 17 rows
  GDX: 17 rows
  SPY: 17 rows
Features data retrieved for 11 ticker(s) from 2026-02-03 00:00:00 to 2026-02-26 00:00:00
Total rows: 187
Date range in data: 2026-02-03 00:00:00 to 2026-02-26 00:00:00
Available features: ATR, ATRP, TRP, RSI, Mom_21, Consistency, IR_63, Beta_63, DD_21, Ret_1d, RollingStalePct, RollMedDollarVol, RollingSameVolCount
  IAG: 17 rows
  AU: 17 rows
  BUD: 17 rows
  CAT: 17 rows
  EWY: 17 rows
  NEM: 17 rows
  WBD: 17 rows
  WWD: 17 rows
  JNJ: 17 rows
  GDX: 17 rows
  SPY: 17 rows

Processing IAG...
  ‚úì Successfully combined data
  OHLCV shape: (17, 5)
  Features shape: (17

### Audit features_df

In [34]:
# # Takes 4 minutes to run, checks all tickers from my_analyzer
# audit_feature_engineering_integrity(my_analyzer, df_indices=df_indices, mode="system")