# Here is the refactored solution. I have separated the concerns into three distinct layers:
1.  **The Data Contract:** explicit `dataclasses` defining exactly what goes in and comes out.
2.  **The Engine:** A purely mathematical class (`AlphaEngine`) containing the logic, with no widget/plotting dependencies.
3.  **The UI:** A cleaned-up dashboard function that simply sends inputs to the Engine and visualizes the Output

### Following is the reverse chronological fix log (most recent entry is at the top )

```
To see the full dataframe of all tickers (both those that passed and those that failed) for a specific date, we need to capture a snapshot of the universe inside the `_get_eligible_universe` method.

I have updated the **`AlphaEngine`** class below.
```

```
To verify that the relative percentile logic is working, we can modify the `AlphaEngine` to report exactly **how the cutoff was calculated** for the specific start date.

We want to see evidence that:
1.  In earlier years (e.g., 2005), the volume cutoff is lower (e.g., $200k).
2.  In later years (e.g., 2024), the volume cutoff is higher (e.g., $5M).

Here is the updated `AlphaEngine` and `UI` code. I have added a **"Audit Log"** feature. When you run the tool, it will now print exactly what the Dollar Volume Threshold was for that specific day.
```

```
The best way to solve this is to switch from a **Fixed Dollar Threshold** (e.g., "$1 Million") to a **Relative Percentile Threshold** (e.g., "Top 50% of the market").

In 2004, a stock trading $200k might have been in the top 50% of liquid stocks. In 2024, that same $200k is illiquid garbage. Using a percentile automatically adjusts for inflation and market growth over time.

Here is how to modify your code to support this.
```

```
To fix this, we need to pass the **actual** calculated start date (the trading day the engine "snapped" to) back from the `AlphaEngine` to the UI. Then, the UI can compare the *Requested Date* vs. the *Actual Date* and display the warning message if they differ.

Here is the plan:
1.  **Update `EngineOutput`**: Add a `start_date` field to the dataclass.
2.  **Update `AlphaEngine.run`**: Populate this new field with `safe_start_date`.
3.  **Update `plot_walk_forward_analyzer`**: Add logic to compare the user's input date with the engine's returned date and print the "Info" message if they are different.

Here is the updated code (Sections C, D, and E have changed):
```

```
I have updated the `AlphaEngine.run` method. specifically inside the `if inputs.mode == 'Manual List':` block. It now iterates through every manual ticker and performs two checks:
1.  **Existence**: Is the ticker in the database?
2.  **Availability**: Does the ticker have a valid price on the specific `Start Date`?

If any ticker fails, it compiles a specific error message explaining why (e.g., "No price data on start date") and aborts the calculation immediately.  
```

```
The `snapshot_df` contains **every single feature** calculated by your `generate_features` function for that specific day, plus the new audit columns we added.

Here is exactly what is inside that DataFrame:

### 1. The Core Features (from `generate_features`)
*   **`TR`**: True Range
*   **`ATR`**: Average True Range
*   **`ATRP`**: Average True Range Percent (Volatility)
*   **`RollingStalePct`**: How often the price didn't move or volume was 0.
*   **`RollMedDollarVol`**: Median Daily Dollar Volume (Liquidity).
*   **`RollingSameVolCount`**: Data quality check for repeated volume numbers.

### 2. The Audit Columns (Added during filtering)
*   **`Calculated_Cutoff`**: The specific dollar amount required to pass on that day.
*   **`Passed_Vol_Check`**: `True` if the ticker met the liquidity requirement.
*   **`Passed_Final`**: `True` if it passed **all** checks (Liquidity + Stale + Quality).

=========================================

Here are the formulas translated directly into the Python `pandas` code used in your `generate_features` function.

I have simplified the code slightly to assume a single ticker context (removing the `groupby` wrapper) so you can see the raw math clearly.

### 1. True Range (TR)
Calculates the maximum of the three price differences.

prev_close = df_ohlcv['Adj Close'].shift(1)

# The three components
diff1 = df_ohlcv['Adj High'] - df_ohlcv['Adj Low']
diff2 = (df_ohlcv['Adj High'] - prev_close).abs()
diff3 = (df_ohlcv['Adj Low'] - prev_close).abs()

# Taking the max of the three
tr = pd.concat([diff1, diff2, diff3], axis=1).max(axis=1)

### 2. Average True Range (ATR)
Uses an Exponential Weighted Mean (EWM) with a specific alpha smoothing factor.

# N = atr_period (e.g., 14)
# alpha = 1 / N
atr = tr.ewm(alpha=1/14, adjust=False).mean()

### 3. ATR Percent (ATRP)
Simple division to normalize volatility.

atrp = atr / df_ohlcv['Adj Close']

### 4. Rolling Stale Percentage
Checks if volume is 0 OR if High equals Low (price didn't move), then averages that 1 or 0 signal over the window.

# 1. Define the Stale Signal (1 for stale, 0 for active)
is_stale = np.where(
    (df_ohlcv['Volume'] == 0) | (df_ohlcv['Adj High'] == df_ohlcv['Adj Low']), 
    1,  
    0
)

# 2. Calculate average over window (W=252)
rolling_stale_pct = pd.Series(is_stale).rolling(window=252).mean()

### 5. Rolling Median Dollar Volume
Calculates raw dollar volume, then finds the median over the window.

# 1. Calculate Daily Dollar Volume
dollar_volume = df_ohlcv['Adj Close'] * df_ohlcv['Volume']

# 2. Get Median over window (W=252)
roll_med_dollar_vol = dollar_volume.rolling(window=252).median()

### 6. Rolling Same Volume Count
Checks if today's volume is exactly the same as yesterday's (a sign of bad data), then sums those occurrences.

# 1. Check if Volume(t) - Volume(t-1) equals 0
# .diff() calculates current row minus previous row
has_same_volume = (df_ohlcv['Volume'].diff() == 0).astype(int)

# 2. Sum the errors over window (W=252)
rolling_same_vol_count = has_same_volume.rolling(window=252).sum()

```

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import ipywidgets as widgets
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
from collections import Counter
import pprint
from datetime import datetime, date

# ==============================================================================
# SECTION A: CORE HELPER FUNCTIONS & FEATURE GENERATION
# (Unchanged from previous version)
# ==============================================================================
# ... (Keep generate_features, calculate_gain, calculate_sharpe, 
#      calculate_sharpe_atr, calculate_buy_and_hold_performance as is) ...

def generate_features(df_ohlcv: pd.DataFrame, atr_period: int = 14, quality_window: int = 252, quality_min_periods: int = 126) -> pd.DataFrame:
    # (Same as before)
    if not df_ohlcv.index.is_monotonic_increasing: df_ohlcv = df_ohlcv.sort_index()
    grouped = df_ohlcv.groupby(level='Ticker')
    prev_close = grouped['Adj Close'].shift(1)
    tr = pd.concat([df_ohlcv['Adj High'] - df_ohlcv['Adj Low'], abs(df_ohlcv['Adj High'] - prev_close), abs(df_ohlcv['Adj Low'] - prev_close)], axis=1).max(axis=1, skipna=False)
    atr = tr.groupby(level='Ticker').transform(lambda x: x.ewm(alpha=1/atr_period, adjust=False).mean())
    atrp = (atr / df_ohlcv['Adj Close']).replace([np.inf, -np.inf], np.nan)
    indicator_df = pd.DataFrame({'TR': tr, 'ATR': atr, 'ATRP': atrp})
    quality_temp_df = pd.DataFrame({'IsStale': np.where((df_ohlcv['Volume'] == 0) | (df_ohlcv['Adj High'] == df_ohlcv['Adj Low']), 1, 0), 'DollarVolume': df_ohlcv['Adj Close'] * df_ohlcv['Volume'], 'HasSameVolume': (grouped['Volume'].diff() == 0).astype(int)}, index=df_ohlcv.index)
    rolling_result = quality_temp_df.groupby(level='Ticker').rolling(window=quality_window, min_periods=quality_min_periods).agg({'IsStale': 'mean', 'DollarVolume': 'median', 'HasSameVolume': 'sum'}).rename(columns={'IsStale': 'RollingStalePct', 'DollarVolume': 'RollMedDollarVol', 'HasSameVolume': 'RollingSameVolCount'}).reset_index(level=0, drop=True)
    return pd.concat([indicator_df, rolling_result], axis=1)

def calculate_gain(price_series): 
    if price_series.dropna().shape[0] < 2: return np.nan
    return (price_series.ffill().iloc[-1] / price_series.bfill().iloc[0]) - 1

def calculate_sharpe(return_series):
    if return_series.dropna().shape[0] < 2: return np.nan
    std = return_series.std()
    return (return_series.mean() / std * np.sqrt(252)) if std > 0 else 0.0

def calculate_sharpe_atr(return_series, atrp_series):
    if return_series.dropna().shape[0] < 2 or atrp_series.dropna().empty: return np.nan
    mean_atrp = atrp_series.mean()
    return (return_series.mean() / mean_atrp) if mean_atrp > 0 else 0.0

def calculate_buy_and_hold_performance(df_close, features_df, tickers, start_date, end_date):
    if not tickers: return pd.Series(dtype=float), pd.Series(dtype=float), pd.Series(dtype=float)
    ticker_counts = Counter(tickers)
    initial_weights = pd.Series({t: c / len(tickers) for t, c in ticker_counts.items()})
    prices_raw = df_close[initial_weights.index.tolist()].loc[start_date:end_date]
    if prices_raw.dropna(how='all').empty: return pd.Series(dtype=float), pd.Series(dtype=float), pd.Series(dtype=float)
    prices_norm = prices_raw.div(prices_raw.bfill().iloc[0])
    weighted_growth = prices_norm.mul(initial_weights, axis='columns')
    value_series = weighted_growth.sum(axis=1)
    return_series = value_series.pct_change()
    full_idx = pd.MultiIndex.from_product([initial_weights.index.tolist(), return_series.index], names=['Ticker', 'Date'])
    feat_subset = features_df.reindex(full_idx)['ATRP'].unstack(level='Ticker')
    atrp_series = (weighted_growth.div(value_series, axis='index').align(feat_subset, join='inner', axis=1)[0] * weighted_growth.div(value_series, axis='index').align(feat_subset, join='inner', axis=1)[1]).sum(axis=1)
    return value_series, return_series, atrp_series

# ==============================================================================
# SECTION B: METRIC REGISTRY
# ==============================================================================

def metric_price(d): return calculate_gain(d['calc_close'])
def metric_sharpe(d): 
    r = d['daily_returns']
    return (r.mean() / r.std() * np.sqrt(252)).replace([np.inf, -np.inf], np.nan).fillna(0)
def metric_sharpe_atr(d):
    return (d['daily_returns'].mean() / d['atrp']).replace([np.inf, -np.inf], np.nan).fillna(0)

METRIC_REGISTRY = {
    'Price': metric_price,
    'Sharpe': metric_sharpe,
    'Sharpe (ATR)': metric_sharpe_atr,
}

# ==============================================================================
# SECTION C: DATA CONTRACTS (The API)
# Updated EngineOutput to include actual start_date
# ==============================================================================

@dataclass
class EngineInput:
    mode: str
    start_date: pd.Timestamp
    calc_period: int
    fwd_period: int
    metric: str
    benchmark_ticker: str
    rank_start: int = 1
    rank_end: int = 10
    quality_thresholds: Dict[str, float] = field(default_factory=lambda: {'min_median_dollar_volume': 1_000_000, 'max_stale_pct': 0.05, 'max_same_vol_count': 10})
    manual_tickers: List[str] = field(default_factory=list)
    debug: bool = False

@dataclass
class EngineOutput:
    portfolio_series: pd.Series
    benchmark_series: pd.Series
    normalized_plot_data: pd.DataFrame
    tickers: List[str]
    initial_weights: pd.Series
    perf_metrics: Dict[str, float]
    results_df: pd.DataFrame
    start_date: pd.Timestamp # <--- NEW FIELD: The actual trading start date used
    calc_end_date: pd.Timestamp
    viz_end_date: pd.Timestamp
    error_msg: Optional[str] = None
    debug_data: Optional[Dict[str, Any]] = None

# ==============================================================================
# SECTION D: THE ALPHA ENGINE (The "Brain")
# This version saves a sorted dataframe called `universe_snapshot` into the debug data. It adds columns showing exactly which tickers passed or failed the specific thresholds.
# ==============================================================================

class AlphaEngine:
    def __init__(self, df_ohlcv: pd.DataFrame, master_ticker: str = 'SPY'):
        print("--- ⚙️ Initializing AlphaEngine ---")
        self.features_df = generate_features(df_ohlcv)
        print("Optimizing data structures...")
        self.df_close = df_ohlcv['Adj Close'].unstack(level=0)
        
        if master_ticker not in self.df_close.columns:
            master_ticker = self.df_close.columns[0]
            print(f"Warning: Master ticker not found. Using {master_ticker}")
            
        self.trading_calendar = self.df_close[master_ticker].dropna().index.unique().sort_values()
        print("✅ AlphaEngine Ready.")

    def run(self, inputs: EngineInput) -> EngineOutput:
        # --- A. Validate Dates ---
        try:
            start_idx = self.trading_calendar.searchsorted(inputs.start_date)
            if start_idx < 0: start_idx = 0
        except Exception:
            return self._error_result("Invalid Start Date")

        desired_end_idx = start_idx + inputs.calc_period + inputs.fwd_period
        if desired_end_idx >= len(self.trading_calendar):
            return self._error_result(f"Date range exceeds history.")

        safe_start_date = self.trading_calendar[start_idx]
        safe_calc_end_date = self.trading_calendar[start_idx + inputs.calc_period]
        safe_viz_end_date = self.trading_calendar[start_idx + inputs.calc_period + inputs.fwd_period]

        # --- B. Select Tickers ---
        tickers_to_trade = []
        results_table = pd.DataFrame()
        debug_dict = {}
        audit_info = {} 

        if inputs.mode == 'Manual List':
            validation_errors = []
            valid_tickers = []
            for t in inputs.manual_tickers:
                if t not in self.df_close.columns:
                    validation_errors.append(f"❌ {t}: Ticker not found.")
                    continue
                if pd.isna(self.df_close.at[safe_start_date, t]):
                    validation_errors.append(f"⚠️ {t}: No price data on start date.")
                    continue
                valid_tickers.append(t)
            
            if validation_errors: return self._error_result("\n".join(validation_errors))
            if not valid_tickers: return self._error_result("No valid tickers.")
            tickers_to_trade = valid_tickers
            results_table = pd.DataFrame(index=valid_tickers)
            
        else: # Ranking Mode
            eligible_tickers = self._get_eligible_universe(safe_start_date, inputs.quality_thresholds, audit_info)
            debug_dict['audit_liquidity'] = audit_info 
            
            if not eligible_tickers: return self._error_result("No tickers passed quality filters.")
            
            calc_close = self.df_close.loc[safe_start_date:safe_calc_end_date, eligible_tickers]
            idx_product = pd.MultiIndex.from_product([eligible_tickers, calc_close.index], names=['Ticker', 'Date'])
            feat_slice = self.features_df.reindex(idx_product).dropna(how='all')
            atrp_mean = feat_slice.groupby(level='Ticker')['ATRP'].mean()
            
            ingredients = { 'calc_close': calc_close, 'daily_returns': calc_close.pct_change(), 'atrp': atrp_mean }
            if inputs.metric not in METRIC_REGISTRY: return self._error_result(f"Metric '{inputs.metric}' not found.")
            metric_vals = METRIC_REGISTRY[inputs.metric](ingredients)
            sorted_tickers = metric_vals.sort_values(ascending=False)
            
            start_r = max(0, inputs.rank_start - 1)
            end_r = inputs.rank_end
            tickers_to_trade = sorted_tickers.iloc[start_r:end_r].index.tolist()
            if not tickers_to_trade: return self._error_result("No tickers generated from ranking.")

            results_table = pd.DataFrame({
                'Rank': range(inputs.rank_start, inputs.rank_start + len(tickers_to_trade)),
                'Ticker': tickers_to_trade,
                'Metric Value': sorted_tickers.loc[tickers_to_trade].values
            }).set_index('Ticker')

        # --- C. Performance Calculations ---
        p_val, p_ret, p_atrp = calculate_buy_and_hold_performance(self.df_close, self.features_df, tickers_to_trade, safe_start_date, safe_viz_end_date)
        b_val, b_ret, b_atrp = calculate_buy_and_hold_performance(self.df_close, self.features_df, [inputs.benchmark_ticker], safe_start_date, safe_viz_end_date)

        # --- D. Final Metrics ---
        plot_data = self.df_close[list(set(tickers_to_trade))].loc[safe_start_date:safe_viz_end_date]
        if not plot_data.empty: plot_data = plot_data / plot_data.bfill().iloc[0]
        calc_end_ts = safe_calc_end_date
        metrics = {}
        get_gain = lambda s: (s.iloc[-1] / s.iloc[0]) - 1 if len(s) > 0 else 0

        metrics['full_p_gain'] = get_gain(p_val)
        metrics['calc_p_gain'] = get_gain(p_val.loc[:calc_end_ts])
        metrics['fwd_p_gain'] = get_gain(p_val.loc[calc_end_ts:])
        metrics['full_p_sharpe_atr'] = calculate_sharpe_atr(p_ret, p_atrp)
        metrics['calc_p_sharpe_atr'] = calculate_sharpe_atr(p_ret.loc[:calc_end_ts], p_atrp.loc[p_ret.loc[:calc_end_ts].index])
        metrics['fwd_p_sharpe_atr'] = calculate_sharpe_atr(p_ret.loc[calc_end_ts:].iloc[1:], p_atrp.loc[p_ret.loc[calc_end_ts:].iloc[1:].index])
        
        if not b_ret.empty:
            metrics['full_b_gain'] = get_gain(b_val)
            metrics['calc_b_gain'] = get_gain(b_val.loc[:calc_end_ts])
            metrics['fwd_b_gain'] = get_gain(b_val.loc[calc_end_ts:])
            metrics['full_b_sharpe_atr'] = calculate_sharpe_atr(b_ret, b_atrp)
            metrics['calc_b_sharpe_atr'] = calculate_sharpe_atr(b_ret.loc[:calc_end_ts], b_atrp.loc[b_ret.loc[:calc_end_ts].index])
            metrics['fwd_b_sharpe_atr'] = calculate_sharpe_atr(b_ret.loc[calc_end_ts:].iloc[1:], b_atrp.loc[b_ret.loc[calc_end_ts:].iloc[1:].index])

        if not plot_data.empty: results_table['Fwd Gain'] = (plot_data.iloc[-1] / plot_data.loc[calc_end_ts]) - 1
        ticker_counts = Counter(tickers_to_trade)
        weights = pd.Series({t: c/len(tickers_to_trade) for t, c in ticker_counts.items()})

        if inputs.debug:
            trace_df = plot_data.copy()
            trace_df.columns = [f'Norm_Price_{c}' for c in trace_df.columns]
            trace_df['Norm_Price_Portfolio'] = p_val
            if not b_val.empty: trace_df[f'Norm_Price_Benchmark_{inputs.benchmark_ticker}'] = b_val
            debug_dict['portfolio_trace'] = trace_df

        return EngineOutput(
            portfolio_series=p_val, benchmark_series=b_val, normalized_plot_data=plot_data,
            tickers=tickers_to_trade, initial_weights=weights, perf_metrics=metrics,
            results_df=results_table, start_date=safe_start_date,
            calc_end_date=safe_calc_end_date, viz_end_date=safe_viz_end_date, debug_data=debug_dict
        )

    # --- UPDATED: CAPTURE SNAPSHOT ---
    def _get_eligible_universe(self, date_ts, thresholds, audit_container=None):
        avail_dates = self.features_df.index.get_level_values('Date').unique().sort_values()
        valid_dates = avail_dates[avail_dates <= date_ts]
        if valid_dates.empty: return []
        day_features = self.features_df.xs(valid_dates[-1], level='Date')

        # 1. Determine Dynamic Cutoff
        vol_cutoff = thresholds.get('min_median_dollar_volume', 0)
        percentile_used = "N/A"
        dynamic_val = 0
        
        if 'min_liquidity_percentile' in thresholds:
            percentile_used = thresholds['min_liquidity_percentile']
            dynamic_val = day_features['RollMedDollarVol'].quantile(percentile_used)
            vol_cutoff = max(vol_cutoff, dynamic_val)

        # 2. Logic Mask
        mask = (
            (day_features['RollMedDollarVol'] >= vol_cutoff) &
            (day_features['RollingStalePct'] <= thresholds['max_stale_pct']) &
            (day_features['RollingSameVolCount'] <= thresholds['max_same_vol_count'])
        )

        # 3. Capture Detailed Audit Snapshot
        if audit_container is not None:
            audit_container['date'] = valid_dates[-1]
            audit_container['total_tickers_available'] = len(day_features)
            audit_container['percentile_setting'] = percentile_used
            audit_container['percentile_value_usd'] = dynamic_val
            audit_container['final_cutoff_usd'] = vol_cutoff
            audit_container['tickers_passed'] = mask.sum()
            
            # Save the DataFrame!
            snapshot = day_features.copy()
            snapshot['Calculated_Cutoff'] = vol_cutoff
            snapshot['Passed_Vol_Check'] = snapshot['RollMedDollarVol'] >= vol_cutoff
            snapshot['Passed_Final'] = mask
            # Sort by volume so user can see the cutoff point easily
            snapshot = snapshot.sort_values('RollMedDollarVol', ascending=False)
            audit_container['universe_snapshot'] = snapshot

        return day_features[mask].index.tolist()

    def _error_result(self, msg):
        return EngineOutput(pd.Series(dtype=float), pd.Series(dtype=float), pd.DataFrame(), [], pd.Series(dtype=float), {}, pd.DataFrame(), pd.Timestamp.min, pd.Timestamp.min, pd.Timestamp.min, msg)

# ==============================================================================
# SECTION E: THE UI (Visualization)
# Update this function to read the audit data from the `debug_data` and print it nicely.
# Updated print logic to detect date shift
# Fixed EngineInput argument mapping
# ==============================================================================

def plot_walk_forward_analyzer(df_ohlcv, 
                               default_start_date='2020-01-01', 
                               default_calc_period=126, 
                               default_fwd_period=63,
                               default_metric='Sharpe (ATR)', 
                               default_rank_start=1, 
                               default_rank_end=10,
                               default_benchmark_ticker='SPY', 
                               master_calendar_ticker='SPY', 
                               quality_thresholds=None, 
                               debug=False):
    
    engine = AlphaEngine(df_ohlcv, master_ticker=master_calendar_ticker)
    results_container = [None]
    debug_container = [None]

    # --- UPDATED DEFAULT SETTINGS WITH PERCENTILE ---
    if quality_thresholds is None:
        quality_thresholds = {
            'min_median_dollar_volume': 100_000, # Hard floor
            'min_liquidity_percentile': 0.50,    # Top 50%
            'max_stale_pct': 0.05, 
            'max_same_vol_count': 10
        }

    # (Widget setup code remains the same...)
    mode_selector = widgets.RadioButtons(options=['Ranking', 'Manual List'], value='Ranking', description='Portfolio Mode:', layout={'width': 'max-content'})
    start_date_picker = widgets.DatePicker(description='Start Date:', value=pd.to_datetime(default_start_date))
    calc_period_input = widgets.IntText(value=default_calc_period, description='Calc Period:')
    fwd_period_input = widgets.IntText(value=default_fwd_period, description='Fwd Period:')
    metric_dropdown = widgets.Dropdown(options=list(METRIC_REGISTRY.keys()), value=default_metric, description='Metric:')
    rank_start_input = widgets.IntText(value=default_rank_start, description='Rank Start:')
    rank_end_input = widgets.IntText(value=default_rank_end, description='Rank End:')
    manual_tickers_input = widgets.Textarea(value='', placeholder='Enter tickers...', description='Manual Tickers:', layout={'width': '400px', 'height': '80px'})
    benchmark_input = widgets.Text(value=default_benchmark_ticker, description='Benchmark:', placeholder='Enter Ticker')
    update_button = widgets.Button(description="Update Chart", button_style='primary')
    ticker_list_output = widgets.Output()

    ranking_controls = widgets.HBox([metric_dropdown, rank_start_input, rank_end_input])
    manual_controls = widgets.HBox([manual_tickers_input])
    date_controls = widgets.HBox([start_date_picker, calc_period_input, fwd_period_input])
    ui = widgets.VBox([mode_selector, date_controls, ranking_controls, manual_controls, widgets.HBox([benchmark_input, update_button]), ticker_list_output], layout=widgets.Layout(margin='10px 0 20px 0'))
    
    def on_mode_change(c):
        ranking_controls.layout.display = 'flex' if c['new'] == 'Ranking' else 'none'
        manual_controls.layout.display = 'none' if c['new'] == 'Ranking' else 'flex'
    mode_selector.observe(on_mode_change, names='value')
    on_mode_change({'new': mode_selector.value})

    fig = go.FigureWidget()
    fig.update_layout(title='Walk-Forward Performance Analysis', height=600, width=1200, template="plotly_white", hovermode='x unified', autosize=True, margin=dict(l=20, r=20, t=40, b=20))
    for i in range(50): fig.add_trace(go.Scatter(visible=False, line=dict(width=2)))
    fig.add_trace(go.Scatter(name='Benchmark', visible=True, line=dict(color='black', width=3, dash='dash')))
    fig.add_trace(go.Scatter(name='Group Portfolio', visible=True, line=dict(color='green', width=3)))

    def update_plot(b):
        print('--- Plot Updated (v2) ---')
        ticker_list_output.clear_output()
        manual_list = [t.strip().upper() for t in manual_tickers_input.value.split(',') if t.strip()]
        start_date_raw = pd.to_datetime(start_date_picker.value)
        
        if start_date_raw < (engine.trading_calendar[0] - pd.Timedelta(days=7)):
            with ticker_list_output: print(f"⚠️ DATE WARNING: Start date {start_date_raw.date()} is too early."); return

        inputs = EngineInput(
            mode=mode_selector.value,
            start_date=start_date_raw,
            calc_period=calc_period_input.value,
            fwd_period=fwd_period_input.value,
            metric=metric_dropdown.value,
            benchmark_ticker=benchmark_input.value.strip().upper(),
            rank_start=rank_start_input.value,
            rank_end=rank_end_input.value,
            quality_thresholds=quality_thresholds,
            manual_tickers=manual_list,
            debug=debug
        )
        
        with ticker_list_output:
            res = engine.run(inputs)
            results_container[0] = res
            debug_container[0] = res.debug_data
            if res.error_msg: print(res.error_msg); return

            with fig.batch_update():
                cols = res.normalized_plot_data.columns.tolist()
                for i in range(50):
                    if i < len(cols): fig.data[i].update(x=res.normalized_plot_data.index.to_pydatetime(), y=res.normalized_plot_data[cols[i]], name=cols[i], visible=True)
                    else: fig.data[i].visible = False
                
                fig.data[50].update(x=res.benchmark_series.index.to_pydatetime(), y=res.benchmark_series.values, name=f"Benchmark ({inputs.benchmark_ticker})", visible=not res.benchmark_series.empty)
                fig.data[51].update(x=res.portfolio_series.index.to_pydatetime(), y=res.portfolio_series.values, visible=True)
                fig.layout.shapes = [dict(type="line", x0=res.calc_end_date.to_pydatetime(), y0=0, x1=res.calc_end_date.to_pydatetime(), y1=1, xref='x', yref='paper', line=dict(color="grey", width=2, dash="dash"))]

            req_date = inputs.start_date.date()
            act_date = res.start_date.date()
            if req_date != act_date: print(f"ℹ️ Info: Start date {req_date} is not a trading day. Snapping forward to {act_date}.")
            
            # --- LIQUIDITY AUDIT PRINT ---
            if inputs.mode == 'Ranking' and res.debug_data and 'audit_liquidity' in res.debug_data:
                audit = res.debug_data['audit_liquidity']
                if audit:
                    pct_str = f"{audit.get('percentile_setting', 0)*100:.0f}%"
                    cut_val = audit.get('final_cutoff_usd', 0)
                    print("-" * 60)
                    print(f"🔍 LIQUIDITY CHECK ({act_date})")
                    print(f"   Universe Size: {audit.get('total_tickers_available')} tickers")
                    print(f"   Filtering: Top {pct_str} of Market")
                    print(f"   Calculated Cutoff: ${cut_val:,.0f} / day")
                    print(f"   Tickers Remaining: {audit.get('tickers_passed')}")
                    print("-" * 60)
            
            print(f"Analysis Period: {act_date} to {res.viz_end_date.date()}.")
            
            if inputs.mode == 'Ranking': print("Ranked Tickers:"); pprint.pprint(res.tickers)
            else: print("Manual Portfolio Tickers:"); pprint.pprint(res.tickers)
            
            m = res.perf_metrics
            rows = [
                {'Metric': 'Group Portfolio Gain', 'Full': m.get('full_p_gain'), 'Calc': m.get('calc_p_gain'), 'Fwd': m.get('fwd_p_gain')},
                {'Metric': f'Benchmark ({inputs.benchmark_ticker}) Gain', 'Full': m.get('full_b_gain'), 'Calc': m.get('calc_b_gain'), 'Fwd': m.get('fwd_b_gain')},
                {'Metric': '== Gain Delta', 'Full': m.get('full_p_gain',0)-m.get('full_b_gain',0), 'Calc': m.get('calc_p_gain',0)-m.get('calc_b_gain',0), 'Fwd': m.get('fwd_p_gain',0)-m.get('fwd_b_gain',0)},
                {'Metric': 'Group Sharpe (ATR)', 'Full': m.get('full_p_sharpe_atr'), 'Calc': m.get('calc_p_sharpe_atr'), 'Fwd': m.get('fwd_p_sharpe_atr')},
                {'Metric': f'Benchmark Sharpe (ATR)', 'Full': m.get('full_b_sharpe_atr'), 'Calc': m.get('calc_b_sharpe_atr'), 'Fwd': m.get('fwd_b_sharpe_atr')},
                {'Metric': '== Sharpe Delta', 'Full': m.get('full_p_sharpe_atr',0)-m.get('full_b_sharpe_atr',0), 'Calc': m.get('calc_p_sharpe_atr',0)-m.get('calc_b_sharpe_atr',0), 'Fwd': m.get('fwd_p_sharpe_atr',0)-m.get('fwd_b_sharpe_atr',0)}
            ]
            display(pd.DataFrame(rows).set_index('Metric').style.format("{:+.2%}", na_rep="N/A"))

    update_button.on_click(update_plot)
    update_plot(None)
    display(ui, fig)
    return results_container, debug_container

# ==============================================================================
# SECTION F: UTILITIES
# ==============================================================================

def print_nested(d, indent=0, width=4):
    """Pretty-print any nested dict/list/tuple combination."""
    spacing = ' ' * indent
    if isinstance(d, dict):
        for k, v in d.items():
            print(f'{spacing}{k}:')
            print_nested(v, indent + width, width)
    elif isinstance(d, (list, tuple)):
        for item in d:
            print_nested(item, indent, width)
    else:
        print(f'{spacing}{d}')



In [2]:
import pandas as pd

data_path = r'c:\Users\ping\Files_win10\python\py311\stocks\data\df_OHLCV_stocks_etfs.parquet'
df_ohlcv = pd.read_parquet(data_path, engine='pyarrow')
print(f'df_ohlcv.info():\n{df_ohlcv.info()}')
df_ohlcv

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 9661318 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZWS', Timestamp('2025-12-02 00:00:00'))
Data columns (total 5 columns):
 #   Column     Dtype  
---  ------     -----  
 0   Adj Open   float64
 1   Adj High   float64
 2   Adj Low    float64
 3   Adj Close  float64
 4   Volume     int64  
dtypes: float64(4), int64(1)
memory usage: 406.1+ MB
df_ohlcv.info():
None


Unnamed: 0_level_0,Unnamed: 1_level_0,Adj Open,Adj High,Adj Low,Adj Close,Volume
Ticker,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,1999-11-18,27.2452,29.9398,23.9518,26.3470,74716411
A,1999-11-19,25.7108,25.7482,23.8396,24.1764,18198352
A,1999-11-22,24.7378,26.3470,23.9893,26.3470,7857766
A,1999-11-23,25.4488,26.1225,23.9518,23.9518,7138321
A,1999-11-24,24.0267,25.1120,23.9518,24.5881,5785609
...,...,...,...,...,...,...
ZWS,2025-11-25,47.2900,48.4800,47.1500,48.0200,592800
ZWS,2025-11-26,47.5400,48.7000,47.3000,48.1300,1154100
ZWS,2025-11-28,48.4600,48.4800,47.7000,47.7000,481400
ZWS,2025-12-01,47.1700,48.1800,47.1500,47.7400,608100


In [3]:
results_container, debug_container = plot_walk_forward_analyzer(
    df_ohlcv=df_ohlcv,
    default_start_date='2025-08-13',
    default_calc_period=10,
    default_fwd_period=5,
    default_metric='Sharpe (ATR)',
    default_rank_start=15,
    default_rank_end=16,
    default_benchmark_ticker='VOO', 
    master_calendar_ticker='VOO',    
    quality_thresholds = { 
        'min_median_dollar_volume': 100_000, # A low "hard floor" to filter absolute errors/garbage
        # If min_liquidity_percentile is 0.8 (Top 20%), we want values > the 0.8 quantile.            
        'min_liquidity_percentile': 0.50,    # Dynamic: Only keep the top 50% of stocks by volume
        'max_stale_pct': 0.05, 
        'max_same_vol_count': 10
    },
    debug=True  # <-- Activate the new mode!
)

--- ⚙️ Initializing AlphaEngine ---
Optimizing data structures...
✅ AlphaEngine Ready.


VBox(children=(RadioButtons(description='Portfolio Mode:', layout=Layout(width='max-content'), options=('Ranki…

FigureWidget({
    'data': [{'line': {'width': 2},
              'name': 'JPST',
              'type': 'scatter',
              'uid': '3ddb194c-3f2c-4cf8-bc65-5041ecb01efc',
              'visible': True,
              'x': array([datetime.datetime(2025, 8, 13, 0, 0),
                          datetime.datetime(2025, 8, 14, 0, 0),
                          datetime.datetime(2025, 8, 15, 0, 0),
                          datetime.datetime(2025, 8, 18, 0, 0),
                          datetime.datetime(2025, 8, 19, 0, 0),
                          datetime.datetime(2025, 8, 20, 0, 0),
                          datetime.datetime(2025, 8, 21, 0, 0),
                          datetime.datetime(2025, 8, 22, 0, 0),
                          datetime.datetime(2025, 8, 25, 0, 0),
                          datetime.datetime(2025, 8, 26, 0, 0),
                          datetime.datetime(2025, 8, 27, 0, 0),
                          datetime.datetime(2025, 8, 28, 0, 0),
                          

In [12]:
features_df = generate_features(df_ohlcv=df_ohlcv)

In [None]:
my_tickers = ['SPY', 'AAPL', 'IWM', 'QQQ', 'META', 'EEM', 'BABA']
my_tickers = ['NTES', 'LII',]


In [None]:
for ticker in my_tickers:
  if ticker in features_df.index.get_level_values('Ticker'):
    ticker_features = features_df.loc[ticker]
    ticker_features.to_csv(f'./export_csv/features_{ticker}.csv')
    print(f"✅ {ticker} features exported to: ./export_csv/features_{ticker}.csv")
  else:
    print(f"⚠️ {ticker} not found in features_df")

✅ NTES exported to: ./export_csv/features_NTES.csv
✅ LII exported to: ./export_csv/features_LII.csv


In [None]:
for ticker in my_tickers:
  if ticker in df_ohlcv.index.get_level_values('Ticker'):
    ticker_features = df_ohlcv.loc[ticker]
    ticker_features.to_csv(f'./export_csv/ohlcv_{ticker}.csv')
    print(f"✅ {ticker} OHLCV exported to: ./export_csv/ohlcv_{ticker}.csv")
  else:
    print(f"⚠️ {ticker} not found in df_ohlcv")

✅ NTES exported to: ./export_csv/ohlcv_NTES.csv
✅ LII exported to: ./export_csv/ohlcv_LII.csv


In [4]:
# Assuming you have run the variables setup from the previous step
snapshot_df = debug_container[0]['audit_liquidity']['universe_snapshot']

if 'AAPL' in snapshot_df.index:
    display(snapshot_df.loc[['AAPL']])
else:
    print("AAPL was not present in the data for this date.")

Unnamed: 0_level_0,TR,ATR,ATRP,RollingStalePct,RollMedDollarVol,RollingSameVolCount,Calculated_Cutoff,Passed_Vol_Check,Passed_Final
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AAPL,1.1061,0.520297,0.017905,0.0,5348299000.0,0.0,37206500.0,True,True


In [9]:
snapshot_df.to_csv('./export_csv/snapshot_df.csv')
print(f"✅ Snapshot exported to: ./export_csv/snapshot_df.csv")
print(f"   Shape: {snapshot_df.shape}")
print(f"   Columns: {list(snapshot_df.columns)}")

✅ Snapshot exported to: ./export_csv/snapshot_df.csv
   Shape: (1226, 9)
   Columns: ['TR', 'ATR', 'ATRP', 'RollingStalePct', 'RollMedDollarVol', 'RollingSameVolCount', 'Calculated_Cutoff', 'Passed_Vol_Check', 'Passed_Final']


In [29]:
# 1. Access the data inside the container list
current_debug_data = debug_container[0]

# 2. Check if the audit data exists (it is created only in 'Ranking' mode)
if current_debug_data and 'audit_liquidity' in current_debug_data:
    audit = current_debug_data['audit_liquidity']
    snapshot_df = audit['universe_snapshot']
    
    print(f"📅 Date: {audit['date'].date()}")
    print(f"💰 Calculated Cutoff: ${audit['final_cutoff_usd']:,.0f}")
    print("-" * 30)

# 3. View the tickers right around the cutoff point
# Find the index where 'Passed_Vol_Check' switches from True to False
    try:
        # Get the integer location (iloc) of the last True value
        last_pass_iloc = np.where(snapshot_df['Passed_Vol_Check'])[0][-1]
        
        # Show 5 rows before and 5 rows after the cutoff
        start = max(0, last_pass_iloc - 5)
        end = min(len(snapshot_df), last_pass_iloc + 6)
        
        display(snapshot_df.iloc[start:end].style.format({
            'RollMedDollarVol': '${:,.0f}',
            'Calculated_Cutoff': '${:,.0f}',
            'RollingStalePct': '{:.1%}'
        }))
    except IndexError:
        print("Could not determine cutoff boundary (maybe all passed or all failed).")
        display(snapshot_df.head())
else:
    print("⚠️ No audit data found. Make sure you are in 'Ranking' mode and have clicked 'Update Chart'.")    

📅 Date: 2015-04-28
💰 Calculated Cutoff: $37,206,497
------------------------------


Unnamed: 0_level_0,TR,ATR,ATRP,RollingStalePct,RollMedDollarVol,RollingSameVolCount,Calculated_Cutoff,Passed_Vol_Check,Passed_Final
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
NCLH,1.59,1.19589,0.023837,0.0%,"$37,587,368",0.0,"$37,206,497",True,True
TER,0.5592,0.396619,0.022918,0.0%,"$37,569,939",0.0,"$37,206,497",True,True
AER,0.7876,0.750605,0.016253,0.0%,"$37,551,067",0.0,"$37,206,497",True,True
EVRG,0.4212,0.439924,0.016716,0.0%,"$37,542,028",0.0,"$37,206,497",True,True
VRSK,0.6321,0.866085,0.012449,0.0%,"$37,535,055",0.0,"$37,206,497",True,True
NTES,0.4759,0.658619,0.03188,0.0%,"$37,206,497",0.0,"$37,206,497",True,True
LII,1.6596,1.712063,0.018024,0.0%,"$37,107,899",0.0,"$37,206,497",False,False
ATI,0.7509,1.041575,0.031199,0.0%,"$36,741,036",0.0,"$37,206,497",False,False
BMO,0.3743,0.620426,0.014351,0.0%,"$36,597,179",0.0,"$37,206,497",False,False
MKC,0.3127,0.400849,0.01282,0.0%,"$36,228,620",0.0,"$37,206,497",False,False


In [None]:
display(snapshot_df)

In [30]:
print_nested(results_container)
# print('='*20)
# print_nested(debug_container)

EngineOutput(portfolio_series=Date
2015-04-28    1.000000
2015-04-29    1.015713
2015-04-30    1.050600
2015-05-01    1.077869
2015-05-04    1.078302
2015-05-05    1.063868
2015-05-06    1.073178
2015-05-07    1.086883
2015-05-08    1.099645
2015-05-11    1.099924
2015-05-12    1.092934
2015-05-13    1.101673
2015-05-14    1.113253
2015-05-15    1.117366
2015-05-18    1.118461
2015-05-19    1.119714
dtype: float64, benchmark_series=Date
2015-04-28    1.000000
2015-04-29    0.995925
2015-04-30    0.986015
2015-05-01    0.996334
2015-05-04    0.999430
2015-05-05    0.987768
2015-05-06    0.983798
2015-05-07    0.987768
2015-05-08    1.000725
2015-05-11    0.995925
2015-05-12    0.993032
2015-05-13    0.993342
2015-05-14    1.003716
2015-05-15    1.004744
2015-05-18    1.007841
2015-05-19    1.007327
dtype: float64, normalized_plot_data=Ticker           VMC       MLM      MDLZ       LKQ
Date                                              
2015-04-28  1.000000  1.000000  1.000000  1.000000
2