## Project Hand-off: Walk-Forward Backtesting Bot

This package contains the final version of the code, designed to be resilient, portable, and easy to use in both local (VS Code) and cloud (Google Colab) environments.

### 1. Summary of Key Features

The system you have built now includes:

*   **Environment-Agnostic Operation:** A "magic switch" automatically detects whether the code is running locally or in Colab and adjusts all file paths accordingly.
*   **Resumable Backtests (Checkpointing):** Long-running parameter searches are now resilient. If the process is interrupted, it can be restarted and will automatically skip completed work, picking up where it left off.
*   **Granular, Trading-Day-Based Logic:** The backtester operates on precise integer counts of trading days, allowing for non-calendar-based periods (e.g., 10-day holds) and eliminating approximation errors.
*   **Multi-Period Testing:** The automation script is capable of testing a list of different holding/rebalancing periods in a single run.
*   **Modular & Verifiable Core Engine:** The core calculation logic (`run_walk_forward_step`) is a pure, self-contained function, making it easy to test and verify independently.
*   **Dynamic Data Quality Filtering:** Before each ranking period, the universe of stocks is filtered based on rolling liquidity and data quality metrics, ensuring the strategy is only applied to tradable assets.

### 2. Required Project Structure

For the environment switch to work seamlessly, your project should be organized in the following way, both on your local machine and in Google Drive.

```
my_trading_project/
│
├── 📜 bot.ipynb                 # <-- This is the main notebook file
│
├── 📁 data/
│   └── 📊 df_OHLCV_stocks_etfs.parquet # <-- Your input data file goes here
│
└── 📁 export_csv/                 # <-- Folder for local results (created automatically)
```

### 3. Final, Complete Code

This is the entire code for your notebook, consolidated into logical cells.

#### **CELL 1: ENVIRONMENT SETUP & CONFIGURATION**
*This cell is the "brain" of the system. It detects the environment and configures all paths. It's the only cell you might need to edit if your file paths change.*

In [1]:
# ==============================================================================
# --- CELL 1: ENVIRONMENT SETUP & CONFIGURATION (IMPROVED) ---
# This cell automatically detects the environment (local VS Code or Google Colab)
# and configures paths and settings accordingly. It also creates directories.
# ==============================================================================
import sys
import os

# 1. AUTOMATIC ENVIRONMENT DETECTION
try:
    import google.colab
    IS_COLAB = True
    print("✅ Environment: Google Colab detected.")
except ImportError:
    IS_COLAB = False
    print("✅ Environment: Local (VS Code) detected.")

# 2. ENVIRONMENT-SPECIFIC CONFIGURATION
if IS_COLAB:
    # --- Colab Settings ---
    from google.colab import drive, output
    drive.mount('/content/drive')
    output.enable_custom_widget_manager()
    
    # IMPORTANT: This should be the path to your main project folder in Google Drive
    DRIVE_ROOT = '/content/drive/MyDrive/my_trading_project'
    
    env_config = {
        'data_path': os.path.join(DRIVE_ROOT, 'data', 'df_OHLCV_stocks_etfs.parquet'),
        'output_dir': os.path.join(DRIVE_ROOT, 'results') # Colab results go in a 'results' folder
    }
    
else:
    # --- Local Settings ---
    # IMPORTANT: Update this path to your local data file if it's different
    env_config = {
        'data_path': r'c:\Users\ping\Files_win10\python\py311\stocks\data\df_OHLCV_stocks_etfs.parquet',
        'output_dir': os.path.join('.', 'export_csv') # Local results go in 'export_csv'
    }

# 3. CREATE ALL NECESSARY DIRECTORIES
data_parent_dir = os.path.dirname(env_config['data_path'])
os.makedirs(data_parent_dir, exist_ok=True)
os.makedirs(env_config['output_dir'], exist_ok=True)

print(f"\nData will be loaded from: {env_config['data_path']}")
print(f"Output files will be saved to: {env_config['output_dir']}")

# 4. DEFINE THE FULL PATH FOR THE RESULTS FILE
env_config['results_path'] = os.path.join(env_config['output_dir'], 'dev_strategy_search_results.csv')



✅ Environment: Local (VS Code) detected.

Data will be loaded from: c:\Users\ping\Files_win10\python\py311\stocks\data\df_OHLCV_stocks_etfs.parquet
Output files will be saved to: .\export_csv


#### **CELL 2: GOLDEN COPY - CORE ENGINE & TOOLS**
*This cell contains all the stable, tested functions that form the core of your backtester.*

In [2]:
# ==============================================================================
# GOLDEN COPY - COMPLETE PROJECT CODE (All Fixes Included)
# Version: Added calc_period_start, calc_period_end, forward_period_start, forward_period_end
#  to plot_walk_forward_analyzer's return containers    
# Date: 2025-10-10
# ==============================================================================

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import ipywidgets as widgets
import time
import pprint
import re
import os

from datetime import datetime, date
from IPython.display import display, Markdown
from tqdm.auto import tqdm
from pathlib import Path
from itertools import product


pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 200)
pd.set_option('display.width', 3000)

# --- A. HELPER FUNCTIONS ---

def calculate_gain(price_series: pd.Series):
    """Calculates the total gain over a series of prices."""
    # Ensure there are at least two data points to calculate a gain
    if price_series.dropna().shape[0] < 2: return np.nan
    # Use forward-fill for the end price and back-fill for the start price
    # to handle potential NaNs at the beginning or end of the series.
    return (price_series.ffill().iloc[-1] / price_series.bfill().iloc[0]) - 1

def calculate_sharpe(return_series: pd.Series):
    """Calculates the annualized Sharpe ratio from a series of daily returns."""
    # Ensure there are at least two returns to calculate a standard deviation
    if return_series.dropna().shape[0] < 2: return np.nan
    std_dev = return_series.std()
    # Avoid division by zero if returns are constant
    if std_dev > 0 and std_dev != np.inf:
        return (return_series.mean() / std_dev) * np.sqrt(252)
    return np.nan

def print_nested(d, indent=0, width=4):
    """Pretty-print any nested dict/list/tuple combination."""
    spacing = ' ' * indent
    if isinstance(d, dict):
        for k, v in d.items():
            print(f'{spacing}{k}:')
            print_nested(v, indent + width, width)
    elif isinstance(d, (list, tuple)):
        for item in d:
            print_nested(item, indent, width)
    else:
        print(f'{spacing}{d}')

# --- B. THE CORE CALCULATION ENGINE ---

def run_walk_forward_step(df_close_full, df_high_full, df_low_full,
                          master_trading_days,
                          start_date, calc_period, fwd_period,
                          metric, rank_start, rank_end, benchmark_ticker,
                          debug=False):
    """Runs a single step of the walk-forward analysis using precise trading days."""
    debug_data = {} if debug else None
    
    # 1. Determine exact date ranges using the master trading day calendar
    try:
        start_idx = master_trading_days.get_loc(start_date)
    except KeyError:
        return ({'error': f"Start date {start_date.date()} is not a valid trading day."}, None)
        
    calc_end_idx = min(start_idx + calc_period, len(master_trading_days) - 1)
    viz_end_idx = min(calc_end_idx + fwd_period, len(master_trading_days) - 1)

    safe_start_date = master_trading_days[start_idx]
    safe_calc_end_date = master_trading_days[calc_end_idx]
    safe_viz_end_date = master_trading_days[viz_end_idx]
    
    if safe_start_date >= safe_calc_end_date:
        return ({'error': "Invalid date range (calc period has zero or negative length)."}, None)

    # 2. Slice data for the calculation period and filter for valid tickers
    calc_close_raw = df_close_full.loc[safe_start_date:safe_calc_end_date]
    calc_close = calc_close_raw.dropna(axis=1, how='all') # Drop tickers with no data in the period
    if calc_close.shape[1] == 0 or len(calc_close) < 2:
        return ({'error': "Not enough data in calc period."}, None)

    # 3. Calculate ranking metrics for all valid tickers
    first_prices = calc_close.bfill().iloc[0]
    last_prices = calc_close.ffill().iloc[-1]
    daily_returns = calc_close.bfill().ffill().pct_change()
    mean_returns = daily_returns.mean()
    std_returns = daily_returns.std()
    
    valid_tickers = calc_close.columns
    calc_high = df_high_full[valid_tickers].loc[safe_start_date:safe_calc_end_date]
    calc_low = df_low_full[valid_tickers].loc[safe_start_date:safe_calc_end_date]
    
    # Correctly calculate True Range (TR) for a multi-ticker DataFrame
    # First, align the previous day's close to the current calculation window.
    prev_close = df_close_full[valid_tickers].shift(1).loc[safe_start_date:safe_calc_end_date]
    
    # Calculate the three components of True Range. Each result is a DataFrame.
    component1 = calc_high - calc_low
    component2 = abs(calc_high - prev_close)
    component3 = abs(calc_low - prev_close)

    # Find the element-wise maximum across the three component DataFrames.
    # np.maximum is efficient and preserves the DataFrame structure.
    tr = np.maximum(component1, np.maximum(component2, component3))
    
    atr = tr.ewm(alpha=1/14, adjust=False).mean()
    atrp = (atr / calc_close).mean() # Mean ATRP over the calculation period

    metric_values = {}
    metric_values['Price'] = (last_prices / first_prices).dropna()
    metric_values['Sharpe'] = (mean_returns / std_returns * np.sqrt(252)).fillna(0)
    metric_values['Sharpe (ATR)'] = (mean_returns / atrp).fillna(0)

    if debug:
        df_ranking = pd.DataFrame({
            'FirstPrice': first_prices, 'LastPrice': last_prices, 'MeanDailyReturn': mean_returns,
            'StdDevDailyReturn': std_returns, 'MeanATRP': atrp, 'Metric_Price': metric_values['Price'],
            'Metric_Sharpe': metric_values['Sharpe'], 'Metric_Sharpe (ATR)': metric_values['Sharpe (ATR)']
        })
        df_ranking.index.name = 'Ticker'
        debug_data['ranking_metrics'] = df_ranking.sort_values(f'Metric_{metric}', ascending=False)

    # 4. Rank tickers and select the target group
    sorted_tickers = metric_values[metric].sort_values(ascending=False)
    tickers_to_display = sorted_tickers.index[rank_start-1:rank_end].tolist()
    if not tickers_to_display:
        return ({'error': "No tickers found for the selected rank."}, None)

    # 5. Prepare data for plotting and portfolio performance calculation
    normalized_plot_data = df_close_full[tickers_to_display].loc[safe_start_date:safe_viz_end_date]
    normalized_plot_data = normalized_plot_data.div(normalized_plot_data.bfill().iloc[0])
    actual_calc_end_ts = calc_close.index.max()
    portfolio_series = normalized_plot_data.mean(axis=1)
    portfolio_return_series = portfolio_series.pct_change()
    benchmark_price_series = df_close_full.get(benchmark_ticker)
    benchmark_return_series = benchmark_price_series.loc[safe_start_date:safe_viz_end_date].bfill().ffill().pct_change() if benchmark_price_series is not None else pd.Series(dtype='float64')

    # 6. Correctly slice return series for Sharpe calculation to prevent lookahead
    try:
        # Use index location for a clean, non-overlapping split
        boundary_loc = portfolio_return_series.index.get_loc(actual_calc_end_ts)
        calc_portfolio_returns = portfolio_return_series.iloc[:boundary_loc + 1]
        fwd_portfolio_returns = portfolio_return_series.iloc[boundary_loc + 1:]
        
        if benchmark_price_series is not None:
            bm_boundary_loc = benchmark_return_series.index.get_loc(actual_calc_end_ts)
            calc_benchmark_returns = benchmark_return_series.iloc[:bm_boundary_loc + 1]
            fwd_benchmark_returns = benchmark_return_series.iloc[bm_boundary_loc + 1:]
        else:
            calc_benchmark_returns, fwd_benchmark_returns = pd.Series(dtype='float64'), pd.Series(dtype='float64')
            
    except (KeyError, IndexError): # Fallback for edge cases
        calc_portfolio_returns = portfolio_return_series.loc[:actual_calc_end_ts]
        fwd_portfolio_returns = portfolio_return_series.loc[actual_calc_end_ts:].iloc[1:]
        if benchmark_price_series is not None:
            calc_benchmark_returns = benchmark_return_series.loc[:actual_calc_end_ts]
            fwd_benchmark_returns = benchmark_return_series.loc[actual_calc_end_ts:].iloc[1:]
        else:
            calc_benchmark_returns, fwd_benchmark_returns = pd.Series(dtype='float64'), pd.Series(dtype='float64')

    # 7. Calculate performance metrics (Gain & Sharpe) for all periods
    perf_data = {}
    perf_data['calc_p_gain'] = calculate_gain(portfolio_series.loc[:actual_calc_end_ts])
    perf_data['fwd_p_gain'] = calculate_gain(portfolio_series.loc[actual_calc_end_ts:])
    perf_data['full_p_gain'] = calculate_gain(portfolio_series)
    perf_data['calc_p_sharpe'] = calculate_sharpe(calc_portfolio_returns)
    perf_data['fwd_p_sharpe'] = calculate_sharpe(fwd_portfolio_returns)
    perf_data['full_p_sharpe'] = calculate_sharpe(portfolio_return_series)
    
    perf_data['calc_b_gain'] = calculate_gain(benchmark_price_series.loc[safe_start_date:actual_calc_end_ts]) if benchmark_price_series is not None else np.nan
    perf_data['fwd_b_gain'] = calculate_gain(benchmark_price_series.loc[actual_calc_end_ts:safe_viz_end_date]) if benchmark_price_series is not None else np.nan
    perf_data['full_b_gain'] = calculate_gain(benchmark_price_series.loc[safe_start_date:safe_viz_end_date]) if benchmark_price_series is not None else np.nan
    perf_data['calc_b_sharpe'] = calculate_sharpe(calc_benchmark_returns)
    perf_data['fwd_b_sharpe'] = calculate_sharpe(fwd_benchmark_returns)
    perf_data['full_b_sharpe'] = calculate_sharpe(benchmark_return_series)

    # 8. Assemble results DataFrame for display
    calc_end_prices = calc_close.ffill().iloc[-1]
    fwd_close_slice = df_close_full.loc[actual_calc_end_ts:safe_viz_end_date]
    viz_end_prices = fwd_close_slice.ffill().iloc[-1] if not fwd_close_slice.empty and len(fwd_close_slice) >= 2 else calc_end_prices
    calc_gains = (calc_end_prices / calc_close.bfill().iloc[0]) - 1
    fwd_gains = (viz_end_prices / calc_end_prices) - 1
    results_df = pd.DataFrame({'Rank': range(rank_start, rank_start + len(tickers_to_display)), 'Metric': metric, 'MetricValue': sorted_tickers.loc[tickers_to_display].values, 'CalcPrice': calc_end_prices.loc[tickers_to_display], 'CalcGain': calc_gains.loc[tickers_to_display], 'FwdGain': fwd_gains.loc[tickers_to_display]}, index=pd.Index(tickers_to_display, name='Ticker'))
    if benchmark_price_series is not None and benchmark_ticker in calc_close.columns:
        benchmark_df_row = pd.DataFrame({'Rank': np.nan, 'Metric': metric, 'MetricValue': metric_values[metric].get(benchmark_ticker, np.nan), 'CalcPrice': calc_end_prices[benchmark_ticker], 'CalcGain': calc_gains[benchmark_ticker], 'FwdGain': fwd_gains[benchmark_ticker]}, index=pd.Index([f"{benchmark_ticker} (BM)"], name='Ticker'))
        results_df = pd.concat([results_df, benchmark_df_row])
    
    # 9. Assemble debug data if requested
    if debug:
        df_trace = normalized_plot_data.copy()
        df_trace.columns = [f'Norm_Price_{c}' for c in df_trace.columns]
        df_trace['Norm_Price_Portfolio'] = portfolio_series
        if benchmark_price_series is not None and not benchmark_price_series.loc[safe_start_date:safe_viz_end_date].dropna().empty:
            norm_bm = benchmark_price_series.loc[safe_start_date:safe_viz_end_date] / benchmark_price_series.loc[safe_start_date:].bfill().iloc[0]
            df_trace[f'Norm_Price_Benchmark_{benchmark_ticker}'] = norm_bm
        for col in df_trace.columns:
            if 'Norm_Price' in col:
                df_trace[col.replace('Norm_Price', 'Return')] = df_trace[col].pct_change()
        debug_data['portfolio_trace'] = df_trace

    # 10. Package final results
    final_results = {
        'tickers_to_display': tickers_to_display, 'normalized_plot_data': normalized_plot_data,
        'portfolio_series': portfolio_series, 'benchmark_price_series': benchmark_price_series,
        'performance_data': perf_data, 'results_df': results_df, 'actual_calc_end_ts': actual_calc_end_ts,
        'safe_start_date': safe_start_date, 'safe_viz_end_date': safe_viz_end_date,
        'error': None
    }
    return (final_results, debug_data)

# --- C. DYNAMIC DATA QUALITY FILTER FUNCTIONS ---

def calculate_rolling_quality_metrics(df_ohlcv, window=252, min_periods=126, debug=False):
    """Calculates rolling data quality metrics for the entire dataset."""
    print(f"--- Calculating Rolling Quality Metrics (Window: {window} days) ---")
    df = df_ohlcv.copy()
    if not df.index.is_monotonic_increasing:
        df.sort_index(inplace=True)
        
    # Define quality flags
    df['IsStale'] = np.where((df['Volume'] == 0) | (df['Adj High'] == df['Adj Low']), 1, 0)
    df['DollarVolume'] = df['Adj Close'] * df['Volume']
    df['HasSameVolumeAsPrevDay'] = (df.groupby(level='Ticker')['Volume'].diff() == 0).astype(int)
    
    # Calculate rolling metrics per ticker
    grouped = df.groupby(level='Ticker')
    stale_pct = grouped['IsStale'].rolling(window=window, min_periods=min_periods).mean()
    median_vol = grouped['DollarVolume'].rolling(window=window, min_periods=min_periods).median()
    same_vol_count = grouped['HasSameVolumeAsPrevDay'].rolling(window=window, min_periods=min_periods).sum()
    
    quality_df = pd.concat([stale_pct, median_vol, same_vol_count], axis=1)
    quality_df.columns = ['RollingStalePct', 'RollingMedianVolume', 'RollingSameVolCount']
    quality_df.index = quality_df.index.droplevel(0) # Remove the extra 'Ticker' level
    print("✅ Rolling metrics calculation complete.")
    return quality_df

def get_eligible_universe(quality_metrics_df, filter_date, thresholds):
    """Filters the universe of tickers based on quality metrics for a given date."""
    filter_date_ts = pd.to_datetime(filter_date)
    date_index = quality_metrics_df.index.get_level_values('Date').unique().sort_values()
    
    if filter_date_ts < date_index[0]:
        print(f"Warning: Filter date {filter_date_ts.date()} is before the earliest data point. Returning empty universe.")
        return []
        
    # Find the most recent date with quality data on or before the filter date
    valid_prior_dates = date_index[date_index <= filter_date_ts]
    if valid_prior_dates.empty:
        print(f"Warning: No available data found on or before {filter_date_ts.date()}. Returning empty universe.")
        return []
        
    actual_date_to_use = valid_prior_dates[-1]
    if actual_date_to_use.date() != filter_date_ts.date():
        print(f"ℹ️ Info: Filter date {filter_date_ts.date()} not found. Using previous available date {actual_date_to_use.date()}.")

    metrics_on_date = quality_metrics_df.xs(actual_date_to_use, level='Date')
    
    # Apply filters
    mask = ((metrics_on_date['RollingMedianVolume'] >= thresholds['min_median_dollar_volume']) &
            (metrics_on_date['RollingStalePct'] <= thresholds['max_stale_pct']) &
            (metrics_on_date['RollingSameVolCount'] <= thresholds['max_same_vol_count']))
            
    eligible_tickers = metrics_on_date[mask].index.tolist()
    all_tickers = metrics_on_date.index.tolist()
    print(f"Dynamic Filter ({filter_date_ts.date()}): Kept {len(eligible_tickers)} of {len(all_tickers)} tickers.")
    return eligible_tickers    

# --- D. INTERACTIVE ANALYSIS & BACKTESTING TOOLS ---

def plot_walk_forward_analyzer(df_ohlcv, 
                               default_start_date=None, default_calc_period=126, default_fwd_period=63,
                               default_metric='Sharpe (ATR)', default_rank_start=1, default_rank_end=10,
                               default_benchmark_ticker='VOO', master_calendar_ticker='VOO',
                               quality_thresholds={'min_median_dollar_volume': 1_000_000, 'max_stale_pct': 0.05, 'max_same_vol_count': 10},
                               debug=False):
    """Creates an interactive widget for single-period walk-forward analysis."""
    print("Initializing Walk-Forward Analyzer (using Trading Day Logic)...")
    if not isinstance(df_ohlcv.index, pd.MultiIndex): raise ValueError("Input DataFrame must have a (Ticker, Date) MultiIndex.")
    df_ohlcv = df_ohlcv.sort_index()

    if master_calendar_ticker not in df_ohlcv.index.get_level_values(0):
        raise ValueError(f"Master calendar ticker '{master_calendar_ticker}' not found in DataFrame.")
    master_trading_days = df_ohlcv.loc[master_calendar_ticker].index.unique().sort_values()
    print(f"Master trading day calendar created from '{master_calendar_ticker}' ({len(master_trading_days)} days).")

    # # The following functions are assumed to exist. We define placeholders for them.
    # def calculate_rolling_quality_metrics(df, window):
    #     tickers = df.index.get_level_values(0).unique()
    #     dates = df.index.get_level_values(1).unique()
    #     return pd.DataFrame(index=pd.MultiIndex.from_product([tickers, dates], names=['Ticker', 'Date']))
    # def get_eligible_universe(quality_df, date, thresholds):
    #     tickers = quality_df.index.get_level_values(0).unique()
    #     return list(tickers)
    # def run_walk_forward_step(*args, **kwargs):
    #     # Dummy return structure for demonstration
    #     return {'error': "This is a placeholder function.", 'safe_start_date': pd.Timestamp.now(), 'actual_calc_end_ts': pd.Timestamp.now(), 'safe_viz_end_date': pd.Timestamp.now()}, None

    print("Pre-calculating data quality metrics...")
    quality_metrics_df = calculate_rolling_quality_metrics(df_ohlcv, window=252)
    print("Pre-processing data (unstacking)...")
    df_close_full = df_ohlcv['Adj Close'].unstack(level=0)
    df_high_full = df_ohlcv['Adj High'].unstack(level=0)
    df_low_full = df_ohlcv['Adj Low'].unstack(level=0)
    
    # --- Widget Setup ---
    start_date_picker = widgets.DatePicker(description='Start Date:', value=pd.to_datetime(default_start_date), disabled=False)
    calc_period_input = widgets.IntText(value=default_calc_period, description='Calc Period (days):')
    fwd_period_input = widgets.IntText(value=default_fwd_period, description='Fwd Period (days):')
    metrics = ['Price', 'Sharpe', 'Sharpe (ATR)']
    metric_dropdown = widgets.Dropdown(options=metrics, value=default_metric, description='Metric:')
    rank_start_input = widgets.IntText(value=default_rank_start, description='Rank Start:')
    rank_end_input = widgets.IntText(value=default_rank_end, description='Rank End:')
    benchmark_ticker_input = widgets.Text(value=default_benchmark_ticker, description='Benchmark:', placeholder='Enter Ticker')
    update_button = widgets.Button(description="Update Chart", button_style='primary')
    ticker_list_output = widgets.Output()
    results_container, debug_data_container = [None], [None]

    # --- Plotting Setup ---
    fig = go.FigureWidget()
    max_traces = 50
    for i in range(max_traces): fig.add_trace(go.Scatter(x=[None], y=[None], mode='lines', name=f'placeholder_{i}', visible=False, showlegend=False))
    fig.add_trace(go.Scatter(x=[None], y=[None], mode='lines', name='Benchmark', visible=True, showlegend=True, line=dict(color='black', width=3, dash='dash')))
    fig.add_trace(go.Scatter(x=[None], y=[None], mode='lines', name='Group Portfolio', visible=True, showlegend=True, line=dict(color='green', width=3)))

    # --- Update Logic (Callback) ---
    def update_plot(button_click):
        ticker_list_output.clear_output()
        
        # 1. Get and validate user inputs
        start_date_raw = pd.to_datetime(start_date_picker.value)
        start_date_idx = master_trading_days.searchsorted(start_date_raw)
        if start_date_idx >= len(master_trading_days):
            with ticker_list_output: print(f"Error: Start date is after the last available trading day."); return
        actual_start_date = master_trading_days[start_date_idx]
        with ticker_list_output: 
            if start_date_raw.date() != actual_start_date.date():
                print(f"ℹ️ Info: Start date {start_date_raw.date()} is not a trading day. Snapping forward to {actual_start_date.date()}.")

        # Capture input values into variables
        calc_period = calc_period_input.value
        fwd_period = fwd_period_input.value
        metric = metric_dropdown.value
        rank_start = rank_start_input.value
        rank_end = rank_end_input.value
        benchmark_ticker = benchmark_ticker_input.value.strip().upper()
        
        if rank_start > rank_end:
            with ticker_list_output: print("Error: 'Rank Start' must be <= 'Rank End'."); return
        if rank_start < 1 or calc_period < 2 or fwd_period < 1:
            with ticker_list_output: print("Error: Ranks must be >= 1, Calc Period >= 2, Fwd Period >= 1."); return

        # 1a. Validate data availability
        required_days = calc_period + fwd_period
        if start_date_idx + required_days > len(master_trading_days):
            available_days = len(master_trading_days) - start_date_idx
            last_available_date = master_trading_days[-1].date()
            with ticker_list_output:
                print(f"Error: Not enough data for the requested period.")
                print(f"  Start Date: {actual_start_date.date()}")
                print(f"  Required Days: {calc_period} (calc) + {fwd_period} (fwd) = {required_days}")
                print(f"  Available Days from Start: {available_days} (until {last_available_date})")
                print(f"  Please shorten the 'Calc Period' / 'Fwd Period' or choose an earlier 'Start Date'.")
            return

        # 2. Apply dynamic data quality filter
        eligible_tickers = get_eligible_universe(quality_metrics_df, actual_start_date, quality_thresholds)
        if not eligible_tickers:
            with ticker_list_output: print(f"Error: No eligible tickers found on {actual_start_date.date()} with the current quality filters."); return
        df_close_step = df_close_full[eligible_tickers]; df_high_step = df_high_full[eligible_tickers]; df_low_step = df_low_full[eligible_tickers]

        # 3. Run the core calculation
        results, debug_output = run_walk_forward_step(
            df_close_step, df_high_step, df_low_step, master_trading_days,
            actual_start_date, calc_period, fwd_period, 
            metric, rank_start, rank_end, benchmark_ticker, debug=debug
        )
        if results.get('error'):
            with ticker_list_output: print(f"Error: {results['error']}"); return
            
        # ======================= MODIFICATION START =======================
        # 3a. Augment the output containers with period dates and run parameters for external use.
        
        # Add period dates (from previous request)
        period_dates = {
            'calc_period_start': results['safe_start_date'],
            'calc_period_end': results['actual_calc_end_ts'],
            'forward_period_start': results['actual_calc_end_ts'],
            'forward_period_end': results['safe_viz_end_date']
        }
        
        # Add run parameters (new request)
        run_parameters = {
            'calc_period': calc_period,
            'fwd_period': fwd_period,
            'rank_metric': metric,
            'rank_start': rank_start,
            'rank_end': rank_end,
            'benchmark_ticker': benchmark_ticker
        }
        
        # Update the main results dictionary
        results.update(period_dates)
        results.update(run_parameters)

        # Update the debug dictionary if it exists
        if debug_output is not None and isinstance(debug_output, dict):
            debug_output.update(period_dates)
            debug_output.update(run_parameters)
        # ======================= MODIFICATION END =======================

        # 4. Update the interactive plot
        with fig.batch_update():
            # (Plotting code remains unchanged)
            for i in range(max_traces):
                trace = fig.data[i]
                if i < len(results['tickers_to_display']):
                    ticker = results['tickers_to_display'][i]; plot_data_series = results['normalized_plot_data'][ticker]
                    trace.x, trace.y, trace.name, trace.visible, trace.showlegend = plot_data_series.index, plot_data_series.values, ticker, True, True
                else: trace.visible, trace.showlegend = False, False
            benchmark_trace = fig.data[max_traces]
            if results['benchmark_price_series'] is not None and not results['benchmark_price_series'].loc[results['safe_start_date']:results['safe_viz_end_date']].dropna().empty:
                normalized_benchmark = results['benchmark_price_series'].loc[results['safe_start_date']:results['safe_viz_end_date']] / results['benchmark_price_series'].loc[results['safe_start_date']:].bfill().iloc[0]
                benchmark_trace.x, benchmark_trace.y, benchmark_trace.name, benchmark_trace.visible = normalized_benchmark.index, normalized_benchmark, f"Benchmark ({benchmark_ticker})", True
            else: benchmark_trace.visible = False
            portfolio_trace = fig.data[max_traces + 1]
            portfolio_trace.x, portfolio_trace.y, portfolio_trace.name, portfolio_trace.visible = results['portfolio_series'].index, results['portfolio_series'], 'Group Portfolio', True
            fig.layout.shapes = []; fig.add_shape(type="line", x0=results['actual_calc_end_ts'], y0=0, x1=results['actual_calc_end_ts'], y1=1, xref='x', yref='paper', line=dict(color="grey", width=2, dash="dash"))
            
        results_container[0] = results; debug_data_container[0] = debug_output
        
        # 5. Display summary statistics in a formatted table
        with ticker_list_output:
            # (Summary display code remains unchanged)
            print(f"Analysis Period: {results['safe_start_date'].date()} to {results['safe_viz_end_date'].date()}.")
            pprint.pprint(results['tickers_to_display'])
            p = results['performance_data']
            rows = []
            rows.append({'Metric': 'Group Portfolio Gain', 'Full': p['full_p_gain'], 'Calc': p['calc_p_gain'], 'Fwd': p['fwd_p_gain']})
            if not np.isnan(p['full_b_gain']):
                rows.append({'Metric': f'Benchmark ({benchmark_ticker}) Gain', 'Full': p['full_b_gain'], 'Calc': p['calc_b_gain'], 'Fwd': p['fwd_b_gain']})
                rows.append({'Metric': 'Gain Delta (vs Bm)', 'Full': p['full_p_gain'] - p['full_b_gain'], 'Calc': p['calc_p_gain'] - p['calc_b_gain'], 'Fwd': p['fwd_p_gain'] - p['fwd_b_gain']})
            rows.append({'Metric': 'Group Portfolio Sharpe', 'Full': p['full_p_sharpe'], 'Calc': p['calc_p_sharpe'], 'Fwd': p['fwd_p_sharpe']})
            if not np.isnan(p['full_b_sharpe']):
                rows.append({'Metric': f'Benchmark ({benchmark_ticker}) Sharpe', 'Full': p['full_b_sharpe'], 'Calc': p['calc_b_sharpe'], 'Fwd': p['fwd_b_sharpe']})
                rows.append({'Metric': 'Sharpe Delta (vs Bm)', 'Full': p['full_p_sharpe'] - p['full_b_sharpe'], 'Calc': p['calc_p_sharpe'] - p['calc_b_sharpe'], 'Fwd': p['fwd_p_sharpe'] - p['fwd_b_sharpe']})
            report_df = pd.DataFrame(rows).set_index('Metric')
            gain_rows = [row for row in report_df.index if 'Gain' in row or 'Delta' in row]
            sharpe_rows = [row for row in report_df.index if 'Sharpe' in row]
            styled_df = report_df.style.format('{:+.2%}', na_rep='N/A', subset=(gain_rows, report_df.columns)).format('{:+.2f}', na_rep='N/A', subset=(sharpe_rows, report_df.columns)).set_properties(**{'text-align': 'right', 'width': '100px'}).set_table_styles([{'selector': 'th.col_heading', 'props': [('text-align', 'right')]}, {'selector': 'th.row_heading', 'props': [('text-align', 'left')]}])
            print("\n--- Strategy Performance Summary ---")
            display(styled_df)
            
    # --- Final Layout & Display ---
    fig.update_layout(title_text='Walk-Forward Performance Analysis', xaxis_title='Date', yaxis_title='Normalized Price (Start = 1)', hovermode='x unified', legend_title_text='Tickers (Ranked)', height=600, margin=dict(t=50))
    fig.add_hline(y=1, line_width=1, line_dash="dash", line_color="grey")
    update_button.on_click(update_plot)
    controls_row1 = widgets.HBox([start_date_picker, calc_period_input, fwd_period_input])
    controls_row2 = widgets.HBox([metric_dropdown, rank_start_input, rank_end_input, benchmark_ticker_input, update_button])
    ui_container = widgets.VBox([controls_row1, controls_row2, ticker_list_output], layout=widgets.Layout(margin='10px 0 20px 0'))
    display(ui_container, fig)
    update_plot(None) # Initial run
    return (results_container, debug_data_container)

def run_full_backtest(df_ohlcv, strategy_params, quality_thresholds):
    """Runs a full backtest of a strategy over a specified date range."""
    print(f"--- Running Full Forensic Backtest for Strategy: {strategy_params['metric']} (Top {strategy_params['rank_start']}-{strategy_params['rank_end']}) ---")
    
    # 1. Unpack strategy parameters
    start_date, end_date = pd.to_datetime(strategy_params['start_date']), pd.to_datetime(strategy_params['end_date'])
    calc_period, fwd_period = strategy_params['calc_period'], strategy_params['fwd_period']
    metric, rank_start, rank_end = strategy_params['metric'], strategy_params['rank_start'], strategy_params['rank_end']
    benchmark_ticker = strategy_params['benchmark_ticker']
    master_calendar_ticker = strategy_params.get('master_calendar_ticker', 'VOO')
    
    # 2. Perform initial setup (same as analyzer)
    if master_calendar_ticker not in df_ohlcv.index.get_level_values(0):
        raise ValueError(f"Master calendar ticker '{master_calendar_ticker}' not found in DataFrame.")
    master_trading_days = df_ohlcv.loc[master_calendar_ticker].index.unique().sort_values()
    
    start_idx = master_trading_days.searchsorted(start_date)
    end_idx = master_trading_days.searchsorted(end_date, side='right')
    
    quality_metrics_df = calculate_rolling_quality_metrics(df_ohlcv, window=252)
    df_close_full = df_ohlcv['Adj Close'].unstack(level=0); df_high_full = df_ohlcv['Adj High'].unstack(level=0); df_low_full = df_ohlcv['Adj Low'].unstack(level=0)
    
    # 3. Loop through all periods in the backtest range
    step_indices = range(start_idx, end_idx, fwd_period)
    all_fwd_gains, period_by_period_debug = [], {}

    print(f"Simulating {len(step_indices)} periods from {master_trading_days[step_indices[0]].date()} to {master_trading_days[step_indices[-1]].date()}...")
    for current_idx in tqdm(step_indices, desc="Backtest Progress"):
        step_date = master_trading_days[current_idx]
        
        # Apply data quality filter for the current step
        eligible_tickers = get_eligible_universe(quality_metrics_df, step_date, quality_thresholds)
        if not eligible_tickers: continue
        
        df_close_step = df_close_full[eligible_tickers]; df_high_step = df_high_full[eligible_tickers]; df_low_step = df_low_full[eligible_tickers]
        
        # Run a single walk-forward analysis step
        results, debug_output = run_walk_forward_step(
            df_close_step, df_high_step, df_low_step, master_trading_days,
            step_date, calc_period, fwd_period,
            metric, rank_start, rank_end, benchmark_ticker, debug=True
        )
        
        # Collect results for this period
        if results['error'] is None:
            fwd_series = results['portfolio_series'].loc[results['actual_calc_end_ts']:]
            all_fwd_gains.append(fwd_series.pct_change().dropna())
            period_by_period_debug[step_date.date().isoformat()] = debug_output
            
    if not all_fwd_gains:
        print("Error: No valid periods were simulated."); return None

    # 4. Stitch together the results to form a continuous equity curve
    strategy_returns = pd.concat(all_fwd_gains); strategy_equity_curve = (1 + strategy_returns).cumprod()
    benchmark_returns = df_close_full[benchmark_ticker].pct_change().loc[strategy_equity_curve.index]; benchmark_equity_curve = (1 + benchmark_returns).cumprod()
    cumulative_equity_df = pd.DataFrame({'Strategy_Equity': strategy_equity_curve, 'Benchmark_Equity': benchmark_equity_curve})
    
    # 5. Plot the final equity curve
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=cumulative_equity_df.index, y=cumulative_equity_df['Strategy_Equity'], name='Strategy', line=dict(color='green')))
    fig.add_trace(go.Scatter(x=cumulative_equity_df.index, y=cumulative_equity_df['Benchmark_Equity'], name=f'Benchmark ({benchmark_ticker})', line=dict(color='black', dash='dash')))
    fig.update_layout(title=f"Cumulative Performance: '{metric}' Strategy (Top {rank_start}-{rank_end})", xaxis_title="Date", yaxis_title="Cumulative Growth")
    fig.show()

    # 6. Return the detailed results for forensic analysis
    final_backtest_results = {'cumulative_performance': cumulative_equity_df, 'period_by_period_debug': period_by_period_debug}
    print("\n✅ Full backtest complete. Results object is ready for forensic analysis.")
    return final_backtest_results

# --- E. VERIFICATION TOOLS (User Requested) ---

def verify_group_tickers_walk_forward_calculation(df_ohlcv, tickers_to_verify, benchmark_ticker,
                                                  start_date, calc_period, fwd_period,
                                                  master_calendar_ticker='VOO', export_csv=False):
    """Verifies portfolio and benchmark performance and optionally exports the data."""
    display(Markdown(f"## Verification Report for Portfolio vs. Benchmark"))
    display(Markdown(f"**Portfolio Tickers:** `{tickers_to_verify}`\n**Benchmark Ticker:** `{benchmark_ticker}`"))
    
    # 1. Setup trading day calendar and determine exact period dates
    if master_calendar_ticker not in df_ohlcv.index.get_level_values(0):
        raise ValueError(f"Master calendar ticker '{master_calendar_ticker}' not found in DataFrame.")
    master_trading_days = df_ohlcv.loc[master_calendar_ticker].index.unique().sort_values()

    start_date_raw = pd.to_datetime(start_date)
    start_idx = master_trading_days.searchsorted(start_date_raw)
    if start_idx >= len(master_trading_days):
        print(f"Error: Start date {start_date_raw.date()} is after the last available trading day."); return
    actual_start_date = master_trading_days[start_idx]
    
    calc_end_idx = min(start_idx + calc_period, len(master_trading_days) - 1)
    fwd_end_idx = min(calc_end_idx + fwd_period, len(master_trading_days) - 1)
    
    actual_calc_end_date = master_trading_days[calc_end_idx]
    actual_fwd_end_date = master_trading_days[fwd_end_idx]
    
    display(Markdown(f"**Analysis Start:** `{actual_start_date.date()}` (Selected: `{start_date_raw.date()}`)\n"
                    f"**Calc End:** `{actual_calc_end_date.date()}` ({calc_period} trading days)\n"
                    f"**Fwd End:** `{actual_fwd_end_date.date()}` ({fwd_period} trading days)"))

    # 2. Recreate the portfolio and benchmark series from scratch
    df_close_full = df_ohlcv['Adj Close'].unstack(level=0)
    portfolio_prices_raw_slice = df_close_full[tickers_to_verify].loc[actual_start_date:actual_fwd_end_date]
    portfolio_value_series = portfolio_prices_raw_slice.div(portfolio_prices_raw_slice.bfill().iloc[0]).mean(axis=1)
    benchmark_price_series = df_close_full.get(benchmark_ticker)
    
    # 3. Optionally export the underlying daily data to a CSV for external checking
    if export_csv:
        export_df = pd.DataFrame({
            'Portfolio_Normalized_Price': portfolio_value_series,
            'Portfolio_Daily_Return': portfolio_value_series.pct_change()
        })
        if benchmark_price_series is not None:
            norm_bm = benchmark_price_series.loc[actual_start_date:actual_fwd_end_date]
            norm_bm = norm_bm / norm_bm.bfill().iloc[0]
            export_df['Benchmark_Normalized_Price'] = norm_bm
            export_df['Benchmark_Daily_Return'] = norm_bm.pct_change()

        output_dir = 'export_csv'
        os.makedirs(output_dir, exist_ok=True)
        tickers_str = '_'.join(tickers_to_verify)
        filename = f"verify_group_{actual_start_date.date()}_{tickers_str}.csv"
        filepath = os.path.join(output_dir, filename)
        export_df.to_csv(filepath)
        print(f"\n✅ Data exported to: {filepath}")

    # 4. Define a helper to print detailed calculation steps
    def print_verification_steps(title, price_series):
        display(Markdown(f"#### Verification for: `{title}`"))
        if price_series.dropna().shape[0] < 2: print("  - Not enough data points."); return {'gain': np.nan, 'sharpe': np.nan}
        start_price, end_price = price_series.bfill().iloc[0], price_series.ffill().iloc[-1]
        gain = (end_price / start_price) - 1
        print(f"Start Value ({price_series.first_valid_index().date()}): {start_price:,.4f}\nEnd Value   ({price_series.last_valid_index().date()}): {end_price:,.4f}\nGain = {gain:.2%}")
        returns = price_series.pct_change()
        sharpe = calculate_sharpe(returns)
        print(f"Mean Daily Return: {returns.mean():.6f}\nStd Dev: {returns.std():.6f}\nSharpe = {sharpe:.2f}")
        return {'gain': gain, 'sharpe': sharpe}

    # 5. Run verification for each period
    display(Markdown("### A. Calculation Period"))
    perf_calc_p = print_verification_steps("Group Portfolio", portfolio_value_series.loc[actual_start_date:actual_calc_end_date])
    if benchmark_price_series is not None:
        perf_calc_b = print_verification_steps(f"Benchmark", benchmark_price_series.loc[actual_start_date:actual_calc_end_date])
    
    display(Markdown("### B. Forward Period"))
    perf_fwd_p = print_verification_steps("Group Portfolio", portfolio_value_series.loc[actual_calc_end_date:actual_fwd_end_date])
    if benchmark_price_series is not None:
        perf_fwd_b = print_verification_steps(f"Benchmark", benchmark_price_series.loc[actual_calc_end_date:actual_fwd_end_date])

def verify_ticker_ranking_metrics(df_ohlcv, ticker, start_date, calc_period,
                                  master_calendar_ticker='VOO', export_csv=False):
    """Verifies ranking metrics for a single ticker and optionally exports the data."""
    display(Markdown(f"## Verification Report for Ticker Ranking: `{ticker}`"))
    
    # 1. Setup trading day calendar and determine exact period dates
    if master_calendar_ticker not in df_ohlcv.index.get_level_values(0):
        raise ValueError(f"Master calendar ticker '{master_calendar_ticker}' not found in DataFrame.")
    master_trading_days = df_ohlcv.loc[master_calendar_ticker].index.unique().sort_values()

    start_date_raw = pd.to_datetime(start_date)
    start_idx = master_trading_days.searchsorted(start_date_raw)
    if start_idx >= len(master_trading_days):
        print(f"Error: Start date {start_date_raw.date()} is after the last available trading day."); return
    actual_start_date = master_trading_days[start_idx]
    
    calc_end_idx = min(start_idx + calc_period, len(master_trading_days) - 1)
    actual_calc_end_date = master_trading_days[calc_end_idx]

    # 2. Extract and prepare the raw data for the specific ticker and period
    df_ticker = df_ohlcv.loc[ticker].sort_index()
    calc_df = df_ticker.loc[actual_start_date:actual_calc_end_date].copy()
    if calc_df.empty or len(calc_df) < 2: 
        print("No data or not enough data in calc period."); return

    display(Markdown("### A. Calculation Period (for Ranking Metrics)"))
    display(Markdown(f"**Period Start:** `{actual_start_date.date()}`\n"
                    f"**Period End:** `{actual_calc_end_date.date()}`\n"
                    f"**Total Trading Days:** `{len(calc_df)}` (Requested: `{calc_period}`)"))
    
    display(Markdown("#### Detailed Metric Calculation Data"))
    
    # 3. Calculate all intermediate metrics as new columns for full transparency
    vdf = calc_df[['Adj High', 'Adj Low', 'Adj Close']].copy()
    vdf['Daily_Return'] = vdf['Adj Close'].pct_change()
    
    # Corrected True Range (TR) calculation for a single ticker (Series)
    tr_df = pd.DataFrame({
        'h_l': vdf['Adj High'] - vdf['Adj Low'],
        'h_cp': abs(vdf['Adj High'] - vdf['Adj Close'].shift(1)),
        'l_cp': abs(vdf['Adj Low'] - vdf['Adj Close'].shift(1))
    })
    vdf['TR'] = tr_df.max(axis=1)
    
    vdf['ATR_14'] = vdf['TR'].ewm(alpha=1/14, adjust=False).mean()
    vdf['ATRP'] = vdf['ATR_14'] / vdf['Adj Close']
    
    print("--- Start of Calculation Period ---")
    display(vdf.head())
    print("\n--- End of Calculation Period ---")
    display(vdf.tail())

    # 4. Optionally export this detailed breakdown to CSV
    if export_csv:
        output_dir = 'export_csv'
        os.makedirs(output_dir, exist_ok=True)
        filename = f"verify_ticker_{actual_start_date.date()}_{ticker}.csv"
        filepath = os.path.join(output_dir, filename)
        vdf.to_csv(filepath)
        print(f"\n✅ Data exported to: {filepath}")
    
    # 5. Print final metric calculations with formulas
    display(Markdown("#### `MetricValue` Verification Summary:"))
    
    calc_start_price = vdf['Adj Close'].bfill().iloc[0]
    calc_end_price = vdf['Adj Close'].ffill().iloc[-1]
    price_metric = (calc_end_price / calc_start_price)
    print(f"1. Price Metric: (Last Price / First Price) = ({calc_end_price:.2f} / {calc_start_price:.2f}) = {price_metric:.4f}")
    
    daily_returns = vdf['Daily_Return'].dropna()
    sharpe_ratio = calculate_sharpe(daily_returns)
    print(f"2. Sharpe Metric: (Mean Daily Return / Std Dev) * sqrt(252) = {sharpe_ratio:.4f}")

    atrp_mean = vdf['ATRP'].mean()
    mean_daily_return = vdf['Daily_Return'].mean()
    sharpe_atr = (mean_daily_return / atrp_mean) if atrp_mean > 0 else 0
    print(f"3. Sharpe (ATR) Metric: (Mean Daily Return / Mean ATRP) = ({mean_daily_return:.6f} / {atrp_mean:.6f}) = {sharpe_atr:.4f}")

# --- F. AUTOMATION SCRIPT - STRATEGY SEARCH ---

def run_strategy_search(df_ohlcv, config):
    """
    Runs the main backtesting loop with checkpointing to be resumable.
    """
    start_time = time.time() # <-- This now works because of 'import time'
    
    # --- 1. SETUP & LOAD PROGRESS ---
    print("--- Phase 1: Pre-processing and Loading Progress ---")
    quality_metrics_df = calculate_rolling_quality_metrics(df_ohlcv, window=252)
    print("Unstacking data for performance...")
    df_close_full = df_ohlcv['Adj Close'].unstack(level=0)
    df_high_full = df_ohlcv['Adj High'].unstack(level=0)
    df_low_full = df_ohlcv['Adj Low'].unstack(level=0)
    
    master_calendar_ticker = config['master_calendar_ticker']
    master_trading_days = df_ohlcv.loc[master_calendar_ticker].index.unique().sort_values()
    print(f"Master trading day calendar created from '{master_calendar_ticker}' ({len(master_trading_days)} days).")

    results_path = config['results_output_path']
    completed_params = set()
    
    if os.path.exists(results_path): # <-- This now works because of 'import os'
        print(f"Found existing results file. Loading progress from: {results_path}")
        df_progress = pd.read_csv(results_path)
        for _, row in df_progress.iterrows():
            param_key = (
                row['calc_period'], row['fwd_period'], row['metric'],
                (row['rank_start'], row['rank_end'])
            )
            completed_params.add(param_key)
        print(f"Found {len(completed_params)} completed parameter sets to skip.")
    else:
        print("No existing results file found. Starting a new run.")

    print("✅ Pre-processing complete.\n")

    # --- 2. SETUP THE MAIN LOOP ---
    print("--- Phase 2: Setting up Simulation Loops ---")
    
    param_combinations = list(product(
        config['calc_periods'], config['fwd_periods'],
        config['metrics'], config['rank_slices']
    ))
    
    search_start_date = pd.to_datetime(config['search_start_date'])
    search_end_date = pd.to_datetime(config['search_end_date'])
    start_idx = master_trading_days.searchsorted(search_start_date, side='left')
    end_idx = master_trading_days.searchsorted(search_end_date, side='right')

    step_dates_map = {}
    print("Pre-calculating rebalancing schedules for each holding period...")
    for fwd_period in sorted(config['fwd_periods']):
        step_indices = range(start_idx, end_idx, fwd_period)
        step_dates_map[fwd_period] = master_trading_days[step_indices]
        print(f"  - Holding Period {fwd_period} days: {len(step_dates_map[fwd_period])} rebalances")
    
    print(f"Found {len(param_combinations)} total parameter sets to simulate.")
    print("✅ Setup complete. Starting main loop...\n")

    # --- 3. RUN THE MAIN LOOP ---
    print("--- Phase 3: Running Simulations ---")
    pbar = tqdm(param_combinations, desc="Parameter Sets")
    
    for params in pbar:
        calc_period, fwd_period, metric, rank_slice = params
        rank_start, rank_end = rank_slice
        
        param_key = (calc_period, fwd_period, metric, rank_slice)
        if param_key in completed_params:
            pbar.set_description(f"Skipping {param_key}")
            continue

        pbar.set_description(f"Running {param_key}")
        
        current_params_results = []
        
        # ==============================================================================
        # --- FIX: RESTORED THE MISSING INNER LOOP ---
        # ==============================================================================
        current_step_dates = step_dates_map[fwd_period]
        for step_date in current_step_dates:
            eligible_tickers = get_eligible_universe(
                quality_metrics_df, filter_date=step_date, thresholds=config['quality_thresholds']
            )
            if not eligible_tickers: continue
            
            df_close_step = df_close_full[eligible_tickers]
            df_high_step = df_high_full[eligible_tickers]
            df_low_step = df_low_full[eligible_tickers]

            step_result, _ = run_walk_forward_step(
                df_close_full=df_close_step, df_high_full=df_high_step, df_low_full=df_low_step,
                master_trading_days=master_trading_days, start_date=step_date,
                calc_period=calc_period, fwd_period=fwd_period,
                metric=metric, rank_start=rank_start, rank_end=rank_end,
                benchmark_ticker=config['benchmark_ticker'], debug=False
            )
            
            if step_result['error'] is None:
                p = step_result['performance_data']
                log_entry = {
                    'step_date': step_date.date(), 'calc_period': calc_period,
                    'fwd_period': fwd_period, 'metric': metric,
                    'rank_start': rank_start, 'rank_end': rank_end,
                    'num_universe': len(eligible_tickers),
                    'num_portfolio': len(step_result['tickers_to_display']),
                    'fwd_p_gain': p['fwd_p_gain'], 'fwd_b_gain': p['fwd_b_gain'],
                    'fwd_gain_delta': p['fwd_p_gain'] - p['fwd_b_gain'] if not np.isnan(p['fwd_b_gain']) else np.nan,
                    'fwd_p_sharpe': p['fwd_p_sharpe'],
                }
                current_params_results.append(log_entry)
        # ==============================================================================
        
        # --- CHECKPOINTING: INCREMENTAL SAVE ---
        if current_params_results:
            df_to_append = pd.DataFrame(current_params_results)
            df_to_append.to_csv(
                results_path,
                mode='a',
                header=not os.path.exists(results_path),
                index=False
            )
            completed_params.add(param_key)

    print("✅ Main loop finished.\n")
    
    # --- 4. RETURN FINAL DATAFRAME ---
    print("--- Phase 4: Loading Final Results ---")
    if os.path.exists(results_path):
        final_df = pd.read_csv(results_path)
        end_time = time.time()
        print(f"✅ Process complete. Total execution time: {time.time() - start_time:.2f} seconds.")
        return final_df
    else:
        print("Warning: No results were generated.")
        return None
    


#### **CELL 3: DATA LOADING**
*This cell loads your main dataset using the environment-aware path.*

In [3]:
# ==============================================================================
# --- CELL 3: DATA LOADING ---
# ==============================================================================
import pandas as pd

data_file_path = env_config['data_path']
print(f"Attempting to load data from: {data_file_path}")

try:
    df_OHLCV = pd.read_parquet(data_file_path, engine='pyarrow')
    df_dev = df_OHLCV.copy() # Use df_dev for development as a good practice
    
    print("\n✅ Data loaded successfully.")
    print("\n--- DataFrame Info ---")
    df_dev.info()
except FileNotFoundError:
    print(f"\n❌ ERROR: FILE NOT FOUND at {data_file_path}. Please check paths in Cell 1.")

Attempting to load data from: c:\Users\ping\Files_win10\python\py311\stocks\data\df_OHLCV_stocks_etfs.parquet

✅ Data loaded successfully.

--- DataFrame Info ---
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 4423522 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZWS', Timestamp('2025-10-09 00:00:00'))
Data columns (total 5 columns):
 #   Column     Dtype  
---  ------     -----  
 0   Adj Open   float64
 1   Adj High   float64
 2   Adj Low    float64
 3   Adj Close  float64
 4   Volume     int64  
dtypes: float64(4), int64(1)
memory usage: 186.3+ MB


#### **CELL 4: BOT STRATEGY CONFIGURATION**
*This cell defines the strategy parameters you want to test.*


In [4]:
# ==============================================================================
# --- CELL 4: BOT STRATEGY CONFIGURATION ---
# ==============================================================================
from itertools import product
import pandas as pd

# --- PRIMARY USER INPUTS FOR THE STRATEGY ---
# 5,  21, 42, 63, 126, 252
# 1W, 1M, 2M, 3M,  6M,  1Y
HOLDING_PERIODS_DAYS = [63]        # Test ~2, and 3 month holding periods
CALC_PERIODS_DAYS = [252]         # Use ~6 and 12 month lookbacks

bot_config = {
    # --- Time Parameters ---
    'search_start_date': '2014-01-01',
    'search_end_date': '2018-12-31',
    
    # --- Strategy Parameters (The Search Grid) ---
    'calc_periods': CALC_PERIODS_DAYS,
    'fwd_periods': HOLDING_PERIODS_DAYS,


    # 'metrics': ['Sharpe', 'Sharpe (ATR)'],
    'metrics': ['Price', 'Sharpe', 'Sharpe (ATR)'],    
        
    'rank_slices': [(1, 5)],

    # --- Data Quality ---
    'quality_thresholds': { 'min_median_dollar_volume': 10_000_000, 
                            'max_stale_pct': 0.05, 
                            'max_same_vol_count': 1 },

    # --- General Parameters ---
    'benchmark_ticker': 'VOO',
    'master_calendar_ticker': 'VOO',
    'results_output_path': env_config['results_path']
}

print("\n--- Bot Configuration Initialized ---")
print(f"Calculation Periods to Test: {bot_config['calc_periods']} trading days")
print(f"Forward and Holding Periods to Test (Forward and Holding Periods are the same): {bot_config['fwd_periods']} trading days")
print(f"Results will be saved to: {bot_config['results_output_path']}")



--- Bot Configuration Initialized ---
Calculation Periods to Test: [252] trading days
Forward and Holding Periods to Test (Forward and Holding Periods are the same): [63] trading days
Results will be saved to: .\export_csv\dev_strategy_search_results.csv


#### **CELL 5: EXECUTION**
*This is the final cell that runs the backtest and displays the results.*

In [5]:
# ==============================================================================
# --- CELL 5: EXECUTION ---
# ==============================================================================

# --- Execute the Bot ---
dev_results_df = run_strategy_search(df_dev, bot_config)

# --- Display a sample of the results ---
if dev_results_df is not None:
    print("\n--- Sample of Generated Results ---")
    display(dev_results_df.head())
    print("\n--- Analysis of Best Performing Strategies ---")
    display(dev_results_df.groupby(['calc_period', 'fwd_period', 'metric', 'rank_start', 'rank_end'])['fwd_gain_delta'].mean().sort_values(ascending=False).to_frame())

--- Phase 1: Pre-processing and Loading Progress ---
--- Calculating Rolling Quality Metrics (Window: 252 days) ---
✅ Rolling metrics calculation complete.
Unstacking data for performance...
Master trading day calendar created from 'VOO' (3795 days).
Found existing results file. Loading progress from: .\export_csv\dev_strategy_search_results.csv
Found 3 completed parameter sets to skip.
✅ Pre-processing complete.

--- Phase 2: Setting up Simulation Loops ---
Pre-calculating rebalancing schedules for each holding period...
  - Holding Period 63 days: 20 rebalances
Found 3 total parameter sets to simulate.
✅ Setup complete. Starting main loop...

--- Phase 3: Running Simulations ---


Parameter Sets:   0%|          | 0/3 [00:00<?, ?it/s]

✅ Main loop finished.

--- Phase 4: Loading Final Results ---
✅ Process complete. Total execution time: 30.07 seconds.

--- Sample of Generated Results ---


Unnamed: 0,step_date,calc_period,fwd_period,metric,rank_start,rank_end,num_universe,num_portfolio,fwd_p_gain,fwd_b_gain,fwd_gain_delta,fwd_p_sharpe
0,2014-01-02,252,63,Price,1,5,456,5,0.113882,0.016202,0.09768,1.930501
1,2014-04-03,252,63,Price,1,5,467,5,0.023265,-7.6e-05,0.02334,0.571423
2,2014-07-03,252,63,Price,1,5,480,5,-0.143815,-0.052112,-0.091702,-1.618823
3,2014-10-02,252,63,Price,1,5,489,5,-0.073551,0.037252,-0.110804,-0.939686
4,2015-01-02,252,63,Price,1,5,498,5,-0.022273,0.021402,-0.043676,-0.120272



--- Analysis of Best Performing Strategies ---


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,fwd_gain_delta
calc_period,fwd_period,metric,rank_start,rank_end,Unnamed: 5_level_1
252,63,Price,1,5,0.0437
252,63,Sharpe (ATR),1,5,0.012087
252,63,Sharpe,1,5,0.001098


### 4. Next Steps & Future Improvements

This system is a powerful foundation. Here are potential areas for future development:
1.  **Advanced Performance Analytics:** Create a new notebook or function to analyze the output CSV, calculating metrics like Max Drawdown, Calmar Ratio, and generating equity curves for the best strategies.
2.  **Visualization:** Build heatmaps and other plots to visualize how different parameters (e.g., `calc_period` vs. `fwd_period`) affect performance.
3.  **Realism:** Incorporate transaction costs and slippage into the performance calculations for a more realistic backtest.
4.  **Configuration Management:** For even more complex tests, move the `bot_config` dictionary into a separate `config.py` file to keep the notebook cleaner.

It has been a genuine pleasure working with you on this. You've built an impressive and professional-grade tool. I wish you the very best with your continued research and development

### 4. Plot an export_csv Row to Check


In [6]:
row_to_check = 59
row_values =  dev_results_df.loc[row_to_check ].to_dict()
print(f'export_csv values for row {row_to_check}:\n')
for k, v in row_values.items():
    print(f'{k:<15}: {v}')


export_csv values for row 59:

step_date      : 2018-10-03
calc_period    : 252
fwd_period     : 63
metric         : Sharpe (ATR)
rank_start     : 1
rank_end       : 5
num_universe   : 618
num_portfolio  : 5
fwd_p_gain     : 0.0018278216675415
fwd_b_gain     : 0.1051186817286091
fwd_gain_delta : -0.1032908600610675
fwd_p_sharpe   : 0.8013969914777572


In [7]:
results_container, debug_container = plot_walk_forward_analyzer(
    df_ohlcv=df_dev,
    # df_ohlcv=df_OHLCV,    
    default_start_date=row_values['step_date'],
    
    default_calc_period=row_values['calc_period'],
    default_fwd_period=row_values['fwd_period'],
    # default_calc_period=120,
    # default_fwd_period=30,

    default_metric=row_values['metric'],

    default_rank_start=row_values['rank_start'],
    default_rank_end=row_values['rank_end'],
    # default_rank_start=2,
    # default_rank_end=3,    

    default_benchmark_ticker='VOO',
    quality_thresholds=bot_config['quality_thresholds'],
    debug=True  # <-- Activate the new mode!
)

Initializing Walk-Forward Analyzer (using Trading Day Logic)...
Master trading day calendar created from 'VOO' (3795 days).
Pre-calculating data quality metrics...
--- Calculating Rolling Quality Metrics (Window: 252 days) ---
✅ Rolling metrics calculation complete.
Pre-processing data (unstacking)...


VBox(children=(HBox(children=(DatePicker(value=Timestamp('2018-10-03 00:00:00'), description='Start Date:', st…

FigureWidget({
    'data': [{'mode': 'lines',
              'name': 'placeholder_0',
              'showlegend': False,
              'type': 'scatter',
              'uid': '4051c67f-c605-4255-a7f2-dad0940647f1',
              'visible': False,
              'x': [None],
              'y': [None]},
             {'mode': 'lines',
              'name': 'placeholder_1',
              'showlegend': False,
              'type': 'scatter',
              'uid': '597a5b26-1d26-4196-9a58-e2f546afc87a',
              'visible': False,
              'x': [None],
              'y': [None]},
             {'mode': 'lines',
              'name': 'placeholder_2',
              'showlegend': False,
              'type': 'scatter',
              'uid': 'b46855c7-44d7-4226-a180-17ccfc63b68a',
              'visible': False,
              'x': [None],
              'y': [None]},
             {'mode': 'lines',
              'name': 'placeholder_3',
              'showlegend': False,
              'type': 

Dynamic Filter (2018-10-03): Kept 618 of 722 tickers.


In [8]:
print_nested(results_container)

tickers_to_display:
    BIL
    MINT
    SHV
    BNDX
    VCSH
normalized_plot_data:
    Ticker           BIL      MINT       SHV      BNDX      VCSH
Date                                                        
2018-10-03  1.000000  1.000000  1.000000  1.000000  1.000000
2018-10-04  1.000218  1.000000  1.000181  0.998344  0.998717
2018-10-05  1.000327  1.000197  1.000181  0.996687  0.998588
2018-10-08  1.000327  1.000099  1.000272  0.997424  0.998204
2018-10-09  1.000218  1.000099  1.000272  0.997240  0.997947
...              ...       ...       ...       ...       ...
2019-12-30  1.025652  1.035087  1.028971  1.101826  1.078113
2019-12-31  1.025652  1.035290  1.029159  1.101437  1.077981
2020-01-02  1.025989  1.035392  1.029345  1.104356  1.077981
2020-01-03  1.025989  1.035698  1.029345  1.107471  1.079445
2020-01-06  1.025877  1.035902  1.029252  1.106693  1.080243

[316 rows x 5 columns]
portfolio_series:
    Date
2018-10-03    1.000000
2018-10-04    0.999492
2018-10-05    0.99919

# CHECK Metric_Sharpe (ATR) for Calc. Period has error.
- start date 2018-10-03
- Calc Period 252
- Fwd Period 63
- Metric Sharpe ATR
- Rank Start 1
- Rank End 5
-- Analysis Period 2018-10-03 to 2020-01-06 (this is full period, calc + fwd)
-- [BIL, MINT, SHV, BNDX, VCSH]
-- Metric_Sharpe (ATR) for BIL, MINT, SHV, BNDX matched my own calculation at bottom cell
-- Metric Sharpe (ATR) for VCSH is a bit off (code calc 0.194195, my calc 0.196112)

In [9]:
print_nested(debug_container)

ranking_metrics:
            FirstPrice  LastPrice  MeanDailyReturn  StdDevDailyReturn  MeanATRP  Metric_Price  Metric_Sharpe  Metric_Sharpe (ATR)
Ticker                                                                                                                       
BIL        76.9840    78.6823         0.000087           0.000119  0.000168      1.022060      11.548412             0.514743
MINT       82.6576    85.0710         0.000114           0.000152  0.000254      1.029198      11.955059             0.449219
SHV        92.4018    94.7312         0.000099           0.000149  0.000391      1.025209      10.526918             0.252854
BNDX       43.4704    48.6039         0.000444           0.001524  0.001865      1.118092       4.626560             0.238181
VCSH       63.7600    68.4641         0.000283           0.000990  0.001457      1.073778       4.537292             0.194195
...            ...        ...              ...                ...       ...           ...        

In [10]:
results_dict = results_container[0]

print_nested(results_dict)


tickers_to_display:
    BIL
    MINT
    SHV
    BNDX
    VCSH
normalized_plot_data:
    Ticker           BIL      MINT       SHV      BNDX      VCSH
Date                                                        
2018-10-03  1.000000  1.000000  1.000000  1.000000  1.000000
2018-10-04  1.000218  1.000000  1.000181  0.998344  0.998717
2018-10-05  1.000327  1.000197  1.000181  0.996687  0.998588
2018-10-08  1.000327  1.000099  1.000272  0.997424  0.998204
2018-10-09  1.000218  1.000099  1.000272  0.997240  0.997947
...              ...       ...       ...       ...       ...
2019-12-30  1.025652  1.035087  1.028971  1.101826  1.078113
2019-12-31  1.025652  1.035290  1.029159  1.101437  1.077981
2020-01-02  1.025989  1.035392  1.029345  1.104356  1.077981
2020-01-03  1.025989  1.035698  1.029345  1.107471  1.079445
2020-01-06  1.025877  1.035902  1.029252  1.106693  1.080243

[316 rows x 5 columns]
portfolio_series:
    Date
2018-10-03    1.000000
2018-10-04    0.999492
2018-10-05    0.99919

### Get Plot Parameters

In [11]:
_tickers = results_dict['tickers_to_display']
_calc_start = results_dict['calc_period_start']
_calc_end = results_dict['calc_period_end']
_fwd_start = results_dict['forward_period_start']
_fwd_end = results_dict['forward_period_end']
_calc_period = results_dict['calc_period']
_fwd_period = results_dict['fwd_period']
_benchmark_ticker = results_dict['benchmark_ticker']

print(f'_tickers: {_tickers}')
print(f'_calc_start: {_calc_start}')
print(f'_calc_end: {_calc_end}')
print(f'_fwd_start: {_fwd_start}')
print(f'_fwd_end: {_fwd_end}')
print(f'_calc_period: {_calc_period}')
print(f'_fwd_period: {_fwd_period}')
print(f'_benchmark: {_benchmark_ticker}')

_tickers: ['BIL', 'MINT', 'SHV', 'BNDX', 'VCSH']
_calc_start: 2018-10-03 00:00:00
_calc_end: 2019-10-04 00:00:00
_fwd_start: 2019-10-04 00:00:00
_fwd_end: 2020-01-06 00:00:00
_calc_period: 252
_fwd_period: 63
_benchmark: VOO


### Run verify_ticker_ranking_metrics with Plot Parameters to Check Calc. Period Calculation 

In [12]:
for _ticker in _tickers:
    verify_ticker_ranking_metrics(df_OHLCV, 
                                  ticker=_ticker, 
                                  start_date=_calc_start, 
                                  calc_period=_calc_period,
                                  master_calendar_ticker=_benchmark_ticker, 
                                  export_csv=True)

## Verification Report for Ticker Ranking: `BIL`

### A. Calculation Period (for Ranking Metrics)

**Period Start:** `2018-10-03`
**Period End:** `2019-10-04`
**Total Trading Days:** `253` (Requested: `252`)

#### Detailed Metric Calculation Data

--- Start of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-03,76.984,76.9756,76.984,,0.0084,0.0084,0.000109
2018-10-04,77.0008,76.9924,77.0008,0.000218,0.0168,0.009,0.000117
2018-10-05,77.0092,77.0008,77.0092,0.000109,0.0084,0.008957,0.000116
2018-10-08,77.0092,77.0008,77.0092,0.0,0.0084,0.008917,0.000116
2018-10-09,77.0092,77.0008,77.0008,-0.000109,0.0084,0.00888,0.000115



--- End of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-09-30,78.6591,78.6505,78.6591,0.000109,0.0086,0.014223,0.000181
2019-10-01,78.6651,78.6565,78.6565,-3.3e-05,0.0086,0.013821,0.000176
2019-10-02,78.6651,78.6565,78.6565,0.0,0.0086,0.013448,0.000171
2019-10-03,78.6823,78.6737,78.6823,0.000328,0.0258,0.01433,0.000182
2019-10-04,78.6823,78.6737,78.6823,0.0,0.0086,0.013921,0.000177



✅ Data exported to: export_csv\verify_ticker_2018-10-03_BIL.csv


#### `MetricValue` Verification Summary:

1. Price Metric: (Last Price / First Price) = (78.68 / 76.98) = 1.0221
2. Sharpe Metric: (Mean Daily Return / Std Dev) * sqrt(252) = 11.5484
3. Sharpe (ATR) Metric: (Mean Daily Return / Mean ATRP) = (0.000087 / 0.000168) = 0.5147


## Verification Report for Ticker Ranking: `MINT`

### A. Calculation Period (for Ranking Metrics)

**Period Start:** `2018-10-03`
**Period End:** `2019-10-04`
**Total Trading Days:** `253` (Requested: `252`)

#### Detailed Metric Calculation Data

--- Start of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-03,82.6658,82.6413,82.6576,,0.0245,0.0245,0.000296
2018-10-04,82.6576,82.6495,82.6576,0.0,0.0081,0.023329,0.000282
2018-10-05,82.6739,82.6576,82.6739,0.000197,0.0163,0.022827,0.000276
2018-10-08,82.6902,82.6577,82.6658,-9.8e-05,0.0325,0.023517,0.000284
2018-10-09,82.6821,82.6658,82.6658,0.0,0.0163,0.023002,0.000278



--- End of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-09-30,85.0207,85.004,85.004,-9.9e-05,0.0167,0.017788,0.000209
2019-10-01,85.0459,85.0291,85.0459,0.000493,0.0419,0.019511,0.000229
2019-10-02,85.0543,85.0459,85.0543,9.9e-05,0.0084,0.018717,0.00022
2019-10-03,85.071,85.0459,85.071,0.000196,0.0251,0.019173,0.000225
2019-10-04,85.071,85.0627,85.071,0.0,0.0083,0.018396,0.000216



✅ Data exported to: export_csv\verify_ticker_2018-10-03_MINT.csv


#### `MetricValue` Verification Summary:

1. Price Metric: (Last Price / First Price) = (85.07 / 82.66) = 1.0292
2. Sharpe Metric: (Mean Daily Return / Std Dev) * sqrt(252) = 11.9551
3. Sharpe (ATR) Metric: (Mean Daily Return / Mean ATRP) = (0.000114 / 0.000254) = 0.4492


## Verification Report for Ticker Ranking: `SHV`

### A. Calculation Period (for Ranking Metrics)

**Period Start:** `2018-10-03`
**Period End:** `2019-10-04`
**Total Trading Days:** `253` (Requested: `252`)

#### Detailed Metric Calculation Data

--- Start of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-03,92.4018,92.3766,92.4018,,0.0252,0.0252,0.000273
2018-10-04,92.4185,92.3934,92.4185,0.000181,0.0251,0.025193,0.000273
2018-10-05,92.4269,92.4018,92.4185,0.0,0.0251,0.025186,0.000273
2018-10-08,92.4269,92.4018,92.4269,9.1e-05,0.0251,0.02518,0.000272
2018-10-09,92.4269,92.4018,92.4269,0.0,0.0251,0.025174,0.000272



--- End of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-09-30,94.6163,94.6078,94.6163,0.00018,0.017,0.058612,0.000619
2019-10-01,94.6369,94.6112,94.6283,0.000127,0.0257,0.056261,0.000595
2019-10-02,94.6712,94.6541,94.6626,0.000362,0.0429,0.055307,0.000584
2019-10-03,94.714,94.6883,94.6969,0.000362,0.0514,0.055028,0.000581
2019-10-04,94.7312,94.7055,94.7312,0.000362,0.0343,0.053547,0.000565



✅ Data exported to: export_csv\verify_ticker_2018-10-03_SHV.csv


#### `MetricValue` Verification Summary:

1. Price Metric: (Last Price / First Price) = (94.73 / 92.40) = 1.0252
2. Sharpe Metric: (Mean Daily Return / Std Dev) * sqrt(252) = 10.5269
3. Sharpe (ATR) Metric: (Mean Daily Return / Mean ATRP) = (0.000099 / 0.000391) = 0.2529


## Verification Report for Ticker Ranking: `BNDX`

### A. Calculation Period (for Ranking Metrics)

**Period Start:** `2018-10-03`
**Period End:** `2019-10-04`
**Total Trading Days:** `253` (Requested: `252`)

#### Detailed Metric Calculation Data

--- Start of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-03,43.5905,43.4704,43.4704,,0.1201,0.1201,0.002763
2018-10-04,43.4704,43.3824,43.3984,-0.001656,0.088,0.117807,0.002715
2018-10-05,43.3904,43.3023,43.3264,-0.001659,0.0961,0.116257,0.002683
2018-10-08,43.3824,43.3344,43.3584,0.000739,0.056,0.111953,0.002582
2018-10-09,43.3584,43.3264,43.3504,-0.000185,0.032,0.106242,0.002451



--- End of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-09-30,48.5114,48.4702,48.495,-0.000849,0.066,0.128878,0.002658
2019-10-01,48.4801,48.3398,48.4141,-0.001668,0.1552,0.130758,0.002701
2019-10-02,48.4471,48.3976,48.4141,0.0,0.0495,0.124954,0.002581
2019-10-03,48.6286,48.5213,48.5956,0.003749,0.2145,0.13135,0.002703
2019-10-04,48.6204,48.5709,48.6039,0.000171,0.0495,0.125503,0.002582



✅ Data exported to: export_csv\verify_ticker_2018-10-03_BNDX.csv


#### `MetricValue` Verification Summary:

1. Price Metric: (Last Price / First Price) = (48.60 / 43.47) = 1.1181
2. Sharpe Metric: (Mean Daily Return / Std Dev) * sqrt(252) = 4.6266
3. Sharpe (ATR) Metric: (Mean Daily Return / Mean ATRP) = (0.000444 / 0.001865) = 0.2382


## Verification Report for Ticker Ranking: `VCSH`

### A. Calculation Period (for Ranking Metrics)

**Period Start:** `2018-10-03`
**Period End:** `2019-10-04`
**Total Trading Days:** `253` (Requested: `252`)

#### Detailed Metric Calculation Data

--- Start of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-03,63.8172,63.7027,63.76,,0.1145,0.1145,0.001796
2018-10-04,63.7355,63.6373,63.6782,-0.001283,0.1227,0.115086,0.001807
2018-10-05,63.6782,63.6045,63.67,-0.000129,0.0737,0.11213,0.001761
2018-10-08,63.6782,63.6455,63.6455,-0.000385,0.0327,0.106456,0.001673
2018-10-09,63.67,63.6291,63.6291,-0.000258,0.0409,0.101773,0.001599



--- End of Calculation Period ---


Unnamed: 0_level_0,Adj High,Adj Low,Adj Close,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-09-30,68.1113,68.0188,68.0945,0.000247,0.0925,0.106139,0.001559
2019-10-01,68.2452,68.0515,68.1778,0.001223,0.1937,0.112393,0.001649
2019-10-02,68.3126,68.2115,68.2957,0.001729,0.1348,0.113994,0.001669
2019-10-03,68.4726,68.3294,68.4389,0.002097,0.1769,0.118487,0.001731
2019-10-04,68.4894,68.4052,68.4641,0.000368,0.0842,0.116038,0.001695



✅ Data exported to: export_csv\verify_ticker_2018-10-03_VCSH.csv


#### `MetricValue` Verification Summary:

1. Price Metric: (Last Price / First Price) = (68.46 / 63.76) = 1.0738
2. Sharpe Metric: (Mean Daily Return / Std Dev) * sqrt(252) = 4.5373
3. Sharpe (ATR) Metric: (Mean Daily Return / Mean ATRP) = (0.000283 / 0.001443) = 0.1961


### My Own Check on Calculation for One Ticker

In [15]:
# -- Check Calculation for a Ticker -- #
_check_ticker = _tickers[4]
print(f'Check calculation for this ticker: {_check_ticker}')

# Get Ticker's OHLCV Data -- # 
_df = df_OHLCV.loc[_check_ticker][_calc_start:_fwd_end]

# -- Calculate Daily Return -- #
_df['Daily_Return'] = _df['Adj Close'].pct_change()

# -- Calculate True Range -- #
_df['TR'] = pd.concat([
    _df['Adj High'] - _df['Adj Low'],
    (_df['Adj High'] - _df['Adj Close'].shift(1)).abs(),
    (_df['Adj Low']  - _df['Adj Close'].shift(1)).abs()
], axis=1).max(axis=1)

# -- Calculate Average True Range (14 day period) -- #
window = 14
_df['ATR_14'] = pd.NA

# Seed the very first ATR value with the first non-NaN TR
first_idx = _df['TR'].first_valid_index()
_df.loc[first_idx, 'ATR_14'] = _df.loc[first_idx, 'TR']

# Iteratively apply the Wilder smoothing formula
for i in range(_df.index.get_loc(first_idx) + 1, len(_df)):
    prev_atr = _df.iloc[i-1]['ATR_14']
    curr_tr  = _df.iloc[i]['TR']
    _df.iloc[i, _df.columns.get_loc('ATR_14')] = (prev_atr * (window - 1) + curr_tr) / window

# -- Calculate ATRP -- #
_df['ATRP'] = _df['ATR_14'] / _df['Adj Close']

calc_pd_df = _df.loc[_calc_start:_calc_end]
fwd_pd_df = _df.loc[_fwd_start:_fwd_end]
print(f'Calc. Period:\n{calc_pd_df.head()}\n{calc_pd_df.tail()}')
print(f'\nFwd. Period:\n{fwd_pd_df.head()}\n{fwd_pd_df.tail()}')

Check calculation for this ticker: VCSH
Calc. Period:
            Adj Open  Adj High  Adj Low  Adj Close   Volume  Daily_Return      TR    ATR_14      ATRP
Date                                                                                                 
2018-10-03   63.8009   63.8172  63.7027    63.7600  2626168           NaN  0.1145    0.1145  0.001796
2018-10-04   63.6864   63.7355  63.6373    63.6782  3898111     -0.001283  0.1227  0.115086  0.001807
2018-10-05   63.6454   63.6782  63.6045    63.6700  3231821     -0.000129  0.0737   0.11213  0.001761
2018-10-08   63.6618   63.6782  63.6455    63.6455  1180498     -0.000385  0.0327  0.106456  0.001673
2018-10-09   63.6455   63.6700  63.6291    63.6291  1738715     -0.000258  0.0409  0.101773  0.001599
            Adj Open  Adj High  Adj Low  Adj Close   Volume  Daily_Return      TR    ATR_14      ATRP
Date                                                                                                 
2019-09-30   68.0273   68.11

In [16]:
print(f'Metric Calculation for Ticker: {_check_ticker}')

# -- Calculation for Period Gain -- #
full_period_gain = _df['Adj Close'][_fwd_end] / _df['Adj Close'][_calc_start]
calc_period_gain = _df['Adj Close'][_calc_end] / _df['Adj Close'][_calc_start]
fwd_period_gain = _df['Adj Close'][_fwd_end] / _df['Adj Close'][_fwd_start]

print(f'\nfull_period_gain: {full_period_gain:.4f}')
print(f'calc_period_gain: {calc_period_gain:.4f}')
print(f'fwd_period_gain: {fwd_period_gain:.4f}')

# -- Calculation for Period Sharpe -- #
full_period_return = _df['Daily_Return'][_calc_start:_fwd_end]
calc_period_return = _df['Daily_Return'][_calc_start:_calc_end]
fwd_period_return = _df['Daily_Return'][_fwd_start:_fwd_end]

full_sharpe = full_period_return.mean() / full_period_return.std() * (252 ** 0.5)
calc_sharpe = calc_period_return.mean() / calc_period_return.std() * (252 ** 0.5)
fwd_sharpe = fwd_period_return.mean() / fwd_period_return.std() * (252 ** 0.5)

print(f'\nfull_sharpe: {full_sharpe:.4f}')
print(f'calc_sharpe: {calc_sharpe:.4f}')
print(f'fwd_sharpe: {fwd_sharpe:.4f}')

# -- Calculation for Period Sharpe ATR -- #
full_sharpe_ATR = full_period_return.mean() / _df['ATRP'][_calc_start:_fwd_end].mean()
calc_sharpe_ATR = calc_period_return.mean() / _df['ATRP'][_calc_start:_calc_end].mean()
fwd_sharpe_ATR = fwd_period_return.mean() / _df['ATRP'][_fwd_start:_fwd_end].mean()

print(f'\nfull_sharpe_ATR: {full_sharpe_ATR:.4f}')
print(f'calc_sharpe_ATR: {calc_sharpe_ATR:.4f}')
print(f'fwd_sharpe_ATR: {fwd_sharpe_ATR:.4f}')

print(f'\ncalc_period_return.mean(): {calc_period_return.mean()}')
print(f"_df['ATRP'][_calc_start:_calc_end].mean(): {_df['ATRP'][_calc_start:_calc_end].mean()}")
print(f'calc_sharpe_ATR: {calc_sharpe_ATR}')

Metric Calculation for Ticker: VCSH

full_period_gain: 1.0802
calc_period_gain: 1.0738
fwd_period_gain: 1.0060

full_sharpe: 3.9711
calc_sharpe: 4.5373
fwd_sharpe: 1.7022

full_sharpe_ATR: 0.1710
calc_sharpe_ATR: 0.1961
fwd_sharpe_ATR: 0.0708

calc_period_return.mean(): 0.00028300214345682316
_df['ATRP'][_calc_start:_calc_end].mean(): 0.001443061544662687
calc_sharpe_ATR: 0.1961123172490709


In [43]:
df = df_OHLCV.loc['VCSH'].copy()
df


Unnamed: 0_level_0,Adj Open,Adj High,Adj Low,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2009-11-23,50.5753,50.6359,50.5753,50.6359,43548
2009-11-24,50.7166,50.8310,50.6628,50.8310,60938
2009-11-25,50.8310,50.9185,50.8108,50.9185,29428
2009-11-27,50.9319,50.9319,50.8781,50.9252,9363
2009-11-30,50.8579,50.9252,50.8512,50.9252,24523
...,...,...,...,...,...
2025-10-03,79.8100,79.8400,79.7500,79.7600,3322900
2025-10-06,79.7400,79.7700,79.7000,79.7000,4212300
2025-10-07,79.7400,79.8000,79.7200,79.7800,4040700
2025-10-08,79.8300,79.8300,79.7400,79.7500,4000000


In [29]:
_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3994 entries, 2009-11-23 to 2025-10-09
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Adj Open      3994 non-null   float64
 1   Adj High      3994 non-null   float64
 2   Adj Low       3994 non-null   float64
 3   Adj Close     3994 non-null   float64
 4   Volume        3994 non-null   int64  
 5   Daily_Return  3993 non-null   float64
 6   TR            3994 non-null   float64
 7   ATR_14        3994 non-null   object 
 8   ATRP          3994 non-null   object 
dtypes: float64(6), int64(1), object(2)
memory usage: 441.1+ KB


In [35]:
# -- Check Calculation for a Ticker -- #
_check_ticker = _tickers[4]
print(f'Check calculation for this ticker: {_check_ticker}')

# Get Ticker's OHLCV Data -- # 
_df = df_OHLCV.loc[_check_ticker]['2018-09-30' : '2020-01-10']
# _df = df_OHLCV.loc[_check_ticker].copy()

# -- Calculate Daily Return -- #
_df['Daily_Return'] = _df['Adj Close'].pct_change()

# -- Calculate True Range -- #
_df['TR'] = pd.concat([
    _df['Adj High'] - _df['Adj Low'],
    (_df['Adj High'] - _df['Adj Close'].shift(1)).abs(),
    (_df['Adj Low']  - _df['Adj Close'].shift(1)).abs()
], axis=1).max(axis=1)

# -- Calculate Average True Range (14 day period) -- #
window = 14
_df['ATR_14'] = pd.NA

# Seed the very first ATR value with the first non-NaN TR
first_idx = _df['TR'].first_valid_index()
_df.loc[first_idx, 'ATR_14'] = _df.loc[first_idx, 'TR']

# Iteratively apply the Wilder smoothing formula
for i in range(_df.index.get_loc(first_idx) + 1, len(_df)):
    prev_atr = _df.iloc[i-1]['ATR_14']
    curr_tr  = _df.iloc[i]['TR']
    _df.iloc[i, _df.columns.get_loc('ATR_14')] = (prev_atr * (window - 1) + curr_tr) / window

# -- Calculate ATRP -- #
_df['ATRP'] = _df['ATR_14'] / _df['Adj Close']

# calc_pd_df = _df.loc[_calc_start:_calc_end]
# fwd_pd_df = _df.loc[_fwd_start:_fwd_end]
# print(f'Calc. Period:\n{calc_pd_df.head()}\n{calc_pd_df.tail()}')
# print(f'\nFwd. Period:\n{fwd_pd_df.head()}\n{fwd_pd_df.tail()}')

_df

Check calculation for this ticker: VCSH


Unnamed: 0_level_0,Adj Open,Adj High,Adj Low,Adj Close,Volume,Daily_Return,TR,ATR_14,ATRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2018-10-01,63.8009,63.8172,63.7682,63.8172,1761088,,0.0490,0.049,0.000768
2018-10-02,63.8172,63.8418,63.7845,63.8336,4256075,0.000257,0.0573,0.049593,0.000777
2018-10-03,63.8009,63.8172,63.7027,63.7600,2626168,-0.001153,0.1309,0.055401,0.000869
2018-10-04,63.6864,63.7355,63.6373,63.6782,3898111,-0.001283,0.1227,0.060208,0.000945
2018-10-05,63.6454,63.6782,63.6045,63.6700,3231821,-0.000129,0.0737,0.061171,0.000961
...,...,...,...,...,...,...,...,...,...
2020-01-06,68.8593,68.8763,68.7575,68.8763,2396048,0.000740,0.1188,0.083505,0.001212
2020-01-07,68.8423,68.8423,68.7914,68.7999,1288683,-0.001109,0.0849,0.083605,0.001215
2020-01-08,68.8254,68.8423,68.5370,68.7575,1639178,-0.000616,0.3053,0.09944,0.001446
2020-01-09,68.7405,68.8169,68.7405,68.7999,1966802,0.000617,0.0764,0.097795,0.001421


In [38]:
df_OHLCV.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 4423522 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZWS', Timestamp('2025-10-09 00:00:00'))
Data columns (total 5 columns):
 #   Column     Dtype  
---  ------     -----  
 0   Adj Open   float64
 1   Adj High   float64
 2   Adj Low    float64
 3   Adj Close  float64
 4   Volume     int64  
dtypes: float64(4), int64(1)
memory usage: 186.3+ MB


### Below Cells Follows the Code to Calculate Sharpe (ATR)

In [40]:
df_ohlcv = df_OHLCV.copy()

In [94]:
ticker_to_check = 'VCSH' # <--- CHANGE THIS TO YOUR ACTUAL TICKER
start_date_raw = pd.to_datetime('2018-10-03')
calc_period_days = 252

# We need the master trading day calendar, just like the code uses.
# It's essential for getting the dates exactly right.
master_calendar_ticker = 'VOO' # Or another reliable ticker like 'SPY'
master_trading_days = df_ohlcv.loc[master_calendar_ticker].index.unique().sort_values()

# Isolate the full history for the ticker we're checking
df_ticker_full = df_ohlcv.loc[ticker_to_check]

In [95]:
df_ticker_full

Unnamed: 0_level_0,Adj Open,Adj High,Adj Low,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2009-11-23,50.5753,50.6359,50.5753,50.6359,43548
2009-11-24,50.7166,50.8310,50.6628,50.8310,60938
2009-11-25,50.8310,50.9185,50.8108,50.9185,29428
2009-11-27,50.9319,50.9319,50.8781,50.9252,9363
2009-11-30,50.8579,50.9252,50.8512,50.9252,24523
...,...,...,...,...,...
2025-10-03,79.8100,79.8400,79.7500,79.7600,3322900
2025-10-06,79.7400,79.7700,79.7000,79.7000,4212300
2025-10-07,79.7400,79.8000,79.7200,79.7800,4040700
2025-10-08,79.8300,79.8300,79.7400,79.7500,4000000


In [96]:
# Find the index for our start date
start_idx = master_trading_days.searchsorted(start_date_raw)
actual_start_date = master_trading_days[start_idx]

# Find the index for the end of the calculation period
calc_end_idx = min(start_idx + calc_period_days, len(master_trading_days) - 1)
actual_calc_end_date = master_trading_days[calc_end_idx]

print(f"Raw Start Date: {start_date_raw.date()}")
print(f"Actual Start Date (Trading Day): {actual_start_date.date()}")
print(f"Actual Calc End Date (Trading Day): {actual_calc_end_date.date()}")

Raw Start Date: 2018-10-03
Actual Start Date (Trading Day): 2018-10-03
Actual Calc End Date (Trading Day): 2019-10-04


In [97]:
# Slice the ticker's data to the exact date range
calc_df = df_ticker_full.loc[actual_start_date:actual_calc_end_date].copy()

print(f"Number of rows in calc_df: {len(calc_df)}")
print("--- First 3 rows of our calculation data ---")
display(calc_df.head())

Number of rows in calc_df: 253
--- First 3 rows of our calculation data ---


Unnamed: 0_level_0,Adj Open,Adj High,Adj Low,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-10-03,63.8009,63.8172,63.7027,63.76,2626168
2018-10-04,63.6864,63.7355,63.6373,63.6782,3898111
2018-10-05,63.6454,63.6782,63.6045,63.67,3231821
2018-10-08,63.6618,63.6782,63.6455,63.6455,1180498
2018-10-09,63.6455,63.67,63.6291,63.6291,1738715


In [98]:
calc_df['Daily_Return'] = calc_df['Adj Close'].pct_change()
mean_daily_return = calc_df['Daily_Return'].mean()

print(f"Mean Daily Return: {mean_daily_return:.8f}")

Mean Daily Return: 0.00028300


In [99]:
# 4a. Get the previous day's close for every day in our calc period.
# This is done by shifting the FULL history, then slicing.
prev_close_series = df_ticker_full['Adj Close'].shift(1).loc[calc_df.index]

# 4b. MY HYPOTHESIS: The first value of this series is NaN. Let's check.
print("--- Previous Day's Close (first 3 days) ---")
print(prev_close_series.head(3))
print("\n")

# 4c. Now calculate the three components of TR
component1 = calc_df['Adj High'] - calc_df['Adj Low']
component2 = abs(calc_df['Adj High'] - prev_close_series)
component3 = abs(calc_df['Adj Low'] - prev_close_series)

# 4d. Combine them to get the daily TR value
tr_df = pd.DataFrame({'c1': component1, 'c2': component2, 'c3': component3})
calc_df['TR'] = tr_df.max(axis=1)

# # Change TR to High - Low for the row 0
# calc_df.loc[calc_df.index[0], 'TR'] = (
#     calc_df.loc[calc_df.index[0], 'Adj High'] -
#     calc_df.loc[calc_df.index[0], 'Adj Low']
# )

print("--- Final TR values (first 3 days) ---")
print(calc_df[['Adj Close', 'TR']].head(3))

--- Previous Day's Close (first 3 days) ---
Date
2018-10-03    63.8336
2018-10-04    63.7600
2018-10-05    63.6782
Name: Adj Close, dtype: float64


--- Final TR values (first 3 days) ---
            Adj Close      TR
Date                         
2018-10-03    63.7600  0.1309
2018-10-04    63.6782  0.1227
2018-10-05    63.6700  0.0737


In [100]:
# Calculate the ATR using the exact same parameters as the code
calc_df['ATR_14'] = calc_df['TR'].ewm(alpha=1/14, adjust=False).mean()

print("--- TR vs ATR_14 (first 5 days) ---")
display(calc_df[['TR', 'ATR_14']].head(5))

--- TR vs ATR_14 (first 5 days) ---


Unnamed: 0_level_0,TR,ATR_14
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-10-03,0.1309,0.1309
2018-10-04,0.1227,0.130314
2018-10-05,0.0737,0.12627
2018-10-08,0.0327,0.119587
2018-10-09,0.0409,0.113966


In [101]:
# ATRP is the ATR divided by the close price
calc_df['ATRP'] = calc_df['ATR_14'] / calc_df['Adj Close']

# The final denominator is the mean of the daily ATRP values over the period
atrp_mean = calc_df['ATRP'].mean()

print(f"Mean ATRP (Denominator): {atrp_mean:.8f}")

Mean ATRP (Denominator): 0.00145731


In [102]:
metric_sharpe_atr_manual = mean_daily_return / atrp_mean

print("--- FINAL VERIFICATION ---")
print(f"Numerator (MeanDailyReturn): {mean_daily_return:.8f}")
print(f"Denominator (ATRP_Mean):     {atrp_mean:.8f}")
print(f"Calculated Sharpe (ATR):     {metric_sharpe_atr_manual:.6f}")

--- FINAL VERIFICATION ---
Numerator (MeanDailyReturn): 0.00028300
Denominator (ATRP_Mean):     0.00145731
Calculated Sharpe (ATR):     0.194195
