### Backtest Results Verification

This notebook performs a manual, step-by-step calculation of portfolio returns for a **single, specific date** and compares them against the results generated by the automated backtesting engine (`py9`).

Its purpose is to serve as a sanity check and a debugging tool to ensure the core logic of the backtester is correct.

**Workflow:**
1.  **Setup:** Define the single `DATE_STR` to be checked.
2.  **Load Data:** Load the three required files: the portfolio selection for the target date, the historical price data, and the master backtest results file.
3.  **Manual Calculation:** Manually identify the buy/sell dates and calculate the portfolio returns for each weighting scheme.
4.  **Compare & Verify:** Extract the corresponding results from the master backtest file, display them side-by-side with the manual calculations, and assert that they are numerically equal.

### Setup and Configuration

**This is the only cell you need to edit.** Set the `DATE_STR` to the date of the selection run you want to verify.

In [22]:
# py10_backtest_verification.ipynb

import sys
from pathlib import Path
import pandas as pd
import numpy as np


# --- THIS IS THE ONLY PARAMETER TO CHANGE ---
# DATE_STR = "2025-06-11" 

# --- Project Path Setup ---
NOTEBOOK_DIR = Path.cwd()

# Corrected Logic:
# The notebook is at ROOT/notebooks/_working.
# To get to the ROOT directory, we need to go up two levels.
# NOTEBOOK_DIR.parent -> .../notebooks
# NOTEBOOK_DIR.parent.parent -> .../stocks (This is the correct ROOT_DIR)
ROOT_DIR = NOTEBOOK_DIR.parent.parent

SRC_DIR = ROOT_DIR / 'src'
if str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))

sys.path.append(str(ROOT_DIR / 'notebooks'))  # Add config.py to sys.path

# --- Verification (Optional, but good for debugging) ---
print(f"Current Notebook Dir: {NOTEBOOK_DIR}")
print(f"Calculated ROOT_DIR:   {ROOT_DIR}")
print(f"Calculated SRC_DIR:    {SRC_DIR}")
print(f"sys.path contains SRC_DIR: {str(SRC_DIR) in sys.path}")

# --- Local Imports ---
import utils
from config import DATE_STR, RISK_FREE_RATE_DAILY

# --- File Path Construction (using our standard principles) ---
# We derive all paths from the verification date
SELECTION_DIR = ROOT_DIR / 'output' / 'selection_results'
BACKTEST_DIR = ROOT_DIR / 'output' / 'backtest_results'
DATA_DIR = ROOT_DIR / 'data' # Assuming data dir is at root/data


# Construct the exact filenames we expect
SELECTION_FILE_PATH = SELECTION_DIR / f"{DATE_STR}_short_term_mean_reversion.parquet"
BACKTEST_RESULTS_PATH = BACKTEST_DIR / 'backtest_master_results.parquet'
HISTORICAL_PRICES_PATH = DATA_DIR / 'df_adj_close.parquet'

# --- Notebook Setup ---
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format', '{:.6f}'.format)
%load_ext autoreload
%autoreload 2

# --- Verification ---
print(f"Verifying backtest for selection date: {DATE_STR}")
print(f"Selection File: {SELECTION_FILE_PATH}")
print(f"Backtest Results: {BACKTEST_RESULTS_PATH}")
print(f"Price Data: {HISTORICAL_PRICES_PATH}")

Current Notebook Dir: c:\Users\ping\Files_win10\python\py311\stocks\notebooks\_working
Calculated ROOT_DIR:   c:\Users\ping\Files_win10\python\py311\stocks
Calculated SRC_DIR:    c:\Users\ping\Files_win10\python\py311\stocks\src
sys.path contains SRC_DIR: True
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Verifying backtest for selection date: 2025-06-11
Selection File: c:\Users\ping\Files_win10\python\py311\stocks\output\selection_results\2025-06-11_short_term_mean_reversion.parquet
Backtest Results: c:\Users\ping\Files_win10\python\py311\stocks\output\backtest_results\backtest_master_results.parquet
Price Data: c:\Users\ping\Files_win10\python\py311\stocks\data\df_adj_close.parquet


### Step 1: Load All Required Data

In [23]:
print("--- Step 1: Loading all required data files ---")

try:
    # Load the specific portfolio selection for the verification date
    df_selection = pd.read_parquet(SELECTION_FILE_PATH)
    print(f"✅ Successfully loaded selection file for {DATE_STR}.")
    
    # Load the master backtest results file
    df_backtest = pd.read_parquet(BACKTEST_RESULTS_PATH)
    print("✅ Successfully loaded master backtest results.")

    # Load the historical price data
    df_prices = pd.read_parquet(HISTORICAL_PRICES_PATH)
    df_prices.index = pd.to_datetime(df_prices.index)
    print("✅ Successfully loaded historical price data.")
    
    data_loaded_successfully = True

except FileNotFoundError as e:
    print(f"❌ ERROR: Could not find a required file. {e}")
    data_loaded_successfully = False

--- Step 1: Loading all required data files ---
✅ Successfully loaded selection file for 2025-06-11.
✅ Successfully loaded master backtest results.
✅ Successfully loaded historical price data.


### Step 2: Manual Performance Calculation

In [24]:
if data_loaded_successfully:
    print(f"\n--- Step 2: Manually calculating performance for {DATE_STR} ---")
    
    # Isolate the tickers from our portfolio
    tickers = df_selection.index.to_list()
    
    # --- Correct Date Logic ---
    # Find the index location of our selection date
    # date_loc = df_prices.index.get_loc(pd.to_datetime(DATE_STR), method='ffill')
    date_loc = df_prices.index.get_indexer([pd.to_datetime(DATE_STR)], method='ffill')[0]
    
    # Get the next two trading days from the index
    buy_date = df_prices.index[date_loc + 1]
    sell_date = df_prices.index[date_loc + 2]
    
    print(f"Selection Date (actual used): {df_prices.index[date_loc].date()}")
    print(f"Buy Date (T+1): {buy_date.date()}")
    print(f"Sell Date (T+2): {sell_date.date()}")
    
    # Extract the prices for the buy and sell dates for our selected tickers
    buy_prices = df_prices.loc[buy_date, tickers]
    sell_prices = df_prices.loc[sell_date, tickers]
    
    # Calculate individual returns
    individual_returns = (sell_prices - buy_prices) / buy_prices
    
    # Calculate portfolio return for each weighting scheme
    weights_df = df_selection[['Weight_EW', 'Weight_IV', 'Weight_SW']]
    weighted_returns = weights_df.multiply(individual_returns, axis=0)
    
    manual_results = weighted_returns.sum()
    # This takes an index like 'Weight_EW', 
    # splits it by _ into ['Weight', 'EW'], 
    # and takes the last element, 'EW'.
    manual_results.index = manual_results.index.str.split('_').str[-1]
    manual_results.name = "manual_return"
    
    print("\nManual Calculation Results:")
    display(manual_results.to_frame())


--- Step 2: Manually calculating performance for 2025-06-11 ---
Selection Date (actual used): 2025-06-11
Buy Date (T+1): 2025-06-12
Sell Date (T+2): 2025-06-13

Manual Calculation Results:


Unnamed: 0,manual_return
EW,-0.017819
IV,-0.017221
SW,-0.018375


### Step 3: Compare and Verify

In [25]:
if data_loaded_successfully:
    print(f"\n--- Step 3: Comparing manual results with automated backtest results ---")
    
    # Extract the results from the automated backtest file
    automated_results_df = df_backtest[
        (df_backtest['selection_date'] == DATE_STR)
    ].set_index('scheme')[['portfolio_return']]
    automated_results_df.columns = ['backtest_return']

    # Create a comparison DataFrame
    df_comparison = pd.concat([manual_results, automated_results_df], axis=1)
    df_comparison['match'] = np.isclose(df_comparison['manual_return'], df_comparison['backtest_return'])

    print("Comparison Table:")
    display(df_comparison)
    
    # Assert that all results match
    assert df_comparison['match'].all(), "❌ VERIFICATION FAILED: Manual and backtest results do not match!"
    
    print("\n✅ Verification Successful: All calculated portfolio returns match the automated backtest results.")


--- Step 3: Comparing manual results with automated backtest results ---
Comparison Table:


Unnamed: 0,manual_return,backtest_return,match
EW,-0.017819,-0.017819,True
IV,-0.017221,-0.017221,True
SW,-0.018375,-0.018375,True



✅ Verification Successful: All calculated portfolio returns match the automated backtest results.


# TODO: WRONG, check using _backtest_check2.ipynb

### TODO: change to this performance metics

In [27]:
import pandas as pd
import numpy as np


# --- Main Calculation Logic ---

buy_date = pd.to_datetime(DATE_STR) + pd.Timedelta(days=1)
sell_date = buy_date + pd.Timedelta(days=1)

print(f"Buy Date: {buy_date.date()}")
print(f"Sell Date: {sell_date.date()}")
print(f"Daily Risk-Free Rate: {RISK_FREE_RATE_DAILY:.6f}")


# 2. Extract prices for buy and sell dates
buy_prices = df_prices.loc[buy_date]
sell_prices = df_prices.loc[sell_date]

# 3. Calculate individual stock returns
individual_returns = (sell_prices / buy_prices) - 1

# --- START OF MODIFICATIONS ---

# 3a. Calculate the standard deviation of the individual stock returns for the day
std_dev_daily_returns = individual_returns.std()

# --- END OF MODIFICATIONS ---

# 4. Calculate portfolio performance
weights_df = df_selection[['Weight_EW', 'Weight_IV', 'Weight_SW']]
weighted_returns = weights_df.multiply(individual_returns, axis=0)
portfolio_performance = weighted_returns.sum()

# --- START OF MODIFICATIONS ---

# 5. Calculate Daily Efficiency Ratio
# This uses the standard deviation of individual stock returns as the denominator,
# as we only have a single day of portfolio data.
# It measures portfolio excess return against the cross-sectional volatility of its components.
if std_dev_daily_returns > 0:
    daily_efficiency_ratio = (portfolio_performance - RISK_FREE_RATE_DAILY) / std_dev_daily_returns
else:
    # Handle case with no volatility to avoid division by zero
    daily_efficiency_ratio = pd.Series([np.nan] * len(portfolio_performance), index=portfolio_performance.index)
    
# --- END OF MODIFICATIONS ---


# 6. Display the final results

print("\n--- Individual Stock Returns ---")
print((individual_returns).map('{:.6f}'.format))


print("\n--- Portfolio Performance ---")
portfolio_performance.name = "Portfolio Return"
print((portfolio_performance).map('{:.4f}'.format))

# --- START OF MODIFICATIONS ---

print("\n--- Daily Risk Metrics ---")
print(f"Standard Deviation of Individual Stock Returns: {std_dev_daily_returns:.6f}")

print("\n--- Daily Efficiency Ratio (using cross-sectional std dev) ---")
daily_efficiency_ratio.name = "Daily Efficiency Ratio"
print(daily_efficiency_ratio.map('{:.4f}'.format))

# --- END OF MODIFICATIONS ---

Buy Date: 2025-06-12
Sell Date: 2025-06-13
Daily Risk-Free Rate: 0.000159

--- Individual Stock Returns ---
Ticker
A       -0.015390
AA      -0.017672
AAL     -0.048624
AAON    -0.015644
AAPL    -0.013805
          ...    
ZM      -0.010076
ZS       0.001725
ZTO     -0.018560
ZTS     -0.026596
ZWS     -0.023030
Length: 1556, dtype: object

--- Portfolio Performance ---
Weight_EW    -0.0178
Weight_IV    -0.0172
Weight_SW    -0.0184
Name: Portfolio Return, dtype: object

--- Daily Risk Metrics ---
Standard Deviation of Individual Stock Returns: 0.016548

--- Daily Efficiency Ratio (using cross-sectional std dev) ---
Weight_EW    -1.0864
Weight_IV    -1.0502
Weight_SW    -1.1200
Name: Daily Efficiency Ratio, dtype: object
