### Backtest Orchestrator Workflow

This notebook automates a rolling-window backtest by iteratively executing a worker notebook (`_pm_worker_sharpe_strategy.ipynb`) for each time slice.

**Workflow:**

1.  **Setup:** Loads project configurations and strategy parameters.
2.  **Load Data:** Reads the master adjusted close prices file.
3.  **Prepare Data Chunks:** Converts prices to returns and uses a utility function to create rolling window data chunks for the backtest.
4.  **Execute Backtests:** Loops through each data chunk, splitting it into train/test sets, saving them to a temporary directory, and running the worker notebook via `papermill` with the corresponding file paths.
5.  **Aggregate Results:** After all workers complete, it collects their individual output files and combines them into a single, final portfolio returns file.
6.  **Cleanup:** Deletes the temporary data files created during the process.

### Step 1: Setup and Configuration

This cell contains all imports and configuration variables. It defines the project structure and parameters for the rolling backtest.

In [None]:
import sys
from pathlib import Path
import pandas as pd
import papermill as pm
import shutil
from IPython.display import display, Markdown

# --- Project Path Configuration ---
# [REFACTOR] Standardized path setup for consistency and portability.
NOTEBOOK_DIR = Path.cwd()
ROOT_DIR = NOTEBOOK_DIR.parent
DATA_DIR = ROOT_DIR / 'data'
SRC_DIR = ROOT_DIR / 'src'
TEMP_DIR = NOTEBOOK_DIR / 'temp_backtest_data'
OUTPUT_DIR = NOTEBOOK_DIR / 'backtest_results'

# --- Add src to Python path ---
if str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))

# --- Import Custom Modules ---
# [REFACTOR] Moved imports here to follow the standard structure.
import utils

# --- Papermill Configuration ---
WORKER_NOTEBOOK_NAME = "_pm_worker_sharpe_strategy.ipynb"
WORKER_NOTEBOOK_PATH = NOTEBOOK_DIR / WORKER_NOTEBOOK_NAME
AGGREGATED_RESULTS_FILENAME = "final_portfolio_returns.parquet"

# --- Strategy & Backtest Parameters ---
# [REFACTOR] All strategy parameters centralized for easy tuning.
SLIDING_WINDOW_WIDTH = 300
SLIDING_WINDOW_STEP = 30
TRAIN_TEST_SPLIT_POINT = 270 # Defines how many rows of the window are for training
BENCHMARK_TICKER = 'VGT'
FINVIZ_DATA_FILENAME = '2025-08-01_df_finviz_merged_stocks_etfs.parquet' # Example filename

# --- Verification ---
print("--- Path Configuration ---")
print(f"ROOT_DIR:                {ROOT_DIR}")
print(f"NOTEBOOK_DIR:            {NOTEBOOK_DIR}")
print(f"DATA_DIR:                {DATA_DIR}")
print(f"SRC_DIR:                 {SRC_DIR}")
print(f"TEMP_DIR:                {TEMP_DIR}")
print(f"OUTPUT_DIR:              {OUTPUT_DIR}")
print(f"WORKER_NOTEBOOK_PATH:    {WORKER_NOTEBOOK_PATH}")
assert all([ROOT_DIR.exists(), DATA_DIR.exists(), SRC_DIR.exists(), NOTEBOOK_DIR.exists()]), "A key directory was not found!"
assert (DATA_DIR / FINVIZ_DATA_FILENAME).exists(), f"Finviz data file not found: {FINVIZ_DATA_FILENAME}"
assert WORKER_NOTEBOOK_PATH.exists(), f"Worker notebook not found at: {WORKER_NOTEBOOK_PATH}"

print("\n--- Strategy Parameters ---")
print(f"Window Width:  {SLIDING_WINDOW_WIDTH}, Step Size: {SLIDING_WINDOW_STEP}")
print(f"Train/Test Split: {TRAIN_TEST_SPLIT_POINT} rows for training")
print(f"Benchmark:     {BENCHMARK_TICKER}")

### Step 2: Load and Prepare Data

Load the adjusted close prices and convert them to percentage returns, which form the basis of our analysis.

In [None]:
# Load historical price data
adj_close_path = DATA_DIR / 'df_adj_close.parquet'
df_adj_close = pd.read_parquet(adj_close_path)

# Calculate daily returns
returns = df_adj_close.pct_change().dropna()

# Add a risk-free 'CASH' asset with zero return
returns['CASH'] = 0.0

print(f"Loaded returns data. Shape: {returns.shape}")
print(f"Date range: {returns.index.min().strftime('%Y-%m-%d')} to {returns.index.max().strftime('%Y-%m-%d')}")
display(returns.head(3))

### Step 3: Generate Rolling Window Chunks

Using our utility function, slice the returns data into overlapping chunks that will be used for each iteration of the backtest.

In [None]:
# [REFACTOR] Logic moved to a clean, reusable function in src/utils.py
rolling_chunks = utils.create_rolling_window_chunks(
    returns_df=returns,
    window_width=SLIDING_WINDOW_WIDTH,
    step_size=SLIDING_WINDOW_STEP
)

print(f"Successfully generated {len(rolling_chunks)} rolling window chunks.")
if rolling_chunks:
    chunk = rolling_chunks[0]
    print(f"Shape of each chunk: {chunk.shape}")
    print(f"First chunk date range: {chunk.index.min().strftime('%Y-%m-%d')} to {chunk.index.max().strftime('%Y-%m-%d')}")

### Step 4: Execute Backtest Loop via Papermill

Iterate through each data chunk, save the train/test splits to disk, and execute the worker notebook.

**Note:** This cell may take a long time to run, depending on the number of chunks.

In [None]:
# Create temporary and output directories if they don't exist
TEMP_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

print(f"--- Starting Papermill Execution Loop for {len(rolling_chunks)} chunks ---")

# For demonstration, we run only the first 3 chunks.
# To run all, change to: for i, chunk in enumerate(rolling_chunks):
for i, chunk in enumerate(rolling_chunks[:1]):
    # Define train and test sets for this chunk
    returns_train = chunk.iloc[:TRAIN_TEST_SPLIT_POINT]
    returns_test = chunk.iloc[TRAIN_TEST_SPLIT_POINT:]

    # Define temporary file paths for this iteration
    train_file = TEMP_DIR / f"returns_train_chunk_{i}.parquet"
    test_file = TEMP_DIR / f"returns_test_chunk_{i}.parquet"
    result_file = OUTPUT_DIR / f"result_chunk_{i}.parquet"
    output_notebook_path = OUTPUT_DIR / f"output_notebook_chunk_{i}.ipynb"

    # Save data splits to disk
    returns_train.to_parquet(train_file)
    returns_test.to_parquet(test_file)

    print(f"\nExecuting chunk {i+1}/{len(rolling_chunks)}...")
    print(f"  Train Period: {returns_train.index.min().date()} to {returns_train.index.max().date()}")
    print(f"  Test Period:  {returns_test.index.min().date()} to {returns_test.index.max().date()}")

    # [REFACTOR] All paths are now passed as parameters for a self-contained worker.
    pm.execute_notebook(
       input_path=WORKER_NOTEBOOK_PATH,
       output_path=output_notebook_path,
       parameters={
           "returns_train_path": str(train_file),
           "returns_test_path": str(test_file),
           "finviz_data_path": str(DATA_DIR / FINVIZ_DATA_FILENAME),
           "output_path": str(result_file),
           "benchmark_ticker": BENCHMARK_TICKER,
       },
       kernel_name="python3"
    )

print("\n--- Papermill execution complete. ---")

### Step 5: Aggregate and Save Final Results

Collect the individual result files from each worker run and combine them into a single, continuous time series of portfolio returns.

In [None]:
# [REFACTOR] New aggregation step replaces the fragile 'append' logic.
all_results = []
for i in range(len(rolling_chunks[:1])): # Match the loop range from Step 4
    result_file = OUTPUT_DIR / f"result_chunk_{i}.parquet"
    if result_file.exists():
        chunk_result = pd.read_parquet(result_file)
        all_results.append(chunk_result)
    else:
        print(f"Warning: Result file not found for chunk {i}: {result_file}")

if all_results:
    final_portfolio_returns = pd.concat(all_results).sort_index()
    # Remove any potential duplicate rows from window overlaps if necessary
    final_portfolio_returns = final_portfolio_returns[~final_portfolio_returns.index.duplicated(keep='first')]

    # Save the aggregated results
    final_output_path = OUTPUT_DIR / AGGREGATED_RESULTS_FILENAME
    final_portfolio_returns.to_parquet(final_output_path)

    print(f"Successfully aggregated {len(all_results)} result chunks.")
    print(f"Final portfolio returns saved to: {final_output_path}")
    display(final_portfolio_returns.head())
    display(final_portfolio_returns.tail())
else:
    print("No result files found to aggregate.")

### Step 6: Cleanup Temporary Files

Remove the intermediate `temp_backtest_data` directory to keep the project folder clean. This step includes garbage collection and a short delay to ensure all file handles from the worker notebooks are released before deletion.

In [None]:
import shutil
import gc
import time

# [REFACTOR] Added a more robust cleanup step to handle potential file locks on Windows.
if TEMP_DIR.exists():
    # Explicitly run garbage collection to help release file handles from papermill
    print("Running garbage collection to release file handles...")
    gc.collect()

    # Add a short delay to give the OS time to release file locks
    print("Waiting 2 seconds before cleanup...")
    time.sleep(2)

    try:
        shutil.rmtree(TEMP_DIR)
        print(f"✅ Successfully removed temporary directory: {TEMP_DIR}")
    except OSError as e:
        print(f"❌ Error removing temporary directory {TEMP_DIR}: {e}")
        print("This may happen if a file is still open. Try restarting the kernel and running this cell again.")
else:
    print("Temporary directory not found, no cleanup needed.")