### Performance Ratio Calculation Pipeline

This notebook calculates various performance metrics (Sharpe, Sortino, Omega) over multiple time horizons for a given set of tickers.

**Workflow:**

1.  **Prerequisites:** A cleaned OHLCV data file (`df_OHLCV_clean_stocks_etfs.parquet`) must exist. The `config.py` file must be up-to-date with the desired `DATE_STR` and `ANNUAL_RISK_FREE_RATE`.
2.  **Load Data:** Loads the cleaned OHLCV data.
3.  **Prepare Data Windows:** Slices the data up to the `DATE_STR` and creates a list of DataFrames, each corresponding to a different time window (e.g., 3 days, 5 days, 250 days).
4.  **Calculate Ratios:** Iterates through each data window and ticker to calculate the performance ratios.
5.  **Clean Results:** Handles any `NaN` or infinite values that result from the calculations (e.g., due to zero volatility).
6.  **Save & Verify:** Saves the final DataFrame of ratios to a Parquet file and displays a sample for verification.


### Setup and Configuration

This cell loads all necessary libraries and configuration parameters. It pulls dynamic settings from `config.py` and defines static settings for this specific notebook.


In [None]:
import sys
from pathlib import Path
import pandas as pd
import numpy as np
from tabulate import tabulate

# --- Project Path Setup ---
NOTEBOOK_DIR = Path.cwd()
ROOT_DIR = NOTEBOOK_DIR.parent if NOTEBOOK_DIR.name == 'notebooks' else NOTEBOOK_DIR
if str(ROOT_DIR) not in sys.path:
    sys.path.append(str(ROOT_DIR))
SRC_DIR = ROOT_DIR / 'src'
if str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))

# --- Dynamic Configuration (from config.py) ---
from config import DATE_STR, DEST_DIR, ANNUAL_RISK_FREE_RATE
import utils

# --- Static Configuration for this Notebook ---
# Define the time windows (in days) for which to calculate ratios.
DAYS_RATIO = [3, 5, 10, 15, 30, 60, 120, 250]

# Define a list of key symbols to display in the final verification step.
# SAMPLE_SYMBOLS = ['AAPL', 'MSFT', 'GOOG', 'NVDA', 'AMZN', 'TSLA', 'META', 'GLD', 'MSTR', 'IBIT']
SAMPLE_SYMBOLS = [
    'AAPL', 'MSFT', 'GOOG', 'NVDA', 'AMZN', 'TSLA', 
     'META', 'GLD', 'MSTR', 'IBIT', 'SHOP', 'VGT',
     'ORCl', 'SNOW', 'COIN', 
    ]

# --- File Path Construction ---
# The source file is the cleaned OHLCV data.
SOURCE_PATH = Path(DEST_DIR) / 'df_OHLCV_clean_stocks_etfs.parquet'
# The destination file will store the calculated performance ratios for the given DATE_STR.
DEST_PATH = Path(DEST_DIR) / f'{DATE_STR}_df_perf_ratios_stocks_etfs.parquet'

# --- Notebook Setup ---
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 2500)
%load_ext autoreload
%autoreload 2

# --- Verification ---
print(f"Processing for Date: {DATE_STR}")
print(f"Annual Risk-Free Rate: {ANNUAL_RISK_FREE_RATE}")
print(f"Source file: {SOURCE_PATH}")
print(f"Destination file: {DEST_PATH}")

Processing for Date: 2025-06-24
Annual Risk-Free Rate: 0.04
Source file: c:\Users\ping\Files_win10\python\py311\stocks\data\df_OHLCV_clean_stocks_etfs.parquet
Destination file: c:\Users\ping\Files_win10\python\py311\stocks\data\2025-06-24_df_perf_ratios_stocks_etfs.parquet


### Step 1: Load Source Data

Load the cleaned OHLCV data, which serves as the single source of truth for prices and the list of valid tickers.

In [2]:
print(f"--- Step 1: Loading data from {SOURCE_PATH.name} ---")

try:
    df_ohlcv = pd.read_parquet(SOURCE_PATH, engine='pyarrow')
    # Extract only the 'Adj Close' column needed for ratio calculations.
    df_adj_close = df_ohlcv[['Adj Close']].copy()
    
    # The list of tickers is derived directly from the cleaned data.
    tickers = df_adj_close.index.get_level_values('Ticker').unique().tolist()
    
    print(f"Successfully loaded data for {len(tickers)} tickers.")
    df_adj_close.info()
    
except FileNotFoundError:
    print(f"ERROR: Source file not found at {SOURCE_PATH}. Halting execution.")
    df_adj_close = None
except Exception as e:
    print(f"An error occurred during file loading: {e}")
    df_adj_close = None

--- Step 1: Loading data from df_OHLCV_clean_stocks_etfs.parquet ---
Successfully loaded data for 1559 tickers.
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 544091 entries, ('A', Timestamp('2025-06-24 00:00:00')) to ('ZWS', Timestamp('2024-02-01 00:00:00'))
Data columns (total 1 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Adj Close  544091 non-null  float64
dtypes: float64(1)
memory usage: 6.3+ MB


### Step 2: Prepare Data Windows for Analysis

This step filters the data up to the target `DATE_STR` and then splits it into multiple DataFrames, one for each period defined in `DAYS_RATIO`.


In [3]:
def prepare_data_windows(df, end_date, days_ratio_list):
    """
    Filters data up to an end_date and creates data windows. It also handles
    the inconsistency between this notebook's 'Ticker' index name and the
    utility function's expected 'Symbol' index name.
    """
    # Filter data to include only dates up to and including the end_date.
    end_date_ts = pd.to_datetime(end_date)
    filtered_df = df[df.index.get_level_values('Date') <= end_date_ts]
    
    # --- FIX: Adapt to the utility function's expectation ---
    # Temporarily rename the 'Ticker' level to 'Symbol' before passing to the util.
    df_for_utils = filtered_df.rename_axis(index={'Ticker': 'Symbol'})
    
    # Generate the list of DataFrames for each time window.
    windows_with_symbol_index = utils.get_latest_dfs(df_for_utils, days_ratio_list)
    
    # --- FIX: Adapt the result back to our standard ---
    # Rename the 'Symbol' level back to 'Ticker' in the results for consistency.
    windows_with_ticker_index = [
        win.rename_axis(index={'Symbol': 'Ticker'}) for win in windows_with_symbol_index
    ]
    
    return windows_with_ticker_index

if df_adj_close is not None:
    print(f"\n--- Step 2: Preparing data windows for date {DATE_STR} ---")
    list_of_data_windows = prepare_data_windows(df_adj_close, DATE_STR, DAYS_RATIO)
    print(f"Created {len(list_of_data_windows)} data windows for the following periods: {DAYS_RATIO}")
    # Example: Check the length of the first window for the first ticker
    sample_ticker = list_of_data_windows[0].index.get_level_values('Ticker')[0]
    print(f"Sample: Window 1 ('{DAYS_RATIO[0]}d') for ticker '{sample_ticker}' has {len(list_of_data_windows[0].loc[sample_ticker])} rows.")
else:
    print("Skipping step because source data failed to load.")
    list_of_data_windows = []


--- Step 2: Preparing data windows for date 2025-06-24 ---
Created 8 data windows for the following periods: [3, 5, 10, 15, 30, 60, 120, 250]
Sample: Window 1 ('3d') for ticker 'A' has 3 rows.


### Step 3: Calculate Performance Ratios

Iterate through each data window and ticker, calculating the full suite of performance metrics using the `analyze_stock` utility function.

In [4]:
def calculate_all_ratios(data_windows, risk_free_rate):
    """Calculates performance ratios for all tickers across all data windows."""
    all_results = {}
    for df_window in data_windows:
        tickers_in_window = df_window.index.get_level_values('Ticker').unique()
        for ticker in tickers_in_window:
            # Suppress division-by-zero warnings during calculation.
            with np.errstate(divide='ignore', invalid='ignore'):
                result_df = utils.analyze_stock(df_window, ticker, risk_free_rate=risk_free_rate)
            
            if result_df is not None:
                ticker_name = result_df.index[0]
                metrics = result_df.iloc[0].to_dict()
                all_results.setdefault(ticker_name, {}).update(metrics)
                
    if not all_results:
        return pd.DataFrame()
        
    return pd.DataFrame.from_dict(all_results, orient='index')

if list_of_data_windows:
    print("\n--- Step 3: Calculating performance ratios ---")
    df_ratios_raw = calculate_all_ratios(list_of_data_windows, ANNUAL_RISK_FREE_RATE)
    print(f"Successfully calculated ratios for {len(df_ratios_raw)} tickers.")
    display(df_ratios_raw.head())
else:
    print("Skipping calculation because no data windows were prepared.")
    df_ratios_raw = pd.DataFrame()


--- Step 3: Calculating performance ratios ---
Successfully calculated ratios for 1559 tickers.


Unnamed: 0,Sharpe 3d,Sortino 3d,Omega 3d,Sharpe 5d,Sortino 5d,Omega 5d,Sharpe 10d,Sortino 10d,Omega 10d,Sharpe 15d,Sortino 15d,Omega 15d,Sharpe 30d,Sortino 30d,Omega 30d,Sharpe 60d,Sortino 60d,Omega 60d,Sharpe 120d,Sortino 120d,Omega 120d,Sharpe 250d,Sortino 250d,Omega 250d
A,24.504862,inf,,6.657626,19.944519,3.512773,-3.363689,-4.218381,0.578812,3.058159,5.457725,1.614388,0.562953,0.78863,1.095659,0.213866,0.307703,1.03862,-0.731239,-1.015711,0.87798,-0.421686,-0.598949,0.928503
AA,13.665621,inf,,0.521782,1.035965,1.093248,0.116788,0.201737,1.019238,2.896426,5.313945,1.568742,0.801328,1.280224,1.132803,-0.142809,-0.204348,0.973576,-0.738581,-1.004655,0.877266,-0.432843,-0.606936,0.928945
AAL,46.341363,inf,,15.040418,inf,,-1.548899,-2.097882,0.780332,-0.107266,-0.152926,0.981982,-0.350022,-0.505706,0.941744,0.651281,1.021593,1.142921,-1.260375,-1.769612,0.785071,0.224741,0.348974,1.04417
AAON,27.889772,inf,,2.025803,3.092826,1.389659,-7.770553,-7.706771,0.296731,-6.302637,-6.11368,0.167109,-5.545682,-5.463625,0.255317,-0.224461,-0.301857,0.95833,-1.304895,-1.562836,0.770343,-0.054668,-0.071835,0.988936
AAPL,-5.03253,-6.949409,0.380897,7.700143,30.100663,4.792327,-1.653421,-2.322925,0.76724,-1.460434,-2.054022,0.798747,-2.309565,-2.959262,0.683668,-0.518368,-0.787214,0.897174,-1.060782,-1.510992,0.810622,-0.084015,-0.119268,0.983683


### Step 4: Clean Results by Handling Infinite and NaN Values

The ratio calculations can produce `Inf` or `NaN` values (e.g., if volatility is zero). This step replaces them with reasonable boundary values (the min/max of the finite values in each column).


In [5]:
def clean_infinite_and_nan_values(df):
    """Replaces NaN, Inf, and -Inf values in a DataFrame."""
    if df.empty:
        return df
        
    df_clean = df.copy()
    numeric_cols = df_clean.select_dtypes(include=np.number).columns

    for col in numeric_cols:
        finite_vals = df_clean[col][np.isfinite(df_clean[col])]
        if not finite_vals.empty:
            max_val = finite_vals.max()
            min_val = finite_vals.min()
            # Replace NaN, Inf with max; -Inf with min
            df_clean[col] = df_clean[col].fillna(max_val)
            df_clean[col] = df_clean[col].replace([np.inf, -np.inf], [max_val, min_val])
            
    return df_clean

if not df_ratios_raw.empty:
    print("\n--- Step 4: Cleaning NaN and Infinite values from results ---")
    has_nan_inf_before = df_ratios_raw.isnull().values.any() or np.isinf(df_ratios_raw.select_dtypes(include=np.number)).values.any()
    print(f"Does the raw data contain NaN/Inf values? {has_nan_inf_before}")

    df_final = clean_infinite_and_nan_values(df_ratios_raw)

    has_nan_inf_after = df_final.isnull().values.any() or np.isinf(df_final.select_dtypes(include=np.number)).values.any()
    print(f"Does the final data contain NaN/Inf values? {has_nan_inf_after}")
    print("Cleaning complete.")
else:
    print("Skipping cleaning step as no ratios were calculated.")
    df_final = pd.DataFrame()


--- Step 4: Cleaning NaN and Infinite values from results ---
Does the raw data contain NaN/Inf values? True
Does the final data contain NaN/Inf values? False
Cleaning complete.


### Step 5: Save Final Results

Save the cleaned DataFrame of performance ratios to a Parquet file.

In [6]:
if not df_final.empty:
    print("\n--- Step 5: Saving final results ---")
    try:
        DEST_PATH.parent.mkdir(parents=True, exist_ok=True)
        df_final.to_parquet(DEST_PATH, engine='pyarrow', compression='zstd')
        print(f"Successfully saved final ratios to: {DEST_PATH}")
    except Exception as e:
        print(f"An error occurred while saving the file: {e}")
else:
    print("Skipping save step because the final DataFrame is empty.")


--- Step 5: Saving final results ---
Successfully saved final ratios to: c:\Users\ping\Files_win10\python\py311\stocks\data\2025-06-24_df_perf_ratios_stocks_etfs.parquet


### Step 6: Verify and Display Sample Results

Load the saved file and display a formatted table for a sample of key symbols to provide a final, human-readable check.

In [7]:
print("\n--- Step 6: Verifying saved file and displaying sample ---")
if DEST_PATH.exists():
    verified_df = pd.read_parquet(DEST_PATH)
    
    # Filter for symbols that exist in our results
    symbols_to_show = [s for s in SAMPLE_SYMBOLS if s in verified_df.index]
    
    if symbols_to_show:
        sample_df = verified_df.loc[symbols_to_show]
        print(f"Performance Ratios for {DATE_STR}")
        print(tabulate(sample_df, headers='keys', tablefmt='grid', floatfmt='.4f'))
    else:
        print("None of the sample symbols were found in the final results.")
else:
    print("Could not verify file as it was not found at the destination path.")


--- Step 6: Verifying saved file and displaying sample ---
Performance Ratios for 2025-06-24
+------+-------------+--------------+------------+-------------+--------------+------------+--------------+---------------+-------------+--------------+---------------+-------------+--------------+---------------+-------------+--------------+---------------+-------------+---------------+----------------+--------------+---------------+----------------+--------------+
|      |   Sharpe 3d |   Sortino 3d |   Omega 3d |   Sharpe 5d |   Sortino 5d |   Omega 5d |   Sharpe 10d |   Sortino 10d |   Omega 10d |   Sharpe 15d |   Sortino 15d |   Omega 15d |   Sharpe 30d |   Sortino 30d |   Omega 30d |   Sharpe 60d |   Sortino 60d |   Omega 60d |   Sharpe 120d |   Sortino 120d |   Omega 120d |   Sharpe 250d |   Sortino 250d |   Omega 250d |
| AAPL |     -5.0325 |      -6.9494 |     0.3809 |      7.7001 |      30.1007 |     4.7923 |      -1.6534 |       -2.3229 |      0.7672 |      -1.4604 |       -2.0540 |