# üé° Wheel Strategy Backtester V2 - REAL DATA

This notebook implements the Wheel Strategy using **real market data from Databento**.

## Strategy Summary

"On fundamentally strong stocks, sell **30-delta puts** (30‚Äì45 DTE) only when **price ‚â§ 20-day SMA** or **‚â§ lower Bollinger Band** (20,2); if assigned, sell **30-delta calls** (30‚Äì45 DTE) until called away; close positions at **50% profit** or **21 DTE remaining**."

## Architecture

- **Function-based**: All code uses functions (no classes)
- **Self-contained**: All code, API keys, and logic in this notebook
- **UV package management**: Uses UV instead of pip/conda
- **Disk caching**: Data cached as parquet files for efficiency
- **Inspection-friendly**: All major functions output DataFrames/dicts for user inspection

## Data Flow

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 1: Configuration & Setup                           ‚îÇ
‚îÇ - Imports, API keys, strategy parameters, cache config      ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 2: Data Fetching Functions                          ‚îÇ
‚îÇ - Databento client, equity data, options data, market data  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 3: Data Caching Functions                          ‚îÇ
‚îÇ - Cache paths, read/write, statistics, loading              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 4: Data Processing Functions                       ‚îÇ
‚îÇ - Delta calculation, technical indicators, beta             ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 5: Filtering Functions                             ‚îÇ
‚îÇ - Fundamental filters, technical filters, chain filters    ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 6: Strategy Logic Functions                        ‚îÇ
‚îÇ - Position management, contract selection, rolling logic  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 7: Backtest Engine Functions                       ‚îÇ
‚îÇ - Portfolio state, trade execution, daily processing       ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 8: Reporting Functions                             ‚îÇ
‚îÇ - Metrics calculation, visualization, reporting            ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                       ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ SECTION 9: Main Execution                                 ‚îÇ
‚îÇ - Cache inspection, data collection, pre-computation,      ‚îÇ
‚îÇ   backtest execution, results & visualization              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

## Section Breakdown

1. **Section 0**: Overview & Data Flow (this cell)
2. **Section 1**: Configuration & Setup
3. **Section 2**: Data Fetching Functions
4. **Section 3**: Data Caching Functions
5. **Section 4**: Data Processing Functions
6. **Section 5**: Filtering Functions
7. **Section 6**: Strategy Logic Functions
8. **Section 7**: Backtest Engine Functions
9. **Section 8**: Reporting Functions
10. **Section 9**: Main Execution


---

# SECTION 1: Configuration & Setup

## 1.1 Package Installation (UV)

**Data Flow:**
- **Input**: None
- **Processing**: Install required packages using UV
- **Output**: Packages installed and ready
- **Dependencies**: None

**Note**: Run this cell first to install all dependencies using UV.



In [1]:
# ============================================================================
# INSTALL PACKAGES USING UV
# ============================================================================
# Run this cell first to install all required packages

# List of required packages
required_packages = [
    "pandas",
    "numpy", 
    "databento",
    "yfinance",
    "scipy",
    "matplotlib",
    "seaborn",
    "tqdm"
]

# Check which packages are missing
print("Checking required packages...")
print("=" * 60)

missing = []
for pkg in required_packages:
    try:
        __import__(pkg)
        print(f"‚úÖ {pkg} already installed")
    except ImportError:
        print(f"‚ö†Ô∏è  {pkg} not found")
        missing.append(pkg)

print("=" * 60)

# Install missing packages using UV
if missing:
    print(f"Installing {len(missing)} missing packages using UV...")
    packages_str = " ".join(missing)
    !uv pip install {packages_str}
    print(f"‚úÖ {packages_str} Installation complete!")
else:
    print("‚úÖ All packages ready!")


Checking required packages...
‚ö†Ô∏è  pandas not found
‚ö†Ô∏è  numpy not found
‚ö†Ô∏è  databento not found
‚ö†Ô∏è  yfinance not found
‚ö†Ô∏è  scipy not found
‚ö†Ô∏è  matplotlib not found
‚ö†Ô∏è  seaborn not found
‚ö†Ô∏è  tqdm not found
Installing 8 missing packages using UV...
[2mUsing Python 3.14.0 environment at: /Users/samuelminer/Projects/nissan_options/wheel_strategy/.venv[0m
[2K[2mResolved [1m47 packages[0m [2min 2.23s[0m[0m                                        [0m
[2K   [36m[1mBuilding[0m[39m peewee[2m==3.18.3[0m                                             
[2K[1A   [36m[1mBuilding[0m[39m peewee[2m==3.18.3[0m                                     [1A
   [36m[1mBuilding[0m[39m multitasking[2m==0.0.12[0m
[2K[2A   [36m[1mBuilding[0m[39m peewee[2m==3.18.3[0m                                     [2A
   [36m[1mBuilding[0m[39m multitasking[2m==0.0.12[0m
[37m‚†ô[0m [2mPreparing packages...[0m (0/28)
[2K[3A   [36m[1mBuilding[0m[39m 

## 1.2 API Configuration

**Data Flow:**
- **Input**: None
- **Processing**: Set up API keys and configuration
- **Output**: API client and configuration variables
- **Dependencies**: Section 1.1 (imports)

**‚ö†Ô∏è WARNING**: API key is hardcoded below. Do NOT commit this notebook to public repositories without removing the key first!


In [None]:
# ‚ö†Ô∏è WARNING: API KEY HARDCODED - DO NOT COMMIT TO PUBLIC REPOS
DATABENTO_API_KEY = "db-thQsshevN7PTFvnJhLHp8yhy6fexH"

# Dataset IDs
OPRA_DATASET = "OPRA.PILLAR"  # Options data
XNAS_DATASET = "XNAS.ITCH"    # Equity data

print("‚úÖ API Configuration loaded")


‚úÖ API Configuration loaded
   Equity time: 15:30:00
   Options time: 15:45:00


## 1.3 Strategy Parameters

**Data Flow:**
- **Input**: None
- **Processing**: Define all strategy parameters as simple variables/dicts
- **Output**: Configuration variables available for use
- **Dependencies**: None


In [None]:
# Data Fetching Times (hardcoded in strategy parameters)
EQUITY_TIME = "15:30:00"  # 3:30 PM for equity prices (SMA/BB calculation)
OPTIONS_TIME = "15:45:00"  # 3:45 PM for options execution

# Put Selling Parameters
PUT_DELTA_TARGET = 0.30
PUT_DELTA_BAND = 0.05  # ¬±0.05, so range is 0.25-0.35
PUT_DTE_MIN = 30
PUT_DTE_MAX = 45
PUT_DTE_TARGET = 35
PUT_MIN_PREMIUM_ROC = 0.02  # 2% minimum return on capital

# Call Selling Parameters (Covered Calls)
CALL_DELTA_DEFAULT = 0.30
CALL_DELTA_DEFENSIVE = 0.20  # When underwater
CALL_DELTA_AGGRESSIVE = 0.35  # When profitable + near upper BB
CALL_DELTA_AGGRESSIVE_MAX = 0.40
CALL_DTE_MIN = 30
CALL_DTE_MAX = 45

# Technical Indicators
SMA_WINDOW = 20
BB_WINDOW = 20
BB_STD_DEV = 2.0

# Calculation Windows
BETA_WINDOW_DAYS = 252 * 3  # 3 years of trading days for beta calculation
VOLATILITY_WINDOW_DAYS = 30  # 30 days for volatility estimation

# Fundamental Filters
MIN_MARKET_CAP = 10_000_000_000  # $10B
MAX_MARKET_CAP = None  # No upper limit
MIN_BETA = 0.3
MAX_BETA = 1.5
MIN_PRICE = 20.0
MAX_PRICE = 500.0

# Liquidity Filters
MIN_OPEN_INTEREST = 1000
MIN_VOLUME = 0  # No minimum volume requirement
MAX_SPREAD_PCT = 0.10  # 10% max bid-ask spread

# Position Management
PROFIT_TARGET_PCT = 0.50  # Close at 50% profit
MIN_DTE_TO_HOLD = 21  # Close if DTE < 21
MAX_POSITIONS = 10
INITIAL_CASH = 100_000.0

# Backtest Period
WARMUP_DAYS = 20  # Days needed for SMA/BB calculation # This should be rolling wing
START_DATE = date(2024, 1, 2)
END_DATE = date(2024, 12, 6)

# Technical Filter Type
# Options: "SMA_OR_BOLLINGER", "SMA_ONLY", "BOLLINGER_ONLY", "NONE"
TECHNICAL_FILTER_TYPE = "SMA_OR_BOLLINGER"

print("‚úÖ Strategy Parameters loaded")
print(f"   Equity time: {EQUITY_TIME}")
print(f"   Options time: {OPTIONS_TIME}")
print(f"   Put Delta: {PUT_DELTA_TARGET} ¬± {PUT_DELTA_BAND}")
print(f"   Put DTE: {PUT_DTE_MIN}-{PUT_DTE_MAX}")
print(f"   Beta window: {BETA_WINDOW_DAYS} days ({BETA_WINDOW_DAYS/252:.1f} years)")
print(f"   Volatility window: {VOLATILITY_WINDOW_DAYS} days")
print(f"   Initial Cash: ${INITIAL_CASH:,.0f}")


‚úÖ Strategy Parameters loaded
   Equity time: 15:30:00
   Options time: 15:45:00
   Put Delta: 0.3 ¬± 0.05
   Put DTE: 30-45
   Initial Cash: $100,000


## 1.4 Calculations & Formulas

**Data Flow:**
- **Input**: None
- **Processing**: Document all mathematical formulas used in the strategy
- **Output**: Formula definitions for reference
- **Dependencies**: None

This section documents the mathematical formulas used throughout the backtest.


In [None]:
# ============================================================================
# CALCULATION FORMULAS
# ============================================================================
# This section documents all mathematical formulas used in the strategy

"""
OPTION DELTA (Black-Scholes Model)
-----------------------------------
Delta measures the sensitivity of option price to underlying price movement.

For Calls:  Œî = N(d1)
For Puts:   Œî = N(d1) - 1

Where:
    d1 = [ln(S/K) + (r + 0.5*œÉ¬≤)*T] / (œÉ*‚àöT)
    
    S = Current stock price (spot)
    K = Strike price
    r = Risk-free rate (default: 0.05 = 5%)
    œÉ = Annualized volatility (default: 0.25 = 25%)
    T = Time to expiration in years (DTE / 365)
    N() = Cumulative standard normal distribution

Delta ranges:
    - Calls: 0 to 1 (0 = OTM, 1 = deep ITM)
    - Puts: -1 to 0 (0 = OTM, -1 = deep ITM)
    - For filtering, we use absolute value for puts
"""

"""
BETA (Market Correlation)
--------------------------
Beta measures stock's sensitivity to market movements.

    Œ≤ = Covariance(stock_returns, market_returns) / Variance(market_returns)

Where:
    stock_returns = Daily percentage returns of stock
    market_returns = Daily percentage returns of market index (e.g., SPY)
    
    Covariance = E[(stock - E[stock]) * (market - E[market])]
    Variance = E[(market - E[market])¬≤]
    
    Window: BETA_WINDOW_DAYS (default: 756 days = 3 years)

Interpretation:
    Œ≤ = 1.0: Moves with market
    Œ≤ > 1.0: More volatile than market
    Œ≤ < 1.0: Less volatile than market
    Œ≤ = 0.0: No correlation with market
"""

"""
VOLATILITY ESTIMATION
---------------------
Annualized volatility from historical prices.

    œÉ_annual = œÉ_daily * ‚àö252

Where:
    œÉ_daily = Standard deviation of daily returns
    252 = Number of trading days per year
    Window: VOLATILITY_WINDOW_DAYS (default: 30 days)

Calculation:
    1. Calculate daily returns: r_t = (P_t - P_{t-1}) / P_{t-1}
    2. Calculate standard deviation of returns over VOLATILITY_WINDOW_DAYS
    3. Annualize by multiplying by ‚àö252
"""

"""
SIMPLE MOVING AVERAGE (SMA)
----------------------------
Average price over a lookback window.

    SMA_n = (P_t + P_{t-1} + ... + P_{t-n+1}) / n

Where:
    n = Window size (default: 20 days)
    P_t = Price at time t
"""

"""
BOLLINGER BANDS
---------------
Volatility-based price bands around SMA.

    Middle Band = SMA_n
    Upper Band  = SMA_n + (k * œÉ_n)
    Lower Band  = SMA_n - (k * œÉ_n)

Where:
    SMA_n = Simple moving average over n periods
    œÉ_n = Standard deviation of prices over n periods
    k = Number of standard deviations (default: 2.0)

Interpretation:
    - Price ‚â§ Lower Band: Oversold (potential put entry)
    - Price ‚â• Upper Band: Overbought (potential call entry)
    - Price near Middle: Neutral
"""

"""
RETURN ON CAPITAL (ROC) - Premium
----------------------------------
Return on capital at risk for option premium.

    ROC = (Premium / Strike) * 100

Where:
    Premium = Option premium received
    Strike = Strike price (capital at risk for cash-secured puts)

Example:
    Premium = $2.00, Strike = $100
    ROC = (2.00 / 100) * 100 = 2.0%
"""

print("‚úÖ Calculation Formulas Documented")
print("   Formulas: Delta, Beta, Volatility, SMA, Bollinger Bands, ROC")


‚úÖ Calculation Formulas Documented
   Formulas: Delta, Beta, Volatility, SMA, Bollinger Bands, ROC


## 1.5 Cache Configuration

**Data Flow:**
- **Input**: None
- **Processing**: Set up cache directory paths and naming conventions
- **Output**: Cache configuration variables
- **Dependencies**: None


In [16]:
# Cache directory structure
NOTEBOOK_DIR = Path.cwd()
CACHE_BASE_DIR = NOTEBOOK_DIR / "data" / "cache"
CACHE_EQUITY_DIR = CACHE_BASE_DIR / "equity"
CACHE_OPTIONS_DIR = CACHE_BASE_DIR / "options"
CACHE_PROCESSED_DIR = CACHE_BASE_DIR / "processed"

# Create cache directories if they don't exist
CACHE_EQUITY_DIR.mkdir(parents=True, exist_ok=True)
CACHE_OPTIONS_DIR.mkdir(parents=True, exist_ok=True)
CACHE_PROCESSED_DIR.mkdir(parents=True, exist_ok=True)

print("‚úÖ Cache Configuration loaded")
print(f"   Cache base: {CACHE_BASE_DIR}")
print(f"   Equity cache: {CACHE_EQUITY_DIR}")
print(f"   Options cache: {CACHE_OPTIONS_DIR}")
print(f"   Processed cache: {CACHE_PROCESSED_DIR}")


‚úÖ Cache Configuration loaded
   Cache base: /Users/samuelminer/Projects/nissan_options/wheel_strategy/notebooks/data/cache
   Equity cache: /Users/samuelminer/Projects/nissan_options/wheel_strategy/notebooks/data/cache/equity
   Options cache: /Users/samuelminer/Projects/nissan_options/wheel_strategy/notebooks/data/cache/options
   Processed cache: /Users/samuelminer/Projects/nissan_options/wheel_strategy/notebooks/data/cache/processed


## 1.6 Data Structure Initialization

**Data Flow:**
- **Input**: Universe configuration, date range, strategy parameters
- **Processing**: Create empty data structure with properly shaped DataFrames
- **Output**: Initialized data dictionary ready for population
- **Dependencies**: Section 1.3 (Strategy Parameters), Section 1.5 (Cache Configuration)

This section initializes the comprehensive data structure that tracks all processing steps for full inspectability.


In [None]:
# ============================================================================
# DATA STRUCTURE INITIALIZATION
# ============================================================================

from datetime import datetime, timedelta, date
from typing import Dict, List, Optional, Any
from pathlib import Path
import pandas as pd
import numpy as np
import math

def get_trading_days(start_date: date, end_date: date) -> List[date]:
    """
    Generate list of trading days (excluding weekends).
    
    Args:
        start_date: Start date
        end_date: End date
        
    Returns:
        List of trading dates (weekdays only)
    """
    trading_days = []
    current_date = start_date
    while current_date <= end_date:
        if current_date.weekday() < 5:  # Monday = 0, Friday = 4
            trading_days.append(current_date)
        current_date += timedelta(days=1)
    return trading_days


def initialize_data_structure(
    selected_universe: str,
    universe_symbols: Dict[str, List[str]],
    start_date: date,
    end_date: date,
    warmup_days: int,
    equity_time: str,
    options_time: str,
    min_market_cap: float,
    max_market_cap: Optional[float],
    min_beta: float,
    max_beta: float,
    min_price: float,
    max_price: float,
    sma_window: int,
    bb_window: int,
    bb_std: float,
    beta_window_days: int,
    volatility_window_days: int,
) -> Dict[str, Any]:
    """
    Initialize comprehensive data structure for tracking all processing steps.
    
    Creates empty but properly shaped DataFrames and dictionaries ready for population.
    
    Args:
        selected_universe: Name of universe to use (e.g., 'SP500', 'test_1')
        universe_symbols: Dict mapping universe names to symbol lists
        start_date: Start date for backtest
        end_date: End date for backtest
        warmup_days: Days needed for warmup (SMA/BB calculation)
        equity_time: Time string for equity price fetching (e.g., '15:30:00')
        options_time: Time string for options fetching (e.g., '15:45:00')
        min_market_cap: Minimum market cap filter
        max_market_cap: Maximum market cap filter (None if no limit)
        min_beta: Minimum beta filter
        max_beta: Maximum beta filter
        min_price: Minimum price filter
        max_price: Maximum price filter
        sma_window: SMA calculation window
        bb_window: Bollinger Bands window
        bb_std: Bollinger Bands standard deviation
        beta_window_days: Beta calculation window (days)
        volatility_window_days: Volatility calculation window (days)
        
    Returns:
        Initialized data dictionary with empty but properly shaped DataFrames
    """
    # Generate trading days
    trading_days = get_trading_days(start_date, end_date)
    trading_days_index = pd.DatetimeIndex(trading_days)
    
    # Get symbols for selected universe
    if selected_universe not in universe_symbols:
        raise ValueError(f"Universe '{selected_universe}' not found in universe_symbols")
    symbols = universe_symbols[selected_universe]
    
    # Initialize data structure
    data: Dict[str, Any] = {
        # Step 1: Universe Definition
        'universe': universe_symbols.copy(),
        'selected_universe': selected_universe,
        'universe_metadata': {},  # Will store source and last_fetched for each universe
        
        # Step 2: Market Cap Data
        'market_cap': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=np.nan
        ),
        'market_cap_filter': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=False,
            dtype=bool
        ),
        
        # Step 3: Equity Prices (OHLC)
        'equity_price_open': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=np.nan
        ),
        'equity_price_high': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=np.nan
        ),
        'equity_price_low': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=np.nan
        ),
        'equity_price_close': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=np.nan
        ),
        'equity_volume': pd.DataFrame(
            index=trading_days_index,
            columns=symbols,
            data=np.nan
        ),
        
        # Step 4: Index Prices
        'index_prices': {
            'SPY': pd.DataFrame(
                index=trading_days_index,
                columns=['close'],
                data=np.nan
            ),
        },
        
        # Step 5: Technical Indicators
        'technical_filters': {
            'sma': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=np.nan
            ),
            'bb_lower': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=np.nan
            ),
            'bb_upper': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=np.nan
            ),
            'bb_middle': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=np.nan
            ),
            'beta': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=np.nan
            ),
            'volatility': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=np.nan
            ),
        },
        
        # Step 6: Filter Status
        'filter_status': {
            'market_cap': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=False,
                dtype=bool
            ),
            'beta': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=False,
                dtype=bool
            ),
            'price': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=False,
                dtype=bool
            ),
            'technical': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=False,
                dtype=bool
            ),
            'liquidity': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=False,
                dtype=bool
            ),
            'all_filters': pd.DataFrame(
                index=trading_days_index,
                columns=symbols,
                data=False,
                dtype=bool
            ),
        },
        
        # Step 7: Daily Filter Results
        'daily_filter': {},  # Will be populated: {'YYYY-MM-DD': ['SYMBOL1', 'SYMBOL2', ...]}
        
        # Step 8: Options Data
        'options': {
            'definitions': {},  # {'YYYY-MM-DD': {'SYMBOL': ['OPTION_SYMBOL1', ...]}}
            'chains': {},  # {'YYYY-MM-DD': {'SYMBOL': pd.DataFrame(...)}}
            'filtered_chains': {},  # {'YYYY-MM-DD': {'SYMBOL': pd.DataFrame(...)}}
            'filter_status': {},  # {'YYYY-MM-DD': {'SYMBOL': {'dte': df, 'delta': df, ...}}}
        },
        
        # Metadata
        'metadata': {
            'start_date': start_date,
            'end_date': end_date,
            'warmup_days': warmup_days,
            'trading_days': trading_days,
            'equity_time': equity_time,
            'options_time': options_time,
            'last_updated': datetime.now(),
            'version': '2.0',
            'universe_metadata': {},
            'market_cap_min': min_market_cap,
            'market_cap_max': max_market_cap,
            'min_beta': min_beta,
            'max_beta': max_beta,
            'min_price': min_price,
            'max_price': max_price,
            'sma_window': sma_window,
            'bb_window': bb_window,
            'bb_std': bb_std,
            'beta_window': beta_window_days,
            'vol_window': volatility_window_days,
            'risk_free_rate': None,  # Will be populated with daily 10yr Treasury data
            'trades': pd.DataFrame(columns=[
                'date', 'symbol', 'action', 'type', 'strike', 'expiration', 
                'premium', 'roc', 'dte', 'delta', 'contracts'
            ]),
            'positions': {},  # Will track wheel positions
            'performance': {},  # Will store post-backtest metrics
        },
    }
    
    print("‚úÖ Data Structure Initialized")
    print(f"   Selected universe: {selected_universe}")
    print(f"   Symbols: {len(symbols)}")
    print(f"   Trading days: {len(trading_days)}")
    print(f"   Date range: {start_date} to {end_date}")
    print(f"   DataFrames initialized: {len(symbols)} columns √ó {len(trading_days)} rows")
    
    return data

print("‚úÖ Data Structure Initialization Functions Defined")


---

# SECTION 2: Data Fetching Functions

## 2.1 Databento Client Setup

**Data Flow:**
- **Input**: API key
- **Processing**: Create Databento client
- **Output**: Client object
- **Dependencies**: Section 1.1 (imports), Section 1.2 (API config)


## 2.0 Universe Definition Functions

**Data Flow:**
- **Input**: None or universe name
- **Processing**: Fetch symbol lists from various sources (SP500, NASDAQ, DOW)
- **Output**: List of symbols or updated data structure
- **Dependencies**: None (uses web scraping/yfinance)


In [None]:
# ============================================================================
# UNIVERSE DEFINITION FUNCTIONS
# ============================================================================

def fetch_sp500_symbols() -> List[str]:
    """
    Fetch S&P 500 symbols from Wikipedia.
    
    Returns:
        List of S&P 500 ticker symbols
    """
    try:
        url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
        tables = pd.read_html(url)
        sp500_table = tables[0]
        symbols = sp500_table['Symbol'].tolist()
        # Clean symbols (remove dots, etc.)
        symbols = [s.replace('.', '-') for s in symbols]
        print(f"‚úÖ Fetched {len(symbols)} S&P 500 symbols")
        return symbols
    except Exception as e:
        print(f"Error fetching S&P 500 symbols: {e}")
        return []


def fetch_nasdaq_symbols() -> List[str]:
    """
    Fetch NASDAQ-100 symbols (top 100 by market cap).
    
    Note: Full NASDAQ list is large. This fetches NASDAQ-100.
    
    Returns:
        List of NASDAQ-100 ticker symbols
    """
    try:
        # Using yfinance to get NASDAQ-100
        import yfinance as yf
        nasdaq = yf.Ticker("^IXIC")
        # This is a simplified approach - in production, use NASDAQ API or web scraping
        url = 'https://en.wikipedia.org/wiki/NASDAQ-100'
        tables = pd.read_html(url)
        if len(tables) > 0:
            nasdaq_table = tables[0]
            # Try to find symbol column
            symbol_col = None
            for col in nasdaq_table.columns:
                if 'symbol' in col.lower() or 'ticker' in col.lower():
                    symbol_col = col
                    break
            if symbol_col:
                symbols = nasdaq_table[symbol_col].tolist()
                symbols = [str(s).replace('.', '-') for s in symbols if pd.notna(s)]
                print(f"‚úÖ Fetched {len(symbols)} NASDAQ-100 symbols")
                return symbols
        print("‚ö†Ô∏è  Could not fetch NASDAQ symbols, returning empty list")
        return []
    except Exception as e:
        print(f"Error fetching NASDAQ symbols: {e}")
        return []


def fetch_dow_symbols() -> List[str]:
    """
    Fetch Dow Jones Industrial Average (DJIA) symbols from Wikipedia.
    
    Returns:
        List of DJIA ticker symbols
    """
    try:
        url = 'https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average'
        tables = pd.read_html(url)
        if len(tables) > 0:
            dow_table = tables[0]
            # Find symbol column
            symbol_col = None
            for col in dow_table.columns:
                if 'symbol' in col.lower() or 'ticker' in col.lower():
                    symbol_col = col
                    break
            if symbol_col:
                symbols = dow_table[symbol_col].tolist()
                symbols = [str(s).replace('.', '-') for s in symbols if pd.notna(s)]
                print(f"‚úÖ Fetched {len(symbols)} DOW symbols")
                return symbols
        print("‚ö†Ô∏è  Could not fetch DOW symbols, returning empty list")
        return []
    except Exception as e:
        print(f"Error fetching DOW symbols: {e}")
        return []


def populate_universe(
    data: Dict[str, Any],
    universe_name: str,
    symbols: List[str],
    source: str = "manual",
    last_fetched: Optional[date] = None
) -> None:
    """
    Populate universe in data structure and update metadata.
    
    Args:
        data: Data structure dictionary
        universe_name: Name of universe (e.g., 'SP500', 'NASDAQ', 'test_1')
        symbols: List of symbols for this universe
        source: Source of the data (e.g., 'wikipedia.org/wiki/...')
        last_fetched: Date when symbols were fetched
    """
    if 'universe' not in data:
        data['universe'] = {}
    if 'universe_metadata' not in data:
        data['universe_metadata'] = {}
    
    data['universe'][universe_name] = symbols
    data['universe_metadata'][universe_name] = {
        'source': source,
        'last_fetched': last_fetched or date.today(),
        'symbol_count': len(symbols)
    }
    
    # Also update metadata in data['metadata']
    if 'universe_metadata' not in data['metadata']:
        data['metadata']['universe_metadata'] = {}
    data['metadata']['universe_metadata'][universe_name] = data['universe_metadata'][universe_name]
    
    print(f"‚úÖ Populated universe '{universe_name}' with {len(symbols)} symbols")
    print(f"   Source: {source}")
    print(f"   Last fetched: {data['universe_metadata'][universe_name]['last_fetched']}")

print("‚úÖ Universe Definition Functions Defined")


In [17]:
def create_databento_client(api_key: str) -> db.Historical:
    """
    Create and return a Databento historical client.
    
    Args:
        api_key: Databento API key
        
    Returns:
        Databento historical client
    """
    return db.Historical(api_key)

# Create client
db_client = create_databento_client(DATABENTO_API_KEY)

# Output for inspection
print("‚úÖ Databento Client Created")
print(f"   Client type: {type(db_client)}")
print(f"   API key set: {'Yes' if DATABENTO_API_KEY else 'No'}")


‚úÖ Databento Client Created
   Client type: <class 'databento.historical.client.Historical'>
   API key set: Yes


## 2.2 Equity Data Fetching

**Data Flow:**
- **Input**: Symbols, start_date, end_date
- **Processing**: Fetch equity OHLC data at 3:30 PM (EQUITY_TIME) for each trading day
- **Output**: Dict[str, DataFrame] with columns: date, open, high, low, close, price
- **Dependencies**: Section 2.1 (Databento client), Section 1.3 (EQUITY_TIME parameter)


In [None]:
def fetch_equity_data(
    symbols: List[str],
    start_date: date,
    end_date: date,
    client: db.Historical,
) -> Dict[str, pd.DataFrame]:
    """
    Fetch equity OHLC data at 3:30 PM for full backtest period.
    
    Fetches minute-level OHLC data at EQUITY_TIME (3:30 PM) for each trading day.
    Returns DataFrame with date, open, high, low, close, and price (close) columns.
    
    Args:
        symbols: List of stock symbols
        start_date: Start date (includes warmup period)
        end_date: End date (end of trading period)
        client: Databento client
        
    Returns:
        Dict[str, DataFrame] with columns: date, open, high, low, close, price
        DataFrame is sorted chronologically
    """
    result = {}
    
    # Generate list of trading dates
    current_date = start_date
    trading_dates = []
    while current_date <= end_date:
        # Skip weekends (simplified - in production, use trading calendar)
        if current_date.weekday() < 5:  # Monday = 0, Friday = 4
            trading_dates.append(current_date)
        current_date += timedelta(days=1)
    
    for symbol in tqdm(symbols, desc="Fetching equity data"):
        prices = []
        
        for trade_date in trading_dates:
            try:
                # Fetch OHLC data at EQUITY_TIME (3:30 PM)
                datetime_str = f"{trade_date.isoformat()}T{EQUITY_TIME}"
                data = client.timeseries.get_range(
                    dataset=XNAS_DATASET,
                    symbols=[symbol],
                    schema="ohlcv-1m",
                    start=datetime_str,
                    end=datetime_str,
                )
                
                if data is not None and len(data) > 0:
                    # Get the last row (in case there are multiple ticks at 3:30 PM)
                    row = data.iloc[-1]
                    prices.append({
                        'date': trade_date,
                        'open': float(row.get('open', 0)),
                        'high': float(row.get('high', 0)),
                        'low': float(row.get('low', 0)),
                        'close': float(row.get('close', 0)),
                        'price': float(row.get('close', 0)),  # Alias for convenience
                    })
            except Exception as e:
                # Skip dates with errors, continue to next date
                continue
        
        if prices:
            df = pd.DataFrame(prices)
            df = df.sort_values('date').reset_index(drop=True)
            result[symbol] = df
        else:
            result[symbol] = pd.DataFrame(columns=['date', 'open', 'high', 'low', 'close', 'price'])
    
    # Output for inspection
    total_days = sum(len(df) for df in result.values())
    print(f"‚úÖ Fetched equity data at {EQUITY_TIME} for {len(symbols)} symbols")
    print(f"   Total data points: {total_days}")
    print(f"   Date range: {start_date} to {end_date}")
    
    return result

print("‚úÖ Equity Data Fetching Functions Defined")


‚úÖ Equity Data Fetching Functions Defined


## 2.3 Options Data Fetching

**Data Flow:**
- **Input**: Symbol(s), date(s), time string, DTE range
- **Processing**: Fetch option definitions and prices from Databento
- **Output**: DataFrame with option chain data
- **Dependencies**: Section 2.1 (Databento client)


In [None]:
def fetch_option_definitions(symbol: str, date: date, client: db.Historical) -> List[str]:
    """
    Fetch option definitions (contract symbols) for a given symbol and date.
    
    Args:
        symbol: Stock symbol (e.g., "AAPL")
        date: Trading date
        client: Databento client
        
    Returns:
        List of option contract symbols (e.g., ["AAPL 240119C00150000", ...])
    """
    try:
        # Use symbology.resolve to get option definitions
        definitions = client.symbology.resolve(
            dataset=OPRA_DATASET,
            symbols=[f"{symbol}.OPT"],
            stype_in="smart",
            stype_out="raw_symbol",
            start_date=date.isoformat(),
            end_date=date.isoformat(),
        )
        if definitions is not None and len(definitions) > 0:
            return definitions['raw_symbol'].tolist()
        return []
    except Exception as e:
        print(f"Error fetching definitions for {symbol} on {date}: {e}")
        return []


def fetch_options_prices(option_symbols: List[str], date: date, 
                        time_str: str, client: db.Historical) -> pd.DataFrame:
    """
    Fetch option prices at a specific time.
    
    Args:
        option_symbols: List of option contract symbols
        date: Trading date
        time_str: Time string (e.g., "15:45:00")
        client: Databento client
        
    Returns:
        DataFrame with columns: symbol, bid, ask, volume, open_interest, etc.
    """
    if not option_symbols:
        return pd.DataFrame()
    
    try:
        datetime_str = f"{date.isoformat()}T{time_str}"
        data = client.timeseries.get_range(
            dataset=OPRA_DATASET,
            symbols=option_symbols,
            schema="ohlcv-1m",
            start=datetime_str,
            end=datetime_str,
        )
        if data is not None and len(data) > 0:
            return data
        return pd.DataFrame()
    except Exception as e:
        print(f"Error fetching option prices on {date}: {e}")
        return pd.DataFrame()


def fetch_options_chain(symbol: str, date: date, time_str: str, 
                       min_dte: int, max_dte: int, client: db.Historical) -> pd.DataFrame:
    """
    Fetch full options chain with prices, filtered by DTE range.
    
    Args:
        symbol: Stock symbol
        date: Trading date
        time_str: Time string
        min_dte: Minimum days to expiration
        max_dte: Maximum days to expiration
        client: Databento client
        
    Returns:
        DataFrame with option chain data (columns: type, strike, expiration, dte, bid, ask, etc.)
    """
    # Fetch definitions
    option_symbols = fetch_option_definitions(symbol, date, client)
    if not option_symbols:
        return pd.DataFrame()
    
    # Fetch prices
    prices_df = fetch_options_prices(option_symbols, date, time_str, client)
    if prices_df.empty:
        return pd.DataFrame()
    
    # Parse option symbols to extract type, strike, expiration
    # Format: "AAPL 240119C00150000" -> type=C, strike=150, expiration=2024-01-19
    chain_data = []
    for opt_symbol in option_symbols:
        try:
            # Parse raw symbol (simplified - actual format may vary)
            parts = opt_symbol.split()
            if len(parts) >= 2:
                underlying = parts[0]
                contract = parts[1]
                
                # Extract expiration (first 6 digits: YYMMDD)
                exp_str = contract[:6]
                year = 2000 + int(exp_str[:2])
                month = int(exp_str[2:4])
                day = int(exp_str[4:6])
                expiration = date(year, month, day)
                
                # Calculate DTE
                dte = (expiration - date).days
                
                # Filter by DTE
                if min_dte <= dte <= max_dte:
                    # Extract type (C or P)
                    opt_type = contract[6] if len(contract) > 6 else None
                    if opt_type == 'C':
                        opt_type = 'call'
                    elif opt_type == 'P':
                        opt_type = 'put'
                    else:
                        continue
                    
                    # Extract strike (remaining digits, divide by 1000)
                    strike_str = contract[7:] if len(contract) > 7 else None
                    if strike_str:
                        strike = float(strike_str) / 1000.0
                    else:
                        continue
                    
                    # Get price data if available
                    price_row = prices_df[prices_df['symbol'] == opt_symbol] if 'symbol' in prices_df.columns else None
                    bid = float(price_row['bid'].iloc[0]) if price_row is not None and not price_row.empty and 'bid' in price_row.columns else None
                    ask = float(price_row['ask'].iloc[0]) if price_row is not None and not price_row.empty and 'ask' in price_row.columns else None
                    volume = int(price_row['volume'].iloc[0]) if price_row is not None and not price_row.empty and 'volume' in price_row.columns else 0
                    oi = int(price_row['open_interest'].iloc[0]) if price_row is not None and not price_row.empty and 'open_interest' in price_row.columns else 0
                    
                    chain_data.append({
                        'symbol': opt_symbol,
                        'underlying': underlying,
                        'type': opt_type,
                        'strike': strike,
                        'expiration': expiration,
                        'dte': dte,
                        'bid': bid,
                        'ask': ask,
                        'volume': volume,
                        'open_interest': oi,
                    })
        except Exception as e:
            continue
    
    if chain_data:
        chain_df = pd.DataFrame(chain_data)
        # Output for inspection
        print(f"‚úÖ Fetched options chain for {symbol} on {date}")
        print(f"   Contracts: {len(chain_df)}")
        print(f"   DTE range: {chain_df['dte'].min()}-{chain_df['dte'].max()}")
        return chain_df
    return pd.DataFrame()

print("‚úÖ Options Data Fetching Functions Defined")


In [None]:
def fetch_market_cap(symbol: str) -> float:
    """
    Fetch market capitalization using yfinance.
    
    Args:
        symbol: Stock symbol
        
    Returns:
        Market cap in dollars, or None if not found
    """
    try:
        ticker = yf.Ticker(symbol)
        info = ticker.info
        market_cap = info.get('marketCap', None)
        return float(market_cap) if market_cap else None
    except Exception as e:
        print(f"Error fetching market cap for {symbol}: {e}")
        return None


def fetch_market_caps(symbols: List[str]) -> Dict[str, float]:
    """
    Fetch market caps for multiple symbols.
    
    Args:
        symbols: List of stock symbols
        
    Returns:
        Dict[str, float] with symbol -> market cap mapping
    """
    result = {}
    for symbol in tqdm(symbols, desc="Fetching market caps"):
        market_cap = fetch_market_cap(symbol)
        if market_cap:
            result[symbol] = market_cap
    return result


def fetch_index_prices(
    index_symbol: str,
    start_date: date,
    end_date: date,
    client: db.Historical,
) -> pd.DataFrame:
    """
    Fetch index prices at 3:30 PM for beta calculation.
    
    Uses the same approach as equity data fetching (OHLC at EQUITY_TIME).
    
    Args:
        index_symbol: Index symbol (e.g., "SPY" for S&P 500)
        start_date: Start date
        end_date: End date
        client: Databento client
        
    Returns:
        DataFrame with columns: date, price
    """
    # Generate list of trading dates
    current_date = start_date
    trading_dates = []
    while current_date <= end_date:
        if current_date.weekday() < 5:  # Skip weekends
            trading_dates.append(current_date)
        current_date += timedelta(days=1)
    
    prices = []
    for trade_date in tqdm(trading_dates, desc=f"Fetching {index_symbol}"):
        try:
            datetime_str = f"{trade_date.isoformat()}T{EQUITY_TIME}"
            data = client.timeseries.get_range(
                dataset=XNAS_DATASET,
                symbols=[index_symbol],
                schema="ohlcv-1m",
                start=datetime_str,
                end=datetime_str,
            )
            
            if data is not None and len(data) > 0:
                row = data.iloc[-1]
                prices.append({
                    'date': trade_date,
                    'price': float(row.get('close', 0)),
                })
        except Exception as e:
            continue
    
    if prices:
        df = pd.DataFrame(prices)
        df = df.sort_values('date').reset_index(drop=True)
        print(f"‚úÖ Fetched {len(df)} days of {index_symbol} prices at {EQUITY_TIME}")
        print(f"   Date range: {df['date'].min()} to {df['date'].max()}")
        return df
    else:
        return pd.DataFrame(columns=['date', 'price'])

print("‚úÖ Market Data Fetching Functions Defined")


---

# SECTION 3: Data Caching Functions

## 3.1 Cache Path Management

**Data Flow:**
- **Input**: Data type, symbol, date, time string
- **Processing**: Generate cache file paths
- **Output**: Path object
- **Dependencies**: Section 1.4 (cache config)


In [None]:
def get_cache_path(data_type: str, symbol: str, date: date, time_str: str = None) -> Path:
    """
    Generate cache file path for a given data type, symbol, and date.
    
    Args:
        data_type: "equity", "options", or "processed"
        symbol: Stock symbol
        date: Trading date
        time_str: Optional time string (for time-specific data)
        
    Returns:
        Path object for cache file
    """
    if data_type == "equity":
        base_dir = CACHE_EQUITY_DIR
        time_suffix = f"_{time_str.replace(':', '')}" if time_str else ""
        filename = f"{symbol}_{date.isoformat()}{time_suffix}.parquet"
    elif data_type == "options":
        base_dir = CACHE_OPTIONS_DIR
        time_suffix = f"_{time_str.replace(':', '')}" if time_str else ""
        filename = f"{symbol}_{date.isoformat()}{time_suffix}.parquet"
    elif data_type == "processed":
        base_dir = CACHE_PROCESSED_DIR
        filename = f"{symbol}_{date.isoformat()}.parquet"
    else:
        raise ValueError(f"Unknown data_type: {data_type}")
    
    return base_dir / filename


def ensure_cache_dir(cache_path: Path) -> None:
    """
    Ensure the directory for a cache path exists.
    
    Args:
        cache_path: Path to cache file
    """
    cache_path.parent.mkdir(parents=True, exist_ok=True)

print("‚úÖ Cache Path Management Functions Defined")


## 3.2 Cache Read/Write

**Data Flow:**
- **Input**: Data, cache path
- **Processing**: Save/load data to/from parquet files
- **Output**: DataFrame or None
- **Dependencies**: Section 3.1 (cache paths)


## 2.5 Data Structure Population Functions

**Data Flow:**
- **Input**: Data structure, fetched data, dates
- **Processing**: Populate data structure DataFrames with fetched data
- **Output**: Updated data structure (modified in-place)
- **Dependencies**: Section 1.6 (Data Structure Initialization)

These functions populate the comprehensive data structure as data is fetched.


In [None]:
# ============================================================================
# DATA STRUCTURE POPULATION FUNCTIONS
# ============================================================================

def populate_market_cap_data(
    data: Dict[str, Any],
    date: date,
    market_caps: Dict[str, float]
) -> None:
    """
    Populate market cap data for a given date.
    
    Args:
        data: Data structure dictionary
        date: Trading date
        market_caps: Dict mapping symbol to market cap value
    """
    date_idx = pd.Timestamp(date)
    
    for symbol, market_cap in market_caps.items():
        if symbol in data['market_cap'].columns:
            if date_idx in data['market_cap'].index:
                data['market_cap'].loc[date_idx, symbol] = market_cap


def apply_market_cap_filter(
    data: Dict[str, Any],
    min_cap: float,
    max_cap: Optional[float] = None
) -> None:
    """
    Apply market cap filter and populate market_cap_filter DataFrame.
    
    Args:
        data: Data structure dictionary
        min_cap: Minimum market cap
        max_cap: Maximum market cap (None if no limit)
    """
    market_cap_df = data['market_cap']
    filter_df = data['market_cap_filter']
    
    # Apply filter
    passed = market_cap_df >= min_cap
    if max_cap is not None:
        passed = passed & (market_cap_df <= max_cap)
    
    # Update filter DataFrame
    data['market_cap_filter'] = passed.astype(bool)
    
    # Apply filter to market_cap (set NaN where filter fails)
    data['market_cap'] = market_cap_df.where(passed)
    
    print(f"‚úÖ Applied market cap filter (min: ${min_cap:,.0f}" + 
          (f", max: ${max_cap:,.0f}" if max_cap else "") + ")")


def populate_equity_price_data(
    data: Dict[str, Any],
    symbol: str,
    equity_data: pd.DataFrame
) -> None:
    """
    Populate equity price DataFrames (OHLC + volume) for a symbol.
    
    Args:
        data: Data structure dictionary
        symbol: Stock symbol
        equity_data: DataFrame with columns: date, open, high, low, close, volume (or price)
    """
    if symbol not in data['equity_price_close'].columns:
        return
    
    for _, row in equity_data.iterrows():
        trade_date = pd.Timestamp(row['date'])
        if trade_date not in data['equity_price_close'].index:
            continue
        
        # Populate OHLC
        if 'open' in row:
            data['equity_price_open'].loc[trade_date, symbol] = float(row['open'])
        if 'high' in row:
            data['equity_price_high'].loc[trade_date, symbol] = float(row['high'])
        if 'low' in row:
            data['equity_price_low'].loc[trade_date, symbol] = float(row['low'])
        if 'close' in row:
            data['equity_price_close'].loc[trade_date, symbol] = float(row['close'])
        elif 'price' in row:
            # Use price as close if close not available
            data['equity_price_close'].loc[trade_date, symbol] = float(row['price'])
        
        # Populate volume
        if 'volume' in row:
            data['equity_volume'].loc[trade_date, symbol] = float(row['volume'])
    
    # Apply market cap filter: set to NaN where filter fails
    date_idx = pd.to_datetime(equity_data['date'])
    for date_val in date_idx:
        if date_val in data['market_cap_filter'].index:
            if symbol in data['market_cap_filter'].columns:
                if not data['market_cap_filter'].loc[date_val, symbol]:
                    # Filter failed, set to NaN
                    data['equity_price_open'].loc[date_val, symbol] = np.nan
                    data['equity_price_high'].loc[date_val, symbol] = np.nan
                    data['equity_price_low'].loc[date_val, symbol] = np.nan
                    data['equity_price_close'].loc[date_val, symbol] = np.nan
                    data['equity_volume'].loc[date_val, symbol] = np.nan


def populate_index_price_data(
    data: Dict[str, Any],
    index_symbol: str,
    index_data: pd.DataFrame
) -> None:
    """
    Populate index price DataFrame.
    
    Args:
        data: Data structure dictionary
        index_symbol: Index symbol (e.g., 'SPY')
        index_data: DataFrame with columns: date, price (or close)
    """
    if index_symbol not in data['index_prices']:
        # Initialize if doesn't exist
        trading_days_index = data['equity_price_close'].index
        data['index_prices'][index_symbol] = pd.DataFrame(
            index=trading_days_index,
            columns=['close'],
            data=np.nan
        )
    
    index_df = data['index_prices'][index_symbol]
    
    for _, row in index_data.iterrows():
        trade_date = pd.Timestamp(row['date'])
        if trade_date in index_df.index:
            price = float(row.get('close', row.get('price', np.nan)))
            if not np.isnan(price):
                index_df.loc[trade_date, 'close'] = price
    
    data['index_prices'][index_symbol] = index_df

print("‚úÖ Data Structure Population Functions Defined")


In [None]:
def load_cached_data(cache_path: Path) -> Optional[pd.DataFrame]:
    """
    Load cached data from parquet file.
    
    Args:
        cache_path: Path to cache file
        
    Returns:
        DataFrame if file exists, None otherwise
    """
    if cache_path.exists():
        try:
            return pd.read_parquet(cache_path)
        except Exception as e:
            print(f"Error loading cache from {cache_path}: {e}")
            return None
    return None


def save_cached_data(data: pd.DataFrame, cache_path: Path) -> None:
    """
    Save data to cache as parquet file.
    
    Args:
        data: DataFrame to save
        cache_path: Path to cache file
    """
    try:
        ensure_cache_dir(cache_path)
        data.to_parquet(cache_path, index=False)
    except Exception as e:
        print(f"Error saving cache to {cache_path}: {e}")


def is_cached(cache_path: Path) -> bool:
    """
    Check if data is cached.
    
    Args:
        cache_path: Path to cache file
        
    Returns:
        True if cache file exists, False otherwise
    """
    return cache_path.exists()

print("‚úÖ Cache Read/Write Functions Defined")


## 3.3 Cache Statistics & Inspection

**Data Flow:**
- **Input**: Optional data_type, symbol filters
- **Processing**: Scan cache directories and collect statistics
- **Output**: DataFrame with cache statistics
- **Dependencies**: Section 3.1 (cache paths)


In [None]:
def get_cache_stats() -> Dict[str, Any]:
    """
    Get comprehensive cache statistics.
    
    Returns:
        Dict with cache statistics including:
        - data_type_breakdown: Count by type
        - ticker_breakdown: Count by ticker
        - date_ranges: Date ranges per ticker
        - file_counts: Total file counts
        - sizes: Total sizes in MB
    """
    stats = {
        'data_type_breakdown': {},
        'ticker_breakdown': {},
        'date_ranges': {},
        'file_counts': {},
        'sizes': {},
    }
    
    # Scan all cache directories
    for data_type, cache_dir in [
        ('equity', CACHE_EQUITY_DIR),
        ('options', CACHE_OPTIONS_DIR),
        ('processed', CACHE_PROCESSED_DIR),
    ]:
        if cache_dir.exists():
            files = list(cache_dir.glob("*.parquet"))
            stats['data_type_breakdown'][data_type] = len(files)
            stats['file_counts'][data_type] = len(files)
            
            # Calculate total size
            total_size = sum(f.stat().st_size for f in files)
            stats['sizes'][data_type] = total_size / (1024 * 1024)  # MB
            
            # Parse files for ticker and date info
            for f in files:
                parts = f.stem.split('_')
                if len(parts) >= 2:
                    ticker = parts[0]
                    date_str = parts[1]
                    
                    if ticker not in stats['ticker_breakdown']:
                        stats['ticker_breakdown'][ticker] = {}
                        stats['date_ranges'][ticker] = {'min': None, 'max': None}
                    
                    if data_type not in stats['ticker_breakdown'][ticker]:
                        stats['ticker_breakdown'][ticker][data_type] = 0
                    stats['ticker_breakdown'][ticker][data_type] += 1
                    
                    # Update date range
                    try:
                        d = date.fromisoformat(date_str)
                        if stats['date_ranges'][ticker]['min'] is None or d < stats['date_ranges'][ticker]['min']:
                            stats['date_ranges'][ticker]['min'] = d
                        if stats['date_ranges'][ticker]['max'] is None or d > stats['date_ranges'][ticker]['max']:
                            stats['date_ranges'][ticker]['max'] = d
                    except:
                        pass
    
    return stats


def inspect_cache(data_type: Optional[str] = None, symbol: Optional[str] = None) -> pd.DataFrame:
    """
    Inspect cache contents and return as DataFrame.
    
    Args:
        data_type: Optional filter by data type
        symbol: Optional filter by symbol
        
    Returns:
        DataFrame with columns: data_type, symbol, date_range, file_count, total_size_mb
    """
    stats = get_cache_stats()
    rows = []
    
    for ticker, ticker_data in stats['ticker_breakdown'].items():
        if symbol and ticker != symbol:
            continue
        
        for dt, count in ticker_data.items():
            if data_type and dt != data_type:
                continue
            
            date_range = stats['date_ranges'].get(ticker, {})
            date_min = date_range.get('min', 'N/A')
            date_max = date_range.get('max', 'N/A')
            date_range_str = f"{date_min} to {date_max}" if date_min != 'N/A' else 'N/A'
            
            # Calculate size for this ticker/type
            cache_dir = {
                'equity': CACHE_EQUITY_DIR,
                'options': CACHE_OPTIONS_DIR,
                'processed': CACHE_PROCESSED_DIR,
            }.get(dt, None)
            
            size_mb = 0
            if cache_dir:
                files = list(cache_dir.glob(f"{ticker}_*.parquet"))
                size_mb = sum(f.stat().st_size for f in files) / (1024 * 1024)
            
            rows.append({
                'data_type': dt,
                'symbol': ticker,
                'date_range': date_range_str,
                'file_count': count,
                'total_size_mb': round(size_mb, 2),
            })
    
    result_df = pd.DataFrame(rows)
    # Output for inspection
    if not result_df.empty:
        print(f"‚úÖ Cache Inspection Complete")
        print(f"   Total entries: {len(result_df)}")
        print(f"   Total size: {result_df['total_size_mb'].sum():.2f} MB")
        print("\n   Sample:")
        print(result_df.head(10).to_string())
    return result_df

print("‚úÖ Cache Statistics & Inspection Functions Defined")


In [None]:
def load_from_cache(data_type: str, symbol: str, date_range: List[date]) -> Dict[str, pd.DataFrame]:
    """
    Load cached data for given parameters.
    
    Args:
        data_type: "equity", "options", or "processed"
        symbol: Stock symbol
        date_range: List of dates to load
        
    Returns:
        Dict[str, DataFrame] with date -> DataFrame mapping
    """
    result = {}
    for d in date_range:
        cache_path = get_cache_path(data_type, symbol, d)
        data = load_cached_data(cache_path)
        if data is not None:
            result[d.isoformat()] = data
    return result


def load_all_cached(data_type: str) -> Dict[str, Dict[str, pd.DataFrame]]:
    """
    Load all cached data of a given type.
    
    Args:
        data_type: "equity", "options", or "processed"
        
    Returns:
        Nested dict: symbol -> date -> DataFrame
    """
    cache_dir = {
        'equity': CACHE_EQUITY_DIR,
        'options': CACHE_OPTIONS_DIR,
        'processed': CACHE_PROCESSED_DIR,
    }.get(data_type)
    
    if not cache_dir or not cache_dir.exists():
        return {}
    
    result = {}
    for cache_file in cache_dir.glob("*.parquet"):
        parts = cache_file.stem.split('_')
        if len(parts) >= 2:
            symbol = parts[0]
            date_str = parts[1]
            
            if symbol not in result:
                result[symbol] = {}
            
            data = load_cached_data(cache_file)
            if data is not None:
                result[symbol][date_str] = data
    
    # Output for inspection
    total_files = sum(len(dates) for dates in result.values())
    print(f"‚úÖ Loaded {total_files} cached {data_type} files")
    print(f"   Symbols: {len(result)}")
    return result

print("‚úÖ Cache Loading Functions Defined")


## 3.5 Cache Management

**Data Flow:**
- **Input**: Optional filters (data_type, symbol, date_range)
- **Processing**: Clear cache files
- **Output**: None
- **Dependencies**: Section 3.1 (cache paths)


In [None]:
def clear_cache(data_type: Optional[str] = None, symbol: Optional[str] = None, 
                date_range: Optional[List[date]] = None) -> None:
    """
    Clear cache files based on filters.
    
    Args:
        data_type: Optional filter by data type
        symbol: Optional filter by symbol
        date_range: Optional filter by date range
    """
    cache_dirs = []
    if data_type:
        cache_dirs = [{
            'equity': CACHE_EQUITY_DIR,
            'options': CACHE_OPTIONS_DIR,
            'processed': CACHE_PROCESSED_DIR,
        }.get(data_type)]
    else:
        cache_dirs = [CACHE_EQUITY_DIR, CACHE_OPTIONS_DIR, CACHE_PROCESSED_DIR]
    
    cleared = 0
    for cache_dir in cache_dirs:
        if cache_dir.exists():
            for cache_file in cache_dir.glob("*.parquet"):
                parts = cache_file.stem.split('_')
                if len(parts) >= 2:
                    file_symbol = parts[0]
                    file_date_str = parts[1]
                    
                    # Apply filters
                    if symbol and file_symbol != symbol:
                        continue
                    
                    if date_range:
                        try:
                            file_date = date.fromisoformat(file_date_str)
                            if file_date not in date_range:
                                continue
                        except:
                            continue
                    
                    # Delete file
                    cache_file.unlink()
                    cleared += 1
    
    print(f"‚úÖ Cleared {cleared} cache files")


def get_cache_size() -> Dict[str, float]:
    """
    Get cache sizes in MB/GB.
    
    Returns:
        Dict with data_type -> size_mb mapping
    """
    stats = get_cache_stats()
    return stats.get('sizes', {})

print("‚úÖ Cache Management Functions Defined")


---

# SECTION 4: Data Processing Functions

## 4.1 Delta Calculation

**Data Flow:**
- **Input**: Spot price, strike, DTE, option type, volatility
- **Processing**: Calculate option delta using Black-Scholes
- **Output**: Delta value and volatility DataFrame
- **Dependencies**: Section 2.2 (equity data), scipy.stats


## 4.4 Populate Technical Indicators to Data Structure

**Data Flow:**
- **Input**: Data structure, date
- **Processing**: Calculate and populate technical indicators (SMA, BB, beta, volatility) for all symbols
- **Output**: Updated data structure with technical indicators
- **Dependencies**: Section 4.1-4.3 (Technical Indicator Functions), Section 2.5 (Data Structure Population)


In [None]:
# ============================================================================
# POPULATE TECHNICAL INDICATORS TO DATA STRUCTURE
# ============================================================================

def populate_technical_indicators(
    data: Dict[str, Any],
    as_of_date: date
) -> None:
    """
    Calculate and populate all technical indicators for all symbols as of a given date.
    
    This function calculates SMA, Bollinger Bands, beta, and volatility for each symbol
    and populates the data structure's technical_filters dictionaries.
    
    Args:
        data: Data structure dictionary
        as_of_date: Date to calculate indicators for
    """
    date_idx = pd.Timestamp(as_of_date)
    if date_idx not in data['equity_price_close'].index:
        return
    
    symbols = data['equity_price_close'].columns.tolist()
    sma_window = data['metadata']['sma_window']
    bb_window = data['metadata']['bb_window']
    bb_std = data['metadata']['bb_std']
    beta_window = data['metadata']['beta_window']
    vol_window = data['metadata']['vol_window']
    
    # Get close prices up to as_of_date
    prices_up_to_date = data['equity_price_close'].loc[:date_idx]
    
    for symbol in symbols:
        # Get price series for this symbol
        price_series = prices_up_to_date[symbol].dropna()
        
        if len(price_series) < max(sma_window, bb_window):
            # Insufficient data
            continue
        
        # Calculate SMA
        sma_values = calculate_sma(price_series, sma_window)
        if len(sma_values) > 0 and date_idx in sma_values.index:
            sma_val = sma_values.loc[date_idx]
            if pd.notna(sma_val):
                data['technical_filters']['sma'].loc[date_idx, symbol] = sma_val
        
        # Calculate Bollinger Bands
        bb = calculate_bollinger_bands(price_series, bb_window, bb_std)
        if len(bb) > 0 and date_idx in bb.index:
            if 'middle' in bb.columns and pd.notna(bb.loc[date_idx, 'middle']):
                data['technical_filters']['bb_middle'].loc[date_idx, symbol] = bb.loc[date_idx, 'middle']
            if 'lower' in bb.columns and pd.notna(bb.loc[date_idx, 'lower']):
                data['technical_filters']['bb_lower'].loc[date_idx, symbol] = bb.loc[date_idx, 'lower']
            if 'upper' in bb.columns and pd.notna(bb.loc[date_idx, 'upper']):
                data['technical_filters']['bb_upper'].loc[date_idx, symbol] = bb.loc[date_idx, 'upper']
        
        # Calculate volatility (rolling window)
        if len(price_series) >= vol_window:
            returns = price_series.pct_change().dropna()
            if len(returns) >= vol_window:
                window_returns = returns.tail(vol_window)
                daily_std = window_returns.std()
                annualized_vol = daily_std * math.sqrt(252)
                annualized_vol = max(0.10, min(1.0, annualized_vol))  # Bound
                data['technical_filters']['volatility'].loc[date_idx, symbol] = annualized_vol
        
        # Calculate beta (if index data available)
        if 'SPY' in data['index_prices']:
            index_prices = data['index_prices']['SPY']['close'].dropna()
            if len(index_prices) >= beta_window and len(price_series) >= beta_window:
                # Align dates
                common_dates = price_series.index.intersection(index_prices.index)
                if len(common_dates) >= beta_window:
                    stock_aligned = price_series.loc[common_dates].tail(beta_window)
                    index_aligned = index_prices.loc[common_dates].tail(beta_window)
                    
                    if len(stock_aligned) >= beta_window and len(index_aligned) >= beta_window:
                        beta_val = calculate_beta(
                            pd.DataFrame({'close': stock_aligned}),
                            pd.DataFrame({'close': index_aligned})
                        )
                        data['technical_filters']['beta'].loc[date_idx, symbol] = beta_val
    
    # Apply market cap filter: set to NaN where filter fails
    for indicator_name in ['sma', 'bb_lower', 'bb_upper', 'bb_middle', 'beta', 'volatility']:
        indicator_df = data['technical_filters'][indicator_name]
        if date_idx in data['market_cap_filter'].index:
            for symbol in symbols:
                if symbol in data['market_cap_filter'].columns:
                    if not data['market_cap_filter'].loc[date_idx, symbol]:
                        indicator_df.loc[date_idx, symbol] = np.nan
        data['technical_filters'][indicator_name] = indicator_df

print("‚úÖ Technical Indicators Population Functions Defined")


In [None]:
def calculate_delta(spot: float, strike: float, dte: int, is_call: bool, 
                   volatility: float = 0.25, risk_free_rate: float = 0.05) -> float:
    """
    Calculate option delta using Black-Scholes formula.
    
    Args:
        spot: Current stock price
        strike: Strike price
        dte: Days to expiration
        is_call: True for call, False for put
        volatility: Annualized volatility (default 0.25 = 25%)
        risk_free_rate: Risk-free rate (default 0.05 = 5%)
        
    Returns:
        Delta value (-1 to 1)
    """
    if dte <= 0 or spot <= 0 or strike <= 0:
        return 0.0
    
    T = dte / 365.0
    try:
        d1 = (math.log(spot / strike) + (risk_free_rate + 0.5 * volatility ** 2) * T) / (volatility * math.sqrt(T))
        if is_call:
            delta = norm.cdf(d1)
        else:
            delta = norm.cdf(d1) - 1
        return round(delta, 4)
    except (ValueError, ZeroDivisionError):
        return 0.0


def estimate_volatility(prices_df: pd.DataFrame) -> float:
    """
    Estimate annualized volatility from historical prices.
    
    Uses VOLATILITY_WINDOW_DAYS (30 days) from strategy parameters.
    
    Args:
        prices_df: DataFrame with 'price' or 'close' column
        
    Returns:
        Annualized volatility (e.g., 0.25 = 25%)
    """
    price_col = 'price' if 'price' in prices_df.columns else 'close'
    window = VOLATILITY_WINDOW_DAYS
    
    if len(prices_df) < window:
        return 0.25  # Default volatility
    
    returns = prices_df[price_col].pct_change().dropna().tail(window)
    if len(returns) == 0:
        return 0.25
    
    # Annualized volatility = std * sqrt(252 trading days)
    daily_std = returns.std()
    annualized_vol = daily_std * math.sqrt(252)
    return max(0.10, min(1.0, annualized_vol))  # Bound between 10% and 100%


def add_delta_to_chain(chain_df: pd.DataFrame, spot_price: float, 
                      equity_history: pd.DataFrame) -> pd.DataFrame:
    """
    Add delta column to options chain.
    
    Args:
        chain_df: DataFrame with option chain
        spot_price: Current spot price
        equity_history: Historical equity prices for volatility estimation
        
    Returns:
        DataFrame with 'delta' and 'estimated_volatility' columns added
    """
    result = chain_df.copy()
    vol = estimate_volatility(equity_history)
    
    deltas = []
    for _, row in result.iterrows():
        is_call = row['type'] == 'call'
        delta = calculate_delta(spot_price, row['strike'], row['dte'], is_call, vol)
        deltas.append(delta)
    
    result['delta'] = deltas
    result['estimated_volatility'] = vol
    
    # Output for inspection
    print(f"‚úÖ Added delta to {len(result)} contracts")
    print(f"   Volatility: {vol:.2%}")
    print(f"   Delta range: {result['delta'].min():.3f} to {result['delta'].max():.3f}")
    return result

print("‚úÖ Delta Calculation Functions Defined")


## 4.2 Technical Indicators

**Data Flow:**
- **Input**: Price DataFrame, window parameters
- **Processing**: Calculate SMA and Bollinger Bands
- **Output**: DataFrame with indicators added
- **Dependencies**: None


In [None]:
def calculate_sma(prices: pd.Series, window: int) -> pd.Series:
    """
    Calculate Simple Moving Average.
    
    Args:
        prices: Price series
        window: Lookback period
        
    Returns:
        Series with SMA values
    """
    return prices.rolling(window=window, min_periods=window).mean()


def calculate_bollinger_bands(prices: pd.Series, window: int, std_dev: float) -> pd.DataFrame:
    """
    Calculate Bollinger Bands.
    
    Args:
        prices: Price series
        window: Lookback period
        std_dev: Number of standard deviations
        
    Returns:
        DataFrame with columns: middle, upper, lower
    """
    middle = calculate_sma(prices, window)
    std = prices.rolling(window=window, min_periods=window).std()
    upper = middle + std_dev * std
    lower = middle - std_dev * std
    return pd.DataFrame({"middle": middle, "upper": upper, "lower": lower})


def add_indicators(prices_df: pd.DataFrame, sma_window: int, bb_window: int, bb_std: float) -> pd.DataFrame:
    """
    Add technical indicators to price DataFrame.
    
    Args:
        prices_df: DataFrame with 'price' or 'close' column
        sma_window: SMA lookback period
        bb_window: Bollinger Bands lookback period
        bb_std: Bollinger Bands standard deviations
        
    Returns:
        DataFrame with indicators added
    """
    result = prices_df.copy()
    price_col = 'price' if 'price' in result.columns else 'close'
    
    # SMA
    result[f'sma_{sma_window}'] = calculate_sma(result[price_col], sma_window)
    
    # Bollinger Bands
    bb = calculate_bollinger_bands(result[price_col], bb_window, bb_std)
    result['bb_middle'] = bb['middle']
    result['bb_upper'] = bb['upper']
    result['bb_lower'] = bb['lower']
    
    # Output for inspection
    print(f"‚úÖ Added indicators to {len(result)} price points")
    print(f"   Columns: {list(result.columns)}")
    return result

print("‚úÖ Technical Indicators Functions Defined")


## 4.3 Beta Calculation

**Data Flow:**
- **Input**: Stock prices, index prices
- **Processing**: Calculate beta (covariance/variance)
- **Output**: DataFrame with beta and statistics
- **Dependencies**: Section 2.2 (equity data), Section 2.4 (index prices)


## 5.4 Filter Status Functions

**Data Flow:**
- **Input**: Data structure, filter parameters
- **Processing**: Calculate boolean filter status DataFrames for all filters
- **Output**: Updated data structure with filter_status populated
- **Dependencies**: Section 5.1-5.3 (Filter Functions), Section 2.5 (Data Structure Population)

These functions populate the filter_status DataFrames in the data structure.


In [None]:
# ============================================================================
# FILTER STATUS FUNCTIONS
# ============================================================================

def apply_market_cap_filter_status(data: Dict[str, Any]) -> None:
    """
    Apply market cap filter and populate filter_status['market_cap'].
    
    Args:
        data: Data structure dictionary
    """
    min_cap = data['metadata']['market_cap_min']
    max_cap = data['metadata']['market_cap_max']
    
    market_cap_df = data['market_cap']
    filter_df = data['filter_status']['market_cap']
    
    passed = market_cap_df >= min_cap
    if max_cap is not None:
        passed = passed & (market_cap_df <= max_cap)
    
    data['filter_status']['market_cap'] = passed.astype(bool)


def apply_beta_filter_status(data: Dict[str, Any]) -> None:
    """
    Apply beta filter and populate filter_status['beta'].
    
    Args:
        data: Data structure dictionary
    """
    min_beta = data['metadata']['min_beta']
    max_beta = data['metadata']['max_beta']
    
    beta_df = data['technical_filters']['beta']
    filter_df = data['filter_status']['beta']
    
    passed = (beta_df >= min_beta) & (beta_df <= max_beta)
    # Also set False where beta is NaN
    passed = passed.fillna(False)
    
    data['filter_status']['beta'] = passed.astype(bool)


def apply_price_filter_status(data: Dict[str, Any]) -> None:
    """
    Apply price filter and populate filter_status['price'].
    
    Args:
        data: Data structure dictionary
    """
    min_price = data['metadata']['min_price']
    max_price = data['metadata']['max_price']
    
    price_df = data['equity_price_close']
    filter_df = data['filter_status']['price']
    
    passed = (price_df >= min_price) & (price_df <= max_price)
    # Set False where price is NaN
    passed = passed.fillna(False)
    
    data['filter_status']['price'] = passed.astype(bool)


def apply_technical_filter_status(data: Dict[str, Any], filter_type: str) -> None:
    """
    Apply technical filter and populate filter_status['technical'].
    
    Args:
        data: Data structure dictionary
        filter_type: Type of technical filter (SMA_ONLY, BOLLINGER_ONLY, SMA_OR_BOLLINGER, etc.)
    """
    price_df = data['equity_price_close']
    sma_df = data['technical_filters']['sma']
    bb_lower_df = data['technical_filters']['bb_lower']
    bb_upper_df = data['technical_filters']['bb_upper']
    filter_df = data['filter_status']['technical']
    
    if filter_type == "NONE":
        passed = pd.DataFrame(True, index=price_df.index, columns=price_df.columns, dtype=bool)
    elif filter_type == "SMA_ONLY":
        passed = price_df <= sma_df
    elif filter_type == "BOLLINGER_ONLY":
        passed = price_df <= bb_lower_df
    elif filter_type == "SMA_OR_BOLLINGER":
        passed = (price_df <= sma_df) | (price_df <= bb_lower_df)
    elif filter_type == "SMA_AND_BOLLINGER":
        passed = (price_df <= sma_df) & (price_df <= bb_lower_df)
    else:
        passed = pd.DataFrame(True, index=price_df.index, columns=price_df.columns, dtype=bool)
    
    # Set False where any component is NaN
    passed = passed.fillna(False)
    
    data['filter_status']['technical'] = passed.astype(bool)


def apply_liquidity_filter_status(data: Dict[str, Any], min_volume: float = 0) -> None:
    """
    Apply liquidity filter based on volume and populate filter_status['liquidity'].
    
    Args:
        data: Data structure dictionary
        min_volume: Minimum daily volume (default 0 = no filter)
    """
    volume_df = data['equity_volume']
    filter_df = data['filter_status']['liquidity']
    
    if min_volume > 0:
        passed = volume_df >= min_volume
    else:
        # No volume filter, all pass
        passed = pd.DataFrame(True, index=volume_df.index, columns=volume_df.columns, dtype=bool)
    
    # Set False where volume is NaN
    passed = passed.fillna(False)
    
    data['filter_status']['liquidity'] = passed.astype(bool)


def calculate_all_filters_status(data: Dict[str, Any]) -> None:
    """
    Calculate combined filter status (AND of all filters) and populate filter_status['all_filters'].
    
    Args:
        data: Data structure dictionary
    """
    # Combine all filters with AND
    all_passed = (
        data['filter_status']['market_cap'] &
        data['filter_status']['beta'] &
        data['filter_status']['price'] &
        data['filter_status']['technical'] &
        data['filter_status']['liquidity']
    )
    
    data['filter_status']['all_filters'] = all_passed.astype(bool)


def calculate_filter_status(data: Dict[str, Any], filter_type: str = None) -> None:
    """
    Calculate all filter statuses for a given date or all dates.
    
    Args:
        data: Data structure dictionary
        filter_type: Technical filter type (if None, uses metadata value)
    """
    if filter_type is None:
        filter_type = TECHNICAL_FILTER_TYPE
    
    # Calculate each filter status
    apply_market_cap_filter_status(data)
    apply_beta_filter_status(data)
    apply_price_filter_status(data)
    apply_technical_filter_status(data, filter_type)
    apply_liquidity_filter_status(data, MIN_VOLUME)
    
    # Calculate combined status
    calculate_all_filters_status(data)
    
    print("‚úÖ Filter status calculated for all symbols and dates")


def generate_daily_filter_results(data: Dict[str, Any]) -> None:
    """
    Generate daily_filter dictionary from filter_status['all_filters'].
    
    This creates a dictionary mapping date strings to lists of symbols that pass all filters.
    
    Args:
        data: Data structure dictionary
    """
    all_filters_df = data['filter_status']['all_filters']
    daily_filter = {}
    
    for date_idx in all_filters_df.index:
        date_str = date_idx.strftime('%Y-%m-%d')
        # Get symbols where filter is True
        passing_symbols = all_filters_df.columns[all_filters_df.loc[date_idx]].tolist()
        daily_filter[date_str] = passing_symbols
    
    data['daily_filter'] = daily_filter
    
    total_days = len(daily_filter)
    avg_passers = sum(len(syms) for syms in daily_filter.values()) / total_days if total_days > 0 else 0
    print(f"‚úÖ Generated daily filter results")
    print(f"   Total trading days: {total_days}")
    print(f"   Average symbols passing per day: {avg_passers:.1f}")

print("‚úÖ Filter Status Functions Defined")


In [None]:
def calculate_beta(stock_prices: pd.DataFrame, index_prices: pd.DataFrame) -> float:
    """
    Calculate beta of a stock relative to the market.
    
    Uses BETA_WINDOW_DAYS (756 days = 3 years) from strategy parameters.
    
    Args:
        stock_prices: DataFrame with 'price' or 'close' column
        index_prices: DataFrame with 'price' or 'close' column (e.g., SPY)
        
    Returns:
        Beta value
    """
    window = BETA_WINDOW_DAYS
    stock_col = 'price' if 'price' in stock_prices.columns else 'close'
    index_col = 'price' if 'price' in index_prices.columns else 'close'
    
    stock_close = stock_prices[stock_col].tail(window)
    index_close = index_prices[index_col].tail(window)
    
    # Align dates
    common_dates = stock_close.index.intersection(index_close.index)
    if len(common_dates) < 20:
        return 1.0  # Default beta
    
    stock_close = stock_close.loc[common_dates]
    index_close = index_close.loc[common_dates]
    
    # Calculate returns
    stock_returns = stock_close.pct_change().dropna()
    index_returns = index_close.pct_change().dropna()
    
    # Align after pct_change
    common_idx = stock_returns.index.intersection(index_returns.index)
    if len(common_idx) < 20:
        return 1.0
    
    stock_returns = stock_returns.loc[common_idx]
    index_returns = index_returns.loc[common_idx]
    
    # Calculate beta
    covariance = np.cov(stock_returns, index_returns)[0, 1]
    variance = np.var(index_returns)
    
    if variance == 0:
        return 1.0
    
    beta = covariance / variance
    return max(0.0, min(3.0, beta))  # Bound to reasonable range


def calculate_betas_for_universe(symbols: List[str], equity_data: Dict[str, pd.DataFrame], 
                                 index_data: pd.DataFrame) -> Dict[str, float]:
    """
    Calculate betas for all symbols in universe.
    
    Args:
        symbols: List of stock symbols
        equity_data: Dict[symbol] -> DataFrame with equity prices
        index_data: DataFrame with index prices
        
    Returns:
        Dict[symbol] -> beta
    """
    betas = {}
    for symbol in symbols:
        if symbol in equity_data:
            beta = calculate_beta(equity_data[symbol], index_data)
            betas[symbol] = beta
    
    # Output for inspection
    beta_df = pd.DataFrame([{'symbol': s, 'beta': betas.get(s, 1.0)} for s in symbols])
    print(f"‚úÖ Calculated betas for {len(betas)} symbols")
    print(f"   Beta range: {beta_df['beta'].min():.3f} to {beta_df['beta'].max():.3f}")
    print(f"   Mean beta: {beta_df['beta'].mean():.3f}")
    return betas

print("‚úÖ Beta Calculation Functions Defined")


## 5.5 Options Data Population Functions

**Data Flow:**
- **Input**: Data structure, date, symbol, options chain DataFrame
- **Processing**: Populate options data (definitions, chains, filtered chains, filter status)
- **Output**: Updated data structure with options data
- **Dependencies**: Section 2.3 (Options Data Fetching), Section 5.3 (Options Chain Filters)


In [None]:
# ============================================================================
# OPTIONS DATA POPULATION FUNCTIONS
# ============================================================================

def populate_options_data(
    data: Dict[str, Any],
    date: date,
    symbol: str,
    chain_df: pd.DataFrame,
    definitions: List[str] = None
) -> None:
    """
    Populate options data for a given date and symbol.
    
    Args:
        data: Data structure dictionary
        date: Trading date
        symbol: Stock symbol
        chain_df: DataFrame with options chain data
        definitions: Optional list of option contract symbols
    """
    date_str = date.isoformat()
    
    # Initialize date entries if needed
    if date_str not in data['options']['definitions']:
        data['options']['definitions'][date_str] = {}
    if date_str not in data['options']['chains']:
        data['options']['chains'][date_str] = {}
    if date_str not in data['options']['filtered_chains']:
        data['options']['filtered_chains'][date_str] = {}
    if date_str not in data['options']['filter_status']:
        data['options']['filter_status'][date_str] = {}
    
    # Store definitions
    if definitions:
        data['options']['definitions'][date_str][symbol] = definitions
    elif 'symbol' in chain_df.columns:
        data['options']['definitions'][date_str][symbol] = chain_df['symbol'].tolist()
    
    # Store full chain
    data['options']['chains'][date_str][symbol] = chain_df.copy()


def populate_options_filter_status(
    data: Dict[str, Any],
    date: date,
    symbol: str,
    chain_df: pd.DataFrame,
    filtered_df: pd.DataFrame,
    min_dte: int,
    max_dte: int,
    target_delta: float,
    delta_band: float,
    opt_type: str,
    min_oi: int,
    min_volume: int,
    max_spread_pct: float
) -> None:
    """
    Populate options filter status for diagnostics.
    
    Args:
        data: Data structure dictionary
        date: Trading date
        symbol: Stock symbol
        chain_df: Full options chain DataFrame
        filtered_df: Filtered options chain DataFrame
        min_dte: Minimum DTE
        max_dte: Maximum DTE
        target_delta: Target delta
        delta_band: Delta band
        opt_type: Option type ('put' or 'call')
        min_oi: Minimum open interest
        min_volume: Minimum volume
        max_spread_pct: Maximum spread percentage
    """
    date_str = date.isoformat()
    
    if date_str not in data['options']['filter_status']:
        data['options']['filter_status'][date_str] = {}
    if symbol not in data['options']['filter_status'][date_str]:
        data['options']['filter_status'][date_str][symbol] = {}
    
    filter_status = data['options']['filter_status'][date_str][symbol]
    
    # DTE filter status
    dte_passed = pd.Series(True, index=chain_df.index)
    if 'dte' in chain_df.columns:
        dte_passed = (chain_df['dte'] >= min_dte) & (chain_df['dte'] <= max_dte)
        filter_status['dte'] = pd.DataFrame({
            'option_symbol': chain_df.get('symbol', chain_df.index),
            'passed': dte_passed
        })
    
    # Delta filter status
    delta_passed = pd.Series(True, index=chain_df.index)
    if 'delta' in chain_df.columns:
        delta_min = target_delta - delta_band
        delta_max = target_delta + delta_band
        if opt_type == 'put':
            delta_passed = chain_df['delta'].abs().between(delta_min, delta_max)
        else:
            delta_passed = chain_df['delta'].between(delta_min, delta_max)
        filter_status['delta'] = pd.DataFrame({
            'option_symbol': chain_df.get('symbol', chain_df.index),
            'passed': delta_passed
        })
    
    # Liquidity filter status
    liquidity_passed = pd.Series(True, index=chain_df.index)
    if 'open_interest' in chain_df.columns:
        liquidity_passed = liquidity_passed & (chain_df['open_interest'] >= min_oi)
    if 'volume' in chain_df.columns:
        liquidity_passed = liquidity_passed & (chain_df['volume'] >= min_volume)
    if 'bid' in chain_df.columns and 'ask' in chain_df.columns:
        mid_price = (chain_df['bid'] + chain_df['ask']) / 2
        spread = chain_df['ask'] - chain_df['bid']
        spread_pct = spread / mid_price
        liquidity_passed = liquidity_passed & (spread_pct <= max_spread_pct)
    
    filter_status['liquidity'] = pd.DataFrame({
        'option_symbol': chain_df.get('symbol', chain_df.index),
        'passed': liquidity_passed
    })
    
    # All filters combined
    all_passed = dte_passed & delta_passed & liquidity_passed
    filter_status['all'] = pd.DataFrame({
        'option_symbol': chain_df.get('symbol', chain_df.index),
        'passed': all_passed
    })
    
    # Store filtered chain
    data['options']['filtered_chains'][date_str][symbol] = filtered_df.copy()

print("‚úÖ Options Data Population Functions Defined")


---

# SECTION 10: Data Inspection Utilities

**Data Flow:**
- **Input**: Data structure dictionary
- **Processing**: Generate summaries, statistics, export/import functionality
- **Output**: Inspection DataFrames, summary statistics, saved/loaded data structures
- **Dependencies**: Section 1.6 (Data Structure Initialization)

These functions help inspect and manage the comprehensive data structure.


In [None]:
# ============================================================================
# DATA INSPECTION UTILITIES
# ============================================================================

def inspect_data_structure(data: Dict[str, Any]) -> pd.DataFrame:
    """
    Generate summary of all DataFrames in the data structure.
    
    Args:
        data: Data structure dictionary
        
    Returns:
        DataFrame with columns: key, type, shape, memory_usage_mb, non_null_count
    """
    rows = []
    
    # Helper to get DataFrame info
    def get_df_info(key: str, df: pd.DataFrame) -> dict:
        return {
            'key': key,
            'type': 'DataFrame',
            'shape': f"{df.shape[0]}√ó{df.shape[1]}",
            'memory_usage_mb': df.memory_usage(deep=True).sum() / (1024 * 1024),
            'non_null_count': df.notna().sum().sum() if isinstance(df, pd.DataFrame) else df.notna().sum()
        }
    
    # Top-level DataFrames
    top_level_dfs = ['market_cap', 'market_cap_filter', 
                     'equity_price_open', 'equity_price_high', 'equity_price_low', 
                     'equity_price_close', 'equity_volume']
    for key in top_level_dfs:
        if key in data and isinstance(data[key], pd.DataFrame):
            rows.append(get_df_info(key, data[key]))
    
    # Technical filters
    if 'technical_filters' in data:
        for tf_key, tf_df in data['technical_filters'].items():
            if isinstance(tf_df, pd.DataFrame):
                rows.append(get_df_info(f"technical_filters.{tf_key}", tf_df))
    
    # Filter status
    if 'filter_status' in data:
        for fs_key, fs_df in data['filter_status'].items():
            if isinstance(fs_df, pd.DataFrame):
                rows.append(get_df_info(f"filter_status.{fs_key}", fs_df))
    
    # Index prices
    if 'index_prices' in data:
        for idx_key, idx_df in data['index_prices'].items():
            if isinstance(idx_df, pd.DataFrame):
                rows.append(get_df_info(f"index_prices.{idx_key}", idx_df))
    
    # Options data counts
    if 'options' in data:
        opt_def_count = sum(
            len(syms) 
            for date_dict in data['options'].get('definitions', {}).values()
            for syms in date_dict.values()
        )
        opt_chain_count = sum(
            len(date_dict) 
            for date_dict in data['options'].get('chains', {}).values()
        )
        rows.append({
            'key': 'options.definitions',
            'type': 'dict',
            'shape': f"{opt_def_count} option symbols",
            'memory_usage_mb': 0,
            'non_null_count': opt_def_count
        })
        rows.append({
            'key': 'options.chains',
            'type': 'dict',
            'shape': f"{opt_chain_count} symbol-date combinations",
            'memory_usage_mb': 0,
            'non_null_count': opt_chain_count
        })
    
    # Daily filter
    if 'daily_filter' in data:
        rows.append({
            'key': 'daily_filter',
            'type': 'dict',
            'shape': f"{len(data['daily_filter'])} dates",
            'memory_usage_mb': 0,
            'non_null_count': len(data['daily_filter'])
        })
    
    result_df = pd.DataFrame(rows)
    print("‚úÖ Data Structure Inspection Complete")
    print(f"\nSummary:")
    print(result_df.to_string(index=False))
    return result_df


def get_data_summary(data: Dict[str, Any]) -> Dict[str, Any]:
    """
    Get comprehensive summary statistics for the data structure.
    
    Args:
        data: Data structure dictionary
        
    Returns:
        Dictionary with summary statistics
    """
    summary = {
        'universe': {
            'selected': data.get('selected_universe', 'N/A'),
            'symbol_count': len(data.get('equity_price_close', pd.DataFrame()).columns) if 'equity_price_close' in data else 0,
            'available_universes': list(data.get('universe', {}).keys())
        },
        'date_range': {
            'start': data['metadata'].get('start_date'),
            'end': data['metadata'].get('end_date'),
            'trading_days': len(data['metadata'].get('trading_days', []))
        },
        'market_cap': {
            'non_null_percent': 0,
            'avg_market_cap': np.nan
        },
        'equity_prices': {
            'symbols_with_data': 0,
            'dates_with_data': 0
        },
        'filter_status': {
            'market_cap_pass_rate': 0,
            'beta_pass_rate': 0,
            'price_pass_rate': 0,
            'technical_pass_rate': 0,
            'all_filters_pass_rate': 0
        },
        'daily_filter': {
            'days_with_passers': 0,
            'avg_symbols_per_day': 0,
            'max_symbols_per_day': 0
        },
        'options': {
            'dates_with_options': 0,
            'symbols_with_options': 0,
            'total_option_contracts': 0
        }
    }
    
    # Market cap stats
    if 'market_cap' in data:
        market_cap_df = data['market_cap']
        total_cells = market_cap_df.size
        non_null_cells = market_cap_df.notna().sum().sum()
        summary['market_cap']['non_null_percent'] = (non_null_cells / total_cells * 100) if total_cells > 0 else 0
        summary['market_cap']['avg_market_cap'] = market_cap_df.mean().mean()
    
    # Equity prices stats
    if 'equity_price_close' in data:
        price_df = data['equity_price_close']
        summary['equity_prices']['symbols_with_data'] = price_df.notna().any().sum()
        summary['equity_prices']['dates_with_data'] = price_df.notna().any(axis=1).sum()
    
    # Filter status pass rates
    if 'filter_status' in data:
        for filter_name in ['market_cap', 'beta', 'price', 'technical', 'all_filters']:
            if filter_name in data['filter_status']:
                filter_df = data['filter_status'][filter_name]
                total = filter_df.size
                passed = filter_df.sum().sum()
                summary['filter_status'][f'{filter_name}_pass_rate'] = (passed / total * 100) if total > 0 else 0
    
    # Daily filter stats
    if 'daily_filter' in data:
        daily_filter = data['daily_filter']
        summary['daily_filter']['days_with_passers'] = len([syms for syms in daily_filter.values() if len(syms) > 0])
        symbol_counts = [len(syms) for syms in daily_filter.values()]
        summary['daily_filter']['avg_symbols_per_day'] = np.mean(symbol_counts) if symbol_counts else 0
        summary['daily_filter']['max_symbols_per_day'] = max(symbol_counts) if symbol_counts else 0
    
    # Options stats
    if 'options' in data and 'chains' in data['options']:
        chains = data['options']['chains']
        summary['options']['dates_with_options'] = len(chains)
        symbols_set = set()
        total_contracts = 0
        for date_dict in chains.values():
            for symbol, chain_df in date_dict.items():
                symbols_set.add(symbol)
                if isinstance(chain_df, pd.DataFrame):
                    total_contracts += len(chain_df)
        summary['options']['symbols_with_options'] = len(symbols_set)
        summary['options']['total_option_contracts'] = total_contracts
    
    print("‚úÖ Data Summary Generated")
    return summary


def export_data_structure(data: Dict[str, Any], path: Path) -> None:
    """
    Export data structure to disk (pickle format).
    
    Args:
        data: Data structure dictionary
        path: Path to save file (should end with .pkl)
    """
    import pickle
    
    try:
        path.parent.mkdir(parents=True, exist_ok=True)
        with open(path, 'wb') as f:
            pickle.dump(data, f)
        file_size_mb = path.stat().st_size / (1024 * 1024)
        print(f"‚úÖ Exported data structure to {path}")
        print(f"   File size: {file_size_mb:.2f} MB")
    except Exception as e:
        print(f"Error exporting data structure: {e}")


def load_data_structure(path: Path) -> Dict[str, Any]:
    """
    Load data structure from disk.
    
    Args:
        path: Path to saved data structure file (.pkl)
        
    Returns:
        Data structure dictionary
    """
    import pickle
    
    try:
        with open(path, 'rb') as f:
            data = pickle.load(f)
        print(f"‚úÖ Loaded data structure from {path}")
        return data
    except Exception as e:
        print(f"Error loading data structure: {e}")
        return {}

print("‚úÖ Data Inspection Utilities Defined")


---

# SECTION 5: Filtering Functions

## 5.1 Fundamental Filters

**Data Flow:**
- **Input**: Symbols, market caps, betas, prices
- **Processing**: Filter by market cap, beta, price
- **Output**: DataFrame with filter results
- **Dependencies**: Section 2.4 (market data), Section 4.3 (beta)


In [None]:
def filter_by_market_cap(symbols: List[str], market_caps: Dict[str, float], 
                         min_cap: float, max_cap: Optional[float] = None) -> List[str]:
    """Filter symbols by market cap. Returns filtered list and DataFrame for inspection."""
    filtered = []
    results = []
    for s in symbols:
        cap = market_caps.get(s, 0)
        passes = cap >= min_cap and (max_cap is None or cap <= max_cap)
        results.append({'symbol': s, 'market_cap': cap, 'passed_filter': passes})
        if passes:
            filtered.append(s)
    
    result_df = pd.DataFrame(results)
    print(f"‚úÖ Market cap filter: {len(filtered)}/{len(symbols)} passed")
    return filtered, result_df


def filter_by_beta(symbols: List[str], betas: Dict[str, float], 
                   min_beta: float, max_beta: float) -> List[str]:
    """Filter symbols by beta. Returns filtered list and DataFrame for inspection."""
    filtered = []
    results = []
    for s in symbols:
        beta = betas.get(s, 1.0)
        passes = min_beta <= beta <= max_beta
        results.append({'symbol': s, 'beta': beta, 'passed_filter': passes})
        if passes:
            filtered.append(s)
    
    result_df = pd.DataFrame(results)
    print(f"‚úÖ Beta filter: {len(filtered)}/{len(symbols)} passed")
    return filtered, result_df


def filter_by_price(symbols: List[str], prices: Dict[str, float], 
                   min_price: float, max_price: float) -> List[str]:
    """Filter symbols by price. Returns filtered list and DataFrame for inspection."""
    filtered = []
    results = []
    for s in symbols:
        price = prices.get(s, 0)
        passes = min_price <= price <= max_price
        results.append({'symbol': s, 'price': price, 'passed_filter': passes})
        if passes:
            filtered.append(s)
    
    result_df = pd.DataFrame(results)
    print(f"‚úÖ Price filter: {len(filtered)}/{len(symbols)} passed")
    return filtered, result_df

print("‚úÖ Fundamental Filter Functions Defined")


## 5.2 Technical Filters

**Data Flow:**
- **Input**: Symbols, price data with indicators
- **Processing**: Check if price passes technical filter (SMA/BB)
- **Output**: DataFrame with filter results
- **Dependencies**: Section 4.2 (indicators)


In [None]:
def check_technical_filter(price: float, sma: float, bb_lower: float, bb_upper: float, 
                          filter_type: str) -> bool:
    """Check if price passes technical filter."""
    if filter_type == "NONE":
        return True
    elif filter_type == "SMA_ONLY":
        return price <= sma
    elif filter_type == "BOLLINGER_ONLY":
        return price <= bb_lower
    elif filter_type == "SMA_OR_BOLLINGER":
        return price <= sma or price <= bb_lower
    elif filter_type == "SMA_AND_BOLLINGER":
        return price <= sma and price <= bb_lower
    return True


def apply_technical_filter(symbols: List[str], price_data: Dict[str, pd.DataFrame], 
                          filter_type: str) -> List[str]:
    """Apply technical filter. Returns filtered list and DataFrame for inspection."""
    filtered = []
    results = []
    
    for s in symbols:
        if s not in price_data or price_data[s].empty:
            continue
        
        df = price_data[s]
        if 'price' not in df.columns or len(df) == 0:
            continue
        
        latest = df.iloc[-1]
        price = latest['price']
        sma_val = latest.get(f'sma_{SMA_WINDOW}', price)
        bb_lower = latest.get('bb_lower', price)
        bb_upper = latest.get('bb_upper', price)
        
        passes = check_technical_filter(price, sma_val, bb_lower, bb_upper, filter_type)
        reason = "Pass" if passes else "Fail"
        
        results.append({
            'symbol': s,
            'date': latest.get('date', 'N/A'),
            'price': price,
            'sma': sma_val,
            'bb_lower': bb_lower,
            'passes_filter': passes,
            'reason': reason,
        })
        
        if passes:
            filtered.append(s)
    
    result_df = pd.DataFrame(results)
    print(f"‚úÖ Technical filter: {len(filtered)}/{len(symbols)} passed")
    return filtered, result_df

print("‚úÖ Technical Filter Functions Defined")


## 5.3 Options Chain Filters

**Data Flow:**
- **Input**: Options chain DataFrame, filter parameters
- **Processing**: Filter by type, DTE, delta, liquidity
- **Output**: Filtered DataFrame with statistics
- **Dependencies**: Section 2.3 (options data), Section 4.1 (delta)


In [None]:
def filter_by_type(chain_df: pd.DataFrame, opt_type: str) -> pd.DataFrame:
    """Filter chain by option type. Returns filtered DataFrame."""
    if chain_df.empty or 'type' not in chain_df.columns:
        return pd.DataFrame()
    filtered = chain_df[chain_df['type'] == opt_type].copy()
    print(f"   Type filter ({opt_type}): {len(filtered)}/{len(chain_df)} contracts")
    return filtered


def filter_by_dte(chain_df: pd.DataFrame, min_dte: int, max_dte: int) -> pd.DataFrame:
    """Filter chain by DTE. Returns filtered DataFrame."""
    if chain_df.empty or 'dte' not in chain_df.columns:
        return pd.DataFrame()
    filtered = chain_df[(chain_df['dte'] >= min_dte) & (chain_df['dte'] <= max_dte)].copy()
    print(f"   DTE filter ({min_dte}-{max_dte}): {len(filtered)}/{len(chain_df)} contracts")
    if not filtered.empty:
        print(f"      DTE range: {filtered['dte'].min()}-{filtered['dte'].max()}")
    return filtered


def filter_by_delta(chain_df: pd.DataFrame, target_delta: float, delta_band: float, 
                   opt_type: str) -> pd.DataFrame:
    """Filter chain by delta. Returns filtered DataFrame."""
    if chain_df.empty or 'delta' not in chain_df.columns:
        return pd.DataFrame()
    
    delta_min = target_delta - delta_band
    delta_max = target_delta + delta_band
    
    if opt_type == 'put':
        # For puts, use absolute delta
        filtered = chain_df[chain_df['delta'].abs().between(delta_min, delta_max)].copy()
    else:
        filtered = chain_df[chain_df['delta'].between(delta_min, delta_max)].copy()
    
    print(f"   Delta filter ({delta_min:.2f}-{delta_max:.2f}): {len(filtered)}/{len(chain_df)} contracts")
    if not filtered.empty:
        print(f"      Delta range: {filtered['delta'].abs().min():.3f}-{filtered['delta'].abs().max():.3f}")
    return filtered


def filter_by_liquidity(chain_df: pd.DataFrame, min_oi: int, min_volume: int, 
                       max_spread_pct: float) -> pd.DataFrame:
    """Filter chain by liquidity. Returns filtered DataFrame."""
    if chain_df.empty:
        return pd.DataFrame()
    
    filtered = chain_df.copy()
    
    # Open interest filter
    oi_col = 'open_interest' if 'open_interest' in filtered.columns else 'oi'
    if oi_col in filtered.columns:
        filtered = filtered[filtered[oi_col] >= min_oi]
    
    # Volume filter
    if 'volume' in filtered.columns:
        filtered = filtered[filtered['volume'] >= min_volume]
    
    # Spread filter
    if 'bid' in filtered.columns and 'ask' in filtered.columns:
        mid_price = (filtered['bid'] + filtered['ask']) / 2
        spread = filtered['ask'] - filtered['bid']
        spread_pct = spread / mid_price
        filtered = filtered[spread_pct <= max_spread_pct]
    
    print(f"   Liquidity filter: {len(filtered)}/{len(chain_df)} contracts")
    return filtered


def filter_chain_for_wheel(chain_df: pd.DataFrame, opt_type: str, config: Dict, 
                           as_of_date: date) -> pd.DataFrame:
    """Apply all filters for wheel strategy. Returns filtered DataFrame with breakdown."""
    if chain_df.empty:
        return pd.DataFrame()
    
    print(f"\nüîç Filtering chain for {opt_type} on {as_of_date}")
    print(f"   Initial contracts: {len(chain_df)}")
    
    # Apply filters in sequence
    filtered = filter_by_type(chain_df, opt_type)
    filtered = filter_by_dte(filtered, config['min_dte'], config['max_dte'])
    filtered = filter_by_delta(filtered, config['delta_target'], config['delta_band'], opt_type)
    filtered = filter_by_liquidity(filtered, config['min_oi'], config['min_volume'], 
                                   config['max_spread_pct'])
    
    print(f"   ‚úÖ Final filtered: {len(filtered)} contracts")
    return filtered

print("‚úÖ Options Chain Filter Functions Defined")


---

# SECTION 6-9: Strategy Logic, Backtest Engine, Reporting, and Main Execution

**Note**: Due to notebook size, Sections 6-9 contain placeholder functions. 
These should be implemented following the same pattern as Sections 1-5:
- Small, single-purpose functions
- Clear inputs/outputs
- DataFrame/dict outputs for inspection
- Comprehensive docstrings

**Key Functions Needed:**
- Section 6: Position management, contract selection, rolling logic
- Section 7: Portfolio state, trade execution, daily processing, main backtest loop
- Section 8: Metrics calculation, visualization, reporting
- Section 9: Cache inspection, data collection, pre-computation, backtest execution, results

**Implementation Pattern:**
```python
def function_name(inputs) -> outputs:
    \"\"\"Docstring explaining function.\"\"\"
    # Implementation
    result_df = pd.DataFrame(...)  # For inspection
    print(f"‚úÖ Function complete: {stats}")
    return result, result_df
```

**Next Steps:**
1. Implement Sections 6-9 following the established patterns
2. Test each function independently
3. Integrate into main execution flow
4. Add comprehensive error handling
5. Add progress bars for long-running operations
