# Volatility Surface Construction – Notebook 1

### 1.1 Project Overview & Objectives

This project investigates the calibration of the implied volatility surface for the SPDR S&P 500 ETF Trust (SPY). The volatility surface is a fundamental object in quantitative finance, representing the market's expectation of future volatility across varying strike prices and times to maturity. A static, arbitrage-consistent implied volatility surface is essential for the consistent pricing of vanilla and exotic derivatives, as well as for risk management tasks such as hedging.

The primary objective is to calibrate two industry-standard parametric models—Stochastic Volatility Inspired (SVI) and Stochastic Alpha, Beta, Rho (SABR)—to a single-day snapshot of market option prices. We aim to minimize the calibration error between model-implied volatilities and market-observed volatilities, subsequently analyzing the structural properties and fit quality of each model.

**Asset Selection:**
SPY is selected as the underlying asset due to its status as one of the most liquid equity derivatives globally. It provides a dense grid of strike prices and maturities, which is critical for ensuring stable parameter estimation and minimizing interpolation errors that occur in illiquid chains. The distinct equity skew observed in SPY options provides a robust testing ground for the skew-capture capabilities of SVI and SABR.

### 1.2 Computational Pipeline

The project workflow is divided into distinct stages, handled by specific modules in the `src/` directory and documented across sequential notebooks:

1.  **Data Acquisition (Backend):** Raw option chains, spot prices, and yield curves are extracted via `src/data_loader.py`.
2.  **Preprocessing & Filtering (Notebook 1):** Application of liquidity filters, moneyness bounds, and spread controls to isolate valid calibration data.
3.  **Implied Volatility Extraction (Notebook 2):** Conversion of option premiums to implied volatilities using numerical inversion.
4.  **Parametric Surface Modeling (Notebook 3 to 5):** Optimization of SVI and SABR parameters against the filtered volatility smile.
5. **Research Conclusions (Notebook 6):** Visualizations and diagnostics to evaluate model comaprison.

### 1.3 Data Sourcing & Modeling Assumptions

Market data is sourced via the `yfinance` API. To ensure consistency in the calibration process, specific conventions and assumptions are applied prior to modeling.

**1. Option Pricing Convention**
The calibration target is the **Mid-Price**, calculated as $0.5 \times (Bid + Ask)$. This mitigates the bid-ask bounce and provides a representative fair value for the contract.

**2. Interest Rates & Dividends**
The risk-free rate ($r$) is approximated as a flat term structure based on the 1-year Treasury yield. While the yield curve is generally upward sloping, a flat rate is a standard simplification for single-snapshot surface calibration where the focus is on the smile shape rather than term-structure arbitrage. The dividend yield ($q$) is extracted from the underlying asset's trailing yield.

**3. European Approximation**
SPY options are American-style (exercisable at any time). However, for the purpose of implied volatility surface construction, they are modeled as European options. This approximation is standard in equity volatility modeling, justified by the limited impact of early exercise premium on short-to-medium dated options, particularly outside deep in-the-money regions.

**Note:** To ensure abundant observations, the data was preventively fetched 15 min before US market close.

### 1.4 Feature Engineering & Filtering Logic

The raw dataset undergoes rigorous filtering to remove microstructure noise and arbitrage-violating quotes. This logic is implemented in the data processing pipeline.

**Derived Metrics**

The **Forward Price** ($F$) is calculated as:
$$ F = S_0 e^{(r-q)T} $$

**Log-Moneyness** ($k$) serves as the standardized spatial coordinate for the volatility smile:
$$ k = \ln\left(\frac{K}{F}\right) $$

**Total Variance** ($w$) is the calibration target for the SVI model:
$$ w = \sigma_{imp}^2 T $$

**Filtering Criteria**

To ensure the integrity of the calibration dataset, the following filters are applied:

* **Spread Quality:** Contracts with a relative spread $\frac{Ask - Bid}{Mid} > 20\%$ are discarded to avoid unreliable mid-prices.
* **Maturity Bounds:** $7 \text{ days} \leq T \leq 2 \text{ years}$. Very short maturities are excluded due to microstructure noise and gamma risk; very long maturities are excluded due to illiquidity.
* **Liquidity:** $Volume > 1$ and $OpenInterest > 10$ to remove "ghost" quotes.
* **Moneyness:** $|k| \leq 0.30$. Calibration is restricted to the liquid strike range (approx. $\pm 30\%$ from spot) where the models are most stable.
* **Call/Put Consistency:** Out-of-the-money (OTM) options are prioritized for implied volatility extraction (OTM Puts for $K < F$, OTM Calls for $K > F$) to utilize the most liquid instruments.
* **IV Bounds:** $1\% \leq IV \leq 300\%$ to filter numerical inversion failures.

### 1.5 Statement of Limitations

While the methodology adheres to standard industry practices, the following limitations apply:

1.  **American Premium:** Treating SPY options as European ignores the early exercise premium, potentially overestimating implied volatility for deep ITM puts.
2.  **Flat Term Structure:** The use of a constant $r$ ignores the slope of the yield curve, which may introduce minor biases in the forward price calculation for longer-dated maturities.
3.  **Snapshot Validity:** This analysis is a static calibration to a single point in time. It does not account for the temporal dynamics of the volatility surface.

### 1.6 Conclusion & Handoff

The data processing pipeline results in a cleaned, structured CSV file containing the necessary inputs for model calibration: Strike, Expiration, Time to Maturity ($T$), Log-Moneyness ($k$), and Market Implied Volatility ($\sigma_{imp}$).

The cleaned dataset is stored in `src/options_data.csv`. **Notebook 2** will ingest this file to build the necessary numerical foundations before calibrating the SVI and SABR models.

In [None]:
import pandas as pd
from pathlib import Path
import warnings

warnings.filterwarnings("ignore")

# Define path to the processed dataset generated by the backend pipeline
csv_path = Path.cwd().parent / "src" / "options_data.csv"

try:
    # Load the processed data to verify structure for the next notebook
    final_df = pd.read_csv(csv_path)
    print("Processed Data Schema:")
    final_df.info()
except FileNotFoundError:
    print(f"Processed file not found at: {csv_path}. Please run src/data_loader.py first.")