# Getting Started with DoltHub Options Dataset

This notebook demonstrates how to load and explore the post-no-preference/options dataset from DoltHub.

## Prerequisites

1. Install Dolt: `brew install dolt` (macOS) or visit https://docs.dolthub.com/
2. Clone the database: `dolt clone post-no-preference/options`
3. Install required packages: `pip install pandas numpy matplotlib jupyter`

In [None]:
# Import necessary libraries
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# Add backtester to path
sys.path.insert(0, str(Path.cwd().parent))

from backtester import (
    DoltHubAdapter,
    MarketDataLoader,
    BlackScholesModel
)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
%matplotlib inline

## Step 1: Initialize the DoltHub Adapter

Point the adapter to your cloned database directory.

In [None]:
# Update this path to where you cloned the database
DB_PATH = "/Users/janussuk/Desktop/dolt_data/options"

# Create adapter
adapter = DoltHubAdapter(DB_PATH)
print(f"✓ Connected to DoltHub database at {DB_PATH}")

## Step 2: Explore Available Data

Let's see what data is available for a popular ticker like AAPL.

In [None]:
# Load a small sample of option data
ticker = "AAPL"
sample_date = "2024-01-02"

options = adapter.load_option_data(
    ticker=ticker,
    start_date=sample_date,
    end_date=sample_date
)

print(f"\nLoaded {len(options)} option records for {ticker} on {sample_date}")
print(f"\nSample data:")
options.head(10)

In [None]:
# Check available expirations
expirations = sorted(options['expiration'].unique())
print(f"\nAvailable expirations ({len(expirations)} total):")
for exp in expirations[:10]:
    print(f"  - {exp.date()}")

In [None]:
# Visualize the volatility smile
# Pick an expiration ~30 days out
exp_date = expirations[2] if len(expirations) > 2 else expirations[0]
exp_options = options[options['expiration'] == exp_date]

# Separate calls and puts
calls = exp_options[exp_options['option_type'] == 'call'].sort_values('strike')
puts = exp_options[exp_options['option_type'] == 'put'].sort_values('strike')

# Plot volatility smile
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

ax1.plot(calls['strike'], calls['implied_vol'], 'b.-', label='Calls', linewidth=2)
ax1.plot(puts['strike'], puts['implied_vol'], 'r.-', label='Puts', linewidth=2)
ax1.set_xlabel('Strike Price')
ax1.set_ylabel('Implied Volatility')
ax1.set_title(f'Volatility Smile for {ticker}\nExpiration: {exp_date.date()}')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot option prices
ax2.plot(calls['strike'], calls['mid_price'], 'b.-', label='Calls', linewidth=2)
ax2.plot(puts['strike'], puts['mid_price'], 'r.-', label='Puts', linewidth=2)
ax2.set_xlabel('Strike Price')
ax2.set_ylabel('Option Price')
ax2.set_title(f'Option Prices for {ticker}\nExpiration: {exp_date.date()}')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 3: Load Volatility History

The dataset also includes historical and implied volatility metrics.

In [None]:
# Load volatility history
vol_hist = adapter.load_volatility_data(
    ticker=ticker,
    start_date="2024-01-01",
    end_date="2024-03-31"
)

print(f"\nLoaded {len(vol_hist)} days of volatility data")
vol_hist.head()

In [None]:
# Plot HV vs IV
fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(vol_hist['date'], vol_hist['hv_current'], 'b-', label='Historical Vol', linewidth=2)
ax.plot(vol_hist['date'], vol_hist['iv_current'], 'r-', label='Implied Vol', linewidth=2)
ax.fill_between(vol_hist['date'], vol_hist['hv_year_low'], vol_hist['hv_year_high'], 
                alpha=0.2, color='blue', label='HV Range')
ax.set_xlabel('Date')
ax.set_ylabel('Volatility')
ax.set_title(f'{ticker} - Historical vs Implied Volatility')
ax.legend()
ax.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Step 4: Load Complete Market Data for Backtesting

Use the MarketDataLoader to create a complete MarketData object with volatility surfaces.

In [None]:
# Create loader
loader = MarketDataLoader(adapter)

# Load market data
market_data = loader.load(
    ticker="AAPL",
    start_date="2024-01-01",
    end_date="2024-03-31",
    build_vol_surface=True
)

print(f"\n✓ Market data loaded successfully!")
print(f"  Spot data: {len(market_data.time_index)} days")
print(f"  Vol surfaces: {len(market_data.vol_surfaces)} dates")

In [None]:
# Test the volatility surface
test_date = pd.Timestamp("2024-01-15")
spot = market_data.get_spot(test_date)
strike = spot * 1.05  # 5% OTM call
expiry = test_date + pd.Timedelta(days=30)

iv = market_data.get_implied_vol(test_date, strike, expiry, spot)

print(f"\nVolatility Surface Test:")
print(f"  Date: {test_date.date()}")
print(f"  Spot: ${spot:.2f}")
print(f"  Strike: ${strike:.2f} (5% OTM)")
print(f"  Days to expiry: 30")
print(f"  Implied Vol: {iv:.2%}")

## Step 5: Explore Different Tickers

The dataset contains options data for many popular stocks.

In [None]:
# Try different tickers
tickers_to_try = ["SPY", "AAPL", "TSLA", "NVDA", "MSFT"]

print("Checking data availability for popular tickers:\n")

for ticker in tickers_to_try:
    try:
        options = adapter.load_option_data(
            ticker=ticker,
            start_date="2024-01-02",
            end_date="2024-01-02"
        )
        print(f"✓ {ticker:6s} - {len(options):5d} option contracts")
    except Exception as e:
        print(f"✗ {ticker:6s} - Not available")

## Next Steps

Now that you can load data from DoltHub, check out the other notebooks:

- `02_iron_condor_backtest.ipynb` - Backtest an iron condor strategy
- `03_calendar_spread_backtest.ipynb` - Test calendar spreads
- `04_custom_strategy.ipynb` - Build your own custom strategies

## Summary

In this notebook, you learned how to:
- ✓ Connect to the DoltHub options database
- ✓ Load and explore options chain data
- ✓ Visualize volatility smiles and surfaces
- ✓ Access historical and implied volatility data
- ✓ Create MarketData objects for backtesting