# Bitcoin Options Analysis - Put-Call Parity Regression

This notebook demonstrates the analysis of Bitcoin options data from Deribit to extract forward pricing and basis calculations using put-call parity regression.

## Overview
- Process Deribit market data
- Construct option chains with spread tightening
- Perform put-call parity regression
- Apply futures constraints when available
- Visualize results interactively

In [None]:
# Import required libraries
import polars as pl
import numpy as np
import plotly.offline as offline
from datetime import datetime, timedelta
from IPython.display import display, HTML
import sys
import os

# Add parent directory to Python path to access common modules
# Get the project root directory (one level up from notebooks/)
project_root = os.path.dirname(os.path.abspath(__file__ if '__file__' in globals() else os.getcwd()))
if project_root not in sys.path:
    sys.path.append(project_root)

# Import project modules
from common.deribit_md_manager import DeribitMDManager
from common.plotly_manager import PlotlyManager
from common.weight_least_square_regressor import WLSRegressor
from common.nonlinear_minimization import NonlinearMinimization

# Configure Polars display
pl.Config.set_tbl_rows(20)

# Initialize Plotly for notebook
offline.init_notebook_mode()
display(HTML(
    '<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_SVG"></script>'
))

## 1. Data Loading and Setup

Load Deribit market data and initialize the analysis pipeline.

In [None]:
os.chdir('..')
print(f"Current working directory: {os.getcwd()}")

# Configuration
date_str = "20240229"  # YYYYMMDD format
data_file = f'data/{date_str}.market_updates-1318071-20250916.log'  

# Load market data (using lazy evaluation)
try:
    df_market_updates = pl.scan_csv(data_file)
    print(f"Successfully loaded data from {data_file}")
except Exception as e:
    print(f"Error loading data: {e}")
    print("Please ensure the data file is in the correct format and location.")
    # Create sample data structure for demonstration
    df_market_updates = pl.LazyFrame({
        "symbol": ["BTC-29FEB24-60000-C", "BTC-29FEB24-60000-P", "INDEX"],
        "timestamp": ["08:00:00.000", "08:00:00.000", "08:00:00.000"],
        "bid_price": [0.05, 0.02, 0.0],
        "ask_price": [0.06, 0.03, 0.0]
    })
    print("Using sample data for demonstration")

In [None]:
# Initialize pipeline components
symbol_manager = DeribitMDManager(df_market_updates, date_str)
wls_regressor = WLSRegressor()
nonlinear_minimizer = NonlinearMinimization()
plotly_manager = PlotlyManager(date_str, symbol_manager.fut_expiries)

print(f"Available option expiries: {symbol_manager.opt_expiries}")
print(f"Available future expiries: {symbol_manager.fut_expiries}")

## 2. Data Processing and Conflation

Process raw market data into regular time intervals for analysis.

In [None]:
# Data conflation parameters
conflation_every = "1m"   # 1-minute intervals
conflation_period = "10m" # 10-minute lookback window

# Conflate and enrich market data
try:
    df_conflated_md = symbol_manager.get_conflated_md(
        freq=conflation_every, 
        period=conflation_period
    )
    print(f"Conflated data shape: {df_conflated_md.shape}")
    display(df_conflated_md.head())
except Exception as e:
    print(f"Error in data conflation: {e}")
    print("This may be due to missing data or format issues.")

In [None]:
df_conflated_md.filter(pl.col("symbol") == "BTC-29FEB24-60000-C").head()

## 3. Option Chain Analysis

Create option synthetics for put-call parity analysis and apply the tightening technique on the bid-ask spread.

In [None]:
# Analysis parameters
if symbol_manager.opt_expiries:
    expiry = symbol_manager.opt_expiries[0]  # Use first available expiry
    timestamp = datetime(2024, 2, 29, 1, 21, 0)  # Analysis timestamp
    
    print(f"Analyzing expiry: {expiry} at time: {timestamp}")
    
    try:
        # Create option synthetic data
        df_option_chain, df_option_synthetic = symbol_manager.create_option_synthetic(
            df_conflated_md, expiry=expiry, timestamp=timestamp
        )
        
        print(f"Option chain size: {len(df_option_chain)}")
        print(f"Synthetic data size: {len(df_option_synthetic)}")
        
        if not df_option_synthetic.is_empty():
            display(df_option_synthetic.head(10))
        else:
            print("No synthetic data available for this expiry/timestamp")
            
    except Exception as e:
        print(f"Error creating option synthetics: {e}")
else:
    print("No option expiries available in the data")

In [None]:
df_option_chain

In [None]:
# Create a styled version of df_option_chain with yellow highlighting
is_call = True  # Change to False for puts
is_bid = False   # Change to False for asks

if is_call:
    old_col_name = 'old_bid_price' if is_bid else 'old_ask_price'
    int_col_name = 'int_bid_price' if is_bid else 'int_ask_price'
    final_col_name = 'bid_price' if is_bid else 'ask_price'
else:
    old_col_name = 'old_bid_price_P' if is_bid else 'old_ask_price_P'
    int_col_name = 'int_bid_price_P' if is_bid else 'int_ask_price_P'
    final_col_name = 'bid_price_P' if is_bid else 'ask_price_P'

styled_df = df_option_chain.select(['timestamp','strike',old_col_name,int_col_name,final_col_name,'S']).to_pandas()

# Apply conditional styling
def highlight_diff(row):
    if row[old_col_name] != row[int_col_name]:
        return ['background-color: red' if col == int_col_name else '' for col in row.index]
    elif row[int_col_name] != row[final_col_name]:
        return ['background-color: orange' if col == final_col_name else '' for col in row.index]
    else:
        return [''] * len(row)

print(f"Highlighting differences for {'bid' if is_bid else 'ask'} prices of {'calls' if is_call else 'puts'}")

styled_df.style.apply(highlight_diff, axis=1).format({
    'bid_price': '{:.4f}',
    'ask_price': '{:.4f}',
    'bid_price_P': '{:.4f}',
    'ask_price_P': '{:.4f}',
    'S': '{:.2f}',
    'tau': '{:.6f}',
    'old_bid_price': '{:.4f}',
    'old_ask_price': '{:.4f}',
    'old_bid_price_P': '{:.4f}',
    'old_ask_price_P': '{:.4f}',
    'int_bid_price': '{:.4f}',
    'int_ask_price': '{:.4f}',
    'int_bid_price_P': '{:.4f}',
    'int_ask_price_P': '{:.4f}',
})

## 4. Put-Call Parity Regression

Perform weighted least squares regression to extract interest rates.

In [None]:
# WLS Regression
wls_regressor.set_printable(True)

try:
    if 'df_option_synthetic' in locals() and not df_option_synthetic.is_empty():
        wls_result = wls_regressor.fit(df_option_synthetic)
        
        print("\n=== WLS Regression Results ===")
        print(f"USD Interest Rate (r): {wls_result['r']:.4f}")
        print(f"BTC Funding Rate (q): {wls_result['q']:.4f}")
        print(f"Forward Price (F): {wls_result['F']:.2f}")
        print(f"Base Offset (F-S): {wls_result['base_offset']:.2f}")
        print(f"R-squared: {wls_result['r2']:.4f}")
        print(f"Sum of Squared Errors: {wls_result['sse']:.4f}")
        
    else:
        print("No synthetic data available for regression")
        wls_result = None
        
except Exception as e:
    print(f"Error in WLS regression: {e}")
    wls_result = None

## 5. Constrained Optimization

Apply futures constraints if available.

In [None]:
# Constrained Optimization
nonlinear_minimizer.set_printable(True)

try:
    if (wls_result is not None and 'df_option_synthetic' in locals() and 
        not df_option_synthetic.is_empty()):
        
        # Use WLS result as initial guess
        prev_const = wls_result['const']
        prev_coef = wls_result['coef']
        
        constrained_result = nonlinear_minimizer.fit(
            df_option_synthetic, prev_const, prev_coef
        )
        
        print("\n=== Constrained Optimization Results ===")
        print(f"USD Interest Rate (r): {constrained_result['r']:.4f}")
        print(f"BTC Funding Rate (q): {constrained_result['q']:.4f}")
        print(f"Forward Price (F): {constrained_result['F']:.2f}")
        print(f"Base Offset (F-S): {constrained_result['base_offset']:.2f}")
        print(f"R-squared: {constrained_result['r2']:.4f}")
        print(f"Sum of Squared Errors: {constrained_result['sse']:.4f}")
        
        # Compare results
        print("\n=== Comparison ===")
        print(f"Forward Price Difference: {abs(constrained_result['F'] - wls_result['F']):.2f}")
        print(f"R-squared Difference: {constrained_result['r2'] - wls_result['r2']:.4f}")
        
    else:
        print("Cannot perform constrained optimization without WLS results")
        constrained_result = None
        
except Exception as e:
    print(f"Error in constrained optimization: {e}")
    constrained_result = None

## 6. Visualization

Create interactive plots of the regression results.

In [None]:
# Plot regression results
if (wls_result is not None and 'df_option_synthetic' in locals() and 
    not df_option_synthetic.is_empty()):
    
    # Filter by spread if needed
    spread_threshold = 300  # Adjust as needed
    df_filtered = df_option_synthetic
    
    if 'spread' in df_option_synthetic.columns:
        df_filtered = df_option_synthetic.filter(pl.col('spread') < spread_threshold)
        print(f"Filtered to {len(df_filtered)} points with spread < {spread_threshold}")
    
    if not df_filtered.is_empty():
        try:
            # Plot WLS regression
            plotly_manager.plot_regression_result(
                expiry, timestamp.strftime("%H:%M:%S"), df_filtered, wls_result
            )
            # Plot synthetic bid-ask comparison
            plotly_manager.plot_synthetic_bid_ask(
                expiry, timestamp.strftime("%H:%M:%S"), df_filtered, wls_result, 
                use_fitted_rate=False
            )
            # Plot synthetic bid-ask comparison
            plotly_manager.plot_synthetic_bid_ask(
                expiry, timestamp.strftime("%H:%M:%S"), df_filtered, wls_result, 
                use_fitted_rate=True
            )
            
        except Exception as e:
            print(f"Error creating plots: {e}")
    else:
        print("No data points available after filtering")
else:
    print("No results available for visualization")

## 7. Time Series Analysis

Analyze results across multiple time points (if data is available).

In [None]:
# Time series analysis placeholder
print("Time series analysis would require processing multiple timestamps")
print("This involves running the fitting process in a loop over time intervals")
print("\nExample workflow:")
print("1. Define time range (start_ts to end_ts)")
print("2. Loop through timestamps at regular intervals")
print("3. Fit regression for each timestamp")
print("4. Collect results in a DataFrame")
print("5. Plot time series of rates and forward prices")

# Sample code structure (commented out):
"""
results = []
for ts in time_range:
    try:
        _, _, df_synthetic = symbol_manager.create_option_synthetic(
            df_conflated_md, expiry=exp, timestamp=ts
        )
        if not df_synthetic.is_empty() and len(df_synthetic) >= 3:
            fitted_result = fitter.fit(df_synthetic, prev_const, prev_coef)
            results.append({'expiry': exp, 'timestamp': ts, **fitted_result})
            prev_const, prev_coef = fitted_result['const'], fitted_result['coef']
    except ValueError as e:
        continue

# Convert to DataFrame and plot
df_results = pl.DataFrame(results).with_columns(
    (pl.col('r') - pl.col('q')).alias('r-q')
)
plotly_manager.plot_time_series_results(df_results, 'r-q')
"""

print("\nUncomment and modify the above code to run time series analysis")

In [None]:
first_ts_map

In [16]:
# start_ts = datetime(2024,2,29,0,0,0)
first_ts_map=\
df_conflated_md.group_by("expiry").agg(pl.first("timestamp").alias("first_timestamp")).rows_by_key(key='expiry', named=True)

results = []
end_ts = datetime(2024,2,29,16,0,0)
minimum_num_strikes = 3
hit_expiry_0DTE = False

use_nonlinear_optimization = True
if use_nonlinear_optimization:
    fitter = nonlinear_minimizer
else:
    fitter = wls_regressor

prev_const, prev_coef = None, None

fitter.set_printable(False)
for exp in symbol_manager.opt_expiries:
    start_ts = first_ts_map[exp][0]['first_timestamp'] + timedelta(minutes=int(conflation_every[:-1]))

    print(f"Fitting expiry {exp} at {start_ts} ...")
    i = 0
    prev_const, prev_coef = -62000, 1.0

    while True:
        ts = start_ts + timedelta(minutes=(i:=i+1))
        if ts > end_ts:
            print(f"Reached end time {end_ts}, stopping fitting for expiry {exp}")
            break
        
        # if it is 0DTE, we run till 1hour before expiry
        if exp == ts.strftime("%d%b%y").upper() and ts.hour >= 7:
            if not hit_expiry_0DTE:                
                print(f"Skipping expiry {exp} at time {ts} due to 0DTE")
                hit_expiry_0DTE = True
            break

        df_option_chain, df_option_synthetic = symbol_manager.create_option_synthetic(
            df_conflated_md, expiry=exp, timestamp=ts)
        
        if df_option_synthetic.is_empty():
            print(f"Skipping expiry {exp} at time {ts} due to empty synthetic data")
            continue

        if len(df_option_synthetic) <= minimum_num_strikes:
            print(f"Skipping expiry {exp} at time {ts} due to insufficient strikes: {len(df_option_synthetic)}")
            continue
        
        if prev_coef is None or prev_const is None:
            # first fit, there is no previous value for guess.  try wls instead
            fitted_result = wls_regressor.fit(df_option_synthetic, prev_coef=prev_coef, prev_const=prev_const)
            results.append({'expiry': exp, 'timestamp': ts, **fitted_result})
            prev_const, prev_coef = fitted_result['const'], fitted_result['coef']
            continue
        try:
            fitted_result = fitter.fit(df_option_synthetic, prev_coef=prev_coef, prev_const=prev_const)
            results.append({'expiry': exp, 'timestamp': ts, **fitted_result})
            prev_const, prev_coef = fitted_result['const'], fitted_result['coef']
        except ValueError as e:
            print(f"Skipping expiry {exp} at time {ts} due to error: {e}.  Previous guess: {prev_const}, {prev_coef}")
            continue

Reached end time 2024-02-29 16:00:00, stopping fitting for expiry 1MAR24
Fitting expiry 2MAR24 at 2024-02-29 00:11:00 ...
Reached end time 2024-02-29 16:00:00, stopping fitting for expiry 2MAR24
Fitting expiry 3MAR24 at 2024-02-29 08:11:00 ...
Reached end time 2024-02-29 16:00:00, stopping fitting for expiry 2MAR24
Fitting expiry 3MAR24 at 2024-02-29 08:11:00 ...
Skipping expiry 3MAR24 at time 2024-02-29 08:12:00 due to insufficient strikes: 1
Skipping expiry 3MAR24 at time 2024-02-29 08:13:00 due to insufficient strikes: 1
Skipping expiry 3MAR24 at time 2024-02-29 08:14:00 due to insufficient strikes: 1
Skipping expiry 3MAR24 at time 2024-02-29 08:15:00 due to insufficient strikes: 1
Skipping expiry 3MAR24 at time 2024-02-29 08:16:00 due to insufficient strikes: 1
Skipping expiry 3MAR24 at time 2024-02-29 08:17:00 due to insufficient strikes: 2
Skipping expiry 3MAR24 at time 2024-02-29 08:18:00 due to insufficient strikes: 2
Skipping expiry 3MAR24 at time 2024-02-29 08:19:00 due to in

KeyboardInterrupt: 

In [None]:
results

## Summary

This notebook demonstrates the complete workflow for analyzing Bitcoin options using put-call parity regression:

1. **Data Loading**: Import Deribit market data in the expected format
2. **Processing**: Conflate data and create option chains
3. **Analysis**: Apply WLS regression and constrained optimization
4. **Visualization**: Generate interactive plots of results

### Key Results
- **USD Interest Rate (r)**: Extracted from regression coefficients
- **BTC Funding Rate (q)**: Derived from put-call parity
- **Forward Price (F)**: Calculated as S*exp((r-q)*t)
- **Base Offset**: Forward minus spot price

### Next Steps
- Load real Deribit data for comprehensive analysis
- Run time series analysis across trading sessions
- Compare results across different expiries
- Validate against futures market prices