# Exploratory Data Analysis on Options Data

This notebook is used for performing exploratory data analysis (EDA) on the historical options data. The goal is to visualize and derive insights from the data to inform model training and feature selection.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load the historical options data
data_path = '../data/processed/options_data.csv'
options_data = pd.read_csv(data_path)

# Display the first few rows of the dataset
options_data.head()

In [3]:
# Summary statistics of the dataset
options_data.describe()

In [4]:
# Visualize the distribution of option prices
plt.figure(figsize=(10, 6))
sns.histplot(options_data['option_price'], bins=30, kde=True)
plt.title('Distribution of Option Prices')
plt.xlabel('Option Price')
plt.ylabel('Frequency')
plt.show()

In [5]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = options_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Insights

1. The distribution of option prices shows...
2. The correlation heatmap indicates...

Further analysis can be conducted to explore relationships between different features and their impact on option pricing.