# Volatility Surface Trading Strategy Using Conditional Variational Autoencoders
**Nicholas Kaim-Caudle**

November 2024

## Code
https://colab.research.google.com/drive/1TmoNLsHBdyC_lCo_rbSxO06tdxUphRlb?usp=sharing

## Abstract
This paper presents a novel approach to options trading on the S&P 500 using Conditional Variational Autoencoders (CVAE) to reconstruct the volatility surface. The strategy generates trading signals based on discrepancies between observed market volatility and model-reconstructed volatility. We implement delta hedging using futures contracts to maintain market neutrality. While the methodology demonstrates sophisticated modeling capabilities in volatility surface reconstruction, empirical results indicate limited effectiveness in real market conditions, particularly during periods of elevated volatility. Our findings suggest that highly liquid markets may present fewer exploitable inefficiencies than initially hypothesized.

## 1. Introduction
Recent advances in deep learning architectures have enabled increasingly sophisticated modeling of financial markets, particularly in options trading where complex, non-linear relationships dominate price dynamics. This study explores the application of Conditional Variational Autoencoders to decompose the equity volatility surface into latent factors, offering potential advantages over traditional Principal Component Analysis (PCA) approaches through its capacity for non-linear decomposition.

The fundamental premise of our strategy relies on reconstructing the volatility surface using hidden latent factors via a neural network architecture. Trading signals are generated when the reconstructed volatility surface diverges materially from observed market prices. The implementation incorporates daily delta hedging using futures contracts to maintain market neutrality, with signals evaluated near market close during regular trading sessions.

Our methodology addresses the practical challenges of option trading by modeling the volatility surface in log-strike space, facilitating the selection of optimal listed options with known strikes and expiries when trading signals emerge. The delta-hedging component ensures the strategy maintains neutrality to underlying market movements, isolating the volatility component of returns.

## 2. Data and Methodology

### 2.1 Data Collection and Processing
We source futures and options data for ES (E-mini S&P 500) from DataBento, covering the period from June 2, 2017, to October 25, 2024. For futures, we collect minute-by-minute OHLCV bars throughout the trading day. Option quotes are sampled 12 minutes before market close (15:48 US/Eastern) using top-of-book updates over the preceding 10 minutes.

The conversion of raw market quotes to implied volatilities requires careful consideration of market microstructure effects. We employ a sophisticated Bayesian approach using Markov Chain Monte Carlo methods to determine fair discount rates and futures levels. This process utilizes box prices for discount rate calculation and synthetic quotes for futures level estimation, implemented through the NumPyro library with JAX-based automatic differentiation.

For the determination of fair discount rates, we construct box prices using the difference between synthetics at different strikes, incorporating four options in total. This approach eliminates sensitivity to both implied volatility and futures price levels. The methodology aggregates all valid box bids and asks for each expiration, employing a Bayesian framework with a constrained Normal distribution bounded by the maximum bid and minimum ask implied rates.

The fair futures level determination follows a similar Bayesian approach, utilizing the previously determined discount rates to analyze synthetic quote midpoints, bids, and asks. This sequential process enables robust implied volatility calculations for bid, ask, and mid prices across the surface.

Data quality controls require a minimum of three option expirations per trading day, with each expiration satisfying:
1. Minimum log-strike below -0.5
2. Maximum log-strike above 0.15
3. At least six distinct strikes with valid bid-ask quotes

Market liquidity exhibits significant variation over the study period, with earlier data showing approximately four expirations within a one-year horizon, while later periods include expirations extending to three years.

### 2.2 CVAE Architecture
Our model employs an encoder-decoder framework incorporating conditional information for both processes. The encoder transforms input data and conditional variables through fully connected layers with SiLU activation functions, producing mean and log-variance vectors that parameterize the latent space distribution.

The latent representation is generated using the reparameterization trick:
z = μ + σ * ε

where σ = exp(0.5 * logvar) and ε follows a standard normal distribution. This ensures differentiability while maintaining stochastic properties.

The decoder network reconstructs the input by processing the latent vector concatenated with conditional variables through mirror architecture layers. A softplus activation function in the output layer ensures positive volatility surface values, maintaining theoretical consistency.

The conditional information vector y, representing time-to-expiration for each option series, enables the network to learn temporal relationships in volatility dynamics rather than requiring external temporal modeling. This approach allows for more flexible capture of term structure effects.

The training objective optimizes three components:
1. Reconstruction loss: Measured as mean squared error between input and reconstructed volatility surfaces
2. KL divergence loss: Ensures latent space distribution approximates a standard normal distribution
3. Correlation loss: Minimizes correlation between latent dimensions to promote independent factor representation

Notable architectural features include:
- SiLU activation functions ensuring non-zero gradients throughout the model
- Progressive dimensional reduction through bottleneck layers
- Time series cross-validation with weighted averaging following Donate et al.
- JAX implementation for automatic differentiation and hardware acceleration

### 2.3 Model Training
The initial training phase uses 900 trading days with 250 days held out for model selection. We implement time series cross-validation with five folds and weighted average scoring. The model is retrained every 100 trading days during the testing phase to simulate live trading conditions.

The training process incorporates comprehensive hyperparameter optimization across:
- Latent dimensions (1-6)
- Output channels (1-6)
- Hidden dimensions [4, 8, 16, 24, 48]
- Learning rates (log-space -6 to 0)
- KL loss weights (log-space -8 to 0)

Model evaluation metrics include:
- Reconstruction loss
- KL divergence
- Correlation loss
- Log probability density
- Latent space statistics (mean and standard deviation per dimension)

### 2.4 Signal Generation and Trading Implementation
Trading signals derive from normalized z-scores calculated using an expanding window to avoid look-ahead bias. The strategy specifically targets volatility surface points that diverge from their model-implied values, rather than responding to absolute volatility levels. This approach aims to identify relative mispricings within the surface structure.

Signals trigger when the z-score exceeds ±3 but remains within ±5, with positions held for approximately twice the empirically observed mean reversion half-life. The upper bound helps limit exposure to regime shifts or structural breaks in market behaviour.

The strategy begins with $100 in capital, trading one unit of ES options per signal. Delta hedging maintains market neutrality within 0.1% tolerance using futures contracts. Transaction costs include bid-ask spreads and a 0.01% execution fee for both options and futures trades. While the instruments trade on an exchange, we assume a cash account without margining to simplify the implementation.

## 3. Results and Discussion

### 3.1 Model Performance
The CVAE demonstrates strong capabilities in volatility surface reconstruction. Starting from a baseline RMSE of 12% (comparing surfaces against global mean volatility), the model achieves:
- 8.2% RMSE without daily latent factors
- 2.5% RMSE with two significant factors
- 0.5% RMSE with four factors

Optimal performance coincides with four latent factors, though two factors account for most of the improvement. Unlike linear PCA decomposition, the CVAE factors are not naturally ordered by explanatory power. In our implementation, the second and fourth factors show particular significance, potentially capturing overall volatility level and skew/term structure variations respectively.

The non-linear nature of the CVAE enables capture of complex relationships, such as the inverse relationship between overall volatility levels and smile steepness, within individual factors rather than requiring separate components.

### 3.2 Mean Reversion Analysis
Augmented Dickey-Fuller unit root tests were performed on each time-series of z-score per the strike-expiry matrix and all are shown to exhibit mean reverting behaviour at the 10% confidence level. Of the 33 nodes all are significant at the 10% level, 32 are significant at the 5% level, and 30 are significant at the 1% level. Empirical analysis reveals mean reversion in z-score signals across strike and expiry nodes, with half-lives ranging from 1 to 4 days (mean: 2.1 days). This observation motivates our holding period selection of 3-5 days, approximately twice the average half-life.

### 3.3 Trading Performance
The strategy's performance falls short of expectations in both training and testing periods:

Training Period (2017-2020):
- Gradual capital erosion through 2019
- Win rate: 42-45%
- Terminal value: $\$$88.5 by end-2019
- Transaction costs: $\$$2.5
- Maximum drawdown: 30%
- Complete capital loss during COVID-19 volatility

Testing Period (2022-2024):
- Final capital: $\$$0
- Maximum drawdown: 101%
- Transaction costs: $\$$8.0

## 4. Future Research Directions

Several avenues for improvement warrant investigation:

### 4.1 Implementation Enhancements
- Integration of margin requirements for listed futures and options
- Incorporation of intraday margin calls
- Development of more sophisticated transaction cost models
- Use settlement price for close out of options when intraday market data is unavailable

### 4.2 Model Architecture
- Implementation of convolutional layers
- Expanded network depth and width
- Integration of batch normalization
- Addition of dropout and Gaussian noise layers
- Exploration of alternative activation functions

### 4.3 Volatility Surface Modeling
- Implementation of normalized log-strikes: ln(K/F)/(vol(F) * √t)
- Integration of scheduled economic event impacts, such as NFP, FED, CPI releases, elections
- Transition to business time framework
- Incorporation of bid-ask volatility spreads

### 4.4 Market Coverage
- Extension to multiple equity indices
- Incorporation of individual equity options
- Exploration of less liquid markets where inefficiencies may be more prevalent
- Investigation of alternative trading strategies using the same CVAE framework

### 4.5 Risk Management
- Development of dynamic position sizing
- Implementation of volatility regime detection
- Integration of economic calendar events

## 5. Conclusion
While our CVAE implementation demonstrates sophisticated capabilities in volatility surface reconstruction, achieving significant improvements over baseline models, the trading strategy's performance suggests limitations in capturing profitable market inefficiencies. The model's inability to generate consistent profits, particularly during stress periods, indicates that the highly liquid ES options market may be more efficiently priced than initially hypothesized.

The complete loss of capital during the COVID-19 volatility spike highlights the importance of robust risk management systems and suggests that simple z-score thresholds may be insufficient for regime detection. Future research may benefit from exploring less efficient markets or alternative signal generation approaches that incorporate broader market context.
