# Introduction

My trading strategy consists of option trading on the S&P 500 via the ES listed options and related futures contracts. The basic premise is to reconstruct the volatility surface using hidden latent factors and then trade arbitrage opportunities where the observed market volatility is far away from the reconstructed one. Rather than using PCA for the decomposition I am using a deep neural network architecture called a Variational AutoEncoder, more specifically a Conditional Variational AutoEncoder, CVAE. Using a CVAE allows for a non-linear decomposition of the equity vol surface. 

A trading signal is generated if the reconstructed volatility surface is materially different from the observed one. 

The factors are modelled on the log-strike space of the volatility surface so once a trading signal is sent it will be necessary to choose an optimal actual listed option with known strike and expiry (or multiple options). As this strategy is a volatility based strategy it will be delta hedged to zero using futures at the end of each trading day. The signal is tested once per day near the closing time of the regular trading session. 

# Methodology

## Data
The source of my data for ES futures and options is [DataBento](https://databento.com). For the futures I obtain the OHLCV bars by minute over the trading day, for the options I obtain the best bid and ask price as of 12 minutes before the close of the S&P, at 15:48 US/Eastern time. DataBento don't expilicity provide a snapshot functionality so it is necessary to obtain all the top of book updates, I do this for each listed option for the 10 minutes prior to 15:48 in each trading session. 

The CVAE model expects data in implied volatility format so it is first necessary to work out the implied fair futures level and discount factor on each trading day for each listed expiration. Once obtained each top of book quote can then be converted into an implied vol using the Black 1976 formula and a root finding algorithm. Using raw quote information leads to some difficulties which need to be handled carefully. For instance

For the fair discount rate box prices were used (a synthetic on a lower strike minus a synthetic on a higher strike, so 4 options in total) as they have no sensitivity to implied vol nor future level. For each expiration all the possible box bids and asks were collected and average was taken where the mid implied rate was above the highest bid and below the lowest ask rate. For this a Bayesian approach with markov chain monte-carlo was used. 

Similarly for the fair futures level, once the discount rate was found, a Bayesian approach was used on all the synthetic quotes mids, bids and asks. Once the discount rate and fair future level was worked out the implied vol for bid, ask and mid could easily be backed out. 

The time period is from 02-June-2017 until 25-Oct-2024, however there are gaps where there is not enough quotes to build a representative volatility surface. For each trading day we only count it as valid if there are at least 3 option expirations where:
1. Minimum is log_strike is atleast less than -0.5
2. Maximum of log_strike is atleast greater than 0.15
3. There are atleast 6 seperate strikes

For a strike to be valid it must have a bid and and ask. The ES option market changed over the time period studied, near the start there are only ever 4 expirations with generally less than 1 year to expiration. Near the end of the study some of the expirations are 3 years from pricing date. 

## Volatility Latent Model
Here we use a Conditional Variational Autoencoder (CVAE) architecture designed for time series analysis. The model consists of an encoder-decoder framework that incorporates conditional
information to guide both the encoding and decoding processes.

The encoder network transforms input data X and conditional variables y through a series of fully connected layers with SiLU (Sigmoid Linear Unit) activation functions. The architecture employs a dimensional
reduction strategy, mapping the input through hidden layers to produce two outputs: a mean vector (μ) and log-variance vector (logvar) that parameterize the latent space distribution.

The latent space representation is obtained through the reparameterization trick, z = μ + σ ∗ ε, where σ = exp(0.5 ∗ logvar) and ε is sampled from a standard normal distribution. This ensures differentiability
during training while maintaining the stochastic nature of the encoding process.

The decoder network takes the latent vector z concatenated with the conditional variables y and reconstructs the input through a mirror architecture of fully connected layers. The final output layer uses a
softplus activation function to ensure positive outputs, this ensures that the volatility surface is strictly positive and hence valid.

The conditional information passed along to the Encoder and Decoder in the variable y is the number of years from the pricing date to the expiration. EQ options and futures have fixed expirations, e.g. 20-Dec-2024. So at each pricing date the number of days left until expiry slightly reduces and this will impact the volatility surface. By supplying this y vector into the CVAE it allows the network to learn the relationship between time and volatility rather than having it externally modelled. 

The training objective combines three components:
1. Reconstruction loss: Mean squared error between input and reconstructed output
2. KL divergence loss: Ensures the latent space distribution approximates a standard normal distribution
3. Correlation loss: Minimizes the correlation between latent dimensions, promoting independence in the latent representation

The model implements time series cross-validation using weighted averaging as described by Donate et al., with weights following a geometric progression that favours more recent performance. The implementation
leverages JAX for automatic differentiation and hardware acceleration, with Equinox providing the neural network modules.

Notable features include input normalization, configurable network dimensions, and comprehensive metric tracking including latent space statistics. The architecture is particularly suited for conditional
generation tasks where external variables influence the underlying data distribution.

Other items to note:
1. The SiLU activation function is used so that we get non-zero gradients

# Discussion

# Results

# Bibliography

# Appendix

**OLD**

This implementation presents a Conditional Variational Autoencoder (CVAE) architecture designed to work with two-dimensional input data conditioned on auxiliary information. The model employs a hybrid
convolutional-dense architecture with the following key components:

Encoder Architecture: The encoder network consists of a sequential structure that processes both the input data and conditional information:

 1. A 2D convolutional layer (Conv2d) with kernel size (2,3) that processes single-channel input data, mapping it to out_channels feature maps
 2. A flattening operation followed by concatenation with the conditional vector y
 3. A fully connected layer with SiLU (Sigmoid Linear Unit) activation that maps to a hidden dimension
 4. Two parallel linear layers that output the mean (μ) and log-variance (log σ²) of the latent space distribution

Decoder Architecture: The decoder network mirrors the encoder's structure in reverse:

 1. A fully connected layer that processes the concatenated latent vector and conditional information
 2. An intermediate dense layer with SiLU activation
 3. A reshape operation to prepare for transposed convolution
 4. A transposed convolution layer (ConvTranspose2d) with kernel size (2,3) that reconstructs the original input dimensions
 5. A final Softplus activation ensuring non-negative output values

Latent Space: The model implements the standard VAE reparameterization trick for sampling from the latent space: z = μ + ε * σ, where ε ~ N(0,1)

Loss Function: The training objective appears to be a composite loss function combining:

 1. A reconstruction term (likely mean squared error, based on the continuous nature of the data)
 2. A KL divergence term weighted by kl_loss_alpha
 3. A correlation loss term weighted by correl_loss_alpha

Implementation Details:

 • The model is implemented using the Equinox framework, leveraging JAX for automatic differentiation and acceleration
 • The architecture supports variable input dimensions through parameterized height and width
 • The model maintains dimensional consistency through careful sizing of the convolutional and dense layers
 • Training employs a batched approach with configurable batch sizes and learning rates

Notable Features:

 1. The use of SiLU activation functions in the intermediate layers, which have been shown to perform well in deep learning applications
 2. The implementation of a hybrid architecture combining both convolutional and dense layers
 3. The flexibility to adjust the capacity of the model through configurable parameters:
    - Latent dimension size
    - Hidden dimension size
    - Number of convolutional channels
    - Input dimensions

This implementation presents a Conditional Variational Autoencoder (CVAE) designed for time series data analysis. The architecture combines convolutional and dense layers, processing both input data and
conditional information through an encoder-decoder structure.

The encoder processes input through a 2D convolutional layer followed by dense layers, outputting parameters (mean and log-variance) of the latent space distribution. The decoder reconstructs the input from
the latent representation using transposed convolutions, with both encoder and decoder incorporating the conditional information.

The model employs a composite loss function with two components:

 1. Reconstruction loss using mean squared error
 2. KL divergence term to regularize the latent space

Training utilizes time series cross-validation with 5 folds, implementing a weighted averaging scheme following Donate et al.'s formula. The implementation uses the JAX/Equinox framework for efficient
computation and includes configurable hyperparameters for model capacity (latent dimension, hidden dimension, number of channels) and training dynamics (learning rate, loss weights).

The architecture is particularly suited for temporal data where maintaining sequential relationships is crucial while ensuring decorrelated latent representations.

### Further Research
#### Backtesting Implementation
- Use of margin for listed futures and options, end of day and intraday margin calls

#### CVAE Model Architecture
- Without Convolutional layers
- More fully connected layers, wide and deep
- Normalisation techniques such as BatchNormalisation
- Dropout and random GaussianNoise layers

#### Volatility Surface
- Use normalised log-strikes, take sqrt(t) into account
- Account for known large move event days from the economic calender, e.g. CPI, GDP, NFP, Fed, elections
- Use business time rather than calendar time

#### Data
- Expand to cover multiple equity indices
- Expand to cover stock universe
- Use bid and ask vols rather than mid, perhaps could be another data item to pass into the network as a condition. -1 for bid, +1 for ask