# **ALPHA VOLATILITY GENERATION - SYSTEMATIC TRADING STRATEGIES PROJECT**

In [1]:
import numpy as np
from scipy.stats import norm
import pandas as pd
import importlib

In [2]:
from Data.market_data import Market_data
from backtester import Backtester

In [3]:
from strategy.base_strat import BaseStrategy
from strategy.regression_network import Regression_realvol, Regression_IV, Regression_IVvsRV

In [4]:
df_train = pd.read_pickle("df_train.pkl")
df_validation = pd.read_pickle("df_validation.pkl")
df_test = pd.read_pickle("df_test.pkl")
df_price = pd.read_pickle("df_price.pkl")
df_option = pd.read_pickle("df_merged.pkl")

In [5]:
data = Market_data(df_train,df_validation,df_test,df_price,df_option)

----

----

# **Regression network : Strategy based on a linear regression between implied volatility (IV) and realized volatility (RV)**

The strategy takes advantage of differences between implied volatility (IV) and realized volatility (RV) to find trading opportunities. For each option, the last 10 dates are considered, and a linear regression between IV and RV is performed to extract the slope, which reflects the recent trend. Here is the momentum strategy we have chosen **based on what worked best during the backtesting**  :

- **LONG signal:**  
  If (the IV slope is above the 80th percentile of past slopes) **and** (the RV slope is below the 20th percentile), IV is overestimated compared to RV, and the strategy goes **LONG**.

- **SHORT signal:**  
  Conversely, if (the IV slope is below the 20th percentile) **and** (the RV slope is above the 80th percentile), IV is underestimated compared to RV, and the strategy goes **SHORT**.

The strategy exploits recent imbalances between implied and realized volatility, buying when the market overestimates volatility and selling when it underestimates it.


**DRAWBACK** : The main drawback of this strategy is that IV is aggregated using a simple daily average (by quote_date of the straddle). As a result, the strategy does not capture the term structure of volatility. The slope obtained from the regression over the last nb_period days only reflects the average IV trend, which is then compared to realized volatility (standard deviation of returns over the last nb_period days), calculated in the same way.

To sum up, the strategy does not take into account the term structure of straddles during the backtest. Moreover, the choice of quantiles was arbitrary, based on exploratory data analysis, and the window size (nb_period = 10) was also chosen in the same way.

In [6]:
regrv = Regression_realvol(data, 10)
regiv = Regression_IV(data, 10)
regivrv = Regression_IVvsRV(regiv, regrv, data, 10)

In [7]:
back = Backtester(data,regivrv)

In [8]:
print(back.run_backtest_IVvsRV()) # IV under RV over and IV over RV under
print(back.run_backtest_train())
print(back.run_backtest_validation())

The strategy achieved a success rate of 59.53% on 850 straddle trades in df_train.
The strategy achieved a success rate of 100.00% on 14 straddle trades in df_validation.
None
PNL:127.29262945552922, ROI:1.5535199632836612 %
None
PNL:26.74093148660205, ROI:23.681306665428668 %
None


------

-----

# **Garch strat**

This strategy uses a GARCH(1,1) model to predict the realized volatility of the underlying.
I trade only when the predicted volatility is either higher than all implied volatilities (IVs) or lower than all IVs.

- **LONG signal:**  
  If predicted vol < all IVs and the option IV is in the lowest quartile → **LONG**.

- **SHORT signal:**  
  If predicted vol > all IVs and the option IV is in the highest quartile → **SHORT**.

The goal is to exploit extreme situations where model forecasts strongly disagree with market IVs. The strategy is calibrated on a rolling window of spot prices using a GARCH model.
It was then adapted and tuned to perform on df_validation, after training the GARCH on sliding spot data.

**Remark and observation** : We notice that as the filter criteria applied to the straddles become stricter, the success rate gradually decreases (48% → 46% → 45% → 41% of correct signals, see ***exploratory_analysis.ipynb***), while the number of trades in the filtered universe remains reasonable. The results thus become inconsistent with the strategy’s intuition. It is therefore better to use the **inverse** signal of the strategy.

In [10]:
from strategy.garch_strat import Garch_strat

In [11]:
stratgarch = Garch_strat(data)
back = Backtester(data,stratgarch)

In [12]:
back.run_backtest_validation()

PNL:521.7840162818576, ROI:8.598024540579168 %


The strategy has well performed on df_validation

---

---

# **Regime Switching strategy : statistical strat (model free) and Markov regime switching (HMM)**

The idea of the strategies are to exploit sudden regime shifts to build a signal that identifies whether an option is underpriced or overpriced, based on its phase shift relative to the realized volatility. 

An asset’s volatility (whether realized volatility or implied volatility) does not evolve uniformly. It alternates between different “states” or “regimes” (e.g., High Vol / Low Vol), each with its own statistical characteristics.

Retained Strategy : exploit these moments of sudden phase shifts to create a signal:


- If RV suddenly rises but IV lags → the options market underestimates the volatility regime: **LONG** volatility opportunity.

- If RV suddenly drops → one could think the straddles are probably overpriced: **SHORT** volatility opportunity.

### **First Strategy: Regime identification using thresholds (Free-Model and Non-Parametric Approach)**

The first strategy we designed to model volatility regime shifts is a purely statistical, non-parametric approach. 
The idea is to identify volatility regimes using thresholds rather than relying on parametric models.  

For each option, we compute the daily **log-return** and compare it with the **10-day historical volatility**.  
The trading signal is then constructed as follows:  

- **High-volatility regime (LONG signal):**  
  If $|\text{log return}| > \text{threshold}_{high} \times \text{vol}_{10d}$,  
  the market signals a regime shift into a high-volatility state → **LONG straddle**.  

- **Low-volatility regime (SHORT signal):**  
  If $|\text{log return}| < \text{threshold}_{low} \times \text{vol}_{10d}$,  
  the market signals a regime shift into a low-volatility state → **SHORT straddle**.  

The thresholds ($\text{threshold}_{high}, \text{threshold}_{low}$) are calibrated in the 
`exploratory_analysis.ipynb` notebook using the training dataset **df_train**.


In [7]:
from strategy.regime_switching_strat import Regime_switching_modelfree
from strategy.regime_switching_strat import Regime_switching_HMM

In [30]:
reg_switch_freemod = Regime_switching_modelfree(data)
back = Backtester(data,reg_switch_freemod)

  df_train_sub = df_train_sub[self.market_data.df_train['Date'] > " 2016-03-01"].copy()


In [31]:
back.run_backtest_IVvsRV()

The strategy achieved a success rate of 63.44% on 1209 straddle trades in df_train.
The strategy achieved a success rate of 61.64% on 232 straddle trades in df_validation.


In [32]:
back.run_backtest_train()

Result on df_train : PNL:-290.3683528327171, ROI:-2.6202840830417102 %


In [33]:
back.run_backtest_validation()

Result on df_validation : PNL:122.14688399753324, ROI:7.107306718658298 %


**Note**: 

This strategy appears to be particularly sensitive to SHORT signals, as observed in the exploratory_analysis.ipynb file. In volatility trading, risks are asymmetric: a losing long volatility position is limited to the premium paid, while a losing short volatility position can lead to potentially unlimited losses in a market shock, since volatility has no upper limit.

### **Second Strategy: Regime identification using gaussian HMM**  

In this second part on regime-switching strategies, we apply a Hidden Markov Model (HMM) to capture volatility regimes and their transitions. The model is built with 3 states: low, middle, and high, using the hmmlearn package.

After fitting the model on df_train, it outputs an n × 3 probability matrix, where each row (each date) gives the probabilities of being in each regime. This matrix is row-stochastic. To build a trading signal, we focus only on state 1 (low regime) and state 3 (high regime), with conditions: proba_low > seuil_low and proba_high > seuil_high.

From the threshold calibration (see `exploratory_analysis.ipynb`), we observed that the model performs **much better** on SHORT signals: about 80% accuracy on df_train, compared to only 56% for long signals. Therefore, we only keep short signals.  

The final stratégy is :
- **Low-volatility regime (SHORT signal ONLY):**  
  If 5-day realized volatility (RV_5d) > IV and the probability of being in the low regime exceeds seuil_low: SHORT  

In [8]:
reg_switch_hmm = Regime_switching_HMM(data)
back = Backtester(data,reg_switch_hmm)

In [9]:
back.run_backtest_validation()

Result on df_validation : PNL:188.2466034554905, ROI:12.138363948279023 %
