# Volatility Forecasts (Part 2 - XGBoost-STES)

This notebook demonstrates the implementation of the Smooth Transition Exponential Smoothing (STES) model. The model is a variant of the Exponential Smoothing (ES) model that captures non-linear dependencies in volatility time series. The STES model is a more advanced version of the ES model that can capture non-linear dependencies in volatility time series. XGBoost-STES is an extension of STES that uses XGBoost to enhance the STES model by better capturing non-linear dependencies in volatility time series.

This notebook corresponds to the blog series [Volatility Forecasts (Part 2 - XGBoost-STES)](https://steveya.github.io/2024/07/12/volatility-forecast-2.html). We have refactored the code used [Volatility Forecasts (Part 1 - STES Models)](https://steveya.github.io/blob/notebooks/volatility_forecast_1.ipynb) in The aim is to replace the logistic function used by the STES model with a xgboost model, and evaluate their relative performance. It is a work in progress and will be updated as I wrap up my implementations.

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os
project_dir = os.path.abspath('..')
sys.path.insert(0, project_dir)

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from typing import Tuple

from volatility_forecast.data.datamanager import (
    LagReturnDataManager,
    LagAbsReturnDataManager, 
    LagSquareReturnDataManager,
    SquareReturnDataManager,
)
from volatility_forecast.model.stes_model import STESModel
from volatility_forecast.model.neural_network_model import RNNVolatilityModel
from volatility_forecast.model.neural_network_model import GRUVolatilityModel


In [2]:

# Data preparation function
def prepare_data(tickers: Tuple[str], start_date: str, end_date: str):
    returns = LagReturnDataManager().get_data(tickers, start_date, end_date)[0] * 1e2
    realized_var = SquareReturnDataManager().get_data(tickers, start_date, end_date)[0] * 1e4
    feature_sets = np.hstack([
        np.ones((len(returns), 1)),
        LagReturnDataManager().get_data(tickers, start_date, end_date)[0] * 1e2,
        LagAbsReturnDataManager().get_data(tickers, start_date, end_date)[0] * 1e2,
        LagSquareReturnDataManager().get_data(tickers, start_date, end_date)[0] * 1e4,
    ])
    return feature_sets, realized_var, returns

# Data normalization
def normalize_data(X, y):
    scaler_X = StandardScaler()
    scaler_y = StandardScaler()
    X_normalized = scaler_X.fit_transform(X)
    y_normalized = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()
    return X_normalized, y_normalized, scaler_X, scaler_y

# Model evaluation function
def evaluate_model(model, X, y, returns, train_size, test_size):
    model.fit(X[:train_size], y[:train_size], returns[:train_size], 0, train_size)
    predictions = model.predict(X[train_size:train_size+test_size], returns[train_size:train_size+test_size])
    rmse = np.sqrt(mean_squared_error(y[train_size:train_size+test_size], predictions))
    return rmse


In [3]:
# Main execution
tickers = ("SPY",)
start_date = "2000-01-01"
end_date = "2023-12-31"

X, y, returns = prepare_data(tickers, start_date, end_date)
X_normalized, y_normalized, scaler_X, scaler_y = normalize_data(X, y)

train_size = 4000
test_size = 1000


In [9]:


# STES Model
stes_model = STESModel()
stes_rmse = evaluate_model(stes_model, X, y, returns, train_size, test_size)
print(f"STES RMSE: {stes_rmse}")


STES RMSE: 1.661750686869926


In [8]:


# STES Model
stes_model = STESModel()
stes_rmse_scaled = evaluate_model(stes_model, X_normalized, y_normalized, returns, train_size, test_size)
stes_rmse = scaler_y.scale_[0] * stes_rmse_scaled
print(f"STES RMSE: {stes_rmse}")



STES RMSE: 2.456403509282011


In [21]:

rnn_model = RNNVolatilityModel(input_size=X.shape[1], hidden_size=1)
rnn_rmse = evaluate_model(rnn_model, X, y, returns, train_size, test_size)
print(f"RNN RMSE: {rnn_rmse}")



  return F.mse_loss(input, target, reduction=self.reduction)


RNN RMSE: 1.8421252791229161


In [11]:

rnn_model = RNNVolatilityModel(input_size=X.shape[1], hidden_size=3)
rnn_rmse_scaled = evaluate_model(rnn_model, X_normalized, y_normalized, returns, train_size, test_size)
rnn_rmse = scaler_y.scale_[0] * rnn_rmse_scaled
print(f"RNN RMSE: {rnn_rmse}")



RNN RMSE: 3.4246879243225017


In [None]:

# GRU Model
gru_model = GRUVolatilityModel(input_size=X.shape[1], hidden_size=32)
gru_rmse = evaluate_model(gru_model, X_normalized, y_normalized, returns, train_size, test_size)
gru_rmse = scaler_y.inverse_transform([[gru_rmse]])[0][0]
print(f"GRU RMSE: {gru_rmse}")

