# Q3 – Sensitivity to Volatility Regimes

This notebook investigates how well models capture different volatility regimes in financial time series. We analyze KL divergence between model forecasts and true distributions, conditioning on the known volatility of each data-generating process (DGP). This helps identify which models generalize better under varying levels of uncertainty.

### Objective
- Assess how forecast quality changes across DGPs with increasing volatility.
- Identify which models are robust in high- and low-volatility environments.
- Separate evaluation for price and return forecasts.

### DGP Volatility Spectrum

The DGPs span a wide range of volatility regimes, from nearly deterministic to highly stochastic. Volatility levels are empirically estimated by computing the **standard deviation of returns on the last forecast day (Day 22)** across all simulated paths, then **annualized** using a `√252` scaling factor.

This results in the following volatility ordering:

- **Very Low Volatility**: `gbm_low_vol` (≈8% annualized)
- **Low Volatility**: `mixture_normal`
- **Medium Volatility**: `seasonal`
- **High Volatility**: `t_garch` (heavy tails and time-varying volatility)
- **Very High Volatility**: `gbm_high_vol` (≈80% annualized)

### Key Outputs
- 📄 KL divergence tables:
  - Sorted by DGP volatility
  - Shown separately for prices and returns
- 📊 Bar plots:
  - KL divergence vs. DGP volatility for each model

### Notes
- Context length is held fixed during each comparison.
- KL is computed at three forecast horizons: Day 2, Day 12, and Day 22.
- Only models in `selected_model_names` are evaluated.

This setup allows us to systematically evaluate how model performance varies with increasing market volatility—an essential consideration in risk-sensitive financial modeling.


In [1]:
# Packages
import pickle
import numpy as np
import pandas as pd
from pathlib import Path

from utils.evaluation import (
    compute_kl_divergence,
    format_pivot_table,
    dataframe_to_latex
)

from utils.plotting import plot_kl_bar_by_vol

### Models List

Models can be added or removed from the followng list.

In [2]:
# Selected Models for Analysis
selected_model_names = [
    "chronos_model_tiny",
    "chronos_model_mini",
    "chronos_model_base",
    "lag_llama_model",
    "moirai_model_small",
    "moirai_model_base",
    "toto_model",
    "tirex_model",
    "timesfm_model_small",
    "timesfm_model_large"
]

In [3]:
# Paths and Setup
results_dir = Path("results_q3_volatility")
tables_dir = results_dir / "tables"
plots_bar_dir = results_dir / "plots_bar"
plots_bar_dir.mkdir(parents=True, exist_ok=True)
tables_dir.mkdir(parents=True, exist_ok=True)

forecast_dir = Path("forecasts")
run_dir = Path("runfiles")
datasets_dir = Path("datasets")

selected_days = [0, 10, 20]
context_lengths = [22, 66, 252]
dgp_types_kl = ["gbm_low_vol", "mixture_normal", "seasonal", "t_garch", "gbm_high_vol"]

### Loading the Forecasts

We load the forecasts and retrieve the specifics.

In [4]:
# Load Forecasts
forecast_files = sorted(forecast_dir.glob("forecast_*.pkl"))
results = []

for forecast_file in forecast_files:
    run_name = forecast_file.stem
    run_file = run_dir / f"{run_name}.txt"
    if not run_file.exists():
        continue

    run_config = {}
    with open(run_file, "r") as f:
        for line in f:
            if "=" in line:
                key, value = [x.strip() for x in line.strip().split("=", 1)]
                try:
                    run_config[key] = eval(value)
                except:
                    run_config[key] = value.strip("\"'").strip("'")

    try:
        with open(forecast_file, "rb") as f:
            forecast_result = pickle.load(f)
            low, median, high, samples, base_price = forecast_result
    except Exception:
        continue

    results.append({
        "run_name": run_name,
        "model_name": run_config["model_name"],
        "dgp_type": run_config["dataset_name"],
        "target_type": run_config["target_type"],
        "context_length": run_config["context_length"],
        "samples": samples,
        "low": low,
        "median": median,
        "high": high,
        "base_price": base_price
    })

# Filter Results by Selected Models
results = [r for r in results if r["model_name"] in selected_model_names]

price_results = [r for r in results if r["target_type"] == "prices"]
return_results = [r for r in results if r["target_type"] == "returns"]

### Defining Functions

We define 3 new special functions to compute volatility, save tables and compute the KL divergence.

In [5]:
# Compute Annualized Volatility of True DGP (last day, across paths)
def compute_dgp_volatilities(dgp_types, datasets_dir, is_price=True, last_day_index=20):
    dgp_vols = {}
    for dgp in dgp_types:
        file_path = datasets_dir / (f"{dgp}_paths.npy" if is_price else f"{dgp}_returns_paths.npy")
        if not file_path.exists():
            continue

        data = np.load(file_path)
        returns = data[:, 1:] / data[:, :-1] - 1 if is_price else data
        last_day_vol = np.std(returns[:, last_day_index])
        annualized_vol = last_day_vol * np.sqrt(252)
        dgp_vols[dgp] = round(annualized_vol, 4)

    return dgp_vols

dgp_volatility_map = compute_dgp_volatilities(dgp_types_kl, datasets_dir, is_price=True)

In [6]:
# Compute KL Divergence and Attach DGP Volatility
def compute_kl_and_vol(results_subset, dgp_vol_map):
    df_rows = []

    for item in results_subset:
        if item["dgp_type"] not in dgp_types_kl:
            continue

        is_price = item["target_type"] == "prices"
        model_returns = item["samples"]
        if is_price:
            model_returns = model_returns[:, 1:] / model_returns[:, :-1] - 1

        dgp_path = datasets_dir / f"{item['dgp_type']}_returns_paths.npy"
        if not dgp_path.exists():
            continue

        dgp_returns = np.load(dgp_path)

        for day_index in selected_days:
            try:
                p = dgp_returns[:, day_index]
                q = model_returns[:, day_index]
                kl = compute_kl_divergence(p, q)
                df_rows.append({
                    "context_length": item["context_length"],
                    "dgp_type": item["dgp_type"],
                    "model_name": item["model_name"],
                    "day": f"Day {day_index + 2}",
                    "kl_divergence": kl,
                    "volatility": dgp_vol_map[item["dgp_type"]]
                })
            except:
                continue

    return pd.DataFrame(df_rows).round(4)

df_kl_prices = compute_kl_and_vol(price_results, dgp_volatility_map)
df_kl_returns = compute_kl_and_vol(return_results, dgp_volatility_map)

In [7]:
# Generate KL Tables Sorted by Volatility
def save_kl_table_sorted_by_vol(df_kl, label):
    for context in context_lengths:
        df_filtered = df_kl[df_kl["context_length"] == context]
        pivot = df_filtered.pivot_table(
            index=["context_length", "volatility", "dgp_type", "model_name"],
            columns="day",
            values="kl_divergence"
        ).sort_index(level="volatility")

        formatted = format_pivot_table(pivot, selected_days)
        filename = f"q3_kl_by_vol_{label}_context{context}.tex"
        dataframe_to_latex(formatted, tables_dir / filename, preserve_index_order=True)

save_kl_table_sorted_by_vol(df_kl_prices, "prices")
save_kl_table_sorted_by_vol(df_kl_returns, "returns")

### Plotting

Only the specific figure is here plotted.

In [8]:
# Generate KL Bar Plot with Volatility on X-axis
for context_val in context_lengths:
    plot_kl_bar_by_vol(df_kl_prices, plots_bar_dir, "prices", context_val)
    plot_kl_bar_by_vol(df_kl_returns, plots_bar_dir, "returns", context_val)

### Interpretation: How Does Volatility Influence Forecasting Performance?

We assess model robustness across DGPs with varying volatility, using KL divergence at context length 22. Lower KL indicates better alignment between predicted and ground-truth distributions. Both price and return forecasts are analyzed.

**Price Forecasts**

- High volatility (gbm_high_vol) is well handled by most models. Chronos, Toto, Tirex, and TimesFM large maintain low KL values even at Day 22. Lag-Llama also performs well.

- Low volatility (gbm_low_vol) creates instability. Chronos base and mini spike at longer horizons. TimesFM small and Moirai small degrade sharply. Tirex and Chronos tiny remain more stable.

- Mixture normal shows moderate divergence. Chronos and Lag-Llama handle it well. Toto performs inconsistently, with late-horizon spikes. Moirai and TimesFM show steady error growth.

- T-garch is challenging. Chronos and TimesFM small struggle. Tirex and Toto manage better after Day 12. Lag-Llama and Moirai base show strong adaptation.

- Seasonal is captured well by Chronos and Lag-Llama. Moirai and TimesFM small degrade on later horizons. Toto performs consistently.

**Return Forecasts**

- High-volatility returns are difficult for Chronos (KL > 13 across DGPs). Lag-Llama and Toto remain robust with near-zero KL. Moirai and TimesFM vary with size.

- Low volatility returns again highlight poor performance from Chronos (KL ~15+). Lag-Llama and Toto are highly accurate. Tirex, Moirai small, and TimesFM large do well.

- Mixture normal return divergence is wide. Chronos is high. Toto and Lag-Llama remain strong. TimesFM and Moirai perform moderately well.

- T-garch is toughest overall. Lag-Llama, Moirai, and Toto show good late-horizon accuracy. Chronos and Tirex exhibit large, persistent error.

- Seasonal returns are best captured by Lag-Llama and Toto (KL ≈ 0). Chronos performs poorly again. Moirai and TimesFM are stable, but not top-tier.

**Conclusion**

Volatility impacts model performance in nuanced ways:

- High volatility does not always increase KL; some models generalize better in noisy regimes.
- Chronos struggles across the board in return forecasting.
- Lag-Llama and Toto are the most consistent performers across volatility levels and forecast types.
- Moirai and TimesFM show size-dependent variability.
- Tirex excels in price forecasting but is less stable in return space.
