# Q1 – Effect of Context Length on Forecast Distributions

This notebook analyzes how increasing the context window affects forecast quality across different models and DGPs.

### Objective
- Evaluate distributional changes as context length varies.
- Summarize return statistics and KL divergence per context.

### Key Outputs
- 📄 Tables:
  - Return mean/std across selected forecast days
  - Percentile tables (1%–99%)
  - KL divergence per context and DGP

- 📊 Plots:
  - KL divergence vs. context length

### Notes
- Focused on all models listed in `selected_model_names`
- KL computed between model forecast and DGP distribution at selected days
- Forecast days: `Day 2`, `Day 12`, `Day 22`


In [1]:
# Packages
import pickle
import numpy as np
import pandas as pd
from pathlib import Path

from utils.plotting import plot_kl_vs_context

from utils.evaluation import (
    compute_kl_divergence,
    format_pivot_table,
    dataframe_to_latex
)

# Needed to avoid issues with numpy for TimesFM 2.5
import sys, numpy.core.numeric as numeric
sys.modules['numpy._core.numeric'] = numeric

### Models List

Models can be added or removed from the followng list.

In [2]:
# Selected Models for Analysis
selected_model_names = [
    "chronos_model_tiny",
    "chronos_model_mini",
    "chronos_model_base",
    "lag_llama_model",
    "moirai_model_small",
    "moirai_model_base",
    "moirai_model_small_2_0",   # NEW
    "moirai_model_small_1_1",   # NEW
    "moirai_model_base_1_1",    # NEW
    "toto_model",
    "tirex_model",
    "timesfm_model_small",
    "timesfm_model_large",
    "timesfm_model_2_5"        # NEW
]

In [3]:
# Paths and Setup
results_dir = Path("results_q1_context")
tables_dir = results_dir / "tables"
plots_context_dir = results_dir / "plots_context"
plots_context_dir.mkdir(parents=True, exist_ok=True)

for folder in [tables_dir, plots_context_dir]:
    folder.mkdir(parents=True, exist_ok=True)

forecast_dir = Path("forecasts")
run_dir = Path("runfiles")
datasets_dir = Path("datasets")

selected_days = [0, 10, 20]
ordered_days = [f"Day {d+2}" for d in selected_days]
ordered_percentiles = ["p1%", "p5%", "p25%", "p50%", "p75%", "p95%", "p99%"]
percentile_values = [int(p.replace("p", "").replace("%", "")) for p in ordered_percentiles]

dgp_types_kl = ["gbm_low_vol", "gbm_high_vol", "garch", "t_garch", "mixture_normal", "seasonal"]

### Loading the Forecasts

We load the forecasts and retrieve the specifics.

In [4]:
# Load Forecasts
forecast_files = sorted(forecast_dir.glob("forecast_*.pkl"))
results = []

for forecast_file in forecast_files:
    run_name = forecast_file.stem
    run_file = run_dir / f"{run_name}.txt"
    if not run_file.exists():
        continue

    run_config = {}
    with open(run_file, "r") as f:
        for line in f:
            if "=" in line:
                key, value = [x.strip() for x in line.strip().split("=", 1)]
                try:
                    run_config[key] = eval(value)
                except:
                    run_config[key] = value.strip("\"'").strip("'")

    try:
        with open(forecast_file, "rb") as f:
            forecast_result = pickle.load(f)
            low, median, high, samples, base_price = forecast_result
    except Exception:
        continue

    results.append({
        "run_name": run_name,
        "model_name": run_config["model_name"],
        "dgp_type": run_config["dataset_name"],
        "target_type": run_config["target_type"],
        "context_length": run_config["context_length"],
        "samples": samples,
        "low": low,
        "median": median,
        "high": high,
        "base_price": base_price
    })

# Filter Results by Selected Models
results = [r for r in results if r["model_name"] in selected_model_names]

In [5]:
# Split by Target Type
price_results = [r for r in results if r["target_type"] == "prices"]
return_results = [r for r in results if r["target_type"] == "returns"]

In [6]:
print("Unique model names found in runfiles:")
for name in sorted(set(r["model_name"] for r in results)):
    print(f"'{name}'")

Unique model names found in runfiles:
'chronos_model_base'
'chronos_model_mini'
'chronos_model_tiny'
'lag_llama_model'
'moirai_model_base'
'moirai_model_base_1_1'
'moirai_model_small'
'moirai_model_small_1_1'
'moirai_model_small_2_0'
'timesfm_model_2_5'
'timesfm_model_large'
'timesfm_model_small'
'tirex_model'
'toto_model'


### Defining Functions

We define 2 new special functions to save tables and compute the KL divergence compatible with this notebook setup.

In [7]:
# Forecast Summary and Percentiles
def process_and_save_tables(results_subset, label):
    summary_rows = []
    percentile_rows = []

    for item in results_subset:
        is_price = item["target_type"] == "prices"
        returns = item["samples"]
        if is_price:
            returns = returns[:, 1:] / returns[:, :-1] - 1

        for day in selected_days:
            daily = returns[:, day]
            summary_rows.append({
                "context_length": item["context_length"],
                "dgp_type": item["dgp_type"],
                "model_name": item["model_name"],
                "day": f"Day {day+2}",
                "mean_return (%)": np.mean(daily) * 100,
                "std_return (%)": np.std(daily) * 100
            })

            for p, v in zip(ordered_percentiles, np.percentile(daily, percentile_values)):
                percentile_rows.append({
                    "context_length": item["context_length"],
                    "dgp_type": item["dgp_type"],
                    "model_name": item["model_name"],
                    "day": f"Day {day+2}",
                    "percentile": p,
                    "return (%)": v * 100
                })

    df_summary = pd.DataFrame(summary_rows).round(2)
    df_percent = pd.DataFrame(percentile_rows).round(2)

    pivot_summary = df_summary.pivot_table(
        index=["context_length", "dgp_type", "model_name"],
        columns="day",
        values=["mean_return (%)", "std_return (%)"]
    )
    dataframe_to_latex(format_pivot_table(pivot_summary, selected_days), tables_dir / f"forecast_table_{label}.tex")

    pivot_percent = df_percent.pivot_table(
        index=["context_length", "dgp_type", "model_name"],
        columns=["day", "percentile"],
        values="return (%)"
    )
    ordered_columns = pd.MultiIndex.from_product([ordered_days, ordered_percentiles])
    pivot_percent = pivot_percent.reindex(columns=ordered_columns)
    dataframe_to_latex(format_pivot_table(pivot_percent, selected_days), tables_dir / f"percentiles_table_{label}.tex")

In [8]:
# KL Divergence Table
def compute_and_save_kl(results_subset, label):
    rows = []
    for item in results_subset:
        if item["dgp_type"] not in dgp_types_kl:
            continue

        is_price = item["target_type"] == "prices"
        model_returns = item["samples"]
        if is_price:
            model_returns = model_returns[:, 1:] / model_returns[:, :-1] - 1

        dgp_path = datasets_dir / f"{item['dgp_type']}_returns_paths.npy"

        if not dgp_path.exists():
            continue

        dgp_returns = np.load(dgp_path)

        for day_index in selected_days:
            try:
                p = dgp_returns[:, day_index]
                q = model_returns[:, day_index]
                kl = compute_kl_divergence(p, q)
                rows.append({
                    "context_length": item["context_length"],
                    "dgp_type": item["dgp_type"],
                    "model_name": item["model_name"],
                    "day": f"Day {day_index+2}",
                    "kl_divergence": kl
                })
            except:
                continue

    df_kl = pd.DataFrame(rows).round(4)
    filename = f"kl_divergence_table_{label}.tex"
    pivot_kl = df_kl.pivot_table(
        index=["context_length", "dgp_type", "model_name"],
        columns="day",
        values="kl_divergence"
    )
    dataframe_to_latex(format_pivot_table(pivot_kl, selected_days), tables_dir / filename)
    return df_kl

In [9]:
# Execute Table Generation
process_and_save_tables(price_results, "prices")
process_and_save_tables(return_results, "returns")

df_kl_prices = compute_and_save_kl(price_results, "prices")
df_kl_returns = compute_and_save_kl(return_results, "returns")

### Plotting

Only the specific figure is here plotted.

In [10]:
# Plot Context vs KL Divergence
plot_kl_vs_context(df_kl_prices, plots_context_dir, "prices")
plot_kl_vs_context(df_kl_returns, plots_context_dir, "returns")

### Interpretation: Context Length Effect on Forecast Performance

We assess how increasing context length (22 → 66 → 252) affects forecast quality, using KL divergence as the primary metric. Lower KL indicates better alignment between model forecasts and the true data-generating process (DGP). The analysis emphasizes well-performing models to understand which architectures benefit most from longer histories.

**Returns Forecasts**

Several models clearly benefit from extended context:

- Moirai (base and small) improves steadily across most processes. For example, on the high-volatility GBM, KL divergence drops sharply with longer context. Similar gains are observed for the GARCH and mixture-normal processes, suggesting Moirai is able to extract more accurate return distributions from extended sequences.

- Toto shows consistent or improved KL divergence across all context lengths. By context 252, it delivers among the lowest KL values across nearly all DGPs, particularly for GBM and GARCH. This points to strong generalization and robustness in modeling return behavior.

- Tirex maintains stable and low KL values across contexts. While the improvement from longer context is more modest than Moirai, the model performs reliably well even at short history lengths.

- Lag-Llama performs exceptionally at short context, with KL values already near zero. Its performance remains saturated across longer contexts, meaning it achieves high-quality return forecasts even without extended history.

- Chronos models (base, mini, tiny) show mixed patterns. While KL divergence decreases slightly with longer context—especially for more complex DGPs like GARCH and seasonal—it remains substantially higher than top performers. This suggests Chronos may struggle to extract full benefit from longer return histories.

**Price Forecasts**

The effect of context length on price-level forecasts is less pronounced and sometimes inconsistent:

- Moirai models tend to improve or remain stable across contexts. On the GARCH process, for example, KL divergence drops significantly with more context in the base version, indicating enhanced alignment to the true price dynamics.

- Lag-Llama shows signs of instability with longer price sequences. While excellent in the return space, it exhibits rising KL divergence for several processes—such as high-volatility GBM—where KL can spike beyond 8. This could reflect overfitting or misalignment when processing long price series.

- Toto remains generally robust, with low KL divergence across all contexts. It does, however, show occasional spikes on isolated DGPs, pointing to some sensitivity in its price-level modeling.

- Chronos and TimesFM exhibit moderate improvement or plateauing. Chronos, for instance, shows decreasing KL for low-volatility GBM with more context, but overall KL levels remain higher than those of Moirai or Toto, particularly at longer horizons.

**Conclusion**

Longer context windows benefit most models in the return space, where capturing the underlying distribution is more sensitive to history. Moirai, Toto, and Tirex all demonstrate improved KL scores with context and consistently strong performance overall. Lag-Llama achieves high-quality return modeling even at short contexts, but its price forecasts may degrade with longer sequences. Chronos and TimesFM show limited gains, suggesting potential architectural constraints in extracting information from long temporal inputs.
