The Kalman filter estimates time-varying exposures from noisy returns using a linear Gaussian state-space model. Let:

---

**Model Dimensions and Notation**

- $\beta_t \in \mathbb{R}^{K \times 1}$ — latent exposures to $K$ risk factors at time $t$
- $y_t \in \mathbb{R}^{1 \times 1}$ — observed return fund at time $t$
- $H_t \in \mathbb{R}^{1 \times K}$ — row vector of factor/benchmark returns at time $t$
- $T \in \mathbb{R}^{K \times K}$ — transition matrix (often $T = I_K$ for a random walk)
- $Q \in \mathbb{R}^{K \times K}$ — covariance of state (exposure) noise
- $R \in \mathbb{R}^{1 \times 1}$ — variance of observation noise
- $P_{t|s} \in \mathbb{R}^{K \times K}$ — covariance of state at $t$ given observations up to $s$
- $\hat{\beta}_{t|s} \in \mathbb{R}^{K \times 1}$ — estimate of $\beta_t$ given data up to $s$


**State Equation**  
The exposure vector evolves as a linear Gaussian process:

$$
\beta_t = T \beta_{t-1} + \eta_t, \quad \eta_t \sim \mathcal{N}(0, Q)
$$

**Observation Equation**  
The return is modeled as a noisy linear combination of the exposures:

$$
y_t = H_t \beta_t + \epsilon_t, \quad \epsilon_t \sim \mathcal{N}(0, R)
$$


**Prediction Step**

$$
\hat{\beta}_{t|t-1} = T \hat{\beta}_{t-1|t-1}
$$

$$
P_{t|t-1} = T P_{t-1|t-1} T^\top + Q
$$

These equations propagate the state estimate and uncertainty one step forward using the state dynamics.


**Update Step**

Residual (innovation):

$$
\tilde{y}_t = y_t - H_t \hat{\beta}_{t|t-1}
$$

Innovation covariance:

$$
S_t = H_t P_{t|t-1} H_t^\top + R
$$

Kalman gain:

$$
K_t = P_{t|t-1} H_t^\top S_t^{-1}
$$
---

Posterior mean:

$$
\hat{\beta}_{t|t} = \hat{\beta}_{t|t-1} + K_t \tilde{y}_t
$$

Posterior covariance:

$$
P_{t|t} = (I_K - K_t H_t) P_{t|t-1}
$$


The Kalman filter balances model-driven prediction with observation-driven correction. The innovation term reflects deviation from expectation; the Kalman gain controls the strength of that correction. The posterior estimate is a rank-one affine update to the prior mean, and the covariance shrinks along the direction informed by the new data.


In [1]:
import pandas as pd
import numpy as np
import os
import sys
import argparse
import logging
import time
#import pymc as pm
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots

import utils.utils as utils
import plotly.graph_objects as go
from viz.viz_tools import get_sci_template, attach_line_end_labels
from filters.kalman import KalmanSpec, KalmanEngine
from viz.kalman_viz import (ModelDiagnosticsPlotter, 
                            summarize_model_diagnostics, 
                            summarize_factor_dynamics, 
                            plot_beta_grid, 
                            plot_factor_contributions)
import numpy as np
import pandas as pd

In [2]:
bond_return_data = pd.read_csv("data/bond_factor_returns.csv", index_col=0, parse_dates=True)
yield_data = pd.read_csv("data/yield_data.csv", index_col=0, parse_dates=True)
yield_data.rename(columns={yield_data.columns[0]: "yield_level"}, inplace=True)
yield_data["yield_delta"] = yield_data["yield_level"].diff()
merged_data = bond_return_data.merge(yield_data, left_index=True, right_index=True, how="left")
data_weekly = utils.aggregate_weekly_data(merged_data)

for col in ['duration_t', 'yield_level']:
    if col in data_weekly.columns:
        data_weekly = data_weekly.drop(columns=[col])
data_weekly.head(10)

Unnamed: 0_level_0,Preferred Income (FPE),High Yield (HYG),Convertibles (ICVT),Investment Grade (LQD),Preferred Stock (PFF),yield_delta
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-06-05,-0.001567,-0.003009,0.001821,-0.005064,-0.003284,0.1
2015-06-12,-0.00314,-0.002013,-0.004442,-0.000776,-0.004053,-0.02
2015-06-19,-0.003149,0.004257,0.003853,0.005783,-0.001781,-0.13
2015-06-26,0.004907,-0.007251,-0.009292,-0.01356,0.001784,0.23
2015-07-03,-0.002108,0.001758,-0.00673,0.007397,-0.000705,-0.09
2015-07-10,0.00264,0.000564,-0.005132,-0.003551,-0.000767,0.12
2015-07-17,0.006846,-0.001238,0.004746,0.003564,0.011005,-0.08
2015-07-24,0.00131,-0.016237,-0.014992,0.003551,-0.004303,-0.07
2015-07-31,0.001573,0.008252,-0.006048,0.004142,0.004831,-0.07
2015-08-07,-0.00157,-0.009866,-0.009115,-0.001292,-0.000892,-0.02


In [3]:
# --- Inspect Results ---Ca

# --- Define Target and Factor (PFF now used as factor) ---
target_col = "Preferred Income (FPE)"  # Example target
factor_cols = ["Preferred Stock (PFF)", 'Convertibles (ICVT)']  # PFF as factor

y = data_weekly[target_col].values.reshape(-1, 1)
H = data_weekly[factor_cols].values
index = data_weekly.index

# --- Build Kalman Spec ---
spec = (
    KalmanSpec(K=len(factor_cols), name='Simple Model')
    .estimate_initial_state(H, y)
    .set_Q_from_factor_vols(H)
    .estimate_R_from_ols(H, y)
)

# --- Run Kalman Filter ---
engine = KalmanEngine(spec)
results = KalmanEngine(spec).run(data_weekly, 
target_col= "Preferred Income (FPE)", 
factor_cols=factor_cols, 
burn=0
)

viz = ModelDiagnosticsPlotter(results)
fig = viz.plot()  # uses default diagnostics: residuals, log_likelihood, gain_norm
display(fig)
df_factors = summarize_model_diagnostics(results)
display(df_factors.style.format(precision=4))  # control formatting here, not in the function
display(plot_factor_contributions(results, ))

display(summarize_factor_dynamics(results).style.format(precision=4))

fig = plot_beta_grid(results)
display(fig)

Unnamed: 0,Model Name,Target,Date Range,# Observations,Final RMSE,Mean RMSE,Total SSE,Cumulative Log-Likelihood,Mean Gain Norm,Mean Drift Norm
0,Simple Model,Preferred Income (FPE),2015-06-05 → 2025-04-18,516,0.0069,0.0061,0.0193,1933.0803,5.4003,0.016


Unnamed: 0,Model Name,Factor,Avg Beta,Beta Std,Final Beta,Beta Z-Score,Min Beta,Max Beta,Drift Volatility,Mean Gain,Avg Contribution,Final Contribution
0,Simple Model,Preferred Stock (PFF),0.5812,0.1337,0.4844,-0.7238,0.344,0.9947,0.0296,-0.525,0.0005,0.0033
1,Simple Model,Convertibles (ICVT),0.0425,0.0714,-0.0023,-0.6276,-0.2953,0.3426,0.0235,0.1346,0.0001,-0.0


In [None]:
# --- Define Target and Factor ---
target_col = "Preferred Income (FPE)"
factor_cols = ["Preferred Stock (PFF)", "Convertibles (ICVT)"]

y = data_weekly[target_col].values.reshape(-1, 1)
H = data_weekly[factor_cols].values
index = data_weekly.index

# --- Build Kalman Spec with optimal Q and R ---
spec = (
    KalmanSpec(K=len(factor_cols), name="Simple Model")
    .estimate_initial_state(H, y)
    # .set_Q_from_mle(data_weekly, target_col, factor_cols)
    # .set_R_from_mle(data_weekly, target_col, factor_cols)
)

# --- Run Kalman Filter ---
engine = KalmanEngine(spec)
results = engine.run(
    data_weekly,
    target_col=target_col,
    factor_cols=factor_cols,
    burn=0
)

# --- Visualizations and Diagnostics ---
viz = ModelDiagnosticsPlotter(results)
display(viz.plot())

display(summarize_model_diagnostics(results).style.format(precision=4))
display(plot_factor_contributions(results))
display(summarize_factor_dynamics(results).style.format(precision=4))
display(plot_beta_grid(results))


Unnamed: 0,Model Name,Target,Date Range,# Observations,Final RMSE,Mean RMSE,Total SSE,Cumulative Log-Likelihood,Mean Gain Norm,Mean Drift Norm
0,Simple Model,Preferred Income (FPE),2015-06-05 → 2025-04-18,516,0.004,0.0073,0.0277,1842.6655,2.8687,0.0072


Unnamed: 0,Model Name,Factor,Avg Beta,Beta Std,Final Beta,Beta Z-Score,Min Beta,Max Beta,Drift Volatility,Mean Gain,Avg Contribution,Final Contribution
0,Simple Model,Preferred Stock (PFF),0.7347,0.1606,0.7919,0.3561,0.4495,1.0065,0.0249,-0.549,0.0005,0.0054
1,Simple Model,Convertibles (ICVT),0.0214,0.0428,0.0466,0.5878,-0.2953,0.1747,0.018,-0.2083,0.0001,0.0006


In [5]:
spec.estimate_initial_state(H, y)

<filters.kalman.KalmanSpec at 0x242f23bea90>

In [6]:
spec.set_Q_from_mle(data_weekly, target_col, factor_cols)

<filters.kalman.KalmanSpec at 0x242f23bea90>

In [7]:
# # --- Inspect Results ---Ca

# # --- Define Target and Factor (PFF now used as factor) ---
# target_col = "Preferred Income (FPE)"  # Example target
# factor_cols = ["Preferred Stock (PFF)", 'Convertibles (ICVT)']  # PFF as factor

# y = data_weekly[target_col].values.reshape(-1, 1)
# H = data_weekly[factor_cols].values
# index = data_weekly.index

# # --- Build Kalman Spec ---
# spec = (
#     KalmanSpec(K=len(factor_cols), name='Simple Model')
#     .estimate_initial_state(H, y)
#     .set_Q_from_factor_vols(H)
#     .estimate_R_from_ols(H, y)
#     .se
# )

# # --- Run Kalman Filter ---
# engine = KalmanEngine(spec)
# results = KalmanEngine(spec).run(data_weekly, 
# target_col= "Preferred Income (FPE)", 
# factor_cols=factor_cols, 
# burn=0
# )

# viz = ModelDiagnosticsPlotter(results)
# fig = viz.plot()  # uses default diagnostics: residuals, log_likelihood, gain_norm
# display(fig)
# df_factors = summarize_model_diagnostics(results)
# display(df_factors.style.format(precision=4))  # control formatting here, not in the function
# display(plot_factor_contributions(results, ))

# display(summarize_factor_dynamics(results).style.format(precision=4))

# fig = plot_beta_grid(results)
# display(fig)