The Kalman filter estimates time-varying exposures from noisy returns using a linear Gaussian state-space model. Let:

---

**Model Dimensions and Notation**

- $\beta_t \in \mathbb{R}^{K \times 1}$ — latent exposures to $K$ risk factors at time $t$
- $y_t \in \mathbb{R}^{1 \times 1}$ — observed return fund at time $t$
- $H_t \in \mathbb{R}^{1 \times K}$ — row vector of factor/benchmark returns at time $t$
- $T \in \mathbb{R}^{K \times K}$ — transition matrix (often $T = I_K$ for a random walk)
- $Q \in \mathbb{R}^{K \times K}$ — covariance of state (exposure) noise
- $R \in \mathbb{R}^{1 \times 1}$ — variance of observation noise
- $P_{t|s} \in \mathbb{R}^{K \times K}$ — covariance of state at $t$ given observations up to $s$
- $\hat{\beta}_{t|s} \in \mathbb{R}^{K \times 1}$ — estimate of $\beta_t$ given data up to $s$


**State Equation**  
The exposure vector evolves as a linear Gaussian process:

$$
\beta_t = T \beta_{t-1} + \eta_t, \quad \eta_t \sim \mathcal{N}(0, Q)
$$

**Observation Equation**  
The return is modeled as a noisy linear combination of the exposures:

$$
y_t = H_t \beta_t + \epsilon_t, \quad \epsilon_t \sim \mathcal{N}(0, R)
$$


**Prediction Step**

$$
\hat{\beta}_{t|t-1} = T \hat{\beta}_{t-1|t-1}
$$

$$
P_{t|t-1} = T P_{t-1|t-1} T^\top + Q
$$

These equations propagate the state estimate and uncertainty one step forward using the state dynamics.


**Update Step**

Residual (innovation):

$$
\tilde{y}_t = y_t - H_t \hat{\beta}_{t|t-1}
$$

Innovation covariance:

$$
S_t = H_t P_{t|t-1} H_t^\top + R
$$

Kalman gain:

$$
K_t = P_{t|t-1} H_t^\top S_t^{-1}
$$
---

Posterior mean:

$$
\hat{\beta}_{t|t} = \hat{\beta}_{t|t-1} + K_t \tilde{y}_t
$$

Posterior covariance:

$$
P_{t|t} = (I_K - K_t H_t) P_{t|t-1}
$$


The Kalman filter balances model-driven prediction with observation-driven correction. The innovation term reflects deviation from expectation; the Kalman gain controls the strength of that correction. The posterior estimate is a rank-one affine update to the prior mean, and the covariance shrinks along the direction informed by the new data.


In [None]:
import pandas as pd
import numpy as np
import os
import sys
import argparse
import logging
import time
import pymc as pm
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots

import utils.utils as utils
import plotly.graph_objects as go
from utils.viz_utils import get_sci_template, attach_line_end_labels
from filters.kalman import KalmanSpec, KalmanEngine, run_kalman_grid_search
from post_run.kalman_explore import (ModelDiagnosticsPlotter, 
                            summarize_model_diagnostics, 
                            summarize_factor_dynamics, 
                            plot_beta_grid, 
                            plot_factor_contributions,
                            run_kalman_grid_search
                            )
import numpy as np
import pandas as pd

NameError: name 'KalmanSpec' is not defined

In [None]:
# import yfinance as yf
# import pandas as pd

# tickers = [
#     "FPE", "PFF", "PGX", "PSK", "PFFA", "PFXF", "VRP", "HYG", "ICVT", "LQD", 'IEF', "TLT",
# ]
# raw = yf.download(tickers, start="2012-01-01", auto_adjust=False)
# adj_close = raw["Adj Close"]
# returns = adj_close.pct_change()
# returns.to_csv("data/adj_close_returns_etfs.csv")
# print(returns.head())

# etf_names = {
#     "FPE": "First Trust Preferred Securities and Income ETF",
#     "PFF": "iShares Preferred and Income Securities ETF",
#     "PGX": "Invesco Preferred ETF",
#     "PSK": "SPDR Wells Fargo Preferred Stock ETF",
#     "PFFA": "Virtus InfraCap U.S. Preferred Stock ETF",
#     "PFXF": "VanEck Preferred Securities ex Financials ETF",
#     "VRP": "Invesco Variable Rate Preferred ETF",
#     "HYG": "iShares iBoxx $ High Yield Corporate Bond ETF",
#     "ICVT": "iShares Convertible Bond ETF",
#     "LQD": "iShares iBoxx $ Investment Grade Corporate Bond ETF",
#     "IEF": "iShares 7-10 Year Treasury Bond ETF",
#     "TLT": "iShares 20+ Year Treasury Bond ETF"
# }



In [None]:
bond_return_data = pd.read_csv("data/bond_factor_returns.csv", index_col=0, parse_dates=True)
bond_return_data = bond_return_data.loc["2016-01-01":"2023-01-01"]
#bond_return_data = bond_return_data.drop(columns=['PFFA'])

yield_data = pd.read_csv("data/daily-treasury-rates.csv", index_col=0, parse_dates=True).sort_index()
yield_delta = yield_data.diff().drop(columns=['2 Mo', '4 Mo'])
yield_delta.columns = [f"{col}_delta" for col in yield_delta.columns]
yield_delta = yield_delta.dropna()
merged_data = bond_return_data.merge(yield_delta, left_index=True, right_index=True, how="left")
data_weekly = utils.aggregate_weekly_data(merged_data, additive_cols=yield_delta.columns,)
# Convert all columns in data_weekly that are not yield delta columns to basis points (multiply by 100)
yield_delta_cols = yield_delta.columns
cols_to_convert = [col for col in data_weekly.columns if col not in yield_delta_cols]
# data_weekly[cols_to_convert] = data_weekly[cols_to_convert] * 1e4
# data_weekly[yield_delta_cols] = data_weekly[yield_delta_cols] * 100

data_weekly = data_weekly.sort_values(by="Date")
data_weekly

Unnamed: 0_level_0,Preferred Income (FPE),High Yield (HYG),Convertibles (ICVT),Investment Grade (LQD),Preferred Stock (PFF),1 Mo_delta,3 Mo_delta,6 Mo_delta,1 Yr_delta,2 Yr_delta,3 Yr_delta,5 Yr_delta,7 Yr_delta,10 Yr_delta,20 Yr_delta,30 Yr_delta
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2016-01-08,-0.001056,-0.013155,-0.038062,0.005526,-0.000258,0.06,0.04,-0.04,-0.01,-0.12,-0.11,-0.19,-0.18,-0.14,-0.12,-0.10
2016-01-15,-0.004226,-0.020245,-0.052727,-0.004361,-0.015963,-0.01,0.04,-0.08,-0.15,-0.09,-0.12,-0.11,-0.12,-0.10,-0.11,-0.10
2016-01-22,-0.004227,0.011808,-0.007917,-0.005257,-0.000262,0.07,0.07,0.04,-0.02,0.03,0.03,0.03,0.02,0.04,0.02,0.02
2016-01-29,0.005362,0.005836,-0.006046,0.005373,0.006804,-0.04,0.02,0.02,0.00,-0.12,-0.14,-0.16,-0.14,-0.13,-0.10,-0.08
2016-02-05,-0.010133,-0.018999,-0.002559,-0.003541,-0.010837,0.01,-0.03,0.02,0.08,-0.02,-0.06,-0.08,-0.09,-0.08,-0.09,-0.07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-12-02,0.005297,0.011598,0.014257,0.018562,0.016451,-0.25,-0.07,-0.02,-0.07,-0.14,-0.21,-0.18,-0.17,-0.17,-0.18,-0.18
2022-12-09,-0.004684,-0.006882,-0.017483,-0.005118,-0.030979,-0.10,-0.03,0.07,0.03,0.05,0.08,0.08,0.08,0.06,0.03,0.00
2022-12-16,0.005294,-0.000634,-0.003058,0.004876,-0.001598,0.13,0.00,-0.04,-0.11,-0.16,-0.16,-0.14,-0.11,-0.09,-0.09,-0.03
2022-12-23,-0.006263,-0.002815,-0.010098,-0.018339,-0.015690,-0.14,0.03,-0.01,0.05,0.14,0.18,0.25,0.25,0.27,0.26,0.29


In [None]:
# --- Inspect Results ---Ca

# --- Define Target and Factor (PFF now used as factor) ---
target_col = "Preferred Income (FPE)"  # Example target
factor_cols = ['Preferred Stock (PFF)', 'Convertibles (ICVT)']  # PFF as factor

y = data_weekly[target_col]
H = data_weekly[factor_cols]
index = data_weekly.index

# --- Build Kalman Spec ---
spec = (
    KalmanSpec(K=len(factor_cols), name='Test Model')
    #.set_initial_state_from_ols(H, y)
    #.set_Q_from_factor_vols(H)
    #.set_intercept()

    #.set_Q_from_rolling_beta_var(df=data_weekly, target_col=target_col, factor_cols=factor_cols, window=12)
    #.set_Q_from_rolling_residual_vol(df=data_weekly, target_col=target_col, factor_cols=factor_cols, window=12)
    #.set_R_from_ols(H=H, y=y)
    .set_R_from_rolling_ols_residuals(df=data_weekly, target_col=target_col, factor_cols=factor_cols, window=12)
    #.set_R_from_rolling_factor_vols(H, window=12, )
)

# --- Run Kalman Filter ---
engine = KalmanEngine(spec)
results = KalmanEngine(spec).run(data_weekly, 
target_col= target_col, 
factor_cols=factor_cols, 
burn=8
)

viz = ModelDiagnosticsPlotter(results)
fig = viz.plot(include=None)  # uses default diagnostics: residuals, log_likelihood, gain_norm
display(fig)
df_factors = summarize_model_diagnostics(results)
display(df_factors.style.format(precision=4))  # control formatting here, not in the function
#display(plot_factor_contributions(results, ))

display(summarize_factor_dynamics(results).style.format(precision=4))

fig = plot_beta_grid(results)
display(fig)

Unnamed: 0,Model Name,Target,Date Range,# Observations,Final RMSE,Mean RMSE,Total SSE,Cumulative Log-Likelihood,Mean Gain Norm,Mean Drift Norm
0,Test Model,Preferred Income (FPE),2016-03-04 → 2022-12-30,357,0.0063,0.0063,0.0141,1256.1453,5.0406,0.0156


Unnamed: 0,Model Name,Target,Factor,Avg Beta,Beta Std,Final Beta,Beta Z-Score,Min Beta,Max Beta,Drift Volatility,Mean Gain,Avg Contribution,Final Contribution
0,Test Model,Preferred Income (FPE),Preferred Stock (PFF),0.5277,0.159,0.4767,-0.3203,0.2619,0.8688,0.0235,0.1621,0.0005,-0.0033
1,Test Model,Preferred Income (FPE),Convertibles (ICVT),0.0556,0.0748,0.0397,-0.2124,-0.0791,0.341,0.0166,0.4893,0.0003,-0.0001


In [None]:
grid_results = run_kalman_grid_search(
    df=data_weekly,
    target_col=target_col,
    factor_cols=factor_cols,
    base_spec=KalmanSpec(K=len(factor_cols)),
    burn=24
)

In [None]:
def hex_to_rgba(hex_color: str, alpha: float = 0.15) -> str:
    """
    Converts Plotly hex color (e.g. "#636EFA") to rgba string with transparency.
    """
    hex_color = hex_color.lstrip('#')
    r, g, b = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
    return f"rgba({r}, {g}, {b}, {alpha})"

class FundRunner:
    def __init__(self, df, fund_col, factor_cols, burn=12):
        self.df = df
        self.fund_col = fund_col
        self.factor_cols = factor_cols
        self.burn = burn

    def run(self, spec: KalmanSpec, label: str = "default") -> dict:
        engine = KalmanEngine(spec)
        results = engine.run(
            df=self.df,
            target_col=self.fund_col,
            factor_cols=self.factor_cols,
            burn=self.burn
        )
        results["meta"]["Fund"] = self.fund_col
        results["meta"]["Spec Label"] = label
        return results


class ModelComparisonEngine:
    def __init__(self, df, fund_cols, factor_cols, burn=12):
        self.df = df
        self.fund_cols = fund_cols
        self.factor_cols = factor_cols
        self.burn = burn
        self.results = {}

    def run_all(self, q_setters=None, r_setters=None, init_setters=None):
        for fund in self.fund_cols:
            print(f"Running Kalman filter for {fund}...")
            spec = KalmanSpec(K=len(self.factor_cols)).set_intercept()
            fund_runner = FundRunner(self.df, fund, self.factor_cols, self.burn)
            grid = run_kalman_grid_search(
                df=self.df,
                target_col=fund,
                factor_cols=self.factor_cols,
                base_spec=spec,
                q_setters=q_setters,
                r_setters=r_setters,
                init_setters=init_setters,
                burn=self.burn
            )
            best_row = grid.iloc[0]
            best_spec = KalmanSpec(K=len(self.factor_cols)).set_intercept()
            best_spec.name = best_row["Model Name"]
            results = fund_runner.run(best_spec, label=best_row["Model Name"])
            self.results[fund] = {
                "grid_summary": grid,
                "best_row": best_row,
                "results": results
            }

    def get_best_specs(self):
        return {fund: result["best_row"] for fund, result in self.results.items()}

    def get_all_summaries(self):
        summary_frames = []
        for fund, result in self.results.items():
            summary = result["grid_summary"].copy()
            summary["Fund"] = fund
            summary_frames.append(summary)
        return pd.concat(summary_frames, ignore_index=True)

    def compare_exposures(self):
        comparison_frames = []
        for fund, result in self.results.items():
            summary = summarize_factor_dynamics(result["results"])
            summary["Fund"] = fund
            comparison_frames.append(summary)
        return pd.concat(comparison_frames, ignore_index=True)

    def get_results(self, fund: str) -> Optional[dict]:
        return self.results.get(fund, {}).get("results")

    def stack_betas(self) -> pd.DataFrame:
        """Stack all beta paths across funds for overlaid plotting."""
        betas = []
        for fund, result in self.results.items():
            df_beta = result["results"]["beta"].copy()
            df_beta.columns = [f"{col} ({fund})" for col in df_beta.columns]
            betas.append(df_beta)
        return pd.concat(betas, axis=1) if betas else pd.DataFrame()

    def plot_all_stacked_betas(
        self,
        show_std: bool = True,
        n_cols: int = 2,
        factors: list[str] | None = None
    ) -> go.Figure:
        """
        Grid of subplots for each factor, showing all fund beta paths.
        Colors are consistent across subplots per fund.
        Ribbons are legend-grouped to their lines and reflect posterior uncertainty (±1σ).
        """
        if not self.results:
            raise ValueError("No results to plot. Run `.run_all()` first.")

        selected_factors = factors if factors is not None else self.factor_cols
        n_factors = len(selected_factors)
        cols = min(n_cols, n_factors)
        rows = int(np.ceil(n_factors / cols))

        fig = make_subplots(
            rows=rows,
            cols=cols,
            subplot_titles=selected_factors,
            shared_xaxes=False,
            shared_yaxes=False,
            vertical_spacing=0.1,
            horizontal_spacing=0.05
        )

        fund_names = list(self.results.keys())
        color_map = {fund: color for fund, color in zip(fund_names, px.colors.qualitative.Plotly)}

        line_end_map = {}

        for i, factor in enumerate(selected_factors):
            row = i // cols + 1
            col = i % cols + 1
            trace_names = []

            for fund, result in self.results.items():
                beta_df = result["results"]["beta"]
                beta_cov = result["results"]["beta_cov"]

                if factor not in beta_df.columns:
                    continue

                series = beta_df[factor]
                index = beta_df.index
                factor_idx = beta_df.columns.get_loc(factor)

                std_series = pd.Series(
                    [np.sqrt(P[factor_idx, factor_idx]) for P in beta_cov.values()],
                    index=series.index
                )
                upper = series + std_series
                lower = series - std_series

                # Line trace
                trace = go.Scatter(
                    x=index,
                    y=series,
                    mode="lines",
                    name=fund,
                    legendgroup=fund,
                    showlegend=(i == 0),
                    line=dict(width=2, color=color_map[fund]),
                    hovertemplate="%{x|%Y-%m-%d}<br>Beta: %{y:.3f}<extra>" + fund + "</extra>"
                )
                fig.add_trace(trace, row=row, col=col)
                trace_names.append(fund)

                # Uncertainty ribbon
                if show_std:
                    band = go.Scatter(
                        x=list(index) + list(index[::-1]),
                        y=list(upper) + list(lower[::-1]),
                        fill="toself",
                        fillcolor=hex_to_rgba(color_map[fund], alpha=0.15),
                        line=dict(color="rgba(255,255,255,0)"),
                        hoverinfo="skip",
                        showlegend=False,
                        legendgroup=fund
                    )
                    fig.add_trace(band, row=row, col=col)

            line_end_map[f"subplot_{row}_{col}"] = trace_names

        fig.update_layout(
            height=300 * rows,
            width=1100,
            title_text="Kalman Filter Beta Paths per Factor",
            template="sci_template",
            showlegend=True,
            margin=dict(t=60, b=40)
        )

        attach_line_end_labels(
            fig,
            trace_names=sum(line_end_map.values(), []),
            font_size=12,
            text_anchor="middle left"
        )

        return fig

In [None]:
engine = ModelComparisonEngine(df=data_weekly, fund_cols=['Preferred Income (FPE)', 'High Yield (HYG)'], 
factor_cols=['Preferred Stock (PFF)', 'Convertibles (ICVT)'], burn=12)
engine.run_all()  # optional: pass your own q_setters/r_setters

all_results = engine.get_all_summaries()
exposure_df = engine.compare_exposures()

Running Kalman filter for Preferred Income (FPE)...
Running Kalman filter for High Yield (HYG)...


In [None]:
fig = engine.plot_all_stacked_betas(show_std=False)
fig

In [None]:
spec.describe(target_col=target_col, factor_cols=factor_cols)

{'Model Name': 'Test Model',
 'K (State Dim)': 1,
 'Has Intercept': False,
 'Target Column': 'Preferred Income (FPE)',
 'Factor Columns': ['Preferred Stock (PFF)'],
 'Initial Beta': [0.0],
 'Initial Covariance (P_0 diag)': [100.0],
 'Q Type': 'dynamic',
 'R Type': 'dynamic',
 'Q Scalar': None,
 'R Scalar': None,
 'Q Mode': None,
 'R Mode': 'rolling_ols_window_12',
 'Observation Function': "<class 'function'>",
 'Transition Function': "<class 'function'>"}