# Deep State Space Models (DSSMs) -- Theoretical Foundations

## Synopsis

A **deep state space model** (DSSM) merges the principles of **state space models** (SSMs) into **deep neural networks** (DNNs) to create a framework for sequential data modeling.

First, the theoretical foundations are established. Then, a DSSM is implemented and evaluated for forecasting the stock price of TSLA.

**RESULTS:**

- Metrics
  - RMSE: ??.??
  - MAE:  ??.??

## DSSMs in a Nutshell

DSSMs integrate the latent state representations of SSMs into sequential predictions. The latent state of an SSM should encode information about underlying hidden factors (in the context of stock market prices, these could include bull/bear phases, volatility shifts, ...)

Like many models, it is promised to be able to detect hidden trends before they emerge in price movements. 

## (Continuous) SSMs

The concept of SSMs originates from system/control theory in engineering.

A (continuous-time) SSM is defined by four matrices:

- $A \in \mathbb R^{p \times p}$: (Latent) State matrix.
- $B \in \mathbb R^{p \times m}$: Control matrix.
- $C \in \mathbb R^{n \times p}$: Output matrix.
- $D \in \mathbb R^{n \times m}$: Transition matrix (**skip connection**).

For a given input sample path $x(t) \in \mathbb R^{m}$, the SSM computes the corresponding output sample path $y(t) \in \mathbb R^{m}$ by solving

$$
\begin{align*}
\dot s(t) & = A s(t) + B x(t) \\
y (t) & = C s(t) + D x(t)
\end{align*}
$$

where $s(t) \in \mathbb R^{p}$ represents the internal (latent) **state sample path**.

![SSM in Continous Time](./img/ssm-continuous.png)

## DSSMs as Discretisation of a Continuous SSM

Computers cannot handle ordinary differential equations (ODEs) in continuous time directly, they must be discretised.

$$
\begin{align*}
s_t & = A s_{t-1} + B x_t \\
y_t & = C s_t + D x_t
\end{align*}
$$ 

**CAVEAT:** The matrices $A$, $B$, $C$, $D$ are not exactly the same as those in the continuous time equation system!

DSSMs incorporate skip connections: The input influences directly both the state and the output.

![DSSM Recurrent](./img/dssm-recurrent.png)

## DSSMs vs. (Continuous) SSMs

### Input and Output Quantities

- A DSSM sequentially processes discrete-time vectors (similar to other recurrent neural networks (RNNs)).
- An SSM turns continuous sample paths into continuous sample paths. 

### Roles of Parameters/Variables

| Quantity | Machine Learning | Control Theory |
| --- | --- | --- |
| $m$ | determined by $x_t$ | determined by $x(t)$ | 
| $n$ | determined by $y_t$ | determined by $y(t)$ | 
| $x_t$, $x(t)$ | given (training data) | wanted |
| $y_t$, $y(t)$ | given (training data) | given |
| $p$ | hyperparameter | given |
| $A$, $B$, $C$, $D$ | wanted (training parameters) | given |

## DSSMs vs. LSTMs

Like LSTMs, DSSMs internally capture  temporal dependencies across time steps. However:

- DSSMs use the discretisation of a linear system of ODEs.
- LSTMs use many (non-linear) activation functions inside cells (see graphic below).

![LSTM cell](./img/lstm-cell.png)

# DSSMs -- Recurrent or Convolutional?

By a bit of algebra, one can derive from the defining system of equations that

$$
y_k = \left( C A^k B u_0 + C A^{k-1} B u_1 + \cdots + C A B u_{k-1} + C B u_k \right) + D u_k.
$$

Now, the expression in parentheses can be written as convolution $K * u$ with the so-called SSM kernel

$$
K = (C B, CAB, \ldots, CA^{i}B, \ldots).
$$

Thus, (continuous-time) SSMs can also be interpreted in terms of convolutional neural networks (CNNs).

![DSSM Convolutional](./img/dssm-convolutional.png)

# Prediction

## State Prediction

It is said that in some cases, predicting the state is more practical than directly predicting the output. This approach engages estimators like **Kálmán filters** or **Bayesian filters** for state estimation.

## Forecasting

DSSMs can be engaged for forecasting by applying positive or negative lags to the original features or the target variable with missing values imputed as needed.

# References

- [Gu, Albert et al.: *Structured State Space: Combining Continuous-Time, Recurrent, and Convolutional Models* (2022)](https://hazyresearch.stanford.edu/blog/2022-01-14-s4-3)
- [Bourdois, Loïck: *Introduction to State Space Models (SSM)* (2024)](https://huggingface.co/blog/lbourdois/get-on-the-ssm-train)
- [Rangapuram, Syama Sundar et al.: *Deep State Space Models for Time Series Forecasting* (2018)](https://papers.nips.cc/paper_files/paper/2018/file/5cf68969fb67aa6082363a6d4e6468e2-Paper.pdf)
- [Turing, Janelle: *Advanced Time Series Analysis: State Space Models and Kalman Filtering* (2023)](https://janelleturing.medium.com/advanced-time-series-analysis-state-space-models-and-kalman-filtering-3b7eb7157bf2)
- [Murphy, Kevin and Linderman, Scott et al.: *State Space Models: A Modern Approach*. Chapter SSM](https://probml.github.io/ssm-book/chapters/ssm/ssm_index.html)

![dss](./img/dssm-strikethrough.png)

---