```{contents}
```
## State Space Models (SSM)

State Space Models are a class of sequence models that represent a system using **hidden internal states** that evolve over time and generate observable outputs.
They model how the **present arises from the past through an evolving internal memory**.

---

### **Core Intuition**

Think of the system as having a **memory state** that summarizes everything important about the past.

At each time step:

> **Previous state + new input → updated state → output**

Instead of remembering the full history (as in attention), the model carries forward a **compressed dynamic state**.

---

### **Mathematical Formulation**

For time step $t$:

$$
\text{State update: } s_t = A s_{t-1} + B x_t
$$

$$
\text{Output: } y_t = C s_t + D x_t
$$

Where:

* $s_t$ = hidden state
* $x_t$ = input
* $y_t$ = output
* $A, B, C, D$ = learned parameters

This defines a **linear dynamical system**, often extended with nonlinearities in neural SSMs.

---

### **Why SSMs Matter in GenAI**

Transformers rely on **attention over the entire context** — powerful but expensive.
SSMs replace attention with **continuous state evolution**, achieving:

* Long-context memory
* Linear-time computation
* High efficiency

---

### **Modern Neural SSMs**

Recent models like **S4, Mamba, RetNet** integrate SSMs into deep learning and compete with Transformers on language tasks.

---

### **Architecture Overview**

```
Input x₁ → State s₁ → Output y₁
Input x₂ → State s₂ → Output y₂
Input x₃ → State s₃ → Output y₃
...
```

Only the **current state** is carried forward.

---

### **Training**

SSMs are trained end-to-end using backpropagation, similar to other neural networks.

---

### **Advantages**

| Property                | Benefit                  |
| ----------------------- | ------------------------ |
| Linear time & memory    | Scales to long sequences |
| Long-range dependencies | No attention bottleneck  |
| Streaming friendly      | Real-time processing     |
| Hardware efficient      | Low latency inference    |

---

### **Limitations**

| Issue                       | Explanation              |
| --------------------------- | ------------------------ |
| Weaker short-term precision | Compared to attention    |
| Harder expressiveness       | For complex reasoning    |
| Less mature ecosystem       | Compared to Transformers |

---

### **Applications**

#### Natural Language Processing

* Long-context language models
* Document modeling
* Speech recognition

#### Time-Series Analysis

* Financial forecasting
* Sensor data modeling
* Weather prediction

#### Audio & Signal Processing

* Speech synthesis
* Music modeling
* Radar and control systems

#### Control & Robotics

* System dynamics modeling
* Reinforcement learning environments

---

### **SSM vs Transformer**

| Feature           | SSM              | Transformer  |
| ----------------- | ---------------- | ------------ |
| Memory            | Compressed state | Full context |
| Complexity        | O(n)             | O(n²)        |
| Long sequences    | Excellent        | Expensive    |
| Inference latency | Low              | High         |

---

### **When to Use SSMs**

* Ultra-long sequences
* Real-time streaming
* Low-latency environments
* Resource-constrained devices

