## Overview
1. This is my implementatin of the Hungry Hunry Hippos (H3) model
    - [Article by authors](https://hazyresearch.stanford.edu/blog/2023-01-20-h3)
2. I will not be implementing any of the hardware material (no FlashConv)
3. We will be using a discrete model

![Screenshot%202023-10-31%20at%2011.47.15%20AM.png](attachment:Screenshot%202023-10-31%20at%2011.47.15%20AM.png)

## What are state space models?
State space models (SSMs) are a class of mathematical models used to describe the time evolution of systems. They have a rich history in engineering, particularly in control theory, and have found applications in various fields, including economics, biology, and more recently, machine learning.
</br></br>
In the context of state space models:
</br>
1. **State:** Represents the system's current condition or configuration. It's typically a vector, where each component captures some aspect of the system.
2. **Observation:** Represents the measurements or the data we can observe. We often don't have direct access to the state; instead, we observe some function of the state with potential noise added.
3. **Transition function:** Describes how states evolve over time or across sequential data points.
4. **Observation function:** Relates the state to the observations, typically modeling how we get our observed data from the underlying states.
</br></br>
### Mathematical Representation:
State space models can be broadly classified into linear and nonlinear models. For linear state space models, the relationship can be defined as:
</br></br>
- **State equation:** `x_t_1 = A(x_t) + w_t`
    - Where `x_t_1` is the state at time `t`, `A` is the state transition matrix, and `w_t` is the process noise
- **Observation equation:**   `y_t = C(x_t) + v_t`
    - Where `y_t` is the observation at time `t`, `C` is the observation matrix, and `v_t` is the noise vector
    
### Applications in Machine Learning:
1. **Kalman Filters:** This is a recursive algorithm for estimating the evolving state of a linear dynamic system from a series of noisy measurements. It's particularly popular in robotics and sensor fusion.
2. **Hidden Markov Models (HMMs):** These are used in scenarios where the system under study is assumed to be a Markov process with unobserved states. HMMs are popular in natural language processing, particularly in tasks like part-of-speech tagging and speech recognition.
3. **Particle Filters:** These are used for non-linear/non-Gaussian estimation problems. They rely on a set of particles to represent the posterior distribution of the state.
</br></br>

In the machine learning context, state space models are often used for time series forecasting, system identification, and other tasks where sequential data is involved. The goal is often to learn the underlying dynamics of the system, predict future states, or infer hidden states from observations.

In [2]:
import torch
from torch import nn
from torch import tensor