Statistical filtering refers to an algorithm for extracting a latent state variable from noisy data using a statistical model. Formally,
\begin{align}
&\text{Observation equation}:y_t=f(x_t,\epsilon_t^y)\\
&\text{State evolution}:x_{t+1}=f(x_t,\epsilon_{t+1}^x)
\end{align}
Both of these distributions typically depend on static parameters, $\Theta$, and for the purpose of filtering, $\Theta$ is considered known. The posterior distribution of $x_t$ given the observed data $y_{1:t}$, namely $p(x_t|y_{1:t})$, solves
the filtering problem. More specifically, in the Bayesian perspective, 
- The **prediction step** $p(x_{t+1}|y_{1:t})$ can be viewed as the **prior**;
- the **update step** $p(y_{t+1}|x_{t+1})$ can be viewed as the **likelihood**.

**Particle filters** use a discrete approximation to $p(x_t|y_{1:t})$ consisting of states or 'particles', $\{x_t^{(n)}\}^{N}_{n=1}$, and weights associated with those particles, $\{\pi_t^{(n)}\}^{N}_{n=1}$. In other words, a particle approximation is just a random histogram. Formally, the approximation to $p(x_t|y_{1:t})$, denoted as $p^N(x_t|y_{1:t})$, is given by
\begin{align}
p^N(x_t|y_{1:t})=\sum_{n=1}^N\pi_t^{(n)}\delta_{x_t^{(n)}}.
\end{align}
The particle approximation can be transformed into an equally weighted random sample from $p^N(x_t|y_{1:t})$ by sampling, with replacement. This procedure is called *resampling*.

There are **two advantages** of particle filtering: (1) they are flexible and adaptable, suitable for a variety of latent-state models, especially those with non-linearity and/or non-Gaussian; (2) particle filters are easy to program and computationally fast to run.

The central problem that particle filtering is trying to solve is how to do it recursively, i.e. given a sample from $p^N(x_t|y_{1:t})$, how to generate a random sample from the particle approximation to $p(x_{t+1}|y_{1:t+1})$ after receiving a new data point $y_{t+1}$. Three variants are introduced as follows.

- **Exact Particle Filtering**
  - Steps
    - Draw $z^{(n)}\sim Mult_N(\{w_t^{(n)}\}_{n=1}^N)$ for $n=1,\dots,N$ and set $x_t^{(n)}=x_t^{(z^{(n)})}$. Here $w_t^{(n)}=\frac{p(y_{t+1}|x_t^{(n)})}{\sum_{n=1}^Np(y_{t+1}|x_t^{(n)})}$.
    - Draw $x_{t+1}^{(n)}\sim p(x_{t+1}|x_t^{(n)},y_{t+1})$. Note that the weights after this step is simply $1/N$.
  - Advantages
    - It is exact - no importance sampling errors
  - Disadvantages
    - It is restrictive, requiring we are able to evaluate $p(y_{t+1}|x_t)$ and simulate from $p(x_{t+1}|x_t,y_{t+1})$.
- **Sequential Importance Resampling (SIR)**
  - Steps
    - Draw $x_{t+1}^{(n)}\sim p(x_{t+1}|x_t^{(n)})$ for $n=1,\dots,N$
    - Draw $z^{(n)}\sim Mult_N(\{w_t^{(n)}\}_{n=1}^N)$ for $n=1,\dots,N$ and set $x_t^{(n)}=x_t^{(z^{(n)})}$. Here $w_t^{(n)}=\frac{p(y_{t+1}|x_{t+1}^{(n)})}{\sum_{n=1}^Np(y_{t+1}|x_{t+1}^{(n)})}$.
  - Advantages
    - It is widely applicable - being able to evaluate $p(y_{t+1}|x_{t+1})$ and simulate from $p(x_{t+1}|x_t)$ should be easy for almost all latent-variable models.
  - Disadvantages
    - One problem is **sample impoverishment** or **weight degeneracy**, which occurs when a vast majority of the weight is placed on a single particle. When this occurs, the resampling step results in a single particle being sampled multiple times. Thus resampling does not fix the sample impoverishment/weight degeneracy problem, it just hides it.
    - The new state proposals are blind to new information. That is the new states $x_{t+1}$ are drawn without accounting for the next period’s observation, $y_{t+1}$. This implies that the simulated states may not be in important or high likelihood $p(y_{t+1}|x_{t+1})$ regions. This can be mitigated by drawing from some proposal distribution that takes into account $y_{t+1}$, which is what APF below is doing.
- **Auxiliary Particle Filtering (APF)**
  - Steps
    - Compute $w(x_t^{(n)})=\frac{q(y_{t+1}|x_t^{(n)})}{\sum_{n=1}^Nq(y_{t+1}|x_t^{(n)})}$.
    - Draw $z^{(n)}\sim Mult_N(\{w_t^{(n)}\}_{n=1}^N)$ for $n=1,\dots,N$ and set $x_t^{(n)}=x_t^{(z^{(n)})}$.
    - Draw $x_{t+1}^{(n)}\sim q(x_{t+1}|x_t^{(n)},y_{t+1})$.
    - Reweight: \pi_{t+1}^{(n)}\approx\frac{target}{proposal}=\frac{p(y_{t+1}|x_{t+1}^{(n)})p(x_{t+1}^{(n)}|x_t^{(n)})}}{q(y_{t+1}|x_{t+1}^{(n)})q(x_{t+1}^{(n)}|x_t^{(n)})}}
  - Advantages
    - Like exact sampling, the APF resamples first, which is important to insure that high likelihood states are propagated forward, while does not impose the same restrictive constraints as exact filtering.
    - APF is quite flexible, allowing for two importance densities.
  - Disadvantages
    - The performance of the APF is driven by the accuracy of the importance densities. If these are poor approximations, the APF may not perform much better than the SIR algorithm, and in some extreme cases, could even perform worse.

Again, there are **two sources of approximation errors** in particle filtering algorithms. 
- Approximating $p(x_t|y_{1:t})$ by $p^N(x_t|y_{1:t})$ generates the first source of error. This is inherent in all particle filtering algorithms, but can be mitigated by choosing $N$ large.
- In APF (see below), importance sampling generate an approximation to $p^N(x_t|y_{1:t})$, which is another layer of errors.

## References
- 'Particle Filtering' in < Handbook of Financial Time Series>, 2009.