First, some definitions. I am following [this chapter](https://nbviewer.org/github/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/08-Designing-Kalman-Filters.ipynb) of Kalman and Bayesian Filters in Python to give me some structure.

## Choose State Variables
I first need to define my x. 

$\textbf{x}$ consists of the PM2.5 values, wind x and wind y components in a given square. So, this would be 80x50x3 or 12000 values. woof.

I think I'll say the matrix is structured as follows:

$$\textbf{x} = \begin{pmatrix} P_{0 0} & x_{0 0} & y_{0 0} & P_{0 1} & x_{0 1} & y_{0 1} & \cdots & P_{ij} & x_{ij} & y_{ij} \end{pmatrix}^T$$

Hopefully this makes reading the covariance matrices easier, but maybe it won't. I don't know yet.

For the following math, I will be calculating 2D matrices of P, x, and y with 80 rows and 50 columns (as defined by the CMAQ data) and then smushing them together later, because that'll work a lot better for my brain. Also that way, I can change how $\textbf{x}$ is structured if this way is doo-doo.

## Design State Transition Function

Ok, first difficult problem, how to define $\textbf F$, where:
$${\textbf x}_k = \textbf F  {\textbf x}_{k-1}$$

I don't need an exact equation, but something as close as possible so that the optimization algorithm can have a decent starting point.

### For the PM2.5 values:

I want to use the 5x5 grid surrounding a square as the estimators.

For a given square like (3,3), the equation will be $$\sum_{i=1}^5 \sum_{j=1}^5 \sqrt{ab} * P_{ij}$$ 
where $a = 1 - \frac{|\text{ideal\_angle} - \text{actual\_angle}|}{\pi}$, $\text{ideal\_angle} = \tan^{-1} (\frac{-(3-j)}{(3-i)})$, $\text{actual\_angle} = \tan^{-1} (\frac{w_y}{w_x})$, $b=\frac1{(3-i)^2 + (3-j)^2}$.

In summary, I'm doing a weighted average of PM values from the surrounding 5x5 squares, where the weights are a geometric mean between a measure from 0 to 1 of whether the wind vector is pointing at the target square (1) or not (0), and an inverse squared distance, where 1 is the closest distance (directly adjacent) and it goes down to 0 from there. The purpose of the geometric mean is so that low values of wind are heavily penalized in the overall weight.

All together (where the next value of $P_{ij}$ is determined from the previous time step's $P_{ij}$ values):

$${P_{ij}}_{new} = \sum_{i=1}^5 \sum_{j=1}^5 \sqrt{\left(1 - \frac{|\tan^{-1} (\frac{-(3-j)}{(3-i)}) - \tan^{-1} (\frac{w_y}{w_x})|}{\pi}\right)\left(\frac1{(3-i)^2 + (3-j)^2}\right)} * P_{ij}$$ 

Woof. Oh and obviously the weights are going to be normalized by the sum of all of the weights for that square. Also this notation sucks but I'm not gonna be rigorous 

Also, for the squares that actually have sensors within them, I want to have those mostly trust the sensor values, with maybe a bit of wind influence (like 10\% maybe?).

So for when i and j equal the value, I'll throw out the above calculated weight and instead do 9 times the sum of the other weights.

MUST DOUBLE CHECK THE OUTPUT OF THE SUBTRACTION

Then, I'll construct $\textbf F$ for the PM2.5 values with the above calculations (combined with the below wind calculations).

### For the wind values:

I think a similar sort of inverse distance weighting would work, so something like (for the (3,3) square):

$$\sum_{i=1}^5 \sum_{j=1}^5 \frac1{(3-i)^2 + (3-j)^2} * \bar{w}$$

Where $\bar{w} = \begin{pmatrix} w_x & w_y \end{pmatrix}^T$ (the x and y components of the wind)

Similar to before where the weights are normalized by the sum of all of the weights, and when there's actually a wind sensor in the square it'll increase the weight on it by a lot.

I have to figure out how to express this as a product of matrices, so that'll be fun. I'm gonna avoid that until after dinner.

## Design the Process Noise Matrix
Now I have to make $\textbf Q$, which is the covariance matrix of uncertainty for the values in x, i.e. they describe the $\sigma$ of each $\mu$ for every value of $\textbf{x}$, as well as covariances between all variables. This is kind of a pain. I'll have to estimate covariance of both the PM values and the wind components. Thankfully the algorithm will correct my guesses over time, but I need a decent measure.

Ok, I obviously want the squares with actual sensors in them to have the lowest variances (not zero because then the algorithm trusts them too much), maybe like $0.5^2$ based on the range of the PM values

Then, the squares with estimates should be graded on how many of the 5x5 squares are from actual PM values, and the $\sigma$ estimate scales higher with less actual sensors, up to a point where if all 5x5 squares are estimated, then the covariance is extremely high, like $8^2$? 

Ok, so let's say $e(x,y) = I(\text{does index xy has a valid PM sensor in it?})$, so a 1 if it does have data, and a 0 if not. For the diagonals of the covariance matrix (which are just variance measurements):

$$ \sigma_{i,j}^2 = 
\begin{cases} 
0.5^2 & e(i,j) = 1 \\ 
(\frac{7.5}{24} * (24 - \sum_{k=i-2, k \neq i}^{i+2} \sum_{l=j-2, l \neq j}^{j+2} e(k,l)) + 0.5)^2& e(i,j) = 0 
\end{cases}$$

The $\frac{7.5}{24}$ and $0.5$ changes the range of the inner function from $[0,24]$ to $[0.5, 8]$, which seems reasonable to me for a $\sigma$ value for this dataset.

Then, for the off-diagonals, I need to assume some covariance. I don't really know what they'll be, but I can make some guesses and the algorithm will correct those guesses over time.

Since $cov(x,y) \leq \sqrt{var(x)* var(y)}$ by Cauchy-Scwhartz, the covariance between two squares must follow that inequality. How about I say that the covariance is maximum ( when they are next to each other, and tapers off linearly until 0? What would be a valid distance where the pollution would be unrelated?

Ok, I'll base it on normal wind speed, which is less than 20 mph, or 32 km/hr. This means that pollution could potentially travel 768 kilometers in 24 hours... which is a lot. The squares are 36km, so that would span 21 squares. Eek. I'm gonna just say 10 squares and call it there.

So, $\text{cov}(P_{ij}, P_{kl}) = w_{ijkl} * \sqrt{var(P_{ij})* var(P_{kl})}$, where 
$$w_{ijkl} = \begin{cases} 
\frac{9 - (|i-k| + |k-l|)}{9} & |i-k| + |k-l| <= 9 \\ 
0 & |i-k| + |k-l| \geq 10 = 0 
\end{cases}$$

So, $w_{ijkl}$ is 1 when directly adjacent squares, then decreases by $\frac19$ for each square away.

I think I'll do similar for the wind as well, probably with different estimated variances.

# Design the Measurement Function
Now I need to design $\textbf H$, the measurement function that converts $\textbf x$ into the shape of $\textbf z$, the measurements. I think this design is critical, because the measurement vector $\textbf z$ will likely be changing shape as stations gain data and lose data. The structure is $\textbf z = \textbf H \textbf x$, so H will have (# of measurements) rows and (80x50x3) columns.

I think it's pretty simple, just select the correct x value for each sensor value.

Given square (0,0) and (0,2) have sensors, and (0,1) does not, the conversion from $\textbf x$ to measurement would just have to skip (0,1), since it wouldn't have a sensor value.
So:
$$\begin{align*}
zP_{00} &= 1 * P_{00} + 0 * x_{00} + 0 * y_{00} + 0 * P_{01} + \cdots \\
zx_{00} &= 0 * P_{00} + 1 * x_{00} + 0 * y_{00} + 0 * P_{01} + \cdots \\
zy_{00} &= 0 * P_{00} + 0 * x_{00} + 1 * y_{00} + 0 * P_{01} + \cdots \\
zP_{02} &= 0 * P_{00} + 0 * x_{00} + 0 * y_{00} + \cdots + 1 * P_{02}
\end{align*}$$

$$\textbf H = \begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &\cdots \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &\cdots \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 &\cdots \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 &\cdots \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 &\cdots \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 &\cdots \\
\cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots &\cdots
\end{bmatrix}$$

Then, $\textbf H$ should be able to be changed as the number of measurements changes. I hope. I *think* this will work with the equations.

# Design the Measurement Noise Matrix
I now make the covariance matrix $\textbf R$ that describes the variance and covariances of the measurements in $\textbf z$. I will assume that the sensors have independent noise. I have no idea what the actual variance of these sensors are, so maybe I'll just steal from above.

Let's say PM sensors have a variance of $0.5^2$ and wind sensors have variance of $2^2$. This is totally arbitrary and I'll probably tune this later. So, R will be a diag matrix with $0.5^2$, $2^2$, and $2^2$ repeating for each sensor value in $\textbf z$.

$$\textbf R = \begin{bmatrix}
\sigma_{P_{00}} & 0 & 0 & \cdots\\
0 & \sigma_{x_{00}} & 0 & \cdots\\
0 & 0 & \sigma_{y_{00}} & \cdots\\
\cdots & \cdots & \cdots & \cdots
\end{bmatrix}$$

# Set Initial Conditions
I must set the initial $\textbf x$ and $\textbf P$.

I think I'll just use kriging to interpolate the initial sensor data geospatially. Then, I'll do similar covariance calculations as $\textbf Q$, stated above, based on whether surrounding squares have sensor values or not.

---
Ok, so that's all of the math defined. Now, I need to code it. Wahoo.