# Stepping stone exploration

**authors:** Joseph Marcus

Here I explore the stepping stone model with possible approximations building off the classic results of Bodmer and Cavalli-Sforza 1967. The notes here are a bit scattered and I will clean them up later.

Consider a single bi-allelic SNP with haploid individuals carrying either the $A$ or $a$ allele dispersed throughout a habitat. The habitat is discretized and defined on a graph $\mathcal{G}$ over geographic space with $d$ nodes and a migration matrix $\mathbf{M}$ which specifies the edge weights. Note that $\mathbf{M}$ can be interpreted as a "backwards" migration matrix where $m_{ij} >= 0.0$ and $\sum_{j=1}^d m_{ij} = 1$. Furthermore, $m_{ij}$ can be interpreted as the probability that an individual in node $i$ has parents from node $j$. Let $p_{i,t}$ be the allele frequency of the $A$ allele at node $i$ and time $t$, here time is discrete as well. Each generation we can describe the evolution of the allele frequency in two steps, first a deterministic migration event where individuals are swapped amongst only neighboring nodes and a drift event which is a random fluctuation in allele frequency in each node proportional to its population size.

$$
p_{i,t} = \sum_{j=1}^d m_{ij} p_{i,t-1}
$$

Or in matrix notation 

$$
\mathbf{p}_t = \mathbf{M}\mathbf{p}_{t-1}
$$

For now we don't assume any distributional form for $\mathbf{p}_{t}$ but do define its conditional moments

$$
\begin{aligned}
E\big(\mathbf{p}_t | \mathbf{p}_{t-1}\big) &= \mathbf{M}\mathbf{p}_{t-1} \\
Var\big(\mathbf{p}_t | \mathbf{p}_{t-1}\big) &= diag\Big(\frac{1}{\mathbf{N}} \odot \mathbf{M}\mathbf{p}_{t-1} \odot \big(\mathbf{1} - \mathbf{M}\mathbf{p}_{t-1}\big) \Big)
\end{aligned}
$$

Here $\mathbf{N}$ is a $d$ vector of population sizes within each node and $\odot$ refers to element-wise multiplication. Note that this exactly corresponds to the process we described previously. There is first a deterministic migration event and variance induced by random sampling of gametes due to genetic drift. Here we make the simplifying assumption where we focus only on common SNPs such that the binomial sampling variance has a small range and approximate this conditional variance as 

$$
Var\big(\mathbf{p}_t | \mathbf{p}_{t-1}\big) \approx \sigma^2 diag\Big(\frac{1}{\mathbf{N}}\Big)
$$

We can then find the marginal moments of $\mathbf{p}_t$

$$
\begin{aligned}
E(\mathbf{p}_{t}) &= E\Big(E\big(\mathbf{p}_{t} | \mathbf{p}_{t-1}\big)\Big) \\
& = E\big(\mathbf{M}\mathbf{p}_{t-1}\big) \\
&= \mathbf{M}E(\mathbf{p}_{t-1}) \\
&= \dots \\
&= \mathbf{M}^t\mathbf{p}_0
\end{aligned}
$$

$$
\begin{aligned}
Var(\mathbf{p}_t) &= E\Big(Var\big(\mathbf{p}_t | \mathbf{p}_{t-1}\big)\Big) +  Var\Big(E\big(\mathbf{p}_{t} | \mathbf{p}_{t-1} \big)\Big) \\
&= E\Bigg(\sigma^2 diag\Big(\frac{1}{\mathbf{N}}\Big)\Bigg) + Var\big(\mathbf{M}\mathbf{p}_{t-1}\big) \\
&= \sigma^2 diag\Big(\frac{1}{\mathbf{N}}\Big) + \mathbf{M}Var(\mathbf{p}_{t-1})\mathbf{M}^T
\end{aligned}
$$

Letting $\mathbf{Q} = \sigma^2 diag\Big(\frac{1}{\mathbf{N}}\Big)$

$$
\begin{aligned}
\dots \\
&= \mathbf{Q} + \mathbf{M}Var(\mathbf{p}_{t-1})\mathbf{M}^T \\
&= \mathbf{Q} + \mathbf{M}\Big(\mathbf{Q} + \mathbf{M}Var(\mathbf{p}_{t-2})\mathbf{M}^T\Big)\mathbf{M}^T \\
&= \mathbf{Q} + \mathbf{M}\mathbf{Q}\mathbf{M}^T + \mathbf{M}^2 Var(\mathbf{p}_{t-2})(\mathbf{M}^2)^T \\
&= \mathbf{Q} + \mathbf{M}\mathbf{Q}\mathbf{M}^T + \mathbf{M}^2\Big(\mathbf{Q} + \mathbf{M}Var(\mathbf{p}_{t-3})\mathbf{M}^T\Big)(\mathbf{M}^2)^T \\
&= \dots \\
&= \mathbf{Q} + \mathbf{M}\mathbf{Q}\mathbf{M}^T  + \mathbf{M}^2\mathbf{Q}(\mathbf{M}^2)^T + \mathbf{M}^3\mathbf{Q}(\mathbf{M}^3)^T + \dots + \mathbf{M}^t\mathbf{Q}\mathbf{M}^t)^T \\
&= \sum_{k=0}^t \mathbf{M}^k\mathbf{Q}(\mathbf{M}^k)^T
\end{aligned}
$$

Lets even simplify things more! Letting all population sizes be the same with size $N$ across the nodes and migration rates be symmetric  

$$
Var(\mathbf{p}_t) = \frac{\sigma^2}{N}\sum_{k=0}^t \mathbf{M}^k\mathbf{I}(\mathbf{M}^k)^T = \frac{\sigma^2}{N}\sum_{k=0}^t (\mathbf{M}^2)^k
$$

Lets now let the process evolve for infinite time and see what the covariance of the limiting distribution looks like. Lettings $\pi$ be the limiting distribution of the allele frequency

$$
Var(\pi) = \frac{\sigma^2}{N}\sum_{k=0}^{\infty} (\mathbf{M}^2)^k
$$

We can recognize the sum as being a matrix analog of a geometric series and as such it converges to ...

$$
\dots \\
= \frac{\sigma^2}{N}(\mathbf{I} - \mathbf{M}^2)^{-1}
$$

This case is pretty cool! Let the graph laplacian be $\mathbf{L} = \mathbf{D} - \mathbf{M}$ where $d_{ii} = \sum_{j=1}^d m_{ij} = 1$. Thus $\mathbf{L} = \mathbf{I} - \mathbf{M}$. So here we can think of the identity as the degree matrix and perhaps we can connect in some way the above covariance to the Hanks 2017 paper. The precision matrix makes some sense if interpreting in terms of conditional probabilities i.e. nodes not connecting in the graph with have 0 entries.

Lets try to attack the more general case of the limiting covariance with varying population sizes! 

$$
Var(\pi) = \sum_{k=0}^{\infty} \mathbf{M}^k\mathbf{Q}(\mathbf{M}^k)^T
$$

*... need to figure out if it converges. Here is an attempt ...*

First, let $\mathbf{\Gamma} = \mathbf{M}^k\mathbf{Q}(\mathbf{M}^k)^T$ and $\mathbf{A} = \mathbf{M}^k$ so that $\mathbf{\Gamma} = \mathbf{A}\mathbf{Q}\mathbf{A}^T$ 

$$
\gamma_{ij} = \sum_{l=1}^d q_l a_{il} a_{lj}
$$

We can then find the matrix power by eigen-value decomposition $\mathbf{A} = \mathbf{M}^k = \mathbf{U}\mathbf{\Lambda}^k\mathbf{U}^T$. Note that $\mathbf{M}$ has eigenvalues there less than 1 which is a key property for the convergence of the matrix analog of a geometric series.

$$
\begin{aligned}
a_{il} &= \sum_{m=1}^d \lambda^k_m u_{im} u_{ml} \\
a_{lj} &= \sum_{n=1}^d \lambda^k_n u_{ln} u_{nj}
\end{aligned}
$$

Substituting these back in to $\gamma_{ij}$ we get 

$$
\gamma_{ij} = \sum_{l=1}^d q_l \sum_{m=1}^d \lambda^k_m u_{im} u_{ml}\sum_{n=1}^d \lambda^k_n u_{ln} u_{nj}
$$

Recall we'd like to show this series converges!

$$
\begin{aligned}
\sum_{k=0}^{\infty} \gamma_{ij} &= \sum_{k=0}^{\infty} \sum_{l=1}^d q_l \sum_{m=1}^d \lambda^k_m u_{im} u_{ml}\sum_{n=1}^d \lambda^k_n u_{ln} u_{nj} \\
&= \sum_{k=0}^{\infty} \sum_{l=1}^d \sum_{m=1}^d \sum_{n=1}^d q_l \lambda^k_m u_{im} u_{ml} \lambda^k_n u_{ln} u_{nj} \\
&= \sum_{l=1}^d \sum_{m=1}^d \sum_{n=1}^d q_l u_{im} u_{ml} u_{ln} u_{nj} \sum_{k=0}^{\infty} \lambda^k_n \lambda^k_m \\
&= \sum_{l=1}^d \sum_{m=1}^d \sum_{n=1}^d q_l u_{im} u_{ml} u_{ln} u_{nj} \sum_{k=0}^{\infty} (\lambda_n \lambda_m)^k \\
&= \sum_{l=1}^d \sum_{m=1}^d \sum_{n=1}^d q_l u_{im} u_{ml} u_{ln} u_{nj} \frac{1}{1 - \lambda_n \lambda_m} \\
&= \sum_{l=1}^d q_l \sum_{m=1}^d \sum_{n=1}^d u_{im} u_{ml} u_{ln} u_{nj} \frac{1}{1 - \lambda_n \lambda_m}
\end{aligned}
$$

It seems like the series converges! So perhaps we can compute this covariance matrix but it would require diagonalizing $\mathbf{M}$ which is not ideal. 

**TODO: reorganize this into matrix form**