# Poisson point process

During the experiment, the subject explores a trajectory $\mathbf x(t) = \{x_1(t), x_2(t)\}$ in 2D space. Concurrently, we obtain and continuous-time recordings from a single grid cell, represented as the time-series $y(t)$

\begin{equation}\begin{aligned}
y(t) = \begin{cases} \delta(t) & \text{if the cell spiked at time $t$} \\ 0 & \text{ otherwise} \end{cases}
\end{aligned}\end{equation}

We model these spiking observation $y(t)$ as a point process, with a time-varying rate in units of pikes per sample

\begin{equation}\begin{aligned}
\Pr\left(\textstyle\int_t^{t+\Delta t} y(t) \,dt {=} k\right) &= \operatorname{Poisson}\left(\textstyle\int_t^{t+\Delta} \lambda(t)\, dt\right)
\end{aligned}\end{equation}

# Discretization in time

If the firing rate varies smoothly relative to the time-step $\Delta t$, we may approximate

\begin{equation}\begin{aligned}
\textstyle\int_t^{t+\Delta} \lambda(t)\, dt &\approx \Delta t\, \lambda(t)
\end{aligned}\end{equation}

We work in discrete time-steps, and will denote $\lambda(t)$ as $\lambda_t$. We choose units such that $\Delta t = 1$. Define $\lambda_t$ and $y_t$ as:

\begin{equation}\begin{aligned}
\lambda_t &= \textstyle\int_t^{t+\Delta t} \lambda(t) \, dt
\\
y_t &= \textstyle\int_t^{t+\Delta t} y(t) \, dt
\end{aligned}\end{equation}

The probability of $y_t$ spiking events at time $t$ is Poisson distributed:

\begin{equation}\begin{aligned}
\Pr(y_t) &= \frac 1 {y_t!} \lambda_t^{y_t} e^{-\lambda_t}
\end{aligned}\end{equation}

Over the course of the experiment, we collect many data-points. We assume that spiking events are conditionalled independent, given $\lambda_t$. For a given firing-rate map $z(\mathbf x)$, the probability of observing all spiking events $\mathbf y = \{y_1,..,y_T\}$ during the recording is

\begin{equation}\begin{aligned}
\Pr(\mathbf y | z(\mathbf x) ) &=
\prod_{t=1..T} \Pr(y_t | z(\mathbf x) ) = 
\prod_{t=1..T} \frac 1 {y_t!} \lambda_t^{y_t} e^{-\lambda_t}
\end{aligned}\end{equation}

This is simpler to express and calculate in log-probability 

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | z(\mathbf x) ) 
&=
\textstyle\sum_{t=1..T}\left\{ -\ln(y_t!) + y_t \ln(\lambda_t) -\lambda_t\right\}
\\&=
\textstyle\sum_{t=1..T}\left\{y_t \ln(\lambda_t) -\lambda_t\right\} + \text{constant}
\end{aligned}\end{equation}


We assume that the variations in firing rate can be explained by spatial location. The current firing-rate can be predicted as some function $f$ of the current 2D location $\mathbf x(t)\in\mathbb R^2$:

\begin{equation}\begin{aligned}
\lambda(t) = f(\mathbf x(t))
\end{aligned}\end{equation}

We will work with a log-Gaussian model, so $f(\cdot)$ takes the form of a log-firing-rate field $z(\mathbf x(t))$, plus a model for backgorund firing-rate variations $\mu_z(\mathbf x(t))$, which we will specify later:

\begin{equation}\begin{aligned}
\lambda(t) = \exp\left[
z(\mathbf x(t)) + \mu_z(\mathbf x(t))
\right]
\end{aligned}\end{equation}

Again, for small $\Delta t$ we may consider the discrete-time approximation

\begin{equation}\begin{aligned}
\lambda_t \approx \exp\left[z(\mathbf x_t) + \mu_z(\mathbf x_t) \right]
\end{aligned}\end{equation}

Or, in terms of log-rate: 

\begin{equation}\begin{aligned}
\ln \lambda_t \approx z(\mathbf x_t) + \mu_z(\mathbf x_t)
\end{aligned}\end{equation}

Substituting this in to the Poisson log-probability:

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | z(\mathbf x) ) 
&=
\sum_{t=1..T}\left\{ y_t \left[z(\mathbf x_t) + \mu_z(\mathbf x_t) \right] - \exp\left[z(\mathbf x_t) + \mu_z(\mathbf x_t) \right]
\right\} + \text{constant}
\end{aligned}\end{equation}


# Gaussian-process rate map

Now, we place a Gaussian-process prior on the log firing-rate function $z(\mathbf x)$. 

\begin{equation}\begin{aligned}
z(\mathbf x) &\sim\mathcal{GP}\left[
0, \Sigma_z(\mathbf x_1, \mathbf x_2)
\right]
\end{aligned}\end{equation}

where $\Sigma$ is a two-point correlation function that describes the covairiance between two locations $\mathbf x_1$ and $\mathbf x_2$.

Note that we have moved the mean $\mu_z(\mathbf x)$ out of this equation, this simplifies some things later.

Our prior will impose a convolutional structure. So, $\Sigma(\mathbf x_1,\mathbf x_2)$ depends only on the separation $\Delta\mathbf x = \mathbf x_2 - \mathbf x_1$. 
We further impose that this prior kernel is radially symmatrix, so 


\begin{equation}\begin{aligned}
\Sigma(\mathbf x_1,\mathbf x_2) = \text K( | \mathbf x_2 - \mathbf x_1 | )
\end{aligned}\end{equation}

The zero-lag autocorrelation, $\Sigma(\mathbf x,\mathbf x) = \text K(0)$, reflects our prior marginal variance. If our Gaussian process is well-calibrated, this should be very close to the variance of the log-rate map

\begin{equation}\begin{aligned}
\text K(0) \approx \operatorname{var}[ \mu(\mathbf x) ]
\end{aligned}\end{equation}


# Units 

It is commonly said that log-quantities are unit-less. This is false. Rather, they are constrained by what units various functions of the log-units must have. We know that $\lambda(\mathbf x) = \exp\left[ z(\mathbf x) + \mu_z(\mathbf x) \right]$ has units of spikes per sample. We can therefore conclude that: 

- $\exp\left[ z(\mathbf x) \right]$ has units of spikes / background-spike.
- $\exp\left[ \mu_z(\mathbf x) \right]$ has units of background-spikes / sample

Both $z(\mathbf x)$ and $\mu_z(\mathbf x)$ have log-units.

- $z(\mathbf x)$ has units of log(spikes/backgrond-spike)
- $\mu_z(\mathbf x)$ has units of log(background-spikes/sample)

Define a new logarithmic unit "dits" as shorthand for log(spikes/backgrond-spike). This is a natural-logarithmic unit that measures deviation of our log-rate from its prior mean. The zero-lag variance therefore has units of dits². Which is to say that $\exp\left(\sqrt{K(0)} + \mu_z(\mathbf x)\right)=\tfrac12\exp[K(0) + 2 \mu_z(\mathbf x)]$ has units of spikes/sample.

# Basis projections, general case

We would like to estimate $z(\mathbf x)$. Since $z(\mathbf x)$ is an arbitrary function, it may be arbitrarily complicated. For computation, we work with a finite-dimensional projection of $z(\mathbf x)$. We do this by projecting $z(\mathbf x)$ on to a finite set of $M$ basis functions $\mathbf B(\mathbf x) = \{ b_1(\mathbf x),..,b_M(\mathbf x)\}$:

\begin{equation}\begin{aligned}
w_i = \textstyle \int_{\mathbb R^2} b_i(\mathbf x) z(\mathbf x) \, d\mathbf x
\end{aligned}\end{equation}

If we restrict the function space $z(\mathbf x)$ to a nice reproducing kernel Hilbert space $\mathcal H$, we can treat functions $z(\mathbf x)$ much like finite-dimensional real-valued vectors $\mathbf z = \{z_{\mathbf x}\}$, and denote integrals like the above as if they were matrix multiplication: 

\begin{equation}\begin{aligned}
w_i &= \mathbf b_i^\top \mathbf z
\end{aligned}\end{equation}

We denote the projection of a function $\mathbf z$ onto a finite-dimensional vector $\mathbf w$ as: 

\begin{equation}\begin{aligned}
\mathbf w &= \mathbf B^\top \mathbf z
\end{aligned}\end{equation}

 If $\mathbf w = \mathbf B^\top \mathbf z$, then $\mathbf w$ is a $M$-dimensional multivariate Gaussian with distribution

\begin{equation}\begin{aligned}
\mathbf w &\sim\mathcal{N}\left[
\boldsymbol \mu_w, \boldsymbol \Sigma_w
\right]
\\
\boldsymbol \mu_w &= \mathbf B^\top \boldsymbol \mu_z
\\
\boldsymbol \Sigma_w &= \mathbf B^\top \boldsymbol \Sigma_z \mathbf B
\end{aligned}\end{equation}

Estimating $z(\mathbf x)$ can now be re-cast as estimating the vector $\mathbf w$. Given $\mathbf w$, we can approximately convert back to $z(\mathbf x)$ as:

\begin{equation}\begin{aligned}
\tilde {\mathbf z} &= {\mathbf B^+}^{\top} \mathbf w
\end{aligned}\end{equation}

Since the vectors in $\mathbf B$ are orthonormal, 

\begin{equation}\begin{aligned}
{\mathbf B^+}^{\top} = \mathbf B
\end{aligned}\end{equation}

and

\begin{equation}\begin{aligned}
\tilde {\mathbf z} &= \mathbf B \mathbf w
\end{aligned}\end{equation}

Function evaluation in $\mathcal H$ is just the dot product

\begin{equation}\begin{aligned}
z(\mathbf x) = \delta_{\mathbf x}^\top \mathbf z
\end{aligned}\end{equation}

If the basis functions $b_i(\mathbf x)$ are orthogonal and unit norm then $\mathbf B^+ = \mathbf B^\top$:

\begin{equation}\begin{aligned}
\tilde z(\mathbf x) &= \sum_{i=1..M} w_i b_i(\mathbf x) = \delta_{\mathbf x}^\top \mathbf B \mathbf w
\end{aligned}\end{equation}

To directly approximate $z(\mathbf x)$ given $\mathbf x_t$:

\begin{equation}\begin{aligned}
\tilde{\mathbf x}_t = \mathbf B^\top \delta_{\mathbf x_t}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\tilde{\mathbf z}(\mathbf x_t) &= \tilde{\mathbf x}_t^\top \mathbf w
\end{aligned}\end{equation}

Substituting in $\tilde {\mathbf z}$ in our equation for log rate: 

\begin{equation}\begin{aligned}
\ln \lambda_t &\approx \tilde{\mathbf x}_t^\top (\mathbf w + \mathbf w_0)
\end{aligned}\end{equation}

Substituting this in to the Poisson log-probability:

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | z(\mathbf x) ) 
&=
\textstyle \sum_{t=1..T} 
y_t \left[\tilde{\mathbf x}_t^\top (\mathbf w + \mathbf w_0) \right] - \exp\left[\tilde{\mathbf x}_t^\top (\mathbf w + \mathbf w_0)\right]
+\text{constant}
\end{aligned}\end{equation}


# Piecewise constant grid basis projection 

In this work, we will consider two types of basis projections. Projection onto a square grid of local elements ("spatial bins"), and projection onto a finite subset of frequency components in Fourier space. We first consider projection onto a finite grid of square elements with spacing $\Delta l$. These basis elements tile 2D space, and are indexed by $i$ and $j$:

\begin{equation}\begin{aligned}
b_{ij}(x_1,x_2) = 
\frac 1 { {\Delta l} }
\begin{cases} 
1 & \text{if }\, i{\le}\frac{x_1}{\Delta l}{<}{i{+}1}\text{ and }j{\le}\frac{x_2}{\Delta l}{<}{j{+}1}
\\
0 & \text{otherwise.}
\end{cases}
\end{aligned}\end{equation}

It is easy to verify that each $\|b_{ij}\|^2$ is 1, and that different $b_{ij}$, $b_{kl}$ are orthogonal. This therefore correponds to an orthonormal basis for a finite-dimensional subspace of $\mathcal H$. Projections onto components $w_{ij}$ in the subspace are given by integrating $z(\mathbf x)$ with respect to $b_{ij}$:

\begin{equation}\begin{aligned}
w_{ij} &= \textstyle \int_{\mathbb R^2} b_{ij}(\mathbf x) z(\mathbf x) \, d\mathbf x
\end{aligned}\end{equation}

The above projections simply average average the value of $\boldsymbol\mu_z$ and $\boldsymbol\Sigma_z$ within the basis function, times $\Delta l$. If the grid discretization is sufficiently fine, these averages can be replaced by point estimates, and: 

\begin{equation}\begin{aligned}
\mu_{w,ij} &= \Delta l \cdot \mu_z(\mathbf x_{ij})
\\
\Sigma_{w,ij,kl} &= \Delta l \cdot \Sigma_z(\mathbf x_{ij},\mathbf x_{kl})
\end{aligned}\end{equation}

where $\mathbf x_{ij}$ is the centroid of each basis element.

Consider again the Poisson log-probability:

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | z(\mathbf x) ) 
&=
\textstyle \sum_{t=1..T} 
y_t \left[\tilde{\mathbf x}_t^\top (\mathbf w + \mathbf w_0) \right] - \exp\left[\tilde{\mathbf x}_t^\top (\mathbf w + \mathbf w_0)\right]
+\text{constant}
\\
&= 
\mathbf y^\top \left[\tilde{\mathbf X}^\top (\mathbf w + \mathbf w_0) \right] - \mathbf 1^\top \exp\left[\tilde{\mathbf X}^\top (\mathbf w + \mathbf w_0)\right]
+\text{constant}
\end{aligned}\end{equation}


where 

$$
\tilde{\mathbf X} \in \mathbb R^{M \times T}
$$

Is there a nicer way to denote this as a summation over space, rather than time? We can also project the observations $\mathbf y$ onto our basis set. 

$$
\tilde{y}_t = y_t \circ \tilde{\mathbf x}_t
$$

\begin{equation}\begin{aligned}
\mathbf y^\top \left[\tilde{\mathbf X}^\top (\mathbf w + \mathbf w_0) \right]
&=
\textstyle\sum_{t=1..T} y_t \left( \textstyle\int_{\mathbb R^2} \mathbf B(\mathbf x)^\top \delta_{\mathbf x_t} \, d\mathbf x\right)^\top (\mathbf w + \mathbf w_0)
\\
&=
\textstyle\sum_{t=1..T} y_t \left( \textstyle\int_{\mathbb R^2} \delta_{\mathbf x_t}^\top \mathbf B(\mathbf x) \, d\mathbf x\right) (\mathbf w + \mathbf w_0)
\\
&=
\textstyle\int_{\mathbb R^2} \textstyle\sum_{t=1..T} y_t \delta_{\mathbf x_t}^\top \mathbf B(\mathbf x) (\mathbf w + \mathbf w_0)\, d\mathbf x
\\
&=
\textstyle\int_{\mathbb R^2} \textstyle\sum_{t=1..T} y_t \delta_{\mathbf x_t}^\top\, d\mathbf x\; \mathbf B(\mathbf x) (\mathbf w + \mathbf w_0)
\\
\end{aligned}\end{equation}

# Inference

We are now ready to begin inference, with some case. We're going to use a variational approach, and estimate the posterior for $z(\mathbf x)$ also as a Gaussian process. This is a log-Gaussian Cox-process regression. 

\begin{equation}\begin{aligned}
\Pr(\mathbf z | \mathbf y) &\approx Q(\mathbf z) \sim \mathcal {GP}\left[ \hat\mu_q(\mathbf x), \hat\Sigma_q(\mathbf x_1,\mathbf x_2) \right]
\end{aligned}\end{equation}

We do this by minimizing the KL divergence 

\begin{equation}\begin{aligned}
D_{\text{KL}}[ Q \| \Pr(\mathbf z|\mathbf y) ] = \int Q(\mathbf z) \ln \frac
{Q(\mathbf z)}
{\Pr(\mathbf z|\mathbf y)} \mathcal D\mathbf z
= 
\left<
\ln \frac
{Q(\mathbf z)}
{\Pr(\mathbf z|\mathbf y)}
\right>_{Q}
\end{aligned}\end{equation}

This integral is somewhat confusing, since we need to integrate over all possible *functions* $z(\mathbf x)$. We will first expand and simplify it, before carefully relating it to an integral over a finite basis projection. 

Recall Bayes theorem
\begin{equation}\begin{aligned}
\Pr(\mathbf z|\mathbf y) &= 
\Pr(\mathbf y|\mathbf z)
\frac
{\Pr(\mathbf z)}
{\Pr(\mathbf y)}
\end{aligned}\end{equation}

Substitute 

\begin{equation}\begin{aligned}
\left<
\ln \frac
{Q(\mathbf z)}
{\Pr(\mathbf z|\mathbf y)}
\right>_{Q}
&=
\left<
\ln \frac
{Q(\mathbf z)}
{\Pr(\mathbf z)}
\right>_{Q}
-\left< \ln \Pr(\mathbf y|\mathbf z)\right>_{Q}
+\left< \ln \Pr(\mathbf y)\right>_{Q}
\\
&=
D_{\text{KL}}[ Q \| \Pr(\mathbf z)]
-\left< \ln \Pr(\mathbf y|\mathbf z)\right>_{Q}
+\left< \ln \Pr(\mathbf y)\right>_{Q}
\end{aligned}\end{equation}



Assume a uniform prior on $\Pr(\mathbf y)$ so that $\left< \ln \Pr(\mathbf y)\right>_{Q}=$ constant

\begin{equation}\begin{aligned}
D_{\text{KL}}( Q \| P ) &= 
D_{\text{KL}}[ Q \| \Pr(\mathbf z)]
-\left< \ln \Pr(\mathbf y|\mathbf z)\right>_{Q}
+ \text{const.}
\end{aligned}\end{equation}

In other words, variational Bayes maximizing the likelihood of the data given $Q(\mathbf z)$, while minimizing the KL divergence from the prior to the approximating posterior. 

Note that the KL divergence for multivariate Gaussians diverges in the case of an infinite-dimensional Gaussian process:

\begin{equation}\begin{aligned}
D_{\text{KL}}( Q \| \Pr(\mathbf z) ) &= 
\frac 1 2 \left\{
\operatorname{tr}\left( \Sigma_z^{-1} \Sigma_q \right)
+
\mu_q^\top  \Sigma_z^{-1} \mu_q - k - \log\left(\Sigma_z^{-1} \Sigma_q\right)
\right\}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | \mathbf z ) 
&=
\textstyle\sum_{t=1..T}\left\{y_t \ln(\lambda_t) -\lambda_t\right\} + \text{constant}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | \mathbf z ) 
&=
\textstyle\sum_{i=1..M}\left\{K_i (z_i + \mu_i) - N_i e^{z_i + \mu_i} \right\} + \text{constant}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\left< \ln \Pr(\mathbf y|\mathbf z)\right>_{Q}
&=
\left< 
\textstyle\sum_{i=1..M}\left\{K_i (z_i + \mu_i) - N_i e^{z_i + \mu_i} \right\} + \text{constant}
\right>_{Q}
\\&=
\textstyle\sum_{i=1..M}K_i 
\left< z_i \right>_{Q} + \mu_i - N_i 
\left< e^{z_i + \mu_i}\right>_{Q} + \text{constant}
\\&=
\textstyle\sum_{i=1..M}K_i 
\mu_{q,i} + \mu_i - N_i 
\left< e^{z_i + \mu_i}\right>_{Q} + \text{constant}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\ln\Pr(\mathbf y | \mathbf z ) 
&=
\textstyle\sum_{i=1..M}\left\{K_i (z_i + \mu_i) - N_i e^{z_i + \mu_i} \right\} + \text{constant}
\end{aligned}\end{equation}

How do we do this? 

- If we project first, we break the proper connection with the continuous Gaussian process
- If we don't project, the integrals don't exist
- If our prior is in the projected space, it restricts the posterior to a finite subspace and everything works
- Appropriate normalization appears when we correctly normalize our basis functions

# All the heavy lifting is done with the basis projection

We never touch DKL until after correctly doing this. 


\begin{equation}\begin{aligned}
\end{aligned}\end{equation}


\begin{equation}\begin{aligned}
\end{aligned}\end{equation}


\begin{equation}\begin{aligned}
\end{aligned}\end{equation}


\begin{equation}\begin{aligned}
\end{aligned}\end{equation}


\begin{equation}\begin{aligned}
\end{aligned}\end{equation}