In some areas of the brain, neural populations exhibit so-called low-dimensional manifold codes. For example, some neural populations encode a circular manifold which represents which direction an animal is facing in the environment (e.g. Rubin et al. 2019). The high-dimensional vector of population firing rates $\mathbf x\in\mathbb R^N$ is constrained to a lower-dimensional euclidean manifold $\mathcal M$. Population activity therefore encodes a latent state $\theta \in \mathcal M$. The local Euclidean dimension $M$ of $\theta$ is typically much smaller than the number of neurons in the population, $M\ll N$.

### Population coding in the deterministic case

We first consider the case that neural responses are a deterministic function of $\theta$. In this case, the vector of neural population activity is a fixed function $\mathbf x(\theta)$. Individual neurons $x_i$ have tuning curves $x_i(\theta)$. Each tuning curves is an element in the space $\mathcal H$ of functions from the low-dimensional manifold to (scalar) firing rates $f:\mathcal M\to\mathbb R^+$. 

We'll assume that $\mathcal H$ is a reproducing kernel Hilbert space of twice differentiable functions. The set of all tuning curves $\mathbf x(\theta)$ forms a frame for a $K{\le}N$-dimensional subspace of $\mathcal H$, which we will call $\mathcal F$. Neural population codes are typically redundant, so we expect the effective dimensionality $K$ spanned by $\mathbf x$ to be less than the number of neurons, $N$. 

The tuning curves $\mathbf x(\theta)$ construct a rich, nonlinear representation for functions on $\mathcal M$. It is therefore easy to decode various functions $f(\theta)$ using a linear projection of the population activity. For example, if $f(\theta) = \theta$ is in $\mathcal F$, then there exists a set of decoding weights $\mathbf w$ such that 

\begin{equation}\begin{aligned}
\theta = \mathbf w^\top \mathbf x(\theta)
\end{aligned}\end{equation}

If a code is redundant ($N{<}K$), there will be multiple $\mathbf w$ that can decode $\theta$ equally well. Thus, linear decoding of sensory, motor, and cognitive variables is often possible. 

One way to understand why this works is to note that $\theta$ is locally low-dimensional. Assume the local Euclidean dimension $M$ is less than the dimension of population activity $K$, which in turn is less than the population size $N$, i.e. $M\ll N$. Near any particular $\theta_0$, neural activity varies in an $M$-dimensional subspace, and nearby $\theta = \theta_0 + \delta$ can be decoded linearly. One can build nonlinear functions from this locally linear picture by tiling $\mathcal M$ with a family of locally-Euclidean charts. Each chart represents a separate part of the manifold. Richer, nonlinear functions can then be constructed by stiching together several locally linear ones. Because the tuning curves $\mathbf x(\theta)$ are nonlinear, they provide a family of locally-linear representations for different regions of $\theta$. Speaking roughly, the number of independent charts available is $K/M$.

One challenge in decoding low-dimensional manifold codes is handling generalization error. For example, in an experiment neural activity may depend on two sets of variables $\mathbf(\theta_1,\theta_2)$. The tuning in terms of $\theta_1$ might be well-measured, but not in $\theta_2$. A linear decoder trained on this expriment might work on the training data, but fail to generalize to novel $\theta_2$. This is becuase neural activity $\theta_2$ not observed in the original experiment may lie in a completely different linear subspace from the one used to train the decoder. 

If we know that our training data samples from only a limited range of the possible $\theta$, is there some way to select a particular $\mathbf w$ that is especially robust to this sort of out-of-sampe generalization error? 

### Decoding in the stochastic case

Neural activity is typically stochastic. Population activity can be described as a distribution $\Pr(\mathbf x | \theta)$ conditioned on the latent state $\theta$. One way to decode the state $\theta$ given population activity $\mathbf x$ is to use Bayes rule:

\begin{equation}\begin{aligned}
\Pr(\theta | \mathbf x )
=
\Pr( \mathbf x | \theta )
\frac
{\Pr( \theta )}
{\Pr( \mathbf x )}
\end{aligned}\end{equation}

In these notes, we will choose coordinates such that $\Pr( \theta )$ is uniform. Given a particular observed $\mathbf x$, $\Pr( \mathbf x )$ is also constant. In this scenario, the Bayesian posterior is simply proportional to the likelihood: 

\begin{equation}\begin{aligned}
\Pr(\theta | \mathbf x )
\propto
{\Pr( \mathbf x | \theta)}
\end{aligned}\end{equation}

Neural activity $\mathbf x\in\mathbb R^N$ is high-dimensional. The simplest way to capture the joint statistics of $\mathbf x$, conditioned on $\theta$, is to use a multivariate Gaussian distribution: 

\begin{equation}\begin{aligned}
\Pr(\mathbf x | \theta)
=
\mathcal N[\mu_\theta, \Sigma_\theta]
\end{aligned}\end{equation}

In this case, the log-probability $\ln\Pr(\theta | \mathbf x )$ is: 

\begin{equation}\begin{aligned}
\ln\Pr(\theta | \mathbf x )
&=
- \frac 1 2  \ln|\Sigma_\theta|  - \frac 1 2 (\mathbf x - \mu_\theta )^\top \Sigma_\theta^{-1} (\mathbf x - \mu_\theta)
+ \text{constant.}
\end{aligned}\end{equation}

In this case, the log-probability $\ln\Pr(\theta | \mathbf x )$ is: 

\begin{equation}\begin{aligned}
\ln\Pr(\theta | \mathbf x )
&=
- \frac 1 2  \ln|\Sigma_\theta|  - \frac 1 2 (\mathbf x - \mu_\theta )^\top \Sigma_\theta^{-1} (\mathbf x - \mu_\theta)
+ \text{constant.}
\end{aligned}\end{equation}

Consider a version of this around a particular position, $\theta = \theta_0 + \delta$:

\begin{equation}\begin{aligned}
\theta &= theta_0 + \delta + \mathcal O(\delta ^2)
\\
\mu(\theta) &= \mu(\theta_0) + \left<\delta, \partial_\theta \mu(\theta_0)\right>  + \mathcal O(\delta ^2)
\\
\Sigma(\theta) &= \Sigma(\theta_0) + \left<\delta, \partial_\theta\Sigma(\theta_0)\right>  + \mathcal O(\delta ^2)
\end{aligned}\end{equation}


\begin{equation}\begin{aligned}
\ln\Pr(\theta_0 + \delta | \mathbf x )
&=
- \frac 1 2  \ln|\Sigma_\theta|  - \frac 1 2 (\mathbf x - \mu_\theta )^\top \Sigma_\theta^{-1} (\mathbf x - \mu_\theta)
+ \text{constant.}
\end{aligned}\end{equation}

Consider a version of this around a particular position, $\theta = \theta_0 + \delta$:

\begin{equation}\begin{aligned}
\delta &= \theta - \theta_0 +  + \mathcal O(\delta ^2)
\\
\mu(\delta) &= \mu(\theta_0) + \left<\delta, \partial_\theta \mu(\theta_0)\right>  + \mathcal O(\delta ^2)
\\
\Sigma(\delta) &= \Sigma(\theta_0) + \left<\delta, \partial_\theta\Sigma(\theta_0)\right>  + \mathcal O(\delta ^2)
\end{aligned}\end{equation}


Consider a version of this around a particular position, $\theta = \theta_0 + \delta$:

\begin{equation}\begin{aligned}
\delta &= \theta - \theta_0% + \mathcal O(\delta ^2)
\\
\Delta \mu(\delta) &= \left<\delta, \partial_\theta \mu(\theta_0)\right>  + \mathcal O(\delta ^2)
\\
\Delta \Sigma(\delta) &= \left<\delta, \partial_\theta\Sigma(\theta_0)\right>  + \mathcal O(\delta ^2)
\end{aligned}\end{equation}


\begin{equation}\begin{aligned}
\Sigma(\theta) &= 
\left<
(\mathbf x(\theta) - \mu(\theta)]
(\mathbf x(\theta) - \mu(\theta)]^\top
\right>
\\
\Sigma(\theta_0+\delta) &= 
\left<
[\mathbf x(\theta_0+\delta) - \mu(\theta_0+\delta)]
[\mathbf x(\theta_0+\delta) - \mu(\theta_0+\delta)]^\top
\right>
\\
&= 
\Sigma(\theta_0)+A + A^\top+ \mathcal O(\delta^2),
\\\\
\text{where }
A &=
\left<
[\mathbf x(\theta_0) - \mu(\theta_0)]
[
\left<\delta,\partial_\theta \mathbf x(\theta_0)\right>
-
\left<\delta,\partial_\theta \mu(\theta_0)\right>
]^\top\right>
\\
&=
\left<
\mathbf x(\theta_0)
\left<\delta,\partial_\theta \mathbf x(\theta_0)\right>^\top
\right>
-
\left<
\mu(\theta_0)
\left<\delta,\partial_\theta \mathbf x(\theta_0)\right>^\top
\right>
+
\left<
\mathbf x(\theta_0)
\left<\delta,\partial_\theta \mu(\theta_0)\right>^\top
\right>^\top
-
\left<
\mu(\theta_0)
\left<\delta,\partial_\theta \mu(\theta_0)\right>^\top
\right>^\top
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\end{aligned}\end{equation}

\begin{equation}\begin{aligned}
\end{aligned}\end{equation}