# Generalized Linear Models

Now that we have a better sense for neural spike traints, let's build probablistic models that predict neural responses to sensory stimuli or other covariates. These are called **encoding models**, and ideally, these models will recapitulate summary statistics of interest.

## Linear Nonlinear Poisson (LNP) models

First, consider a single neuron. Let $y_{t} \in \mathbb{N}_0$ denote the number of spikes it fires in the $t$-th time bin. (As before, assume time bins are length $\Delta$, typically 5-100 ms.) Let $\mathbf{x}_t$ denote the covariates at time $t$. For example, the covariates may be features of a sensory stimulus at time bin $t$. 

A common modeling assumption in neuroscience is that neural spike counts are **conditionally Poisson**

$$
y_{t} \sim \mathrm{Po}(\lambda(\mathbf{x}_{1:t}) \cdot \Delta),
$$

where $\mathbf{x}_{1:t} = (\mathbf{x}_1, \ldots, \mathbf{x}_t)$ is the stimulus up to and including time $t$, and where $\lambda(\mathbf{x}_{1:t})$ is a conditional **firing rate** that depends on the stimuli.

As written above, the firing rate $\lambda$ looks like a rather complex function... it takes in an arbitrarily long stimulus history and outputs a non-negative scalar. We will make a few simplifying assumptions in order to construct our first model.

1. Assume that $\lambda$ only depends on a finite set of **features** of the stimulus history, $\boldsymbol{\phi}_t = (\phi_1(\mathbf{x}_{1:t}), \ldots, \phi_{D}(\mathbf{x}_{1:t}))^\top \in \mathbb{R}^D$. For example, the features may be the most recent $D$ frames of the stimulus, corresponding to $\phi_d(\mathbf{x}_{1:t}) = \mathbf{x}_{t-d}$.

2. Assume that $\lambda$ only depends on **linear projections** of the features, $\mathbf{w}^\top \boldsymbol{\phi}_t \in \mathbb{R}$, for some weights $\mathbf{w} \in \mathbb{R}^D$. We will call $\mathbf{w}^\top \boldsymbol{\phi}_t$ the **activation** at time $t$.

3. Finally, assume that $\lambda$ maps the activation through a **rectifying nonlinearity**, $f: \mathbb{R} \mapsto \mathbb{R}_+$, to obtain a non-negative firing rate.

Altogether, these assumptions imply a **linear nonlinear Poisson (LNP)** model,

$$
y_t \sim \mathrm{Po}(f(\mathbf{w}^\top \boldsymbol{\phi}_t) \cdot \Delta)
$$

Typical choices of rectifying nonlinearity are the exponential function, $f(a) = e^a$, and the softplus function, $f(a) = \log (1+e^a)$. 

## Incorporating spike history

The model above treats the spike counts $y_t$ and $y_{t'}$ as **conditionally independent** given the stimulus. However, we know this assumption is invalid due to neurons' refractory period: after a neuron spikes, it cannot spike for at least a few milliseconds. For small time bins, these dependencies matter. 

A simple way to address this model misspecification is to allow the firing rate to depend on both the stimulus and the **spike history**, $\lambda(\mathbf{x}_{1:t}, \mathbf{y}_{1:t-1})$. We can do so by including the spike history in the features, 

$$
\boldsymbol{\phi}_t = \left(\phi_1(\mathbf{x}_{1:t}, \mathbf{y}_{1:t-1}), \ldots, \phi_D(\mathbf{x}_{1:t}, \mathbf{y}_{1:t-1}) \right)^\top.
$$

This way, some of our features can capture the stimulus, and others can capture recent spike history. For example, one of our features might be $\phi_d(\mathbf{x}_{1:t}, \mathbf{y}_{1:t-1}) = y_{t-d}$. In the language of statistical time series models, these spike history terms make this an **autoregressive (AR) model**.

:::{admonition} Exercise
:class: tip
Suppose our features were $\phi_d(\mathbf{x}_{1:t}, \mathbf{y}_{1:t-1}) = y_{t-d}$ for $d=1,\ldots,D$. If neurons have a refractory period that prevents firing in two adjacent time bins, what would you expect the best-fitting weights $\mathbf{w} \in \mathbb{R}^D$ to look like?
:::

## Multi-neuronal spike train models

So far, we've considered models for a single neuron. In practice, we will often record from many neurons simultaneously, and we would like our models to capture correlations between neurons. 

Let $\mathbf{y}_t = (y_{t,1}, \ldots, y_{t,N})^\top \in \mathbb{N}_0^N$ denote the vector of spike counts from $N$ neurons in time bin $t$. We can generalize the LNP model above as,

$$
y_{t,n} \sim \mathrm{Po}(f(\mathbf{w}_n^\top \boldsymbol{\phi}_t) \cdot \Delta)
$$

where the weights $\mathbf{w}_n \in \mathbb{R}^D$ are specific to neuron $n$, and where $\boldsymbol{\phi}_t \in \mathbb{R}^D$ now includes features of the stimulus as well as the spike history of _all neurons_.

For example, we might have,

$$
\boldsymbol{\phi}_t = (\mathbf{x}_t,\ldots,\mathbf{x}_{t-L}, y_{t-1,1}, \ldots, y_{t-L,1}, \ldots, y_{t-1,N}, \ldots, y_{t-L,N}, 1)
$$

where $L$ is the maximum lag of stimulus and spike history to be considered. The final 1 in $\boldsymbol{\phi}_t$ is a **bias** term that allows the model to learn a baseline firing rate.

The entries of $\mathbf{w}_n$ associated with the features $(y_{t-1,m}, \ldots, y_{t-L,m})^\top$ can be thought of as **coupling filters**, which model how spikes on neuron $m$ influence the future firing rate of neuron $n$. 

## Basis function encodings

The model above has $\mathcal{O}(N^2 L)$ weights for the coupling filters. For small bin sizes, $L$ may need to include dozens of past time bins to capture all the pairwise interactions. However, these coupling filters are often approximately smooth functions of the time lag. One way to cut down on parameters and capture this smoothness is to use a **basis function representation**. For example, one of the features can be,

$$
\phi_{m,b}(\mathbf{x}_{1:t}, \mathbf{y}_{1:t-1}) = \sum_{\ell=1}^L y_{t-\ell,m} e^{-\frac{1}{2 \sigma^2}(\ell - \mu_b)^2}.
$$

This is a **radial basis function** encoding of the spike history of neuron $m$. It is a weighted sum of past spiking, where the weights are a squared exponential (aka Gaussian) kernel centered on delay $\mu_b$. We can use $B < L$ basis functions to summarize the spike history over the last $L$ time bins.

:::{admonition} Exercise
:class: tip
Show that the feature above can be written as a convolution of the spike history with a squared exponential kernel.
:::

## Generalized linear models (GLMs)

The model described above, with stimulus features, spike history terms, and basis function encodings, is what neuroscientists often call "the" generalized linear model (GLM), after {cite:t}`pillow2008spatio`. Of course, in statistics we know that this model is just one instance of a broad family of GLMs, which are characterized by linear projections of covariates, nonlinear link functions, and exponential family conditional distributions {cite}`mccullagh2019generalized`. In fact, we have already encountered one GLM in this course: the logistic regression model from [Unit 1](./07_pose_tracking.ipynb).