## Lesson 8: Receptive Fields

In Lesson 6, we discussed linear time-invariant systems, which can serve as a simple model of how the rate of a neuron can depend on a time-varying stimulus.

This week, we'll look at some empirical findings about how neurons in the visual system respond to simple stimuli, and then consider how to represent these responses using an LTI model.

Follow along in the notebook during the lecture. There is no assignment associated with this lesson.


In [None]:
# load matplotlib inline mode
%matplotlib inline

# import some useful libraries
import numpy as np                # numerical analysis linear algebra
import matplotlib as mpl
import matplotlib.pyplot as plt   # plotting

# set some style options
mpl.rcParams['image.origin'] = 'lower'
mpl.rcParams['image.aspect'] = 'auto'
mpl.rcParams['image.cmap'] = 'jet'

## Sensory tuning curves

As we saw with the frog visual system, neurons do not respond to every stimulus in the world. Instead, they are **tuned** to specific properties of the stimulus.

For example, so-called *simple cells* in the primary visual cortex (V1) respond strongly to moving bars, and are tuned to specific orientations. Each cell has a **tuning curve**, $r(s)$, that shows firing rate as a function of orientation:

<img src="images/l8_orientation_tuning.png" alt="V1 tuning" style="width: 400px;"/>

## Receptive fields

In the visual system, one of the most fundamental tuning properties of neurons is the region of space that can excite a neuron. This area is called the **receptive field** (or RF) of the neuron.

Because visual space is two-dimensional, a spatial tuning curve has two independent variables, $r(x,y)$, and looks like a surface:

<img src="images/l8_receptive_field_spatial_only.png" alt="lgn receptive field" style="width: 400px;"/>

The peak in the center of the RF is the region of space where a bright spot of light will cause this neuron in the visual thalamus to fire.

Note that the RF can also take on negative values, which indicate regions of space where stimuli suppress responses. The effects of suppressive regions can only be observed if the spontaneous rate of the neuron is high, or if suppressive stimuli are combined with excitatory ones.

## Temporal tuning

Visual neurons also exhibit temporal tuning. That is, they respond best to stimuli that modulate in time. This means that in fact, visual RFs are three-dimensional functions with two spatial and one temporal dimension, i.e. $r(x,y,t)$. The plot below shows one spatial and one temporal dimension:

<img src="images/l8_receptive_field_temporal.png" alt="lgn receptive field" style="width: 400px;"/>

You can see that at short lags, the central excitatory field is surrounded by suppressive areas. Later, the situation is reversed, and the central area is suppressive. Cutting a cross-section through time at the center of the RF gives a function that looks a bit like this:

<img src="images/l8_receptive_field_temporal_only.png" alt="lgn receptive field - time only" style="width: 300px;"/>

What does this mean? One intepretation is that the best stimulus is a small dot of light that turns on for about 100 ms. If the light stays on longer, it will start to suppress the response. 

## "Simple" cells

In contrast to the center-surround RFs seen in the visual thalamus, many neurons in the primary visual cortex have more complex, elongated RFs.

<img src="images/l8_simple_rf.png" alt="V1 simple cell RF" style="width: 300px;"/>

This elongation confers **orientation tuning** on these neurons, as illustrated by this simple diagram:

<img src="images/l8_simple_orientation.png" alt="V1 orientation tuning" style="width: 500px;"/>

Like thalamic neurons, simple cells usually have a temporal profile that inverts over time, which makes the neurons prefer *moving* bars.

<img src="images/l8_simple_temporal.png" alt="V1 spectrotemporal" style="width: 450px;"/>

## Modeling visual receptive fields using LTI systems

How could we build a model that responds like a visual neuron?

As we discussed in lesson 6, one of the simplest models we have is the linear time-invariant system. It's almost always best to start simple and then add complexity. As it turns out, this simple model can produce a lot of the behaviors of real neurons in the primary visual cortex and visual thalamus.

### Review of LTI systems

The general idea behind an LTI system is that its output at any given time $t$ is defined by a weighted average of the present and past values of the input.

The weights used to calculate this average are called the kernel, filter, or impulse response function, $h(t)$. Given this function, you can predict the response to any arbitrary input $x(t)$ using convolution.

Convolution is simply a series of weighted averages or dot products. In its discrete form, convolution looks like this:

$$y(t) \approx \sum_i h(t - \tau_i) x(\tau_i)$$

And in its continuous form it looks like this:

$$y(t) = \int_0^\infty h(\tau) x(t - \tau) d\tau$$

Remember that because convolution is commutative, either $h(t)$ or $x(t)$ can be time-shifted.


### Multivariate stimuli, multivariate kernels

We can generalize the one-dimensional LTI kernel $h(t)$ to more complex stimuli that have non-temporal dimensions.

Visual stimuli are extended in two dimensions and change in time, so the stimulus has three dimensions:

<img src="images/l8_natural_images.png" alt="natural images" style="width: 500px;"/>

You can think of the stimulus as a three-dimensional function, $s(x, y, t)$, or as a time-varying vector, $\mathbf{s}(t)$, where $\mathbf{s}$ has components that correspond to each of the pixels in the image.

This means that the impulse response function for a visual neuron also has three dimensions: $h(x, y, t)$

Only the temporal dimension is convolved; the others are simply multiplied in a dot product. To see why this is, think back to our static neuron model:

![linear neuron](images/l6_linear_neuron.png)

The response at any instant in time is just a weighted sum or dot product:

$$y = \sum_i w_i x_i = \mathbf{w} \cdot \mathbf{x}$$

Note that we're using the convention where lower-case symbols in bold denote vectors.

Therefore, the equation to predict the response to a multivariate stimulus looks like this:

$$r(t) = \sum_i \sum_j \sum_k h(x_i, y_j, \tau_k) s(x_i, y_j, t - \tau_k)$$

Note that because the sums are commutative, this is equivalent to convolving each pixel of the RF with the matching pixel in the stimulus and then summing all the convolutions up together.

\begin{align}
r_{i,j}(t) & = \sum_k h(x_i, y_j, \tau_k) s(x_i, y_j, t - \tau_k) \\
r(t) & = \sum_i \sum_j r_{i,j}(t)
\end{align}

Keep this in mind when you try to implement in Python. `np.convolve` expects univariate arrays, and other convolution packages may try to convolve along multiple dimensions, which is not what you want.

## Static nonlinearity

One problem with our LTI model is that neurons can't have negative firing rates, and most have a maximum.

However, the output of an LTI system is theoretically unbounded.

One solution is to pass the linear estimate through an arbitrary function that transforms to a real firing rate. The nonlinearity is **static** because it only applies to each instant in time.

So if $L(t) = \int_0^\infty h(\tau) x(t - \tau) d\tau$ and $F(\cdot)$ is a univariate nonlinear function, then

$$\hat{r}(t) = r_0 + F(L(t))$$

The nonlinearity can be determined empirically by comparing $L(t)$ to $r(t)$, or it can be parameterized using a number of convenient forms.

<img src="images/l8_nonlinearity.png" alt="static nonlinearities" style="width: 400px;"/>

## Linear-Nonlinear cascade models

The final thing we need to do to our model is to make it spike. As we discussed in Lesson 5, a simple way of modelling spiking is with an inhomogeneous Poisson process, where the number of spikes emitted in a small interval depends only on an underlying rate.

We get this rate from the output of the static nonlinearity.

This introduces the concept of cascade models, which can be conceptualized as a series of steps where one step takes the output of another step as its input. This can be used to make a complete model that goes all the way from stimulus to spiking response.

<img src="images/l8_lnp.png" alt="lnp model" style="width: 600px;"/>

Because the cascade model shown above generates spikes through an inhomogeneous Poisson process, it's called a Linear-Nonlinear-Poisson (LNP) model. LNP models are conceptually simple and serve as starting points for many analyses.

### Example

Let's look at an RF with one spatial and one temporal dimension.

In [None]:
filts = np.load('data/filters.npz')
k1 = filts['wb2'] * 25
plt.imshow(k1, aspect='equal')
plt.xlabel("tau (ms)")
plt.ylabel("x (px)")

We're going to simulate the response to some low-frequency Gaussian noise.

In [None]:
from scipy import ndimage
nx, nt = k1.shape
nsamples = nt * 100
np.random.seed(1)
stim_raw = np.random.randn(nx, nsamples)
stim = ndimage.gaussian_filter(stim_raw, sigma=(10, 50))
stim[:,:500] = 0
plt.imshow(stim)

Now let's do the convolution:

In [None]:
# convolve each pixel (row) separately
convrows = [np.convolve(k1[i], stim[i]) for i in range(nx)]
# sum across rows
conv = np.row_stack(convrows).sum(0)

fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(9, 4))
axes[0].imshow(stim)
axes[1].plot(conv[:nsamples])

Compare the result of the convolution to the stimulus and the RF. Is it what you would expect?

We now convert the convolution to an estimated rate by passing it through a nonlinear function. We'll use `exp`.

In [None]:
r_est = np.exp(conv[:nsamples])
plt.plot(r_est)

And then generate spikes using our standard Bernoulli approximation:

In [None]:
lmb = r_est * 0.001
n_trials = 10
spikes = []
for i in range(n_trials):
    runif = np.random.uniform(size=nsamples)
    spk = (lmb > runif).nonzero()[0]
    plt.vlines(spk, i - 0.4, i + 0.4)
    spikes.append(spk)

## Estimating RFs

As Chapter 2 in Dayan and Abbott discusses in more detail, receptive fields can predict many of the tuning properties of visual neurons. This suggests two key conclusions:

1. Visual neurons (up to a point) are pretty linear.
2. Linear models have a lot of explanatory power.

As a consequence, we often want to try to estimate the linear RF of sensory neurons. Even if the neuron isn't totally linear, it's a good place to start. But how do we infer the kernel in a system where we can't stimulate with a delta function?

## Reverse correlation

One very simple approach is to average the stimuli that preceded each spike, like so:

<img src="images/l8_revcor_1d.png" alt="1d reverse correlation" style="width: 400px;"/>

This is called the **spike-triggered average** (STA) or the **reverse correlation**. The latter term is a bit of a historical artifact, because in fact it's simply the cross-correlation of the stimulus and the response.

The STA is easily generalized to multivariate stimuli:

<img src="images/l8_revcor_2d.png" alt="2D reverse correlation" style="width: 300px;"/>

## Stimulus-triggered ensemble

One way of conceptualizing this is by thinking of the set of stimuli presented to the neuron as occupying some high-dimensional space. (*Dimensions* here is the number of pixels, not the number of dimensions in the image)

<img src="images/l8_stensemble.png" alt="spike-triggered ensemble" style="width: 450px;"/>

The STA is simply the centroid of the ensemble of stimuli that triggered a spike:

<img src="images/l8_sta_ensemble.png" alt="spike-triggered average" style="width: 450px;" />

### Example

Let's see how we would do spike-triggered averaging using the response we just simulated.

In [None]:
# here's a single spike-triggered stimulus
t = spikes[0][5]
sts = stim[:,t-nt:t]
plt.imshow(sts, aspect="equal")

In [None]:
# now we'll do this for every spike
n_spikes = 0
sta = np.zeros_like(k1)
for trial in spikes:
    for t in trial:
        # exclude spikes that are too close to the start
        if t - nt < 0: continue
        n_spikes += 1
        sta += stim[:,t-nt:t]
# average by dividing the sum by N
sta /= n_spikes
plt.imshow(sta, aspect="equal")
plt.title("STA (n=%d)" % n_spikes)

## Stimulus correlations

What happened?? The STA looks nothing like the kernel.

The spike-triggered average only recovers the kernel of an LTI system if there are no correlations in the stimulus. 

To see why this is, think about how the 1D average would be affected if the stimulus had a lot of low-frequency power, so that the values at any instant in time were correlated with the values around it.

Or think about how the average would be affected if the ensemble of black points in the figure a few slides back were not spherical.

## Kernel estimation as linear regression

The solution to the foregoing problem is to correct for the correlations in the stimulus.

It can be helpful to recast the problem as one of linear regression. Remember the expansion of convolution:

$$r(t) = h_1 s(t) + h_2 s(t-\Delta) + h_3 s(t-2\Delta) + \cdots$$ 

Here we've represented the kernel with subscripts starting with 1 but kept the functional notation for the stimulus.

We can rewrite this sum as a dot product,

$$r(t) = \mathbf{s}(t) \cdot \mathbf{h}$$

where $\mathbf{s}(t)$ refers to the **time-lagged** stimulus. That is $\mathbf{s}(t) = \{s(t), s(t - \Delta), \ldots, s(t - N\Delta)\}$.

### Example

Let's look at a concrete example in numpy using a really short stimulus and kernel.

In [None]:
# the kernel
h = np.asarray([0, 1, -0.5])
# the full stimulus
stim = np.random.randn(20)
plt.plot(stim)

Let's calculate the response at $t = 2$. We don't have to do anything with `h`. $\mathbf{s}(2)$ is `[stim[0], stim[1], stim[2]]` but **in reverse**. In slice notation, this is `stim[2::-1]` (i.e., start at index 2 and go to the beginning in steps of -1).

In [None]:
print("y_2 =", np.dot(h, stim[2::-1]))

### Vectorizing the model

Instead of writing the response and stimulus as a function of time, let's think of them as a series of observations made at discrete intervals.

$$r(t) = \mathbf{s}(t) \cdot \mathbf{h}$$

becomes

$$r_i = \mathbf{s}_i \cdot \mathbf{h}$$

where $r_i$ be the rate at time $t_i$ and $\mathbf{s}_i$ be the stimulus from $t_{i-N}$ to $t_i$.

We can further simplify our notation by stacking all the observations in a vector $\mathbf{r}$:

$$\mathbf{r} = \mathbf{S} \mathbf{h}$$

The matrix $\mathbf{S}$ has as many rows as there are time points in the observation vector $\mathbf{r}$ and as many columns as there are time points in the kernel $\mathbf{h}$. In each row, it contains the stimulus at the current time and at a set of previous lags. 

Let's see what this would look like for our toy problem:

In [None]:
print("stimulus:\n", stim)
ntau = len(h)
nt = len(stim)
S = np.zeros((nt - ntau, ntau))
for i in range(ntau, nt):
    S[i-ntau] = stim[i-ntau:i][::-1]
print("stimulus matrix:\n", S)


Take a close look at the stimulus matrix. Notice how the values shift over by one in each row?

This kind of matrix is called a [Toeplitz matrix](https://en.wikipedia.org/wiki/Toeplitz_matrix), and it's the linear algebra version of convolution.

There's a function in `scipy` that can generate this matrix and even do some padding.

In [None]:
from scipy.linalg import toeplitz
S = toeplitz(stim, np.asarray([stim[0], 0, 0]))
print(S)

With our Toeplitz matrix, convolution is as simple as matrix multiplication:

In [None]:
# the slow way
y1 = np.convolve(stim, h)[:nt]
# and the fast way
y2 = np.dot(S, h)
# should give the same result....
assert np.all(y1 == y2)

### Linear regression with time-varying stimuli

What does this have to do with linear regression? The advantage of this notation is that it's easy to see how our data consist of a series of observations made for different values of $\mathbf{s}$. Some of the variance in these observations is explained by variation in the stimulus; the rest is random noise. We represent this noise as an additional variable that's added at each time point. 

$$r_i = \mathbf{s}_i \cdot \mathbf{h} + \varepsilon_i$$

Or equivalently,

$$\mathbf{r} = \mathbf{S} \mathbf{h} + \mathbf{\varepsilon}$$

Given this model, our goal is to estimate $\mathbf{h}$. That is, to find values that maximize the amount of variance explained by the stimulus and minimize the amount of random error ($\mathbf{\varepsilon}$).

If we assume that the noise is normally distributed and independent, then what we need to do is minimize the sum of the squares of the error. This is called **ordinary least squares** (OLS).

Hopefully, you've seen that our model is a pretty vanilla linear model. The OLS solution is well-known:

$$\hat{h} = (\mathbf{S}^T \mathbf{S})^{-1} \mathbf{S}^T \mathbf{r} = \left(\sum_i \mathbf{s}_i \cdot \mathbf{s}_i \right)^{-1} \left(\sum_i \mathbf{s}_i r_i \right)$$

- The first term is the autocovariance matrix for the independent variables. 
- The second term is the covariance between the dependent and independent variables. 
- Here, these correspond to the **autocorrelation** and the **cross-correlation**

If the stimulus is white noise, the correlation between any two instants in time is zero, so $\mathbf{S}^T\mathbf{S} = \sigma^2\mathbf{I}$ ($\mathbf{I}$ is the identity matrix and $\sigma^2$ is the variance of the stimulus).

The $^{-1}$ operator stands for **matrix inversion**. This is what "undoes" the effects of the correlations in the stimulus. Matrix inversion is computationally expensive and numerically hairy. There are some tricks for dealing with the latter, but for now let's look at a quick example:

### Example

Let's use a nice alpha function as our kernel.

In [None]:
tau = 5
dt = 1.0
kt = np.arange(0, 75, dt)
k = kt / tau**2 * np.exp(-kt / tau)
nkt = len(k)
plt.plot(kt, k);

And let's consider two stimuli: one that's uncorrelated and one that is.

In [None]:
nt = 10000
t = np.arange(0, nt, 1.0)
stim_u = np.random.randn(nt)
stim_c = ndimage.gaussian_filter(stim_u, 5)
plt.plot(t, stim_u, t, stim_c)
plt.xlim(0, 200);

We'll generate our Toeplitz matrices for both stimuli and use them to simulate a response. Note that we're adding a bit of noise.

In [None]:
S_u = toeplitz(stim_u[nkt:], stim_u[nkt-1::-1])
S_c = toeplitz(stim_c[nkt:], stim_c[nkt-1::-1])

nrt = nt - nkt
eps = np.random.randn(nrt) * 0.05
r_u = np.dot(S_u, k) + eps
r_c = np.dot(S_c, k) + eps
plt.plot(r_u)
plt.plot(r_c)
plt.xlim(0, 200);

The STA is just the inverse of the convolution:

In [None]:
STA_u = np.dot(S_u.T, r_u)
STA_c = np.dot(S_c.T, r_c)
plt.plot(kt, STA_u / nt, kt, STA_c / nt, kt, k, 'k')

Notice how the STA for the uncorrelated stimulus is pretty good, but the STA for the correlated one is badly distorted.

Now let's find the OLS estimate by dividing out the stimulus autocorrelations.

In [None]:
from scipy import linalg

# this works because the covariance matrix is relatively easy to invert
cov_u = np.dot(S_u.T, S_u)
ols_u = np.dot(linalg.inv(cov_u), STA_u)

# we have to do a little regularization to avoid numerical issues
cov_c = np.dot(S_c.T, S_c) + 10 * np.eye(nkt)
ols_c = np.dot(linalg.inv(cov_c), STA_c)

plt.plot(kt, ols_u, kt, ols_c, kt, k, 'k')

Now the estimate from the correlated stimulus is much closer to the true kernel!

Let's now consider a case where a simple LTI model doesn't work.

## Complex cells

V1 also contains a number of "complex" cells. Like simple cells, these neurons often have strong orientation tuning, but unlike simple cells, their receptive fields do not have clear "on" and "off" subregions.

Hubel and Wiesel theorized that the tuning properties of simple and complex neurons arise from convergence:

<img src="images/l8_hubel_wiesel.png" alt="Hubel and Wiesel model" style="width: 450px;"/>

## The energy model

One model for complex cells is an extension of the simple LNP model in which the nonlinearity combines inputs from multiple LTI filters:

<img src="images/l8_energy_model.png" alt="LNP energy model" style="width: 400px;"/>

That is,

\begin{align}
L_i(t) & = \sum_j k_i(\tau_j) s(t - \tau_j) \\
r(t) & = r_0 + G(L_1(t), L_2(t), L_3(t), \ldots)
\end{align}

In the energy model, $G$ is simply the sum of the squared outputs of the filters.

## Spike-triggered covariance

In a purely linear time-invariant model, the sum of more than one filter can be described by a single filter, and it's not possible to recover the component filters.

This means that STA or linear regression on complex cells usually just gives you noise.

However, there is still structure in the spike-triggered ensemble:

<img src="images/l8_ste_variance_schwartz.png" alt="spike-triggered ensemble variance" style="width: 500px;"/>

## Finding the important dimensions

What we need to do is to find the dimensions along which the variance of the STE is greater or less than expected. Just as the STA is a vector in the stimulus space that corresponds to a linear filter, the vectors with unexpected variance also correspond to filters.

One method for recovering the dimensions is to use principal components analysis to find the eigenvectors of the STE covariance matrix. Dimensions that have larger or smaller variance than expected are the excitatory and suppressive filters of the cell.

<img src="images/l8_stc_pca.png" alt="STC PCA" style="width: 300px;" />

## Estimating the nonlinearity

The nonlinearity can be found empirically using Bayes' rule:

$$p(\mathrm{spike}|L_i) = \frac{p(L_i|\mathrm{spike})}{p(L_i)}$$

<img src="images/l8_simple_complex_nonlin.png" style="width: 400px;"/>

As predicted by the energy model, the nonlinearities for complex neurons often have a roughly parabolic shape.