# Evaluating Encoding Models

The goal of today's notebook is for you to examine the goodness of fit of your linear and nonlinear models for the H1 neuron.

As before, the first cell loads the data. There is no need to alter anything here.

In [1]:
# Importing packages to load data
import scipy.io as sio

# Load the file and parse the variables
H1 = sio.loadmat('H1.mat')
rho = H1['rho']
stim = H1['stim']

# Making them one dimensional to do further calculations
rho = rho[:,0]
stim = stim[:,0]

In order to more directly compare the models to the neural response, we want to create a smoothed firing rate. Use convolution along with the given kernel to create a 21-point moving average across the spike train vector, rho.

In [None]:
# Use an 11-point moving window to smooth out the spike train
from numpy.matlib import repmat
import numpy as np
import matplotlib.pyplot as plt

# kernel is 21 point moving window
kernel = repmat(.05, 1, 21).ravel()

# smooth rho with kernel
smoothRho = #FILL ME IN

plt.figure()
plt.subplot(2,1,1)
plt.plot(rho[0:1000])
plt.subplot(2,1,2)
plt.plot(smoothRho[0:1000])

In order to create the linear model, you will need to convolve your spike-triggered average with the stimulus. In the interest of time, you may load in the pre-computed STA provided below.

In [4]:
# load STA and create linear model
STA = np.load('STA.npy')

# Create linear model by convolving stimulus with STA (use 'same' method)
linearModel = #fill me in

In order to measure the goodness of fit between these two models, we can examine both the Pearson correlation coefficient as well as the coefficient of determination. Fill in the code below to display these.

In [None]:
print('The correlation is: ',np.corrcoef(XXX, XXX)[X,X])
print('Proportion of explained variance: ',# fill this in)

To create the nonlinear model, we need to do two things: z-score the linear model to reduce the range to something sensible, and then pass it through a sigmoid function. Fill in the code below to accomplish these goals.

In [None]:
# z-score the linear model
linearModel = #fill me in

# pass the result through the sigmoid nonlinearity
def sigmoid(x):
    return 1.0 / (1.0+np.exp(-x))

nonlinearModel = sigmoid(#fill me in)

As before, print out the Pearson correlation coefficient and the coefficient of determination for the nonlinear model.

In [None]:
print('The correlation is: ',np.corrcoef(XXX, XXX)[X,X])
print('Proportion of explained variance: ',# fill this in)

Which model is better? How can you tell?

In [None]:
# Answer:

Let's now check how well each model generalizes. We will fit a model to the first half of the dataset, and use it to predict the values of the smoothed spike train from the second half of the dataset (we are ignoring habituation, learning, etc.)

Break up the dataset (smooth rho and stim) into two halves. On the first half, create a linear model in the same method as above. 

In [1]:
# break up dataset into first and second halves

# create the linear model of the first half


Let's now get the slope and intercept parameters for this model. For now, we'll do it by hand.

Remember that the equation for the regression coefficients is:
$$w = (X^TX)^{-1} X^TY$$

In [None]:
from numpy.linalg import inv

# put smooth rho and linear model into matrix form

# add a column of ones to your "X" variable

# calculate XtX and XtY

# calculate w

print(w)

Run the cell below to apply the coefficients that you just calculated to the data from the second half of the experiment. 

In [None]:
xAxis = np.linspace(0, .7, len(smoothRho2))

yHat = w[0] + w[1]*xAxis

print('The correlation is: ',np.corrcoef(smoothRho2, yHat)[0,1])
print('Proportion of explained variance: ', np.corrcoef(smoothRho2, yHat)[0,1]**2)

So, not that great. But it's very hard to build a model that has strong generalization. On the plus side, you can see that there is plenty of room left to build better models in computational neuroscience!

Please upload this file to Lyceum for grading. Wonderful job!