## <span style="color:red">Neural Decoding</span>


In lab, we will be working with files in the Zhang_neurons folder. This dataset contains recordings from 132 neurons in a monkey's inferior temporal lobe (IT). an area known to be highly involved in high-level vision and object perception. The recordings were made while a monkey viewed 7 different objects that were presented at each of three screen locations. Each object was presented approximately 20 times at each of the three locations. In each trial, the monkey viewed a fixation dot for 500 ms, and then viewed one of the seven objects for another 500 ms. The data were reported in Zhang et al (2011, *PNAS*). 

Note: This paper contains conditions in which objects were presented simultaneously, but only the single object condition is included.

Zhang, Y., Meyers, E. M., Bichot, N. P., Serre, T., Poggio, T. A., & Desimone, R. (2011). *Object decoding with attention in inferior temporal cortex*. Proceedings of the National Academy of Sciences, 108(21), 8850-8855.

https://doi.org/10.1073/pnas.1100999108

### About the dataset:

The data are in raster format, meaning that each .mat file contains data from one of the 132 neurons. Each of these files contains three variables.

*raster_site_info*: A structure corresponding to the recording parameters of the experiment that <u>can be ignored</u> for the purpose of this problem set.

*raster_labels*: A structure that contains the object being viewed (stimulus_ID), the position of the object (stimulus_position), and the combined object+position (combined_ID_position).

*raster_data*: A matrix where each row corresponds to the data from one trial, and each column corresponds to data from one 1-ms time point (the rows are also in order so that the first trial is in the first row, and the last trial is in the last row).

### Working with the dataset:

Dealing with 132 separate data files can be a challenge. First, import these packages and define some helpful code snippets:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from mat2array import loadmat
import glob
import os

homeDirectory = os.getcwd()
os.chdir(homeDirectory+ '/Zhang_neurons')
neuronList = glob.glob('*.mat')
os.chdir(homeDirectory)

Next, I recommend reading the files in this way:

In [2]:
for i in range(len(neuronList)):  
    path = homeDirectory + '/Zhang_neurons/' + neuronList[i]
    
    neuron = loadmat(path)
    
    raster_data = neuron['raster_data']
    stimID = neuron['raster_labels']['stimulus_ID']
    stimPosition = neuron['raster_labels']['stimulus_position']

In this lab, we will only concern ourselves with the seven object identities. It's helpful to define them in a list.

In [3]:
classes = ['car','couch','face','flower','guitar','hand','kiwi']

To get the indices of the first class (car), you could do the following:

In [4]:
carInd, = np.where(stimID == classes[0])

Not all IT neurons are equally responsive to visual stimuli. Calculate the mean spike count rate for each neuron in the interval from 601-1000 ms and plot a histogram of the population's spiking rate. (We're omitting the first 100 ms because there is little visually-driven activity in this area during this period).

In [None]:
# Initialize data storage for the average firing rate
meanRate = np.zeros()
# Loop through each neuron in neuronList
for i in range():  
    # Define the file path for loading the .mat files
    
    # Use the loadmat function to load the file
    
    # Defining the data stored in the .mat file
    raster_data = neuron['raster_data']
    stimID = neuron['raster_labels']['stimulus_ID']
    stimPosition = neuron['raster_labels']['stimulus_position']
    
    # Calculate the mean spike count rate for each neuron and store in meanRate

# Plotting
plt.hist(meanRate);
plt.title('Histogram of population firing rates')
plt.xlabel('Firing rate')
plt.ylabel('Frequency')

In 2-3 sentences: 

What do you conclude about the visual responsiveness to this population? What might be a negative consequence of decoding using these raw firing rates?

In [32]:
# Answer:

We want to turn the raw raster plots into spike-count rate matrices. We have provided you with a function that computes the firing rate matrix for a neuron and creates a 420-trial by 18-time bin matrix in which each time bin represents the spike count rate for a neuron within the time window. The time bins begin every 50 ms (1 ms, 51 ms, 101 ms, etc), and are 150 ms long. Thus, time window 1 is from 1-150, window 2 is from 51-200, etc.

In [None]:
neuron = loadmat('Zhang_neurons/bp1001spk_01A_raster_data.mat')
def rate(neuron):
    global bins
    bins = np.arange(0,890,50)
    rateMat = np.zeros((420,18))
    raster_data = neuron['raster_data']
    for i in range(len(bins)):
        rate1 = raster_data[:,bins[i]:bins[i]+150]
        rate2 = np.sum(rate1,axis=1)
        rate3 = rate2/.15
        rateMat[:,i] = rate3
    
    return rateMat

# Example using the new function
rateMat = rate(neuron)
print(rateMat.shape)

In order to fix the problems you outlined above, you want to z-score the firing rates for this neuron. Recall that a z-score is calculated as follows:$$z = \frac{x-\mu}{\sigma}$$ Where $x$ is the raw firing rate, $\mu$ is the mean firing rate, and $\sigma$ is the standard deviation of the cell's firing rate.

Use the zScore function to find the z-score of your firing rate matrix.

In [None]:
def zScore(rateMat): 
    globalMean = np.mean(rateMat)
    globalSTD = np.std(rateMat)
    z = (rateMat - globalMean)/globalSTD
    return z

# Apply the zScore function to your firing rate matrix


It's now time to do some decoding!

In the first problem, we will examine how much each neuron knows about each of the seven object categories. Fill in the template below in order to do the decoding. Recall that you used a correlation classifier last week on one neuron and found a classification accuracy of ~19%.

NOTE: Because some neurons are missing one trial, we will skip over them for now.

NOTE: To have you work through 10-fold cross validation in a manageable way, I'm having you use random indices. This may result in slightly biased numbers in the training-testing splits.

In [None]:
# Import the support vector machine classifier
from sklearn import svm
classify = svm.SVC(kernel='linear')

#Initialize data structures
trialInds = np.zeros()
totalAccuracy = np.zeros(125) # the number of neurons with all trials

# Define classes
classes = ['car','couch','face','flower','guitar','hand','kiwi']

# Start a cell count at 0
count = 0

for i in range():
    # Define the file path for loading the .mat files
    
    # Use the loadmat function to load the file
    
    # Defining the data stored in the .mat file
    raster_data = neuron['raster_data']
    stimID = neuron['raster_labels']['stimulus_ID']
    stimPosition = neuron['raster_labels']['stimulus_position']
    
    # Conditional to check if neuron has all 420 trials
    if raster_data.shape[0] == :
        
        # Calculate rate and zScores of the neuron
        rateMat = rate()
        z = zScore()
        
        # Loop to define the trial indices
        for k in range():
            classInds, =  np.where()
            trialInds[classInds] = 
        
        # Creating random indices for 10-fold cross validation
        inds = np.random.randint(0, high=10, size=(420))
        
        # 10-fold cross validation
        for j in range(10):
            # define testing data
            testInds =
            testVec = trialInds[]
            testData = z[]

            trainInds = 
            trainVec = trialInds[]
            trainData = z[]
            
            # Train SVM
            classify.fit(trainData,trainVec)
            
            # Run SVM on testing data to get predictions of image class
            predClass = classify.predict()
            
            # Initialize storage space to calculate accuracy
            accuracy = np.zeros()
            
            # Loop through predClass
            for h in range():
                # Conditional to check accuracy of predClass with respect to testVec
        
        # Calculate and store accuracy
        totalAccuracy[count] = 
        count = count + 1
        
# Calculate average accuracy across all neurons

If the classifier was randomly guessing object categories, how well would it do? How well do these cells do relative to that standard? Is it statistically significant?

Hint: From **_scipy.test.mstats_** import the ***ttest_onesamp()*** function to quantify your results. Is it practically significant? Why or why not?

In [None]:
from scipy.stats.mstats import ttest_onesamp
ttest_onesamp()
# Answer:

### Now, it's time to use the entire population to decode. 
First, calculate the z-scores for each of the neurons as you did in the first decoding problem, except this time you will need to store them into a 3-dimensional matrix. The purpose of this is to decode by the entire population of neurons at each time point, rather than an individual neuron. 

You will need two total loops: The first is to calculate the z-scored matrix for all of the neurons, the second will be used to do the decoding. NOTE: because objects were presented to each neuron in a random order, we will order each neuron's data by object class before decoding.

Calculate and plot the classifier's accuracy of the population of neurons per time point.

<img src="image2.jpg" alt="drawing" width="250"/>

In [182]:
# Part 1: create 3D matrix

# Import relevant machine learning tools
from sklearn.model_selection import KFold 
from sklearn import svm
classify = svm.SVC(kernel='linear')

#Initialize data structures
trialInds = np.zeros()
nMat = np.zeros()

# Define classes
classes = ['car','couch','face','flower','guitar','hand','kiwi']

# Start a cell count at 0
count1 = 0
for i in range():
    # Define the file path for loading the .mat files
    
    # Use the loadmat function to load the file
    
    # Defining the data stored in the .mat file
    raster_data = neuron['raster_data']
    stimID = neuron['raster_labels']['stimulus_ID']
    stimPosition = neuron['raster_labels']['stimulus_position']

    
    # Conditional to check if neuron is missing a trial
    if raster_data.shape[0] == :
        # Calculate rate and zScores of the neuron
        rateMat = rate()
        z = zScore()
        
        # Loop to define the trial indices
        for k in range():
            classInds, =  np.where()
            trialInds[classInds] = 
        
        # Sorting the data to be used later
        sortedTrials = sorted(enumerate(trialInds), key=lambda x:x[1])
        sortedTrials = np.asarray(sortedTrials)
        sortedData = sortedTrials[:,0].astype(int)
        img = sortedTrials[:,1].astype(int)
        
        nMat[:,:,count1] = z[sortedData,:]
        count1 = count1 + 1

In [None]:
# Part 2: decoding the population

# Initializing storage space
totalAccuracy = np.zeros()

# Loop through each 18 time bins
for t in range():
    
    # Define 420x125 feature vector per time bin
    timeMat = nMat[]
    
    # Define random indices for 10-fold cross validation
    randInds = np.random.randint(0, high=10, size=(420))
    
    # 10-fold cross validation
    for j in range():
        # define testing data
        testInds = 
        testVec = img[]
        testData = timeMat[]

        trainInds = 
        trainVec = img[]
        trainData = timeMat[]
        
        # Train SVM
        classify.fit(trainData,trainVec)
        
        # Run SVM on testing data
        predClass = classify.predict()
        
        # Initialize data storage to calculate accuracy
        accuracy = np.zeros()
        
        # Loop through predClass
        for h in range():
            # Conditional to check accuracy of predClass with respect to testVec
    
    # Calculate accuracy for each time bin
    totalAccuracy[t] = 

# Plotting accuracy of population with respect to each time bin
plt.figure()
plt.plot(bins,totalAccuracy)
plt.xlabel("FILL ME IN")
plt.ylabel("FILL ME IN")

If all went according to plan, your decoding graph should look qualitatively similar to the blue curve from the original paper:

<img src="image1.png" alt="drawing" width="250"/>

Note: Zhang et al used a different type of classifier, so your accuracies will be subtely differnet.

Compare what you found here to what you found when you looked at each cell individually? Is information about object identity primarily found in individual cells or across the population?

In [318]:
# Answer: