## <span style="color:black">Intro to Neural Decoding</span>

Neural decoding is the study of what information is available in the electrical activity of individual cells or networks of neurons by trying to identify what stimulus or event elicits a particular pattern of neural activity.

It can be used predict what people were dreaming about, imagining, looking at or listening too, among many other exciting areas of interest.

In this tutorial, we will go through a few different ways to decode your neural data. This lab will also introduce you to the data that we will be using for our last problem set.

### Part 1: Classifying by computing distance to centroid

Here, we will generate two slightly separated clusters of random data. This will serve as our training set. We will then generate test points drawn from both of our distributions. We will compute the distance of each test point to the center of both clusters and use the smallest distance as our prediction. We will then test the accuracy of our prediction.

Run the cell below. There is no need to change anything.

In [None]:
# Importing helful packages
import numpy as np
import matplotlib.pyplot as plt

# Set random number seed to make results replicable
np.random.seed(10)

# Define parameters of two data clusters
m1 = np.array([0.05, 0.05])
m2 = np.array([0.95, 0.95])
sigma = np.eye(2)

# Generate 100 data points from each cluster as training data
data1 = np.random.multivariate_normal(m1,sigma,100)
data2 = np.random.multivariate_normal(m2,sigma,100)

# Plot the data: make a scatterplot with data1 as blue circles and data2 as red circles
plt.figure()
plt.scatter(data1[:,0], data1[:,1], c='blue')
plt.scatter(data2[:,0], data2[:,1], c='red')

In a loop of 100 steps, generate a test point either from m1 50% of the time and from m2 50% of the time. Using the imported pdist() function, compute the distance of your point to both means, and assign the point to the nearest mean. Record an accuracy of 1 for each trial if the correct mean is guessed, and record a 0 if it is not. What is your mean accuracy?

In [None]:
# Importing pdist function
from scipy.spatial.distance import pdist

# Initialize data structures
accuracy = np.zeros() # fill in the parentheses according to the instructions
mat1 = np.zeros((2,2))
mat2 = np.zeros((2,2))

# Loop to create and classify points
for i in range():
    # Conditional to randomly pick a point from m1 or m2

    # Define test point 
    myPoint = np.random.multivariate_normal(myMean, sigma, 1)
    
    # calculate the distance to m1 and m2 (row 0: myPoint; row1: m1 or m2)
    mat1[0,:] = 
    mat1[1,:] = 
    mat2[0,:] = 
    mat2[1,:] = 
    
    dist1 = pdist()
    dist2 = pdist()
    
    # Conditional to assign predicted class to be the class w/ smallest distance

    # Conditional to determine trial accuracy

# Calculate mean accuracy

### Part 2: Correlation Classifier

For multivariate classifications, a simple but still powerful classification algorithm is the correlation classifier. Here, each feature of the input is correlated to the mean of each class observed in training, and the class that is most correlated with the rest item is taken to be the classifier's prediction.

For this exercise, we will use one single neuron from the Zhang et al (2011) PNAS. Next week, you will be using the whole population of over 100 neurons that were recorded!

The data are stored in the classificationData.mat file. In this structure, neuronData corresponds to the spike count rate of a single neuron in the object-sensitive inferior temporal cortex in 150 ms bins over course of each of 420 trials. In this experiement, a monkey was viewing one of 7 objects, and the indices for each of these objects is in neuronInds. Finally, neuronLabels gives gives the object names in order. In other words, the second position of neuronLabels gives the object that corresponds to all of the 2's in neuronInds.

Run the code below to load the data and to show you the shape of neuronData and the neuronLabels.

In [None]:
# Importing package to load data
from mat2array import loadmat

# Loading and defining data
classificationData = loadmat('classificationData.mat')
neuronData = classificationData['neuronData']
neuronInds = classificationData['neuronInds']
neuronLabels = classificationData['neuronLabels']

print(neuronData.shape)
print(neuronLabels)

In this exercise, you will loop through each of the 420 trials, leaving that ith trial out in turn and training with the remaining 419 trials. 

Within this loop, you will: 
* calculate the category average firing rate pattern for each of the 7 objects
* compute the correlation between the held-out test pattern and each of the category average
* select the category with the highest correlation as the predicted category
* check if the predicted category matches the category of the held-out item
* update the accuracy vector with a 1 if the category was correctly chosen, and 0 otherwise

In [None]:
# Initialize data storage for accuracy
accuracy = np.zeros() 

# Loop through each trial
for i in range():
    
    # define the ith point as the test point and store its corresponding class
    testPoint = 
    testPoint = testPoint + (.0000000001 * np.random.rand(18)) # to make correlations nicer for firing rates=0
    trueCategory = 
    
    # creating training data out of the remaining 419 points (no need to change anything)
    copyData = np.delete(neuronData, i, 0)
    copyInds = np.delete(neuronInds, i, 0)  
    
    # create storage for the correlations of the test item with each object
    testCorr = np.zeros()
    
    # Initialize counter to index by image class (goes from 1-7)
    ID = 1
    
    # Loop through each object; calculate the mean firing pattern, and compute correlation to testPoint
    for j in range(len(neuronLabels)):
        # find all of the training trials for the jth object
        jGroup, = np.where()
        
        # use these indices to create a N-trials by 18-point matrix
        jData = copyData[]
        
        # compute the mean over trials to create an average 18-point firing pattern
        jData = np.mean()
        
        # increment ID
        ID += 1
        
    # Choose the category with the highest correlation as the predicted class
    # Hint: remember that predClass gives indices that are zero-indexed
    predClass, = np.where()
    
    # Conditional to check if this is correct and updates accuracy vector accordingly
    
# print the overall results
print(np.mean(accuracy))
    

If you have correctly implemented this procedure, you will get an accuracy of around 18-19%. How does this correspond to the level that you would expect through random guessing? How might you test whether statistically significantly higher than random guessing?

In [None]:
# Answer: