## <span style="color:red">Intro to Neural Decoding</span>

Neural decoding is the study of what information is available in the electrical activity of individual cells or networks of neurons by trying to identify what stimulus or event elicits a particular pattern of neural activity.

It can be used predict what people were dreaming about, imagining, looking at or listening too, among many other exciting areas of interest.

In this tutorial, we will go through a few different ways to decode your neural data. This lab will also introduce you to the data that we will be using for our last problem set.

### Part 1: Classifying by computing distance to centroid

Here, we will generate two slightly separated clusters of random data. This will serve as our training set. We will then generate test points drawn from both of our distributions. We will compute the distance of each test point to the center of both clusters and use the smallest distance as our prediction. We will then test the accuracy of our prediction.

In [1]:
# Importing helful packages
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Set random number seed to make results replicable
np.random.seed(10)

# Define parameters of two data clusters
m1 = np.array([0.05, 0.05])
m2 = np.array([0.95, 0.95])
sigma = np.eye(2)

# Generate 100 data points from each cluster as training data
data1 = np.random.multivariate_normal(m1,sigma,100)
data2 = np.random.multivariate_normal(m2,sigma,100)

# Plot the data: make a scatterplot with data1 as blue circles and data2 as red circles
plt.figure()
plt.scatter(, c='blue')
plt.scatter(, c='red')

SyntaxError: invalid syntax (<ipython-input-2-da3b8c085f7f>, line 15)

In a loop of 100 steps, generate a test point either from m1 50% of the time and from m2 50% of the time. Using the imported **_pdist()_** function, compute the distance of your point to both means, and assign the point to the nearest mean. Record an accuracy of 1 for each trial if the correct mean is guessed, and record a 0 if it is not. What is your mean accuracy?

In [5]:
# Importing pdist function
from scipy.spatial.distance import pdist

# Initiate data structure for accuracy
accuracy = np.zeros()
mat1 = np.zeros()
mat2 = np.zeros()

# Loop to classify which group the test point is in
for i in range():
    # Conditional to randomly pick a point from m1 or m2

    # Define test point 
    myPoint = np.random.multivariate_normal(myMean,sigma,1)
    # calculate the distance to m1 and m2
    mat1[0,:] = 
    mat1[1,:] = 
    mat2[0,:] = 
    mat2[1,:] = 
    
    dist1 = pdist()
    dist2 = pdist()
    # Conditional to assign predicted class to be the class w/ smallest distance

    # Conditional to determine error

# Calculate accuracy

0.78


### Part 2: k-nearest neighbors

If you correctly implemented the previous stategy, you should have obtained an accuracy around the 70-80%. This is above the level of random guessing (called "chance-level performance" - 50%), but it's not especially great either.

Perhaps we can do better if we make our prediction based on look at the number of training points that are close to our test point. The number of test points we examine is known as k. Each gets one vote, and our predicted class is the majority class of these votes. For example, if k=5 and 3/5 are in category 1 while 2/5 are in category 2, we will predict that our test points is in category 1. Values of k are typically odd numbers to prevent tying.

Let's implement a 5-nearest neighbor classifier on our data.

Run the following cell to set up the necessary data structures for the classification.

In [2]:
# number of neighbors
k = 5

# putting both sets together will make things easier later
allData = np.zeros((200,2))
allData[:100,:] = data1
allData[100:,:] = data2

# category labels
Class = np.zeros((200,1))
Class[:100] = np.matlib.repmat(1,100,1)
Class[100:] = np.matlib.repmat(2,100,1)

# Splitting up the data into training and testing sets
trainInds = np.zeros((160))
trainInds[:80] = np.arange(0,80,1)
trainInds[80:] = np.arange(100,180,1)
trainInds = trainInds.astype(int)

testInds = np.zeros((40))
testInds[:20] = np.arange(80,100,1)
testInds[20:] = np.arange(180,200,1)
testInds = testInds.astype(int)

trainData = allData[trainInds,:]
testData = allData[testInds,:]

trainClass = Class[trainInds,:]
testClass = Class[testInds,:]

NameError: name 'np' is not defined

In an outer loop, sample each of your saved testing points. Within an inner loop, compute the distance of your test point with each of the points in the training data. Save each of these distances. Outside of this inner loop, sort the distances in ascending order, and find the class of the k-smallest distances (ie. nearest neighbors). Choose the most frequent class to be the predicted class for the point.

In [3]:
k = 5
accuracy = np.zeros(40)

for i in range():
    # sample the ith test point
    
    # Intialize storage space for points and distances
    tempMat = np.zeros()
    dist = np.zeros()
    # compute distances to each point in training data in an inner loop
    for j in range():
        # Store points in tempMat
        
        # Store calculated distances in dist
        dist[j] = pdist()
        
    # sorting distances
    sortedDist = np.sort(dist)
    sortedInds = np.searchsorted(dist,sortedDist)
    winners = sortedInds[1:k]
    
    # Find out how many points are in each class
    
    
    # Conditional to check if predicted class matches actual class

# Calculate accuracy

NameError: name 'np' is not defined

### Part 3: Correlation classifier

For multivariate classifications, a simple but still powerful classification algorithm is the correlation classifier. Here, each feature of the input is correlated to the mean of each class observed in training, and the class that is most correlated with the rest item is taken to be the classifier's prediction.

For this exercise, we will use one single neuron from the Zhang et al (2011) *PNAS*. Next week, you will be using the whole population of over 100 neurons that were recorded!

The data are stored in the classificationData.mat file. In this structure, neuronData corresponds to the spikes count rate of a single neuron in the object-sensitive inferior temporary cortex in 150 ms bins over course of each of 420 trials. In this experiement, a monkey was viewing one of 7 objects, and the indices for each of these objects is in neuronInds. Finally, neuronLabels gives gives the object names in order. In other words, the second position of neuronLabels gives the object that corresponds to all of the 2's in neuronInds.

Fill in the code below to create your correlation classifier:

In [4]:
# Importing package to load data
from mat2array import loadmat

# Loading and defining data
classificationData = loadmat('classificationData.mat')
neuronData = classificationData['neuronData']
neuronInds = classificationData['neuronInds']
neuronLabels = classificationData['neuronLabels']

In [6]:
# Initializing data storage for calculating accuracy
accuracy = np.zeros()

for i in range():
    # define the ith point as the test point and corresponding class
    
    # creating training data out of the remaining 419 points
    copyData = np.delete(neuronData, i, 0)
    copyInds = np.delete(neuronInds, i, 0)  
    # create a "template" that shows the average activation pattern for each object
    testCorr = np.zeros()
    # Initialize counter to index by image class.
    ID = 1
    for j in range(len(neuronLabels)):
        # find all of the training trials for the jth object
        jGroup, = np.where()
        jData = copyData[]
        jData = np.mean()
        # Find correlation coefficient between training data and test point
        testCorr[j] = np.corrcoef()[0,1]
        
        ID = ID + 1
    # Choose the category with the highest correlation as the predicted class
    predClass, = np.where()
    # Conditional to check if this is correct and updates accuracy accordingly

# Calculate accuracy

SyntaxError: invalid syntax (<ipython-input-6-2f75c209c6a1>, line 17)

If you have correctly implemented this procedure, you will get an accuracy of around 18%. How does this correspond to the level that you would expect through random guessing? How might you test whether statistically significantly higher than random guessing?

In [214]:
# Answer: