In this notebook we will perform classification analysis on data from the NEPR207 2018 datasets.

Note from dan:


In each folder there are now several files:

roi_instances.mat
roi_labels.mat

where roi = lV1, rV1

instances is a matrix, time bin * trial * voxel
cond is the condition labels for each trial

I set it up so we could do more time bins if anybody in the class decides they want that, for now it's just the average response 3-7.5 s after stimulus onset, which should work fine for most groups.

Responses are percentages relative to 1 so you probably want to subtract 1 or z-score before doing classification.

368/369/370 all have two conditions (attend left/right or stimulus rotated left/right) group 371 has three conditions (task difficulty = 1/2/4)


In [112]:
import os
import scipy.io
import numpy,pandas
import sklearn

subcodes=['s036820180521','s036920180521','s037020180521','s037120180521']


In [113]:
def get_data(subcode):
    lV1_instances=scipy.io.loadmat(os.path.join('..',subcode,'lV1_instances.mat'))['data'][0,:,:]
    labels=scipy.io.loadmat(os.path.join('..',subcode,'lV1_labels.mat'))['cond'][:,0]
    rV1_instances=scipy.io.loadmat(os.path.join('..',subcode,'rV1_instances.mat'))['data'][0,:,:]

    instances=numpy.hstack((lV1_instances,rV1_instances))
    assert instances.shape[0]==labels.shape[0]
    return(labels,instances)

Build a classifier to predict condition from V1 data

In [149]:
def run_classifier(labels,instances,shuffle=False,
                   cv=None,clf=None):

    if cv is None:
        cv=sklearn.model_selection.KFold(n_splits=4,shuffle=True,random_state=0)

    if clf is None:
        clf=sklearn.svm.LinearSVC()
    predicted_labels=numpy.zeros(labels.shape)
    scaler=sklearn.preprocessing.StandardScaler()
    labels_copy=labels.copy()  # to prevent shuffle from affecting original variable

    for train,test in cv.split(instances):
        if shuffle:
            numpy.random.shuffle(labels_copy)
        train_X,test_X=instances[train,:],instances[test,:]
        train_Y,test_Y=labels_copy[train],labels_copy[test]
        train_X=scaler.fit_transform(train_X)
        test_X=scaler.transform(test_X)
        clf.fit(train_X,train_Y)
        predicted_labels[test]=clf.predict(test_X)
    return(predicted_labels)

def print_metrics(labels,predicted_labels):
    print('Confusion matrix:')
    print(sklearn.metrics.confusion_matrix(labels,predicted_labels))
    print('Accuracy:',sklearn.metrics.accuracy_score(labels,predicted_labels))


In [150]:
clf=None  # none for LinearSVC
#clf=sklearn.linear_model.LogisticRegressionCV()

for subcode in subcodes:
    print('running for subcode ',subcode)
    labels,instances=get_data(subcode)
    predicted_labels=run_classifier(labels,instances,shuffle=False,clf=clf)
    print_metrics(labels,predicted_labels)
    # run 10 times to get mean of shuffled accuracy
    nruns=10
    shuffle_acc=numpy.zeros(nruns)
    for r in range(nruns):
        predicted_labels=run_classifier(labels,instances,shuffle=True)
        shuffle_acc[r]=sklearn.metrics.accuracy_score(labels,predicted_labels)
    print('Mean shuffled accuracy (%d runs): %0.3f'%(nruns,numpy.mean(shuffle_acc)))
    print('')

running for subcode  s036820180521
Confusion matrix:
[[67  6]
 [ 3 72]]
Accuracy: 0.9391891891891891
Mean shuffled accuracy (10 runs): 0.514

running for subcode  s036920180521
Confusion matrix:
[[37 14]
 [14 43]]
Accuracy: 0.7407407407407407
Mean shuffled accuracy (10 runs): 0.507

running for subcode  s037020180521
Confusion matrix:
[[29 16]
 [12 31]]
Accuracy: 0.6818181818181818
Mean shuffled accuracy (10 runs): 0.518

running for subcode  s037120180521
Confusion matrix:
[[5 5 9]
 [5 3 8]
 [7 7 4]]
Accuracy: 0.22641509433962265
Mean shuffled accuracy (10 runs): 0.330

