In [35]:
%pylab inline

from xgboost.sklearn import XGBClassifier
from typing import Tuple
import xgboost as xgb
import pandas as pd
import json
import sklearn

dataset = pd.read_csv('data/multisession-eeg.csv')
fromstring = lambda array_str: np.fromstring(array_str, dtype=float, sep=',')
dataset.raw_fft = dataset.raw_fft.apply(fromstring)
# dataset.raw_fft.iloc[0]

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


# Passthoughts

What if you could simply *think your password*? That's the premise behind *passthoughts*. We'll discuss passthoughts in more depth in lecture 3, but for now, we'll lay this out as a classification problem:

> Given a reading, and a person, is that person who they claim to be?

We'll structure this problem as follows: For each subject, we'll train a classifier. That subject's readings will be positive example, and everyone else's readings will be negative examples.

We can make this a little fancier by having people use specific thoughts (e.g. "focus on your breathing," "sing a song in your head," etc). We'll make sure our methods can handle this case, but for the time being, we'll just use the `"unabeled"` readings - people doing nothing in particular.

We'll use subject `A` as our "target" individual. We will train on this subject for this assignment, and train against the other subjects in the corpus (subjects `B` and `C`).

In [36]:
dataset.head()

Unnamed: 0,time,subject,session,label,raw_fft
0,2017-07-22T20:37:13.267775811Z,A,0,unlabeled,"[10.3113040924, 14.77069664, 12.213514328, 9.7..."
1,2017-07-22T20:37:14.253040444Z,A,0,unlabeled,"[11.2151269913, 14.9568557739, 10.8369417191, ..."
2,2017-07-22T20:37:15.372317746Z,A,0,unlabeled,"[6.34600162506, 12.5924711227, 10.8416910172, ..."
3,2017-07-22T20:37:16.483798739Z,A,0,unlabeled,"[10.0782966614, 16.9934558868, 18.5345039368, ..."
4,2017-07-22T20:37:17.471855277Z,A,0,unlabeled,"[4.42960739136, 9.05199050903, 4.41912555695, ..."


In [37]:
def to_matrix (series): #helper func matrix
    return np.array([ x for x in series ])

def readings_right_subject_right_task (subj, task, session=0): #positive ex, the person we want to authenticate
    return to_matrix(dataset[
        (dataset['subject'] == subj) &
        (dataset['label'] == task) &
        (dataset['session'] == session)
    ].raw_fft)

def readings_wrong_subj_any_task (subj): #negative ex, wrong subj, any task
    return to_matrix(dataset[
        (dataset['subject'] != subj)
    ].raw_fft)


In [38]:
positive = readings_right_subject_right_task('A', 'unlabeled', 0)
negative = readings_wrong_subj_any_task('A')
positive.shape, negative.shape

((40, 516), (1228, 516))

In [39]:
dataset['label'].unique()

array(['unlabeled', 'breathe', 'song', 'song_o', 'sport', 'breathe_o',
       'speech', 'face', 'calibration', 'word_x', 'phrase_x', 'face_x',
       'breatheopen', 'song_x', 'sport_x', 'breatheclosed', 'word_c',
       'phrase_c', 'face_c', 'song_c', 'sport_c'], dtype=object)

## TODO

Notice how we structured our positive and negative examples:

- *Positive examples*: The right person thinking the right task.

- *Negative examples*: The wrong person thinking any task (whether it is right or wrong).

In the context of passthoughts, consider other possibilites for selecting positive and negative features. Here, (1) pick one configuration of positive and negative examples, aside from the ones listed, and (2) discuss their possible consequences (pros/cons). Explain how you might evaluate this selection (with data, with user experiments, etc - your choice).

*Your answer here...* <br/> 
c= correct; i = incorrect <br/>
(1) Positive: Person(c), Task(c); <br/>Negative: Person(c), Task(i). <br/>
(2) pro: Can trains for higher security<br/>
con: Correct person is not getting in the system may lead to frustration

Now, we'll turn these data into our feature/label matrices `X` and `y`.

In [40]:
X = np.concatenate([positive, negative])

In [41]:
y = np.array([ 0 for x in positive] + [ 1 for x in negative])
assert X.shape[0] == y.shape[0]

Note that we are assigning `0` to "positive" examples, and `1` to "negative" examples. That means `0` will mean "ACCEPT" and `1` will mean "REJECT."

## TODO

Now, train and test a classifier! Estimate your classifier's accuracy.

In [None]:
# Your code here....
def fresh_clf () -> XGBClassifier:
    return XGBClassifier(
        # Don't worry about those parameters for now,
        # though feel free to look them up if you're interested.
        objective= 'binary:logistic',
        seed=27)

def xgb_cross_validate (
    X: np.array,
    y: np.array,
    nfold: int=7
) -> Tuple[XGBClassifier, pd.DataFrame]:
    # eval_metrics:
    # http://xgboost.readthedocs.io/en/latest//parameter.html
    metrics = ['error@0.1', 'auc']
#     metrics = [ 'auc' ]
    # we use the @ syntax to override the default of 0.5 as the threshold for 0 / 1 classification
    # the intent here to to minimize FAR at the expense of FRR
    alg = fresh_clf()
    xgtrain = xgb.DMatrix(X,y)
    param = alg.get_xgb_params()
    cvresults = xgb.cv(param,
                      xgtrain,
                      num_boost_round=alg.get_params()['n_estimators'],
                      nfold=nfold,
                      metrics=metrics,
                      early_stopping_rounds=100
                      )
    alg.set_params(n_estimators=cvresults.shape[0])
    alg.fit(X,y,eval_metric=metrics)
    return alg, cvresults

In [None]:
X_train, X_validate, y_train, y_validate = sklearn.model_selection.train_test_split(
    X, y, 
    test_size=0.33, 
    random_state=42)

clf, cvres = xgb_cross_validate(X_train, y_train)

In [None]:
clf.score(X_validate, y_validate)

For authentication, what we want even more than "accuracy" here are two metrics:

- False Acceptance Rate (FAR): The percentage of readings *not* from subject A incorrectly classified "ACCEPT." 
- False Rejection Rate (FRR): The percentage of readings *from* subject A incorrectly classified 'REJECT."

For authentication /security/, we want FAR to be as low as possible (so nobody can break in).
For authentication /usability/, we want FRR to be low (so user's don't get frustrated constantly re-trying their passthought).

In [None]:
def far_frr (classifier, features, labels):
    # predict all the labels
    y_pred = classifier.predict(features)
    false_accepts = 0
    false_rejects = 0
    for predicted, actual in zip(y_pred, labels):
        # if we should have rejected,
        # but in fact accepted,
        if (actual == 1) and (predicted == 0):
            # increment false accepts
            false_accepts += 1
        # if we should have accepted,
        # but in fact rejected,
        if (actual == 0) and (predicted == 1):
            # increment false rejections
            false_rejects += 1
    # calculate proportions for each
    far = false_accepts / len(list(filter(lambda x: x==0, y_pred)))
    frr = false_rejects / len(list(filter(lambda x: x==1, y_pred)))
    return far, frr

In [None]:
far, frr = far_frr(clf, X_validate, y_validate)
f'FAR: {far*100}% - FRR: {frr*100}%'

Now, these results might be good. 

But our classifier's accuracy could be misleading.   

Can you see why? FFR might get worse bc env might be diff.


# Nonstationarity

We are training, and testing, using data recorded over a single session. As we know, EEG changes over time, a property known as *nonstationarity*. Will our great results still hold a few weeks later?

Let's take subject `A`'s data from sessions 1 and 2, which were recorded a few weeks after session 0.

In [None]:
X_subja_sess1 = readings_right_subject_right_task('A', 'unlabeled', 1)
X_subja_sess2 = readings_right_subject_right_task('A', 'unlabeled', 2)
X_subja_later = np.concatenate([X_subja_sess1, X_subja_sess2])
y_subja_later = [ 0 for x in X_subja_later ]

Now, let's try the classifier we trained on the original data, testing it on the later data.


In [None]:
far, frr = far_frr(clf, X_subja_later, y_subja_later)
f'FAR: {far*100}% - FRR: {frr*100}%'

As we will discuss more in lecture 3, this is a problem for us. After all, we can calibrate our target subject, but we then expect them to leave the lab and go use the device later on. If their state changes so much that they can no longer be authenticated, we can't very well claim our system is accurate!

## TODO

The crux of the lab focuses on nonstationarity. At minimum, your mission is to quantify and qualify *what* is changing in EEG signals over time. You may use any tools in answering this question.

You also have your choice of corpus:

- Study subject `A`'s recordings over the three sessions provided here.
- Study one subject's recordings over the course of a year.

You can use both of these corpora, if you would like.

Some questions to spur investigation:

- What features of readings cause a classifier that works on earlier recordings fail on later ones?
- What features remain the same? Are there any?
- What might be the source of these changing features? Changing placement in the EEG device? Changing properties of the brain?


Please note below all work you do, and any notes you make along the way. Ideally, your work should read like a story - words (and questions!) interspersed with code. Good luck, and have fun!

## Exploring Nonstationarity of Subject A

Here, in the positive example, we are looking at the correct person and the correct task. In the negative example, we are looking at the correct person doing the wrong task.

In [None]:
def to_matrix (series): #helper func matrix
    return np.array([ x for x in series ])

def readings_right_subject_right_task (dataset, subj, task, session=0): #positive ex, the person we want to authenticate
    return to_matrix(dataset[
        (dataset['subject'] == subj) &
        (dataset['label'] == task) &
        (dataset['session'] == session)
    ].raw_fft)

def task_wrong_subj_right (dataset, subj, task, session = 0): #negative ex, correct subj, wrong task
    return to_matrix(dataset[
        (dataset['subject'] == subj) &
        (dataset['label'] != task)&
        (dataset['session'] == session)
    ].raw_fft)


Let' look at song data!'

In [None]:
positive = readings_right_subject_right_task(dataset, 'A', 'song', 0)
negative = task_wrong_subj_right(dataset, 'A', 'song_x', 0)
                          
positive.shape, negative.shape

Now, we'll turn these data into our feature/label matrices X and y.

In [None]:
X = np.concatenate([positive, negative])
y = np.array([ 0 for x in positive] + [ 1 for x in negative])
assert X.shape[0] == y.shape[0]

Now, let's train our classifer!

In [None]:
def fresh_clf () -> XGBClassifier:
    return XGBClassifier(
        # Don't worry about those parameters for now,
        # though feel free to look them up if you're interested.
        objective= 'binary:logistic',
        seed=27)

def xgb_cross_validate (
    X: np.array,
    y: np.array,
    nfold: int=7
) -> Tuple[XGBClassifier, pd.DataFrame]:
    # eval_metrics:
    # http://xgboost.readthedocs.io/en/latest//parameter.html
    metrics = ['error@0.1', 'auc']
#     metrics = [ 'auc' ]
    # we use the @ syntax to override the default of 0.5 as the threshold for 0 / 1 classification
    # the intent here to to minimize FAR at the expense of FRR
    alg = fresh_clf()
    xgtrain = xgb.DMatrix(X,y)
    param = alg.get_xgb_params()
    cvresults = xgb.cv(param,
                      xgtrain,
                      num_boost_round=alg.get_params()['n_estimators'],
                      nfold=nfold,
                      metrics=metrics,
                      early_stopping_rounds=100
                      )
    alg.set_params(n_estimators=cvresults.shape[0])
    alg.fit(X,y,eval_metric=metrics)
    return alg, cvresults

In [None]:
X_train, X_validate, y_train, y_validate = sklearn.model_selection.train_test_split(
    X, y, 
    test_size=0.33, 
    random_state=42)

clf, cvres = xgb_cross_validate(X_train, y_train)

clf.score(X_validate, y_validate)

In [None]:
far, frr = far_frr(clf, X_validate, y_validate)
f'FAR: {far*100}% - FRR: {frr*100}%'

It seems strange that the FAR rate is 100% because that would mean the classifier is falsely accepting the subject, when it should be in fact denying the subject. The correct subject is doing the <i>wrong</i> task, so it seems strange that they are let in the system every time. Additionally, the FRR is 12.8%, and I'm quite surprised that it's so low. I assumed that since the correct subject is thinking the wrong task, the FRR would've been higher as the wrong thought shouldn't be letting the subject into the system.