# AMUSE batch experiment
In this notebook the following steps are executed
* The auditory AMUSE dataset is loaded for a single subject.
* This calibration data is used to train a supervised LDA classifier with automatic (ledoit-wolf) regularisation. This classifier serves as a baseline
* _n_iterations number of EM updates are performed on the EM based unsupervised model. This model learns without labels on the online data.
### Notes
* Make sure that the data is downloaded and pre-processed before running this script.
* Possible subjects are: VPfce, VPkw, VPfaz, VPfcj, VPfcg, VPfar, VPfaw, VPfax, VPfcc, VPfcm, VPfas, VPfch, VPfcd, VPfca, VPfcb, VPfau, VPfci, VPfav, VPfat, VPfcl, VPfck

In [1]:
import numpy as np
import config
from tools.fileio import load
from decoder.erp_decoder import UnsupervisedEM, LDADecoder
from sklearn.metrics import roc_auc_score

subject = 'VPkw'
_n_iterations = 10
_n_commands = 6 # Keep this at 6!

## Load the data

In [2]:
data_calib, data_test = load('%s/amuse_%s.pkl'%(config._processed,subject))
_n_dim = np.prod(data_test.eeg.shape[2:])
x_train, y_train = data_calib.get_data_as_xy()
x, y = data_test.get_data_as_xy()
truth = data_test.target_stimulus

## Train and evaluate supervised baseline

In [None]:
lda_decoder = LDADecoder(_n_commands,x_train,y_train)
lda_decoder.add_data(data_test)
lda_auc = roc_auc_score(y, lda_decoder.apply_single_stimulus(x))
lda_acc = 100.0*np.mean(lda_decoder.predict_all_trials()==truth)
print('LDA Accuracy: %.2f'%lda_acc)
print('LDA single stimulus AUC: %.2f'%lda_auc)



LDA Accuracy: 96.97
LDA single stimulus AUC: 0.81


## Train an unsupervised model
This model receives the entire online data at once and performs updates on this. It is *not* an online experiment.
Please note that the model is randomly initialised and because of this it does not always converge to a good solution. There are however tricks available to mitigate this. 

In [None]:
em_decoder = UnsupervisedEM(_n_dim,_n_commands)
em_decoder.add_data(data_test)
print ('data is put into the decoder...')

for i in range(_n_iterations):
    em_decoder.update_decoder()
    pred = em_decoder.predict_all_trials()
    print "it: %d\nEM:  symbol acc: %.2f, auc: %.2f,\nLDA: symbol acc: %.2f, auc: %.2f"%(i, 100.0*np.mean(pred==truth),roc_auc_score(y, em_decoder.apply_single_stimulus(x)),lda_acc,lda_auc)

data is put into the decoder...
it: 0
EM:  symbol acc: 13.64, auc: 0.51,
LDA: symbol acc: 96.97, auc: 0.81
it: 1
EM:  symbol acc: 15.15, auc: 0.51,
LDA: symbol acc: 96.97, auc: 0.81
it: 2
EM:  symbol acc: 18.18, auc: 0.52,
LDA: symbol acc: 96.97, auc: 0.81
it: 3
EM:  symbol acc: 24.24, auc: 0.54,
LDA: symbol acc: 96.97, auc: 0.81
it: 4
EM:  symbol acc: 34.85, auc: 0.58,
LDA: symbol acc: 96.97, auc: 0.81
it: 5
EM:  symbol acc: 48.48, auc: 0.63,
LDA: symbol acc: 96.97, auc: 0.81
it: 6
EM:  symbol acc: 71.21, auc: 0.70,
LDA: symbol acc: 96.97, auc: 0.81
it: 7
EM:  symbol acc: 95.45, auc: 0.77,
LDA: symbol acc: 96.97, auc: 0.81
it: 8
EM:  symbol acc: 96.97, auc: 0.79,
LDA: symbol acc: 96.97, auc: 0.81
it: 9
EM:  symbol acc: 96.97, auc: 0.79,
LDA: symbol acc: 96.97, auc: 0.81
