# Week 4: Wavelet Transformation for Feature Selection 
Shateesh Bhugwansing

In an attempt to increase our classification scores, I'm attempting to use Discrete Wavelet Transformation to decompose the EEG data. The resulting sub bands were ranked using their energy levels, and then fit into classifiers.

The following papers were used as a reference for this procedure:   
1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5702353/  
2) https://link.springer.com/article/10.1007%2Fs13246-015-0333-x#Sec3  
3) https://ieeexplore.ieee.org/document/6997315/

### PLEASE NOTE: 
* the python package 'pywt' (PyWavelet, http://pywavelets.readthedocs.io/en/latest/regression/dwt-idwt.html) and 'mne' must be installed in your environment in order to run this notebook. 
* the data sources are local to my machine/external hard drive. We haven't set up a cloud storage system yet.

In [28]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline 

from sklearn.model_selection import StratifiedKFold, ShuffleSplit, cross_val_score, train_test_split
from sklearn import metrics
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# the  classifiers used by our group
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier


import warnings
warnings.simplefilter("ignore", DeprecationWarning)
plt.rcParams['figure.figsize'] = (10.0, 10.0)
plt.rcParams.update({'font.size': 12})


#import mne to read EEG data 
import mne 
from mne.decoding import Vectorizer, CSP

#import pywavelet for DWT 
import pywt 

In [20]:
#read in the data for the first 4 runs of subject 1, and all runs for subject1
path1 = "/Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/Ex10_Suj1_Run1-epo.fif"
path2 = "/Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/Ex10_Suj1_Run2-epo.fif"
path3 = "/Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/Ex10_Suj1_Run3-epo.fif"
path4 = "/Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/Ex10_Suj1_Run4-epo.fif"
path_all = "/Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/subject1_all_runs-epo.fif"

run1 = mne.read_epochs(path1, preload=True)
run2 = mne.read_epochs(path2, preload=True)
run3 = mne.read_epochs(path3, preload=True)
run4 = mne.read_epochs(path4, preload=True)
runs_all = mne.read_epochs(path_all, preload=True)

# extract only the event_id's 4-5 for new vs. scrambled 
run1 = run1[(run1.events[:,-1] == 4) | (run1.events[:,-1] == 5)]
run2 = run2[(run2.events[:,-1] == 4) | (run2.events[:,-1] == 5)]
run3 = run3[(run3.events[:,-1] == 4) | (run3.events[:,-1] == 5)]
run4 = run4[(run4.events[:,-1] == 4) | (run4.events[:,-1] == 5)]
runs_all = runs_all[(runs_all.events[:,-1] == 4) | (runs_all.events[:,-1] == 5)]

#store the actual data 
data1 = run1.get_data()
data2 = run2.get_data()
data3 = run3.get_data()
data4 = run4.get_data()
data_all = runs_all.get_data()

#store labels (used as targets in the classifier algorithms)
labels1 = run1.events[:,-1]
labels2 = run2.events[:,-1]
labels3 = run3.events[:,-1]
labels4 = run4.events[:,-1]
labels_all = runs_all.events[:,-1]

Reading /Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/Ex10_Suj1_Run1-epo.fif ...
Isotrak not found
    Read a total of 1 projection items:
        Average EEG reference (1 x 61) active
    Found the data of interest:
        t =       0.00 ...    1400.00 ms
        0 CTF compensation matrices available
280 matching events found
Created an SSP operator (subspace dimension = 1)
280 matching events found
Created an SSP operator (subspace dimension = 1)
1 projection items activated
Reading /Users/shateeshbhugwansing/Desktop/seniorprojecteeg/clean_data_sample/Ex10_Suj1_Run2-epo.fif ...
Isotrak not found
    Read a total of 1 projection items:
        Average EEG reference (1 x 61) active
    Found the data of interest:
        t =       0.00 ...    1400.00 ms
        0 CTF compensation matrices available
285 matching events found
Created an SSP operator (subspace dimension = 1)
285 matching events found
Created an SSP operator (subspace dimension = 1)
1 projection it

In [29]:
# perform DWT on subj1_run1 
# the papers used the wavelet Daubechies wavelet (db4), since it is the best for EEG data [3]. so I'll start with that.

cA1, cD1 = pywt.dwt(data_all, 'db4')
print 'data: ', data_all.shape
print 'Approximation Coefficients 1: ', cA1.shape
print 'Detail Coefficients 1: ', cD1.shape
print "\n"

# They also continue to use DWT on cA of each level of decomposition, shown below. 
# Eventually there will be cA1-4 and cD1-4

cA2, cD2 = pywt.dwt(cA1, 'db4')
print 'Approximation Coefficients 2: ', cA2.shape
print 'Detail Coefficients 2: ', cD2.shape
print "\n"

cA3, cD3 = pywt.dwt(cA2, 'db4')
print 'Approximation Coefficients 3: ', cA3.shape
print 'Detail Coefficients 3: ', cD3.shape
print "\n"

cA4, cD4 = pywt.dwt(cA3, 'db4')
print 'Approximation Coefficients 4: ', cA4.shape
print 'Detail Coefficients 4: ', cD4.shape
print "\n"

data:  (557, 61, 1401)
Approximation Coefficients 1:  (557, 61, 704)
Detail Coefficients 1:  (557, 61, 704)


Approximation Coefficients 2:  (557, 61, 355)
Detail Coefficients 2:  (557, 61, 355)


Approximation Coefficients 3:  (557, 61, 181)
Detail Coefficients 3:  (557, 61, 181)


Approximation Coefficients 4:  (557, 61, 94)
Detail Coefficients 4:  (557, 61, 94)




# Help! I'm stuck

So after reading the papers (links listed at the top of the notebook), I'm still not sure what to do with the coefficients. Do I classify just the coefficients? Do I apply them to the original signal, somehow? I need help from soneone who understands how DWT and Relative wavelet energy is used to actually select features. 

In [31]:
# make pipeline 
clf = make_pipeline(Vectorizer(),
                   StandardScaler(),
                   GaussianNB())

# cross-validation
# cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(cD1,labels_all, test_size = 0.3, random_state = 42)

clf.fit(X_train,y_train)

# preds1 = np.empty(len(labels1))

#fit the classifier + make predictions for StratifiedKfold 
'''
for train, test in cv.split(cD1, labels1):
    clf.fit(cD1[train], labels1[train])
    preds1[test] = clf.predict(cD1[test])
'''
clf.score(X_test,y_test)
# print("Attempt 1, cD1: accuracy (%): ", metrics.accuracy_score(labels1, preds1))

0.5178571428571429

In [47]:
# maybe I'm supposed to concatenate the coefficients together? 
all_cD = np.concatenate(cD1[0][0].flatten(), cD2[0][0].flatten())



TypeError: only integer scalar arrays can be converted to a scalar index