# Assignment 10: Multivariate and Machine Learning Analysis for Intracranial EEG Data
Please submit this assignment to Canvas as a jupyter notebook (.ipynb).  The assignment will have you compare the ability of different machine learning techniques to classify neural memory states, as well as the examine the effects of normalizing the EEG data.

In [13]:
# imports
import numpy as np
import pandas as pd
import cmlreaders as cml
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.model_selection import KFold
from scipy.stats import ttest_ind

## Assignment Overview

In this assignment you will investigate how different penalization schemes and z-scoring features can produce different behaviors in the classifier. Recall that the objective function for penalized logistic regression is: 

$l(\beta) = \Sigma_{i=1}^N{y_i log p_i + (1 − y_i) log(1 − p_i)} + \frac{\alpha}{2} r||\beta||_2^2 + \alpha(1 − r)||\beta||_1$

where, $\alpha = 1/C$ is the penalty parameter, r is the contribution of L2 penalty, and 1 − r is the contribution of L1 penalty. When r = 1, we have a strictly L2 penalized logistic regression. When r = 0, we have a strictly L1 penalized regression (a.k.a. Lasso). When 0 < r < 1, we have a mixture of both L1 and L2, which is called elastic net. In this part, you will compare the performances of different penalization schemes: strictly L2, strictly L1, and elastic net with r = 0.5.

Again, use data from the following 20 FR1/catFR1 subjects in the intracranial EEG (iEEG) dataset.

In [14]:
subs = ['R1380D', 'R1380D', 'R1111M', 'R1332M', 'R1377M', 
        'R1065J', 'R1385E', 'R1189M', 'R1108J', 'R1390M', 
        'R1236J', 'R1391T', 'R1401J', 'R1361C', 'R1060M', 
        'R1350D', 'R1378T', 'R1375C', 'R1383J', 'R1354E', 
        'R1292E']

For each of these subjects, use the following processing steps:
* Load EEG with CMLReader.load_eeg from a bipolar montage loaded using CMLReader.load('pairs').
* Apply a Butterworth notch filter around 60 Hz (freqs = [58 62]) when extracting the voltage.
* Calculate power at the above frequencies with a Morlet wavelet with wavenumber (keyword “width”) of 6 for each encoding event (from time 0 until 1.6 seconds after the encoding event onset) using a 1 second buffer.
* For each frequency, channel, and encoding event, average the power over the entire 1600 ms encoding period (but not over the buffer period!)
* Log-transform the average encoding power values as in the final step of the previous problem.
* In some cases you may notice artifacts in the data that manifest in power values of zero. These would produce problems in the transformation and classification, so please exclude any events with this issue from all analyses.

## Question 1
1) Repeat the nested cross-validation procedure from the previous assignment, now using z-scored features.  Do so separately for L2, L1, and elastic net. 

* You should again use sklearn’s linear_model.LogisticRegression class for your classifier, appropriately selecting the classifier hyperparameters to obtain the L1 and elastic net regularization schemes.

2) Compare the performances (AUCs) across these three schemes using a barplot or whatever you see fit, including some visualization of variability in the outcomes for these methods. Does one scheme do better than the others?

* Keep in mind that the sklearn LogisticRegression parameter `C` is an inverse regularization strength, meaning the regularization strength is equal to $\alpha$ = 1/C. So higher `C` means lower regularization.

In [15]:
# Question 1.1
### YOUR CODE HERE

In [16]:
# Question 1.2
### YOUR CODE HERE

## Question 2
1) Generate a plot for each of the first three subjects containing three histograms (one for each penalization scheme) of the model coefficients.
* Use the *model.coef_* attribute (*model* is your classifier object) to investigate the learned coefficients of the classifier for each subject. You can pool classifier weights across all outer cross-validation folds. 
* Use the *alpha* parameter of the plt.hist function to ensure the histograms do not cover each other up. 

In [17]:
# Question 2.1
# YOUR CODE HERE

## Question 3
It has been shown that L1 penalization introduces sparsity to classifier weights, i.e. some of the β’s in the model will be zero with L1. 

1) For each subject, report the proportion of non-zero β’s for that subject’s classifier weights pooled across outer cross-validation folds separately for these three schemes. Plot a histogram of these subject proportions for each penalization scheme. 

2) What can you say about the proportions of non-zero β’s across the three penalization schemes?

In [18]:
# Question 3.1
### YOUR CODE HERE

Question 3.2

**YOUR CODE HERE**

## Question 4

1) Test whether z-scoring improves classifier performance by repeating the analysis from question but without z-scored features for L2, L1, and elastic net.

2) Which is better, raw features or z-scored features?  Give an intuitive explanation as to why one is better than the other.

In [19]:
# Question 4.1
### YOUR CODE HERE

Question 4.2

**YOUR ANSWER HERE**

## Question 5

So far, you've used a mean nested CV score to compare penalization schemes and to compare z-scored features to raw features. 

1) What can we conclude about the generalization of these comparisons? Are the improvements you found between these methods biased or unbiased in the sense of overfitting? In other words, if you tested whichever methods among these tested methods that you found achieved the optimal score in a fresh held-out data set, would the method be expected to achieve the same expected performance (assume we had a large enough sample of subjects to ignore subject-level variability)? What would be one scheme you could use to obtain an unbiased estimator of the population-level (as opposed to the individual subject-level) hold-out performance of your chosen optimal methods in new data? 

2) What about a scheme for an unbiased estimate of the performance of these methods at the individual subject level? In other words, if we wanted to ask "which penalization method or z-scoring approach is best for each subject separately?" and then evaluate the performance of the best method for that subject at the individual level, how could we do it in an unbiased manner (without "cheating")? Think about the methods you've used so far.

Question 5.1

**YOUR ANSWER HERE**

Question 5.2

**YOUR ANSWER HERE**