# Assignment 6: Signal Processing and Spectral Analysis
Please submit this assignment to Canvas as a jupyter notebook (.ipynb).  The assignment will have you do spectral analysis of EEG timeseries data using scalp EEG.

In [2]:
# imports
import mne
import ptsa
import cmlreaders as cml
import numpy as np
import matplotlib.pyplot as plt
from ptsa.data.filters import morlet
from ptsa.data.filters import ButterworthFilter

## Assignment Overview

In this project you will analyze a multi-session free recall experiment to determine the spectral biomarkers of successful memory encoding. To determine differences in brain activity (EEG signals) that predict whether a studied item will be subsequently remembered (recalled) you will compare two classes of events: word encodings that led to later recall and those that did not.  This assignment will ask you to analyze data from the ltpFR2 dataset (Kahana et al., 2018, JEP:LMC).  You will not analyze the entire dataset; rather, you will analyze a significant subset of subjects, specifically those listed as 'scalp_subs' below. 

All analyses in this project should include the following data processing steps unless the question explicitly instructs you to do otherwise:
1. For scalp EEG data, use LCF “cleaned” data.
2. After loading the data, apply a Butterworth notch filter around 60 Hz (freqs = [58-62]) to remove line noise.
3. Include a 1000 ms buffer around time period of interest when computing power; compute power using raw voltage at the original sampling rate.
4. All logarithms referenced in this assignment are base-10 logarithms (e.g., np.log10, not np.log, which is the natural log).

In [4]:
# use these subjects for problems other than 1 and 2
scalp_subs = ['LTP093', 'LTP117', 'LTP123', 'LTP133', 'LTP228', 
              'LTP246', 'LTP249', 'LTP251', 'LTP258', 'LTP259', 
              'LTP265', 'LTP269', 'LTP279', 'LTP280', 'LTP285', 
              'LTP287', 'LTP293', 'LTP296', 'LTP297', 'LTP302', 
              'LTP304', 'LTP307', 'LTP310', 'LTP311', 'LTP317', 
              'LTP318', 'LTP322', 'LTP327', 'LTP329', 'LTP330']

# use these for Problems 1 to 2
scalp_3subs = scalp_subs[0:3]

freqs = np.unique(np.round(np.logspace(np.log10(1), np.log10(300), 17)))

## Question 1
1. Load data from the three subjects in 'scalp_3subs'. For each subject, first extract the behavioral data and report basic descriptive statistics including number of sessions completed, number of lists per session, and average percentage of correctly recalled items. Report these data separately for each subject.

In [3]:
# Question 1.1
### YOUR CODE HERE

## Question 2
1) For each of the three initial subjects you will compute a power spectrum (frequency on the x-axis, power on the y-axis) for recalled and non-recalled items. Thus, the goal is to report three graphs (one for each subject) with line plots indicating recalled events (red) and non-recalled events (blue). To do this, 
* Load all encoding events (irrespective of recall status). 
* Extract the LCF-corrected EEG signal for electrode E53 (left parietal lobe) including a buffer of 1000 ms on either side of the event
    * Here we will define the encoding event as the 1600 ms interval that the word was displayed on the screen, from onset to offset. The buffer should extend before and after this period of time.
    * For scalp data, contacts are not available from reader.load(’contacts’), so you will need to use reader.load eeg(...) followed by to_ptsa() on the result, and looking at the .channel.values attribute to obtain the channels.
* Filter out line noise with the PTSA Butterworth filter (see examples from assignment 5). 
* Check that the EEG data match the number of trials in the behavioral data. 
* Compute the power spectrum with a wavelet transform with wavenumber 6 at the following 16 approximately logarithmically-spaced frequencies (in Hz): 1, 2, 3, 4, 6, 8, 12, 17, 25, 35, 50, 72, 103, 147, 210, 300. These freuencies are defined above in the 'freqs' variable.
* For each subject, plot the log of power vs. frequency on a semi-log-x plot averaged across the encoding time interval and across events, separately for recalled and non-recalled events. Take the log of the subject-level averaged powers.

2) What can we learn from the overall shape of the spectra and the qualitative (not statistical) differences between encoding conditions?

In [5]:
# Question 2.1
### YOUR CODE HERE

Question 2.2

**YOUR ANSWER HERE**

## Question 3
In this problem you will repeat the above analysis for all subjects in 'scalp_subs' and make some statistical inferences about spectral biomarkers of successful memory encoding.  
* Be sure that your code is robust to exceptions that can be thrown by e.g. unavailable data, and if you have an exception in a particular session due to a problem with the data in that session, print or log the event so you can report it, and discard that session from the analysis. This data was all acquired with human participants across long stretches of time, and small things can go wrong.
* Once you have verified your code, compute powers for all subjects as described above and create power spectra for recalled and non-recalled events.
* Using the “cmldask” package you should be able to use a separate “core” on the cluster to process data from each session. Only run up to 5 jobs at once with up to 10 GB per job (or your jobs may be killed by mean-spirited admins). Make sure you save out the power values as you go (e.g. with the pickle library).

1) Compute the average power spectra for each subject (separately for recalled and non-recalled items) and then take the log of that spectra. Averaging these power spectra across subjects, graph the average log-power spectra with 95% confidence bands (transparent light red and light blue shading, or your favorite clearly labeled color scheme).
2) To get a sense for the effect of the log-transform and the semi-log plot, also produce plots without the log-transformation of the subject-level powers and ... 
3) ... also produce plots without either the log transformation or the semi-log plot.
4) Plot the mean of the differences between the log spectra (recalled - ot recalled) computed separately for each subject (and then averaged across subjects) on a semi-log plot and place a confidence band on the mean difference score. 
5) What inferences can you now make from these results?

In [6]:
# Question 3.1
### YOUR CODE HERE

In [7]:
# Question 3.2
### YOUR CODE HERE

In [8]:
# Question 3.3
### YOUR CODE HERE

In [9]:
# Question 3.4
### YOUR CODE HERE

Question 3.5

**YOUR CODE HERE**

## Question 4
In this problem, your goal is to assess the effects of two important processing steps on the analyses in the previous problem. Two common normalization procedures used in the analysis of brain signals are the log-transform, which attenuates extremely large values, and the z-transform, which allows you to normalize power values according to some baseline distribution. 

Here you will reproduce the analysis of the previous problem in computing between-subject differences of power spectra between recalled and not recalled items with each of these steps separately. The log-transform can be applied at different points in the analysis stream: immediately after computing power values, after averaging power values for each encoding interval, or after computing the z-transform of the power values. Similarly, the z-transformation can be done by normalizing to power (or log-power values) based on many choices of the “distribution” of power (or log-power values). If our goal is to normalize data to the distribution of power values across a given session, how do you define the “distribution” over which to estimate the standard deviation of values? Consider three different choices and assess the effects of these choices on your final analysis:

1) Take the logarithm of power before averaging across time
2) Z-score powers across the events within a session after averaging across time
3) Complete a third transformation of your choice. 
4) Describe your motivation in considering the transform (in part 3).  Asses the effects of the different choices (parts 1, 2, 3).

In [10]:
# Question 4.1
### YOUR CODE HERE

In [11]:
# Question 4.2
### YOUR CODE HERE

In [12]:
# Question 4.3
### YOUR CODE HERE

Question 4.4

**YOUR ANSWER HERE**

## Question 5
All of the preceding analyses were conducted on a single scalp electrode. 
1) Repeat the analysis at all electrodes, generating a topographic map that shows the subsequent memory effect for the frequency 147 Hz. 

For this problem you should:
* both log-transform your power values immediately after extracting them, and z-transform your data based on the distribution of values across each session (1–23) for each subject. 
* For the topographic maps, use a color bar to indicate the difference between power for recalled and non-recalled items. 
* Use the between-subject t-test method with Benjamini-Hochberg FDR correction to determine which electrodes in each map meet the p < 0.05 significance threshold. 
* Mark statistically significant electrodes on each topomap with some graphical element (star, dot, different color shading). 

2) Discuss your findings.

In [13]:
# Question 5.1
### YOUR CODE HERE

Question 5.2

**YOUR ANSWER HERE**