# Spectral Biomarkers of Successful Memory

This final notebook is the culminating exercise of the workshop, and consists of 
analyses previously published by the Computational Memory Lab. At this point, 
the data required for this exercise is available only on Rhino. To gain access to this
data until such a point where we've packaged it for release, please email
kahana-sysadmin@sas.upenn.edu.

In this project you will analyze a multi-session free recall
experiment to determine the spectral biomarkers of successful memory
encoding.  To determine differences in brain activity (EEG signals)
that predict whether a studied item will be subsequently remembered
(recalled) you will compare two classes of events: word encoding that
led to later recall and those that did not.

The first part of this assignment will ask you to analyze data from
the ltpFR2 data set (Kahana et al., 2018, JEP:LMC).  The second part
of the assignment will ask you to analyze data reported in Solomon et
al (2018).  Both studies involve delayed free recall of 'unrelated'
word lists, but the first study was carried out in healthy young
adults, using scalp EEG, and the second study was carried out in
patients with temporal lobe epilepsy undergoing a neurosurgical
evaluation that required implantation of depth electrodes through
small openings in the skull.  You will not analyze the entire datasets
for this assignment; rather, you will analyze a significant subset of
subjects.

All analyses in this project should
include the following data processing steps unless the question
explicitly instructs you to do otherwise:

1. For scalp EEG data (Part 1), use Localized Component Filtering (LCF)
  ''cleaned'' data. (DelPozo-Baños Weidemann, 2017). To load the
  LCF-cleaned data you will pass the clean=True argument to the
  CMLReader’s load method.
2. After loading the data, apply a Butterworth notch filter around
  60~Hz (freqs = [58 62]) to remove line noise.
3. Include a 1000~ms buffer around time period of interest when
  computing power; compute power using raw voltage at the original
  sampling rate.

# Dataset

## List of 16 Scalp EEG subjects
* LTP123
* LTP133
* LTP249
* LTP251
* LTP258
* LTP259
* LTP269
* LTP285
* LTP293
* LTP302
* LTP304
* LTP307
* LTP311
* LTP318
* LTP322
* LTP330


## List of 20 intracranial Fr1/CatFr1 subjects sorted by recall rate
|Subject ID | Sessions | Recall Rate |
|-----------|----------|-------------|
|R1380D  | 5           | 0.59 |
|R1111M  | 6           | 0.53 |
|R1332M  | 4           | 0.52 |
|R1377M  | 6           | 0.51 |
|R1065J  | 10          | 0.51 |
|R1385E  | 5           | 0.46 |
|R1189M  | 4           | 0.46 |
|R1108J  | 4           | 0.46 |
|R1390M  | 5           | 0.42 |
|R1236J  | 7           | 0.42 |
|R1391T  | 5           | 0.42 |
|R1401J  | 4           | 0.41 |
|R1361C  | 5           | 0.41 |
|R1060M  | 5           | 0.39 |
|R1350D  | 4           | 0.39 |
|R1378T  | 6           | 0.39 |
|R1375C  | 5           | 0.38 |
|R1383J  | 5           | 0.38 |
|R1354E  | 6           | 0.37 |
|R1292E  | 4           | 0.37 |

# Part I

In this part you will work with data from a multi-session scalp EEG study
of list recall. Table of scalp EEG subjects above provides a list of subject codes
that you will use for this assignment.  The first three subjects should be
used for Problems 1.1--1.2.  Subsequent problems should use all subject
codes.


1. Load data from three subjects (first three codes in table).
    For each subject, first extract the behavioral data and report
    basic descriptive statistics including number of sessions
    completed, number of lists per session, average number of correctly
    recalled items, prior-list intrusions, extra-list intrusions, and
    repetitions.  Report these data separately for each subject.

2. For each of the three initial subjects you will compute a power
    spectrum (frequency on the x-axis, power on the y-axis) for recalled
    and non-recalled items.  Thus, the goal is to report three graphs (one
    for each subject) with line plots indicating recalled events (red) and
    non-recalled events (blue).  To do this, first extract all encoding
    event (irrespective of recall status) extract the LCF corrected EEG
    signal for electrode E53 (left parietal) including a buffer of 1000~ms
    on either side of the event and filter out line noise.  For scalp data
    contacts are not available from reader.load('contacts'), so you will
    need to use reader.load\_eeg(...) followed by to\_ptsa() on the
    result, and using .channel.values on that result.  Here we will define
    the encoding event as the 1600ms that the word was displayed on the
    screen, from onset to offset.  The buffer should extend before and
    after this period of time.  Check that the EEG data match the number
    of trials in the behavioral data.  To compute the power spectrum, you
    should use the wavelet transform with wavenumber 6 and compute power
    at the following 16 approximately logarithmically-spaced frequencies
    (in Hz): 1, 2, 3, 4, 6, 8, 12, 17, 25, 35, 50, 72, 103, 147, 210, 300.
    The plots should show average power across time and across events for
    each subject, separately for recalled and non-recalled events.  What
    can we learn from the overall shape of the spectra and the qualitative
    (not statistical) differences between encoding conditions?

   
3. In this problem you will repeat the above analysis for all
    subjects and make some statistical inferences
    about spectral biomarkers of successful memory encoding.  This data
    was all acquired with human participants across long stretches of
    time, and small things can go wrong.  Be sure that your code is robust
    to exceptions that can be thrown, and if you have an exception in a
    particular session due to a problem with the data in that session,
    print or log the event so you can report it, and discard that session
    from the analysis.  Before trying this, please profile your code and
    make sure that it is running efficiently.  You should know how much
    memory is required to analyze data from each subject and how long it
    takes to compute the powers and average across the time samples.
    Using the ''cluster helper'' you should be able to use a separate
    ''core'' on the cluster to process data from each subject.  Make sure
    you save out the power values as you go.  Once you have verified your
    code, compute powers for all subjects as described above and create
    power spectra for recalled and not-recalled events.  Averaging these
    power spectra across subjects, graph the average power spectra with
    95\% confidence bands (transparent light red and light blue shading,
    or your favorite clearly-labelled color scheme).  Below the graph
    showing the two power spectra, plot the difference between the
    spectra, computed separately for each subject, and place a confidence
    band on the difference score.  What inferences can you now make from
    these results?
   

4. In this problem, your goal is to assess the effects of several
    important processing steps on the analyses in the previous
    assignment.  Two common normalization procedures used in the
    analysis of brain signals are the log-transform, which attenuates
    extremely large values, and the z-transform, which allows you to
    normalize power values according to some baseline distribution.
    Here you will reproduce the analyses of the previous problem with
    and without each of these steps.  The log-transform can be applied
    at different points in the analysis stream: immediately after
    computing power values, after averaging power values for each
    encoding interval, or after computing the z-transform of the
    power values.  Similarly, the z-transformation can be done by
    normalizing to power (or log-power values) based on many choices
    of the ''distribution'' of power (or log-power values).  If our
    goal is to normalize data to the distribution of power values
    across a given session, how do you define the ''distribution''
    over which to estimate the standard deviation of values?  Consider
    different choices and assess the effects of these choices on your
    final analysis.

5. All of the preceding analyses were conducted on a single scalp
    electrode.  Here you should repeat the analysis at all electrodes,
    generating a topographic map that shows the subsequent memory effect
    for the frequency 147~Hz.  For this assignment you should both
    log-transform your power values immediately after extracting them, and
    z-transform your data based on the distribution of values across each
    session (1--23) for each subject.  For the topographic maps, use a
    color bar to indicate the difference between power for recalled and
    not-recalled items.  Use the across subject t-test method with
    Benjamini-Hochberg FDR correction to determine which electrodes in
    each map meet the $p<0.05$ significance threshold.  Mark statistically
    significant electrodes on each topomap with some graphical element
    (star, dot, different color shading). **Use the ptsa_plot.ptsa_plot.topo.topoplot function. The ptsa_plot/ptsa_plot/electrode_layouts folder has coordinates for scalp electrode cap contacts.** Discuss your findings.
    

6. A crucial decision in analyzing EEG data is how one references
    the electrical activity: i.e., the EEG measured at a single
    channel is a difference in voltage between that channel and a
    reference point.  In this problem, you should compare the results
    of the previous step under two different referencing schemes: The
    default ''average'' reference that comes when you extract the data and
    a bipolar reference scheme in which an electrode is referenced to a
    nearest neighbor.  For scalp data, since bipolar pairs are not defined
    in the dataset with reader.load('pairs'), this requires using the
    contacts as explained in problem 1.2 along with the scalp layout
    maps to identify suitable neighbors for calculating a bipolar
    difference.  Comment on which method is most robust.  Do these
    differences depend on the frequency being examined?

# Part II

In this part, you will learn how to analyze intracranial EEG data.
This will be your first time working with these data, so in Problem 1
you will practice loading data and examining basic attributes of the
data before conducting the subsequent memory analyses for a large
dataset.

When working with intracranial EEG data you need to bear in mind the
standard arrangement that can be used to average across subjects.  A key
step in the processing of these data involves using CT and MRI scans to
localize the electrodes and then matching up these localizations to a
standard brain map based on the average of many patients' brains.
Fortunately for you, this step has already been taken care of by the
researchers who assembled these data.  All you need to know is how to
obtain the location tags for each electrode in each patient.

For this part, you will analyze data from patients in the second table.
Each of these patients was chosen because they
have at least three sessions of Fr1/CatFR1 data and their recall
performance was at least 30%.

For each of these subjects, use the following processing steps:

* Use the get\_data\_index function with CMLReader to load EEG data
  using a scheme of bipolar pairs of electrodes.
  
* Apply a Butterworth notch filter around 60 Hz (freqs = [58 62])
  when extracting the voltage.
  
* calculate power at the above frequencies with a Morlet wavelet
  with wavenumber (keyword ''width'') of 6 for each encoding event
  (from time 0 until 1.6 seconds after the encoding event onset) using
  a 1 second buffer.
  
* for each frequency, channel, and encoding event, average the
  power over the entire 1600 ms encoding period (but not over the
  buffer period!)

* log-transform the average encoding power values as in the final
  step of the previous problem.

* compute the within-session z-transform of these power values as in
  the previous problems.  

* Some sessions for some subjects may be missing critical data used in
  a particular analysis, which can lead to exceptions being thrown.
  Include code to detect this condition, and discard sessions that cannot
  be recovered.

* In some cases you may notice artifacts in the data that manifest
  in power values of zero. These would produce problems in the
  transformation and classification, so please exclude any events with
  this issue from all analyses.

## Assignment:
  1. Find all subjects (from among those in the table above)
    who have electrodes localized to the following three regions:
    temporal lobe, frontal lobe, and hippocampus.  For example, ind.region
    columns with 'bankssts' and 'temporal' in them are in the temporal
    lobe.  Create a table with the number of electrodes in each of these
    regions for each subject.  Create a table that reports, for each of
    these subjects, the average number of recalled and not-recalled
    (encoding) events and also the number of intrusion errors (combining
    both PLIs and ELIs).   Also, report the percent recall for each
    subject.

  2. Conduct a parallel analysis to the one you carried out in Problem
    1.5, but this time instead of topographic maps you will just analyze
    the data for each of the three brain regions noted above.  Conduct
    these analyses on only those subjects who have at least one electrode
    in each of the temporal lobe and frontal lobe regions, and plot these
    two regions in the resulting figure.  Because subjects will have
    different numbers of electrodes in each of these regions, you need to
    decide how to combine the data across electrodes.  Think of reasonable
    ways to do this and try each of them.  Report all of the approaches
    that you used and document them.  What are the strengths and
    weaknesses of each approach.  There may not be a ''right'' answer
    here... use your best judgment and argue for the method you feel is
    best.

  3. For the preceding analysis, you used bipolar referencing.  As with
    scalp EEG you can use either bipolar or an average reference.  With
    intracranial EEG, however, the average reference is not always well
    defined because each subject will have their own ideosyncratic
    arrangement of electrodes.  For this problem, create an average
    reference by first computing an average across electrodes within each
    lobe of the brain being analyzed, then average the averages across the
    lobes but only for lobes that have at least 5 electrodes.  This will
    insure that each lobe is equally weighted within the overall average.
    Recompute the analysis in the previous problem using this method.
    Compare your results and comment on the strengths and weaknesses of
    each.