## Explore the RAM database

The CML's database of intracranial and scalp EEG comes in a pandas dataframe format. All the pertinent data about each experimental session is recorded in a row of a dataframe. The distributed example data consists of 20 intracranial EEG participants in FR1 (a non-stimulation free-recall experiment) and catFR1 (a non-stimulation categorized free-recall experiment).

Let's load the example database to get a better sense of this data format. We're going to use the **CMLReaders** class, which is a small helper class which will load the example data.

In [1]:
# First, our import statements.
import pandas as pd
import numpy as np
import cmlreaders as cml

# We load the dataframe for all sessions.
df = cml.get_data_index()

In [2]:
# This dataframe contains index information about the experimental sessions in the example dataset.
df[:10]

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
0,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,0,,,0,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
1,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,1,,,1,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
2,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,10,,,10,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
3,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,11,,,11,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
4,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,12,,,12,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
5,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,13,,,13,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
6,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,14,,,14,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
7,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,15,,,15,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
8,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,16,,,16,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
9,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,17,,,17,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...


In [3]:
# Let's see what experiments we have access to
df['experiment'].unique()

array(['ltpFR2', 'VFFR', 'ltpRepFR', 'NiclsCourierClosedLoop',
       'NiclsCourierReadOnly', 'ltpDBOY1', 'prelim', 'FR1', 'FR2', 'PAL1',
       'YC1', 'PAL2', 'catFR1', 'YC2', 'catFR2', 'PS1', 'PS3', 'PS2',
       'TH1', 'FR3', 'PS2.1', 'PAL3', 'TH3', 'OPS', 'RepFR1', 'catFR3',
       'FR5', 'PS4_catFR', 'THR', 'PS4_FR', 'PAL5', 'THR1', 'catFR5',
       'PS4_catFR5', 'FR6', 'PS5_catFR', 'catFR6', 'TICL_FR',
       'LocationSearch', 'TICL_catFR', 'DBOY1', 'RepFR2', 'pyFR'],
      dtype=object)

**Exercise: How many sessions in the example data were run on Jefferson subjects? (Hint: The last letter of the subject code is for the hospital location)**

## Overview of the CML experiment types

### Verbal free-recall tasks (no-stim)
* FR1
* catFR1

### Paired-associates tasks
* PAL1
* PAL2 (open-loop stim)
* PAL3 (closed-loop stim)
* PAL5 (closed-loop stim)

### Spatial navigation tasks
* YC1
* TH1
* THR
* THR1
* YC2 (open-loop stim)
* TH3 (closed-loop stim)

### Verbal free-recall w/ stim
(Basically, any FR task with a number above 1 somewhere)
* FR2 (open-loop)
* catFR2
* FR3 (closed-loop)
* catFR3
* FR5 (closed-loop)
* catFR5
* PS4_FR (closed-loop)
* PS4_catFR (closed-loop)
* PS5_catFR (closed-loop)
* FR6 (multi-target stim)
* catFR6 (multi-target stim)
* TICL_FR (encoding/math/retrieval stim)

### No-task stimulation ("parameter search")
* PS1
* PS2/PS2.1
* PS3
* LocationSearch


In [4]:
# And now let's find all the subjects who did the FR1 task
fr1_df = df.query('experiment == "FR1"')
fr1_df['subject'].unique()

array(['R1380D', 'R1111M', 'R1332M', 'R1377M', 'R1065J', 'R1385E',
       'R1189M', 'R1390M', 'R1391T', 'R1401J', 'R1361C', 'R1060M',
       'R1350D', 'R1378T', 'R1375C', 'R1383J', 'R1354E', 'R1292E'],
      dtype=object)

### Load data from an example subject
Here, let's go through an example of loading experimental events and EEG from one subject

In [4]:
df = cml.get_data_index()

#Specify which subject and experiment we want
sub = 'R1111M'
exp = 'FR1'

#Find out the sessions for this subject
sessions = list(df[(df['subject']==sub) & (df['experiment']==exp)]['session'])


In [5]:
print(sub+' sessions: '+str(sessions))

R1111M sessions: [0, 1, 2, 3]


#### Load experimental events

This subject completed four sessions of FR1. Let's load data from the first session. First, we'll need to select out the dataframes we want, then pass one row into the CMLLoad object we created.

In [8]:
df_select = df[(df['subject']==sub) & (df['experiment']==exp)]
df_sess = df_select.iloc[0]
print(df_sess)

Recognition                                                          NaN
all_events             protocols/r1/subjects/R1111M/experiments/FR1/s...
contacts               protocols/r1/subjects/R1111M/localizations/0/m...
experiment                                                           FR1
import_type                                                        build
localization                                                           0
math_events            protocols/r1/subjects/R1111M/experiments/FR1/s...
montage                                                                0
original_experiment                                                  NaN
original_session                                                       0
pairs                  protocols/r1/subjects/R1111M/localizations/0/m...
ps4_events                                                           NaN
session                                                                0
subject                                            

In [9]:
# Initialize data reader using session metadata
reader = cml.CMLReader(subject=df_sess['subject'], experiment=df_sess['experiment'], session=df_sess['session'],
                       localization=df_sess['localization'], montage=df_sess['montage'])

# For first session...
evs = reader.load("events")
print(evs[100:103])

     eegoffset  answer                    eegfile exp_version experiment  \
100     230205    -999  R1111M_FR1_0_22Jan16_1638        1.05        FR1   
101     231472    -999  R1111M_FR1_0_22Jan16_1638        1.05        FR1   
102     232280    -999  R1111M_FR1_0_22Jan16_1638        1.05        FR1   

     intrusion  is_stim  iscorrect item_name  item_num  ...  protocol  \
100       -999        0       -999      DOLL        78  ...        r1   
101       -999        0       -999       BED        18  ...        r1   
102       -999        0       -999         X      -999  ...        r1   

     recalled  rectime  serialpos session  stim_list  stim_params  subject  \
100         0     -999         11       0          0           []   R1111M   
101         0     -999         12       0          0           []   R1111M   
102         0     -999       -999       0          0           []   R1111M   

          test            type  
100  [0, 0, 0]            WORD  
101  [0, 0, 0]         

The events dataframe contains information about everything that happened during an experimental session. It indicates the time at which every word appeared on the screen, and when those words were later recalled. It also contains information about events that you might not care about, such as when the countdown timer starts and ends.
<center>
<img src="https://github.com/esolomon/PythonBootcamp2019/blob/master/figures/task_design-01.jpg?raw=true" width=650>
</center>
Let's take a look at all the columns in this dataframe.

In [10]:
evs.columns

Index(['eegoffset', 'answer', 'eegfile', 'exp_version', 'experiment',
       'intrusion', 'is_stim', 'iscorrect', 'item_name', 'item_num', 'list',
       'montage', 'msoffset', 'mstime', 'protocol', 'recalled', 'rectime',
       'serialpos', 'session', 'stim_list', 'stim_params', 'subject', 'test',
       'type'],
      dtype='object')

* 'eegoffset' indicates where (in samples) in the EEG file this event occurred. CMLReaders needs this info, but usually you won't need to deal with it directly.
* 'eegfile' is the path to the corresponding file where raw EEG is saved.
* 'experiment' is the behavioral task we're looking at. 
* 'intrusion' is an indicator of intrusion events during the recall period. -1 indicates an extra-list intrusion, otherwise, it's the list number from which the word came.
* 'is_stim' flags whether stimulation occurred during this event. We won't be dealing with stimulation data in this bootcamp. 
* <b>'item_name'</b> is the word that was presented or recalled.
* 'item_num' is the index for this word in the word pool. 
* 'list' is the list number. 
* 'montage' is the subject montage, which you loaded earlier.
* 'mstime' is a time indicator, in ms. Good for comparing between events, but the absolute value is meaningless. 
* <b>'recalled'</b> is a indicator of whether an encoding word was later recalled successfully.
* 'rectime' is the time, in ms, when a word was recalled relative to the start of the recall period for that list.
* <b>'serialpos'</b> is the serial position of a presented/recalled word
* 'stim_list' is an indicator of whether stimulation was active during this list. 
* 'stim_params' is a dictionary of stimulation parameters.
* 'subject' is the subject you're analyzing!
* <b>'type'</b> is the type of event, e.g. 'WORD' or 'REC_WORD'

Please see https://pennmem.github.io/cmlreaders/html/events.html for even more information!

Say we're just interested in analyzing word encoding events. To filter by event type, use handy pandas functionality:

In [11]:
# An optional command that lets us view the full dataframe within Jupyter notebooks.
pd.set_option('display.max_columns', 100)

In [12]:
word_evs = evs[evs['type']=='WORD']
word_evs[:10]

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,list,montage,msoffset,mstime,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
27,100520,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,BEAR,17,1,0,1,1453499295325,r1,1,5210,1,0,0,[],R1111M,"[0, 0, 0]",WORD
28,101829,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,WING,294,1,0,1,1453499297942,r1,1,5748,2,0,0,[],R1111M,"[0, 0, 0]",WORD
29,103113,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,DOOR,79,1,0,1,1453499300510,r1,1,7882,3,0,0,[],R1111M,"[0, 0, 0]",WORD
30,104329,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,PLANT,188,1,0,1,1453499302943,r1,1,6815,4,0,0,[],R1111M,"[0, 0, 0]",WORD
31,105638,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,ROOT,204,1,0,1,1453499305561,r1,0,-999,5,0,0,[],R1111M,"[0, 0, 0]",WORD
32,106897,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,LEAF,146,1,0,1,1453499308078,r1,0,-999,6,0,0,[],R1111M,"[0, 0, 0]",WORD
33,108105,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,SNOW,236,1,0,1,1453499310495,r1,0,-999,7,0,0,[],R1111M,"[0, 0, 0]",WORD
34,109372,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,BLOOM,22,1,0,1,1453499313029,r1,0,-999,8,0,0,[],R1111M,"[0, 0, 0]",WORD
35,110580,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,STRAW,257,1,0,1,1453499315446,r1,0,-999,9,0,0,[],R1111M,"[0, 0, 0]",WORD
36,111813,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,BUSH,38,1,0,1,1453499317912,r1,0,-999,10,0,0,[],R1111M,"[0, 0, 0]",WORD


Applying these kinds of filters are useful if you're only interested in analyzing one kind of event. For instance, we could also just find recall events:

In [13]:
rec_evs = evs[evs['type']=='REC_WORD']
rec_evs[:10]

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,list,montage,msoffset,mstime,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
47,130521,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,BEAR,17,1,0,20,1453499355330,r1,1,5210,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
48,130790,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,WING,294,1,0,20,1453499355868,r1,1,5748,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
49,131324,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,PLANT,188,1,0,20,1453499356935,r1,1,6815,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
50,131857,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,DOOR,79,1,0,20,1453499358002,r1,1,7882,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
51,135459,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,TOY,277,1,0,20,1453499365206,r1,1,15086,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
78,180505,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,DEER,72,2,0,20,1453499455303,r1,1,2863,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
79,181022,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,MULE,163,2,0,20,1453499456338,r1,1,3898,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
80,181875,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,SLUSH,231,2,0,20,1453499458043,r1,1,5603,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
81,182389,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,PIPE,185,2,0,20,1453499459071,r1,1,6631,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
82,184110,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,SPRING,244,2,0,20,1453499462514,r1,1,10074,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD


**Exercise: What is R1111M's overall recall percent correct?**

**Exercise: Plot the distribution of inter-response times for R1111M's recalled words**

### Assignment 2

In this assignment you will analyze behavioral data from a big-data study of human memory, using examples from the FR1 dataset.  FR1 is a free recall experiment during which each session of 24 lists during which a series of 12 common words are presented for encoding.  For each list this is followed by a distractor and then a recall period during which participants attempt to recall as many of the words as they can in any order (free recall).  Chapter 1 of Electrophysiology of Human Memory summarizes the principles concerning recall that you will evaluate in this assignment.

1. Using the events structure for subject R1111M, filter out WORD type events for the first FR1 session, and use the 'recalled' status of each word presentation to create a serial position curve.  A serial position curve plot should show the serial position index of each word presentation's place in the list across the horizontal axis, and the probability of recall on the vertical axis, obtained from the fraction of words successfully recalled.  Hint: Looking at the evs.columns can help you figure out where the information is that you need.

2. The Lag Conditional Response Probability (Lag-CRP) is a calculation comparing the serial proximity of recalled words to the possible serial proximities.  See http://memory.psych.upenn.edu/CRP_Tutorial for an example of the concept.  Using the first FR1 session of R1292E, calculate and plot the Lag-CRP for the REC_WORD events.  Repeats and intrusions (words not from the list) can appear in the recall events, and must be dealt with.

3. If time permits, extend the serial position curve work of part 1, by putting your code to generate the serial position curve data into a function, and use this in a loop to calculate an average serial position curve across all FR1 sessions for all subjects in the example data set.  Then plot this average FR1 serial position curve.

4. If time permits, repeat step 3, but switch the experiment type to catFR1, a categorical free recall experiment in which adjacent pairs in the serial position curve are categorically associated words (e.g, 'dog' and 'cat').  Explain any differences you see in the average catFR1 serial position curve versus the FR1 one, and try to interpret this in terms of the categorical associations.
