# Asignment 1: Behavioral Analysis of Memory Search
Please submit this assignment to Canvas as a jupyter notebook (.ipynb).  The assignment will introduce you to the data and some techniques we will be using for analyses.  At the end, you will be asked to complete a fundamental analysis in memory science: the Serial Position Curve.

## Explore the RAM database
The Computational Memory Lab's database of intracranial and scalp EEG uses pandas dataframes to format data.  We will start by loading the dataframe for all experimental sessions.

In [46]:
# imports
import pandas as pd; pd.set_option('display.max_columns', None)
import cmlreaders as cml
import numpy as np

# load dataframe of all sessions
df = cml.get_data_index()
df[:10]                     # show the first 10 entries

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
0,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,0,,,0,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
1,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,1,,,1,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
2,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,10,,,10,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
3,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,11,,,11,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
4,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,12,,,12,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
5,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,13,,,13,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
6,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,14,,,14,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
7,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,15,,,15,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
8,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,16,,,16,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
9,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,17,,,17,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...


All the pertinent data about each experimental session is recorded in a row of the dataframe.  For our purposes, we will be most interested in the "subject", "experiment", and "session" columns.  We will also use the 'localization' and 'montage' columns for intracranial data -- more on that to come.

In [47]:
# let's select a row of the dataframe and extract import info
row = df.iloc[500]          # random row
display(df[500:501]); print(f'Subject = {row.subject}, experiment = {row.experiment}, session = {row.session}')

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
500,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,20,,,20,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...


Subject = LTP093, experiment = ltpFR2, session = 20


In [48]:
# let's see what experiments we have access to
df['experiment'].unique()

array(['ltpFR', 'ltpFR2', 'VFFR', 'ltpRepFR', 'NiclsCourierClosedLoop',
       'NiclsCourierReadOnly', 'ltpDelayRepFRReadOnly', 'ltpDBOY1',
       'prelim', 'EFRCourierOpenLoop', 'EFRCourierReadOnly', 'FR1', 'FR2',
       'PAL1', 'YC1', 'PAL2', 'catFR1', 'YC2', 'catFR2', 'PS1', 'PS3',
       'PS2', 'TH1', 'FR3', 'PS2.1', 'PAL3', 'TH3', 'OPS', 'RepFR1',
       'catFR3', 'FR5', 'PS4_catFR', 'THR', 'PS4_FR', 'PAL5', 'THR1',
       'catFR5', 'PS4_catFR5', 'FR6', 'PS5_catFR', 'catFR6', 'TICL_FR',
       'LocationSearch', 'TICL_catFR', 'DBOY1', 'CatFR6', 'RepFR2',
       'ICatFR1', 'pyFR'], dtype=object)

## Question 1: Navigating the database

1) How many sessions of 'VFFR' were run?  How many subjects did at least 1 session of 'VFFR'?
2) How many sessions were run of  at Jefferson hospital (hint: last letter of the subject code "J")

In [49]:
# Question 1.1
### YOUR CODE HERE

In [50]:
# Question 1.2
### YOUR CODE HERE

## Loading Behavioral Data
In order to load the data for a session, we need to utilize some functionality form the lab-developed "cmlreaders" package, which we imported above.  Essentially, we need to select a session, instantiate a "Reader" object, and then load the data we want.  For now, let's just worry about the behavioral data, or "events", but later we will be interested in "contacts", "pairs", and the EEG data.

Our lab collects two types of data: scalp EEG and intracranial EEG.  For the purposes of this assignment, the only difference we are concerned with is the specifics of how we load each type of data.  So let's load a session of ltpFR2 (scalp) and a session of FR1 (intracranial).

In [51]:
# let's find subjects who did the ltpFR2 experiment
ltpFR2_df = df.query("experiment == 'ltpFR2'")
ltpFR2_df['subject'].unique()[:20]

array(['LTP093', 'LTP106', 'LTP115', 'LTP117', 'LTP122', 'LTP123',
       'LTP133', 'LTP138', 'LTP187', 'LTP207', 'LTP210', 'LTP228',
       'LTP229', 'LTP236', 'LTP242', 'LTP246', 'LTP249', 'LTP250',
       'LTP251', 'LTP258'], dtype=object)

In [52]:
# we'll pick LTP093 and select out this subject's ltpFR2 sessions
sub_scalp = 'LTP093'
exp_scalp = 'ltpFR2'
df_select = df[(df['subject'] == sub_scalp) & (df['experiment'] == exp_scalp)]
display(df_select); print(f'{sub_scalp} sessions: {np.array(df_select.session)}')

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
487,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,0,,,0,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
488,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,1,,,1,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
489,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,10,,,10,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
490,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,11,,,11,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
491,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,12,,,12,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
492,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,13,,,13,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
493,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,14,,,14,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
494,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,15,,,15,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
495,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,16,,,16,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...
496,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,17,,,17,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...


LTP093 sessions: [ 0  1 10 11 12 13 14 15 16 17 18 19  2 20 21 22 23  3  4  5  6  7  8  9]


In [53]:
# lets load the data from the first session
df_sess = df_select.iloc[0]         # select 1 row

# instantiate a Reader object using session metadata
# subjects beginning with 'LTP' are scalp subjects, so we don't need to specify the localization and montage
reader = cml.CMLReader(subject=df_sess['subject'], experiment=df_sess['experiment'], session=df_sess['session'])
# load the behavioral events
evs = reader.load('events')
evs

Unnamed: 0,eegoffset,answer,begin_distractor,begin_math_correct,eegfile,eogArtifact,experiment,final_distractor,final_math_correct,intruded,intrusion,iscorrect,item_name,item_num,list,montage,msoffset,mstime,phase,protocol,recalled,rectime,serialpos,session,subject,test,trial,type
0,208336,-999,-999,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,-999,-999,0,-999,-999,,-999,-999,0,0,1409670982007,,ltp,0,-999,-999,0,LTP093,"[0, 0, 0]",-999,SESS_START
1,249459,-999,-999,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,-999,-999,-999,-999,-999,,-999,-1,0,0,1409671064252,,ltp,-999,-999,-999,0,LTP093,"[-999, -999, -999]",-999,START
2,249475,24,-999,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,-999,-999,-999,-999,1,,-999,-1,0,1,1409671064284,,ltp,-999,5611,-999,0,LTP093,"[7, 8, 9]",-999,PROB
3,252312,20,-999,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,-999,-999,-999,-999,1,,-999,-1,0,1,1409671069958,,ltp,-999,2849,-999,0,LTP093,"[3, 8, 9]",-999,PROB
4,253768,11,-999,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,-999,-999,-999,-999,1,,-999,-1,0,1,1409671072871,,ltp,-999,2169,-999,0,LTP093,"[2, 6, 3]",-999,PROB
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1336,2788365,-999,24000,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,24000,-999,0,0,-999,STAPLE,1371,-999,0,20,1409676141988,,ltp,0,10186,15,0,LTP093,"[0, 0, 0]",24,REC_WORD
1337,2790926,-999,24000,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,24000,-999,0,0,-999,PATROL,1027,-999,0,20,1409676147110,,ltp,0,15308,18,0,LTP093,"[0, 0, 0]",24,REC_WORD
1338,2791475,-999,24000,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,24000,-999,0,0,-999,SERVANT,1271,-999,0,20,1409676148208,,ltp,0,16406,19,0,LTP093,"[0, 0, 0]",24,REC_WORD
1339,2798827,-999,24000,-999,/protocols/ltp/subjects/LTP093/experiments/ltp...,-1,ltpFR2,24000,-999,0,0,-999,CASHEW,253,-999,0,20,1409676162911,,ltp,0,31109,1,0,LTP093,"[0, 0, 0]",24,REC_WORD


And we have our behavioral events!  Now let's repeat the process for an intracranial session, noting the small difference in how we define our reader.

In [54]:
# let's find subjects who did the FR1 experiment
FR1_df = df.query("experiment == 'FR1'")
FR1_df['subject'].unique()[50:70]

array(['R1084T', 'R1086M', 'R1089P', 'R1092J', 'R1093J', 'R1094T',
       'R1096E', 'R1098D', 'R1100D', 'R1101T', 'R1102P', 'R1104D',
       'R1105E', 'R1106M', 'R1108J', 'R1111M', 'R1112M', 'R1113T',
       'R1114C', 'R1115T'], dtype=object)

In [55]:
# we'll pick R1111M, and select out this subject's FR1 sessions
sub = 'R1111M'
exp = 'FR1'
df_select = df[(df['subject'] == sub) & (df['experiment'] == exp)]
display(df_select); print(f'{sub} sessions: {np.array(df_select.session)}')

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
623,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,0,protocols/r1/subjects/R1111M/localizations/0/m...,,0,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...
624,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,1,protocols/r1/subjects/R1111M/localizations/0/m...,,1,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...
625,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,2,protocols/r1/subjects/R1111M/localizations/0/m...,,2,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...
626,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,3,protocols/r1/subjects/R1111M/localizations/0/m...,,3,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...


R1111M sessions: [0 1 2 3]


In [56]:
# lets load the data from the first session
df_sess = df_select.iloc[0]         # select 1 row

# instantiate a Reader object using session metadata
# subjects beginning with 'R' are intracranial subjects, so we must specify the localization and montage
reader = cml.CMLReader(subject=df_sess['subject'], experiment=df_sess['experiment'], session=df_sess['session'], 
                       localization=df_sess['localization'], montage=df_sess['montage'])
# load the behavioral events
evs = reader.load('events')
evs

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,list,montage,msoffset,mstime,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
0,23207,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,-999,0,0,1453499140690,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",SESS_START
1,32311,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,-999,0,0,1453499158898,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",COUNTDOWN_START
2,37524,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,-999,0,0,1453499169326,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",COUNTDOWN_END
3,39379,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,WATCH,-1,-1,0,1,1453499173035,r1,0,-999,0,0,0,[],R1111M,"[0, 0, 0]",PRACTICE_WORD
4,40579,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,RHINO,-1,-1,0,1,1453499175435,r1,0,-999,1,0,0,[],R1111M,"[0, 0, 0]",PRACTICE_WORD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
760,1435126,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,PEACH,176,24,0,20,1453501964686,r1,1,14747,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
761,1442760,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,24,0,1,1453501979955,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",REC_END
762,1583983,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,25,0,0,1453502262417,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",TRIAL
763,1583983,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,25,0,0,1453502262417,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",COUNTDOWN_START


Before digging into the data, a little information about the experiment.

In free recall experiments, subjects memorize a series of word lists during an experimental session. Each list consists of an encoding period, during which the words (or "items") in the list are presented to the subject one by one, followed by a retrieval period during which the subject recalls as many words as they can in any order. The experiment is called "free" recall because the subjects are "free" to recall the items in any order; this in contrast to a serial recall experiment in which subjects must recall the items in the order they were presented.

Some free recall experiments contain a "distractor" period between the end of the encoding period and the beginning of the retrieval period. The purpose of the distractor is to "clear out" subjects' minds before starting to recall items; without the distractor, subjects are far more likely to recall items from the end of the list in what is known as the recency effect. In the experiments described here, the distractor period consists of a series of arithmetic problems of the form 'A + B + C', to which subjects respond by typing the answer. The FR1 experiment also begins each list by presenting the subject with a countdown period (10, 9, 8, ...).

The events dataframe contains information about everything that happened during an experimental session. It indicates the time at which every word appeared on the screen, and when those words were later recalled. It also contains information about events that you might not care about, such as when the countdown timer starts and ends.
<center>
<img src="https://github.com/esolomon/PythonBootcamp2019/blob/master/figures/task_design-01.jpg?raw=true" width=650>
</center>
Let's take a look at all the columns in this dataframe...

In [57]:
evs.columns

Index(['eegoffset', 'answer', 'eegfile', 'exp_version', 'experiment',
       'intrusion', 'is_stim', 'iscorrect', 'item_name', 'item_num', 'list',
       'montage', 'msoffset', 'mstime', 'protocol', 'recalled', 'rectime',
       'serialpos', 'session', 'stim_list', 'stim_params', 'subject', 'test',
       'type'],
      dtype='object')

... and here is what some of the important ones mean. 
* 'eegoffset' indicates where (in samples) in the EEG file this event occurred. CMLReaders needs this info, but usually you won't need to deal with it directly.
* 'answer' is the participants response to a math distractor problem.
* 'experiment' is the behavioral task we're looking at. 
* 'intrusion' is an indicator of intrusion events during the recall period. -1 indicates an extra-list intrusion, otherwise, number of lists prior from which the word came.
* 'item_name' is the word that was presented or recalled.
* **'list' is the list number (in the LTPFR2 scalp EEG dataset list number is instead contained in the 'trial' column).** 
* 'mstime' is a time indicator, in ms. Good for comparing between events, but the absolute value is meaningless. 
* 'recalled' is a indicator of whether an encoding word was later recalled successfully.
* 'rectime' is the time, in ms, when a word was recalled relative to the start of the recall period for that list.
* 'serialpos' is the serial position of a presented/recalled word
* 'session' is the session number the data came from
* 'subject' is the subject you're analyzing!
* 'type' is the type of event, e.g. 'WORD' or 'REC_WORD'

Please see https://pennmem.github.io/cmlreaders/html/events.html for more information!

Two types of events that we are often interesting in analyzing are encoding ("WORD") events and recall ("REC_WORD") events.

In [58]:
# let's filter for encoding events
word_evs = evs[evs['type'] == 'WORD']
word_evs.head()

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,list,montage,msoffset,mstime,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
27,100520,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,BEAR,17,1,0,1,1453499295325,r1,1,5210,1,0,0,[],R1111M,"[0, 0, 0]",WORD
28,101829,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,WING,294,1,0,1,1453499297942,r1,1,5748,2,0,0,[],R1111M,"[0, 0, 0]",WORD
29,103113,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,DOOR,79,1,0,1,1453499300510,r1,1,7882,3,0,0,[],R1111M,"[0, 0, 0]",WORD
30,104329,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,PLANT,188,1,0,1,1453499302943,r1,1,6815,4,0,0,[],R1111M,"[0, 0, 0]",WORD
31,105638,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,ROOT,204,1,0,1,1453499305561,r1,0,-999,5,0,0,[],R1111M,"[0, 0, 0]",WORD


In [59]:
# as well as recall events
rec_evs = evs[evs['type'] == 'REC_WORD']
rec_evs.head()

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,list,montage,msoffset,mstime,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
47,130521,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,BEAR,17,1,0,20,1453499355330,r1,1,5210,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
48,130790,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,WING,294,1,0,20,1453499355868,r1,1,5748,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
49,131324,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,PLANT,188,1,0,20,1453499356935,r1,1,6815,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
50,131857,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,DOOR,79,1,0,20,1453499358002,r1,1,7882,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
51,135459,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,TOY,277,1,0,20,1453499365206,r1,1,15086,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD


## Question 2: Simple Behavioral Analyses

1) What is R111M's overal recall percentage?  (In other words what percent of the words did R1111M successfully recall?)
2) Plot the distribution of inter-response times for R111M's recalled words in a histogram.
* Note that an IRT is the time between successive recalls, as opposed to the time since the start of the recall period.

In [60]:
# Question 2.1
### YOUR CODE HERE

In [61]:
# Question 2.2
### YOUR CODE HERE

## Question 3: Serial Position Curve
A serial position curve (SPC) plots the recall probability (vertical axis) as a function of serial position (horizontal axis), as shown in figure 6.1 of FHM.  We define recall probability as the fraction of words successfully recalled.  As with any analyses, there are numerous ways to calculate the SPC, but a good hint is to look at the "recalled" status of each presented word.

1) Write a function to calculate the recall probabilities at each serial position for 1 session.  Then calculate the SPC for subject LTP093, session 0 and plot the results.
2) Write a function to calculate the SPC for all sessions for 1 subject.  Then plot the SPC averaged over all of LTP093's sessions.
3) Write a function that loops over selected subjects from LTPFR2 and computes a between-subject averaged serial position curve.  This is done by averaging the subject-level SPCs across all subjects.  Comment on the final SPC, what effects does it show?

If you are smart about it, you should be able to use code from earlier parts of the problem to make later parts much easier!

A few notes:
* Generally, each experimental subject will have different characteristics. A subject-level average provides an estimate of the population mean, i.e. the mean of the population from which these subjects were sampled from (which in the case of LTPFR2, was primarily Penn undergraduates).
* Loading these data may take a substantial amount of time. You may want to write your code and then test it over a smaller number of subjects (e.g., 2 subjects). You may also want to load all the data once and then save it out locally as a .csv file (see the pandas.DataFrame.to_csv() function in the pandas documentation). You will then be able to load all of these events from this .csv file much more quickly (see the pandas.read_csv() function).
    * Again, some subjects did not complete all sessions. To reduce the variability in your estimates, include only the subjects that completed all 24 sessions. Additionally, drop the 24th session (which has session number 23 since the sessions are zero-indexed) which used a different variant of the experiment.
    
* Be sure that your code is robust to exceptions that can be thrown by e.g. unavailable data, and if you have an exception in a particular session due to a problem with the data in that session, print or log the event so you can report it, and discard that session from the analysis (you can use Python try-except blocks to catch and handle exceptions). This data was all acquired with human participants across long stretches of time, and small things can go wrong.

In [62]:
# analyze only these subjects
ltpFR2_subs = ['LTP093', 'LTP106', 'LTP115', 'LTP117', 'LTP122', 'LTP123', 'LTP133', 'LTP138', 'LTP187', 'LTP207', 
               'LTP210', 'LTP228', 'LTP229', 'LTP236', 'LTP242', 'LTP246', 'LTP249', 'LTP250', 'LTP251', 'LTP258', 
               'LTP259', 'LTP260', 'LTP265', 'LTP269', 'LTP273', 'LTP274', 'LTP278', 'LTP279', 'LTP280', 'LTP281']

In [63]:
# Question 3.1
### YOUR CODE HERE

In [64]:
# Question 3.2
### YOUR CODE HERE

In [65]:
# Question 3.3
### YOUR CODE HERE

## Question 4: Immediate Free Recall v. Delayed Free Recall

In this question, we will use the ltpFR experiment to compare the SPCs for immediate free recall and delayed free recall (description of the experiment found under phase 2: https://memory.psych.upenn.edu/PEERS).  For each subject, use sessions 7-13 and compare lists with between-words 'distractor' = 0 and end-of-list 'final_distractor' = 0, 8000, and 16000 (ms).

1) Calculate between-subject averaged serial position curves for each of the 3 distractor conditions.  Plot the 3 SPCs on the same graph and label each curve in a legend.
2) Comment on the differences in the SPCs that result from the manipulation of the distractor.  What changes between immediate free recall and delayed free recall?

In [67]:
# let's take a look at the events for an ltpFR session
ltpFR_df = df.query("experiment == 'ltpFR'")
df_sess = ltpFR_df[(ltpFR_df['subject'] == 'LTP063') & (ltpFR_df['session'] == 8)]
reader = cml.CMLReader(subject = df_sess.iloc[0].subject, experiment = df_sess.iloc[0].experiment, session = df_sess.iloc[0].session)
evs = reader.load('events')
evs

Unnamed: 0,eegoffset,answer,case,color_b,color_g,color_r,distractor,eegfile,eogArtifact,experiment,final_distractor,final_math_correct,finalrecalled,font,intruded,intrusion,iscorrect,item_name,item_num,list,listtype,math_correct,montage,msoffset,mstime,phase,protocol,recalled,recog_conf,recog_resp,recog_rt,recognized,rectime,rej_time,rejected,resp,rt,serialpos,session,studytrial,subject,task,test,trial,type
0,121697,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,,-999,-999,-999,-999,0,0,1301341737850,,ltp,0,-999,0,-999,0,-999,-999,0,-999,652,-999,8,-999,LTP063,-999,"[0, 0, 0]",-999,SESS_START
1,134751,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,-999,,-999,-1,-999,-999,0,0,1301341763956,,ltp,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[-999, -999, -999]",-999,START
2,134763,15,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,1,,-999,-1,-999,-999,0,1,1301341763981,,ltp,-999,-999,-999,-999,-999,3271,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[6, 1, 8]",-999,PROB
3,136415,7,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,1,,-999,-1,-999,-999,0,1,1301341767285,,ltp,-999,-999,-999,-999,-999,1943,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[2, 4, 1]",-999,PROB
4,137402,13,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,1,,-999,-1,-999,-999,0,1,1301341769259,,ltp,-999,-999,-999,-999,-999,1793,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[3, 3, 7]",-999,PROB
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1592,2256554,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,TAPE,1433,-999,2,-999,0,20,1301346007507,,ltp,0,5,-999,-999,0,56836,-999,0,0,1084,7,8,2,LTP063,0,"[0, 0, 0]",12,RECOG_CONF
1593,2257258,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,WEB,1598,-999,-999,-999,0,0,1301346008915,,ltp,0,4,0,652,0,-999,-999,0,-999,-999,-999,8,-999,LTP063,-999,"[0, 0, 0]",12,RECOG_LURE
1594,2257584,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,WEB,1598,-999,-999,-999,0,20,1301346009567,,ltp,0,-999,0,-999,0,58896,-999,0,-999,-999,-999,8,-999,LTP063,-999,"[0, 0, 0]",12,RECOG_RESP
1595,2257950,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,WEB,1598,-999,-999,-999,0,20,1301346010300,,ltp,0,4,-999,-999,0,59629,-999,0,-999,-999,-999,8,-999,LTP063,-999,"[0, 0, 0]",12,RECOG_CONF


For each session, we want to find lists where the 'distractor' column equals 0 and the 'final_distractor' column equals 0 (immediate free recall), 8000, or 16000 (delayed free recall).

In [69]:
# Question 4.1
### YOUR CODE HERE

Question 4.2

**YOUR ANSWER HERE**