# Asignment 1A: Behavioral Analysis of Memory Search
Please submit this assignment to Canvas as a jupyter notebook (.ipynb).  The assignment will introduce you to the data and some techniques we will be using for analyses.  Please double click on this cell and edit it to include your name.
## Name:

## Explore the RAM database
The Computational Memory Lab's database of intracranial and scalp EEG uses pandas dataframes to format data.  We will start by loading the dataframe for all experimental sessions.

In [1]:
# imports
import pandas as pd
import cmlreaders as cml
import numpy as np

# load dataframe of all sessions
df = cml.get_data_index()
df[:10]                     # show the first 10 entries

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
0,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,0,,,0,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
1,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,1,,,1,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
2,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,10,,,10,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
3,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,11,,,11,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
4,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,12,,,12,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
5,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,13,,,13,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
6,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,14,,,14,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
7,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,15,,,15,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
8,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,16,,,16,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...
9,,protocols/ltp/subjects/LTP063/experiments/ltpF...,,ltpFR,build,0,protocols/ltp/subjects/LTP063/experiments/ltpF...,0,,17,,,17,LTP063,LTP063,,protocols/ltp/subjects/LTP063/experiments/ltpF...


All the pertinent data about each experimental session is recorded in a row of the dataframe.  For our purposes, we will be most interested in the "subject", "experiment", and "session" columns.  We will also use the 'localization' and 'montage' columns for intracranial data -- more on that to come.

In [2]:
# let's select a row of the dataframe and extract import info
row = df.iloc[500]          # random row
display(df[500:501]); print(f'Subject = {row.subject}, experiment = {row.experiment}, session = {row.session}')

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
500,,protocols/ltp/subjects/LTP093/experiments/ltpF...,,ltpFR2,build,0,protocols/ltp/subjects/LTP093/experiments/ltpF...,0,,20,,,20,LTP093,LTP093,,protocols/ltp/subjects/LTP093/experiments/ltpF...


Subject = LTP093, experiment = ltpFR2, session = 20


In [3]:
# let's see what experiments we have access to
df['experiment'].unique()

array(['ltpFR', 'ltpFR2', 'VFFR', 'ltpRepFR', 'NiclsCourierClosedLoop',
       'NiclsCourierReadOnly', 'ltpDelayRepFRReadOnly', 'ltpDBOY1',
       'prelim', 'EFRCourierOpenLoop', 'EFRCourierReadOnly', 'FR1', 'FR2',
       'PAL1', 'YC1', 'PAL2', 'catFR1', 'YC2', 'catFR2', 'PS1', 'PS3',
       'PS2', 'TH1', 'FR3', 'PS2.1', 'PAL3', 'TH3', 'OPS', 'RepFR1',
       'catFR3', 'FR5', 'PS4_catFR', 'THR', 'PS4_FR', 'PAL5', 'THR1',
       'catFR5', 'PS4_catFR5', 'FR6', 'PS5_catFR', 'catFR6', 'TICL_FR',
       'LocationSearch', 'TICL_catFR', 'DBOY1', 'CatFR6', 'RepFR2',
       'ICatFR1', 'pyFR'], dtype=object)

## Question 1: Navigating the database

1) How many sessions of 'VFFR' were run?  How many subjects did at least 1 session of 'VFFR'?
2) How many sessions were run of subjects at Jefferson hospital (hint: last letter of the subject code "J")

In [6]:
# Question 1.1
### YOUR CODE HERE

In [7]:
# Question 1.2
### YOUR CODE HERE

## Loading Behavioral Data
In order to load the data for a session, we need to utilize some functionality form the lab-developed "cmlreaders" package, which we imported above.  Essentially, we need to select a session, instantiate a "Reader" object, and then load the data we want.  For know, let's just worry about the behavioral data, or "events", but later we will be interested in "contacts", "pairs", and the EEG data.

In [8]:
# let's find subjects who did the FR1 experiment
FR1_df = df.query("experiment == 'FR1'")
FR1_df['subject'].unique()[50:70]

array(['R1084T', 'R1086M', 'R1089P', 'R1092J', 'R1093J', 'R1094T',
       'R1096E', 'R1098D', 'R1100D', 'R1101T', 'R1102P', 'R1104D',
       'R1105E', 'R1106M', 'R1108J', 'R1111M', 'R1112M', 'R1113T',
       'R1114C', 'R1115T'], dtype=object)

In [9]:
# we'll pick R1111M, and select out this subject's FR1 sessions
sub = 'R1111M'
exp = 'FR1'
df_select = df[(df['subject'] == sub) & (df['experiment'] == exp)]
display(df_select); print(f'{sub} sessions: {np.array(df_select.session)}')

Unnamed: 0,Recognition,all_events,contacts,experiment,import_type,localization,math_events,montage,original_experiment,original_session,pairs,ps4_events,session,subject,subject_alias,system_version,task_events
623,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,0,protocols/r1/subjects/R1111M/localizations/0/m...,,0,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...
624,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,1,protocols/r1/subjects/R1111M/localizations/0/m...,,1,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...
625,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,2,protocols/r1/subjects/R1111M/localizations/0/m...,,2,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...
626,,protocols/r1/subjects/R1111M/experiments/FR1/s...,protocols/r1/subjects/R1111M/localizations/0/m...,FR1,build,0,protocols/r1/subjects/R1111M/experiments/FR1/s...,0,,3,protocols/r1/subjects/R1111M/localizations/0/m...,,3,R1111M,R1111M,,protocols/r1/subjects/R1111M/experiments/FR1/s...


R1111M sessions: [0 1 2 3]


In [10]:
# lets load the data from the first session
df_sess = df_select.iloc[0]         # select 1 row

# instantiate a Reader object using session metadata
# subjects beginning with 'R' are intracranial subjects, so we must specify the localization and montage
reader = cml.CMLReader(subject=df_sess['subject'], experiment=df_sess['experiment'], session=df_sess['session'], 
                       localization=df_sess['localization'], montage=df_sess['montage'])
# load the behavioral events
evs = reader.load('events')
evs

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,...,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
0,23207,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,...,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",SESS_START
1,32311,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,...,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",COUNTDOWN_START
2,37524,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,...,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",COUNTDOWN_END
3,39379,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,WATCH,-1,...,r1,0,-999,0,0,0,[],R1111M,"[0, 0, 0]",PRACTICE_WORD
4,40579,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,RHINO,-1,...,r1,0,-999,1,0,0,[],R1111M,"[0, 0, 0]",PRACTICE_WORD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
760,1435126,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,PEACH,176,...,r1,1,14747,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
761,1442760,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,...,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",REC_END
762,1583983,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,...,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",TRIAL
763,1583983,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,X,-999,...,r1,0,-999,-999,0,0,[],R1111M,"[0, 0, 0]",COUNTDOWN_START


Before digging into the data, a little information about the experiment.

In free recall experiments, subjects memorize a series of word lists during an experimental session. Each list consists of an encoding period, during which the words (or "items") in the list are presented to the subject one by one, followed by a retrieval period during which the subject recalls as many words as they can in any order. The experiment is called "free" recall because the subjects are "free" to recall the items in any order; this in contrast to a serial recall experiment in which subjects must recall the items in the order they were presented.

Some free recall experiments contain a "distractor" period between the end of the encoding period and the beginning of the retrieval period. The purpose of the distractor is to "clear out" subjects' minds before starting to recall items; without the distractor, subjects are far more likely to recall items from the end of the list in what is known as the recency effect. In the experiments described here, the distractor period consists of a series of arithmetic problems of the form 'A + B + C', to which subjects respond by typing the answer. The FR1 experiment also begins each list by presenting the subject with a countdown period (10, 9, 8, ...).

The events dataframe contains information about everything that happened during an experimental session. It indicates the time at which every word appeared on the screen, and when those words were later recalled. It also contains information about events that you might not care about, such as when the countdown timer starts and ends.
<center>
<img src="https://github.com/esolomon/PythonBootcamp2019/blob/master/figures/task_design-01.jpg?raw=true" width=650>
</center>
Let's take a look at all the columns in this dataframe...

In [11]:
evs.columns

Index(['eegoffset', 'answer', 'eegfile', 'exp_version', 'experiment',
       'intrusion', 'is_stim', 'iscorrect', 'item_name', 'item_num', 'list',
       'montage', 'msoffset', 'mstime', 'protocol', 'recalled', 'rectime',
       'serialpos', 'session', 'stim_list', 'stim_params', 'subject', 'test',
       'type'],
      dtype='object')

... and here is what some of the important ones mean. 
* 'eegoffset' indicates where (in samples) in the EEG file this event occurred. CMLReaders needs this info, but usually you won't need to deal with it directly.
* 'answer' is the participants response to a math distractor problem.
* 'experiment' is the behavioral task we're looking at. 
* 'intrusion' is an indicator of intrusion events during the recall period. -1 indicates an extra-list intrusion, otherwise, number of lists prior from which the word came.
* 'item_name' is the word that was presented or recalled.
* 'list' is the list number (in the LTPFR2 scalp EEG dataset list number is instead contained in the 'trial' column). 
* 'mstime' is a time indicator, in ms. Good for comparing between events, but the absolute value is meaningless. 
* 'recalled' is a indicator of whether an encoding word was later recalled successfully.
* 'rectime' is the time, in ms, when a word was recalled relative to the start of the recall period for that list.
* 'serialpos' is the serial position of a presented/recalled word
* 'session' is the session number the data came from
* 'subject' is the subject you're analyzing!
* 'type' is the type of event, e.g. 'WORD' or 'REC_WORD'

Please see https://pennmem.github.io/cmlreaders/html/events.html for more information!

Two types of events that we are often interesting in analyzing are encoding ("WORD") events and recall ("REC_WORD") events.

In [12]:
# let's filter for encoding events
word_evs = evs[evs['type'] == 'WORD']
word_evs.head()

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,...,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
27,100520,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,BEAR,17,...,r1,1,5210,1,0,0,[],R1111M,"[0, 0, 0]",WORD
28,101829,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,WING,294,...,r1,1,5748,2,0,0,[],R1111M,"[0, 0, 0]",WORD
29,103113,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,DOOR,79,...,r1,1,7882,3,0,0,[],R1111M,"[0, 0, 0]",WORD
30,104329,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,PLANT,188,...,r1,1,6815,4,0,0,[],R1111M,"[0, 0, 0]",WORD
31,105638,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,-999,0,-999,ROOT,204,...,r1,0,-999,5,0,0,[],R1111M,"[0, 0, 0]",WORD


In [13]:
# as well as recall events
rec_evs = evs[evs['type'] == 'REC_WORD']
rec_evs.head()

Unnamed: 0,eegoffset,answer,eegfile,exp_version,experiment,intrusion,is_stim,iscorrect,item_name,item_num,...,protocol,recalled,rectime,serialpos,session,stim_list,stim_params,subject,test,type
47,130521,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,BEAR,17,...,r1,1,5210,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
48,130790,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,WING,294,...,r1,1,5748,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
49,131324,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,PLANT,188,...,r1,1,6815,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
50,131857,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,DOOR,79,...,r1,1,7882,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD
51,135459,-999,R1111M_FR1_0_22Jan16_1638,1.05,FR1,0,0,-999,TOY,277,...,r1,1,15086,-999,0,0,[],R1111M,"[0, 0, 0]",REC_WORD


## Question 2: Simple Behavioral Analyses

1) What is R111M's overal recall percentage?  (In other words what percent of the words did R1111M successfully recall?)
2) Plot the distribution of inter-response times for R111M's recalled words in a histogram.

In [54]:
# Question 2.1
### YOUR CODE HERE

In [55]:
# Question 2.2
### YOUR CODE HERE