# Asignment 1: Behavioral Analysis of Memory Search
Please submit this assignment to Canvas as a jupyter notebook (.ipynb).  In this assignment you are asked to complete a fundamental analysis in memory science: the Serial Position Curve.

## Question 1: Navigating the database

1) How many sessions of 'VFFR' were run?  How many subjects did at least 1 session of 'VFFR'?
2) How many sessions were run of  at Jefferson hospital (hint: last letter of the subject code "J")

In [49]:
# Question 1.1
### YOUR CODE HERE

In [50]:
# Question 1.2
### YOUR CODE HERE

## Question 2: Simple Behavioral Analyses

1) What is R1111M's overal recall percentage?  (In other words what percent of the words did R1111M successfully recall?)
2) Plot the distribution of inter-response times for R1111M's recalled words in a histogram.
* Note that an IRT is the time between successive recalls, as opposed to the time since the start of the recall period.

In [60]:
# Question 2.1
### YOUR CODE HERE

In [61]:
# Question 2.2
### YOUR CODE HERE

## Question 3: Serial Position Curve
A serial position curve (SPC) plots the recall probability (vertical axis) as a function of serial position (horizontal axis), as shown in figure 6.1 of FHM.  We define recall probability as the fraction of words successfully recalled.  As with any analyses, there are numerous ways to calculate the SPC, but a good hint is to look at the "recalled" status of each presented word.

1) Write a function to calculate the recall probabilities at each serial position for 1 session.  Then calculate the SPC for subject LTP093, session 0 and plot the results.
2) Write a function to calculate the SPC for all sessions for 1 subject.  Then plot the SPC averaged over all of LTP093's sessions.
3) Write a function that loops over selected subjects from LTPFR2 and computes a between-subject averaged serial position curve.  This is done by averaging the subject-level SPCs across all subjects.  Comment on the final SPC, what effects does it show?

If you are smart about it, you should be able to use code from earlier parts of the problem to make later parts much easier!

A few notes:
* Generally, each experimental subject will have different characteristics. A subject-level average provides an estimate of the population mean, i.e. the mean of the population from which these subjects were sampled from (which in the case of LTPFR2, was primarily Penn undergraduates).
* Loading these data may take a substantial amount of time. You may want to write your code and then test it over a smaller number of subjects (e.g., 2 subjects). You may also want to load all the data once and then save it out locally as a .csv file (see the pandas.DataFrame.to_csv() function in the pandas documentation). You will then be able to load all of these events from this .csv file much more quickly (see the pandas.read_csv() function).
    * Again, some subjects did not complete all sessions. To reduce the variability in your estimates, include only the subjects that completed all 24 sessions. Additionally, drop the 24th session (which has session number 23 since the sessions are zero-indexed) which used a different variant of the experiment.
    
* Be sure that your code is robust to exceptions that can be thrown by e.g. unavailable data, and if you have an exception in a particular session due to a problem with the data in that session, print or log the event so you can report it, and discard that session from the analysis (you can use Python try-except blocks to catch and handle exceptions). This data was all acquired with human participants across long stretches of time, and small things can go wrong.

In [62]:
# analyze only these subjects
ltpFR2_subs = ['LTP093', 'LTP106', 'LTP115', 'LTP117', 'LTP122', 'LTP123', 'LTP133', 'LTP138', 'LTP187', 'LTP207', 
               'LTP210', 'LTP228', 'LTP229', 'LTP236', 'LTP242', 'LTP246', 'LTP249', 'LTP250', 'LTP251', 'LTP258', 
               'LTP259', 'LTP260', 'LTP265', 'LTP269', 'LTP273', 'LTP274', 'LTP278', 'LTP279', 'LTP280', 'LTP281']

In [63]:
# Question 3.1
### YOUR CODE HERE

In [64]:
# Question 3.2
### YOUR CODE HERE

In [65]:
# Question 3.3
### YOUR CODE HERE

## Question 4: Immediate Free Recall v. Delayed Free Recall

In this question, we will use the ltpFR experiment to compare the SPCs for immediate free recall and delayed free recall (description of the experiment found under phase 2: https://memory.psych.upenn.edu/PEERS).  For each subject, use sessions 7-13 and compare lists with between-words 'distractor' = 0 and end-of-list 'final_distractor' = 0, 8000, and 16000 (ms).

1) Calculate between-subject averaged serial position curves for each of the 3 distractor conditions.  Plot the 3 SPCs on the same graph and label each curve in a legend.
2) Comment on the differences in the SPCs that result from the manipulation of the distractor.  What changes between immediate free recall and delayed free recall?

In [1]:
import pandas as pd; pd.set_option('display.max_columns', None)
import cmlreaders as cml

# let's take a look at the events for an ltpFR session
df = cml.get_data_index()
ltpFR_df = df.query("experiment == 'ltpFR'")
df_sess = ltpFR_df[(ltpFR_df['subject'] == 'LTP063') & (ltpFR_df['session'] == 8)]
reader = cml.CMLReader(subject = df_sess.iloc[0].subject, experiment = df_sess.iloc[0].experiment, session = df_sess.iloc[0].session)
evs = reader.load('events')
evs

Unnamed: 0,eegoffset,answer,case,color_b,color_g,color_r,distractor,eegfile,eogArtifact,experiment,final_distractor,final_math_correct,finalrecalled,font,intruded,intrusion,iscorrect,item_name,item_num,list,listtype,math_correct,montage,msoffset,mstime,phase,protocol,recalled,recog_conf,recog_resp,recog_rt,recognized,rectime,rej_time,rejected,resp,rt,serialpos,session,studytrial,subject,task,test,trial,type
0,121697,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,,-999,-999,-999,-999,0,0,1301341737850,,ltp,0,-999,0,-999,0,-999,-999,0,-999,652,-999,8,-999,LTP063,-999,"[0, 0, 0]",-999,SESS_START
1,134751,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,-999,,-999,-1,-999,-999,0,0,1301341763956,,ltp,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[-999, -999, -999]",-999,START
2,134763,15,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,1,,-999,-1,-999,-999,0,1,1301341763981,,ltp,-999,-999,-999,-999,-999,3271,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[6, 1, 8]",-999,PROB
3,136415,7,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,1,,-999,-1,-999,-999,0,1,1301341767285,,ltp,-999,-999,-999,-999,-999,1943,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[2, 4, 1]",-999,PROB
4,137402,13,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,-999,,-999,-999,1,,-999,-1,-999,-999,0,1,1301341769259,,ltp,-999,-999,-999,-999,-999,1793,-999,-999,-999,-999,-999,8,-999,LTP063,-999,"[3, 3, 7]",-999,PROB
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1592,2256554,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,TAPE,1433,-999,2,-999,0,20,1301346007507,,ltp,0,5,-999,-999,0,56836,-999,0,0,1084,7,8,2,LTP063,0,"[0, 0, 0]",12,RECOG_CONF
1593,2257258,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,WEB,1598,-999,-999,-999,0,0,1301346008915,,ltp,0,4,0,652,0,-999,-999,0,-999,-999,-999,8,-999,LTP063,-999,"[0, 0, 0]",12,RECOG_LURE
1594,2257584,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,WEB,1598,-999,-999,-999,0,20,1301346009567,,ltp,0,-999,0,-999,0,58896,-999,0,-999,-999,-999,8,-999,LTP063,-999,"[0, 0, 0]",12,RECOG_RESP
1595,2257950,-999,,-999,-999.0,-999.0,-999,/protocols/ltp/subjects/LTP063/experiments/ltp...,-1,ltpFR,-999,-999,0,,0,-999,-999,WEB,1598,-999,-999,-999,0,20,1301346010300,,ltp,0,4,-999,-999,0,59629,-999,0,-999,-999,-999,8,-999,LTP063,-999,"[0, 0, 0]",12,RECOG_CONF


For each session, we want to find lists where the 'distractor' column equals 0 and the 'final_distractor' column equals 0 (immediate free recall), 8000, or 16000 (delayed free recall).

In [69]:
# Question 4.1
### YOUR CODE HERE

Question 4.2

**YOUR ANSWER HERE**