# Modeling Repetitions
> Do CMR and ICMR track the consequences of item repetition for free recall differently?

There are some signs that InstanceCMR and CMR track the consequences of item repetitions for free recall differently. Here, using the dataset associated with a 2014 paper focused on spacing and repetition effects in free recall, 

> Lohnas, L. J., & Kahana, M. J. (2014). A retrieved context account of spacing and repetition effects in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(3), 755.

We'll investigate these possible differences and develop an account of them. Before I get into the datasets, I'll try to test my intuitions about how repetitions drive learning between the two models w/ some toy simulations plotting how recall probabilities change as items are repeatedly encoded. Once that's cleared up, I'll do the fitting and assess outcomes.

## Premise

There seems to be substantive differences in the way CMR and ICMR apply a sensitivity parameter to nonlinearly increase the gap between highly supported and less supported recall outcomes. InstanceCMR, like MINERVA 2, has the option to apply its sensitivity scaling operation (i.e. an exponent to each value) to activations of individual traces - that is, before integration into a unitary vector tracking retrieval support. CMR, on the other hand, has no access to any trace activations and thus applies its sensitity scaling to the integrated result. 

The latter operation is hypothesized to more strongly differentiate strongly from weakly supported items than the former. Suppose a constant sensitivity parameter $\tau$ and that two distinct experiences each contributed a support of $c$ for a given feature unit in the current recall. Under trace-based sensitivity scaling, the retrieval support for that feature unit would be $c^{\tau} + c^{\tau}$. But under echo-based sensitity scaling, support would be ${(c + c)}^{\tau}$, a much larger quantity. 

When feature representations associated with each presentation encoded into models are all orthogonal, this distinction doesn't emerge. Since each trace corresponds to activation of a unique feature unit, no equivalent of the $c^{\tau} + c^{\tau}$ operation from the above example ever emerges. But experiments where items are presented repeatedly offer an opportunity to explore this distinction. It could be that parameter fitting avoids any of the main consequences for this mechanism difference, much as InstanceCMR and CMR learn similar retrieval dynamics despite implementing distinct associative networks with distinct learning algorithms. And alternatively it could be that more subtle manipulations, such as the use of partially overlapping rather than identical-but-otherwise-orthogonal feature representations, could explore this mechanistic distinction more thoughtfully. We'll see!

## Initial Simulations

A straightforward way to initiate analysis of how the models might differentially handle item repetitions is to measure how the probability of recalling an item given a static contextual cue changes as the item is repeatedly encoded. We expect it to increase linearly (by $x^{\tau}$) with respect to `InstanceCMR`, but exponentially within `CMR`. If this reasoning survives scrutiny, that's a big deal.

Parameters, contextual cue, and repeated item shouldn't matter here. We'll try to set them to values that aren't distracting and vary them to ensure our simulation results aren't contingent on any particular configuration.

In [None]:
#%env NUMBA_DISABLE_JIT 1

from instance_cmr.models import *
from instance_cmr.model_analysis import *
import numpy as np
import matplotlib.pyplot as plt

# selected parameters
parameters = {
    'item_count': 20,
    'presentation_count': 70,
    'encoding_drift_rate': .6,
    'start_drift_rate': .7,
    'recall_drift_rate': .6,
    'shared_support': 0.01,
    'item_support': 1.0,
    'learning_rate': .3,
    'primacy_scale': 1,
    'primacy_decay': 1,
    'stop_probability_scale': 0.01,
    'stop_probability_growth': 0.3,
    'choice_sensitivity': 2
}

experiment_count = 1000

### InstanceCMR

In [None]:
results = np.zeros((experiment_count, 1+parameters['presentation_count']-parameters['item_count']))

for experiment in range(experiment_count):
    # arbitrary item and contextual cue
    repeated_item = np.random.randint(parameters['item_count'])
    cue = np.concatenate((np.zeros(parameters['item_count']+1), np.random.rand(parameters['item_count']+1)))

    # initialize model
    model = InstanceCMR(**parameters)
    model.experience(np.eye(parameters['item_count'], parameters['item_count'] + 1, 1))
    results[experiment, 0] = model.outcome_probabilities(cue)[repeated_item+1]
    
    # track outcome probability of selected item as it is repeatedly encoded
    for i in range(parameters['presentation_count']-parameters['item_count']):
        
        model.experience(np.eye(parameters['item_count'], parameters['item_count'] + 1, 1)[repeated_item:repeated_item+1])
        
        # cue = np.concatenate((np.zeros(parameters['item_count']+1), model.context)) # for when i want context to be the cue
        results[experiment, i+1] = model.outcome_probabilities(cue)[repeated_item+1]

# plot an example trial    
#print('repeated item:\n', repeated_item, np.eye(parameters['item_count'], parameters['item_count'] + 1, 1)[repeated_item:repeated_item+1])
#print('cue:\n', cue)
#print('outcome probability for repeated item:\n', results[-1])
#plot_states(model.memory, "memory")

plt.plot(np.mean(results, axis=0))
plt.title('InstanceCMR Average Increase in Recall Probability of Repeated Item')
plt.show()

### CMR

In [None]:
results = np.zeros((experiment_count, 1+parameters['presentation_count']-parameters['item_count']))

for experiment in range(experiment_count):
    # arbitrary item and contextual cue
    repeated_item = np.random.randint(parameters['item_count'])
    cue = np.random.rand(parameters['item_count'] + 1)

    # initialize model
    model = CMR(**parameters)
    model.experience(np.eye(parameters['item_count'], parameters['item_count']))
    results[experiment, 0] = model.outcome_probabilities(cue)[repeated_item+1]
    
    # track outcome probability of selected item as it is repeatedly encoded
    for i in range(parameters['presentation_count']-parameters['item_count']):
        
        model.experience(np.eye(parameters['item_count'], parameters['item_count'])[repeated_item:repeated_item+1])
        
        #cue = model.context # for when i want context to be the cue
        results[experiment, i+1] = model.outcome_probabilities(cue)[repeated_item+1]
        
# plot an example trial    
#print('repeated item:\n', repeated_item, np.eye(parameters['item_count'], parameters['item_count'])[repeated_item:repeated_item+1])
#print('cue:\n', cue)
#print('outcome probability for repeated item:\n', results[-1])
#plot_states(model.mcf, "memory")

import matplotlib.pyplot as plt

plt.plot(np.mean(results, axis=0))
plt.title('CMR Average Increase in Recall Probability of Repeated Item')
plt.show()

The models seem to differ in the manner predicted. But whether this difference matters when it comes to fitting parameters to items only repeated a few times is ambiguous.

## The Dataset

Across 4 sessions, 35 subjects performed delayed free recall of 48 lists. Subjects were University of Pennsylvania undergraduates, graduates and staff, age 18-32. List items were drawn from a pool of 1638 words taken from the University of South Florida free association norms (Nelson, McEvoy, & Schreiber, 2004; Steyvers, Shiffrin, & Nelson, 2004, available at http://memory.psych.upenn.edu/files/wordpools/PEERS_wordpool.zip). Within each session, words were drawn without replacement. Words could repeat across sessions so long as they did not repeat in two successive sessions. Words were also selected to ensure that no strong semantic associates co-occurred in a given list (i.e., the semantic relatedness between any two words on a given list, as determined using WAS (Steyvers et al., 2004), did not exceed a threshold value of 0.55).

Subjects encountered four different types of lists: 
1. Control lists that contained all once-presented items;  
2. pure massed lists containing all twice-presented items; 
3. pure spaced lists consisting of items presented twice at lags 1-8, where lag is defined as the number of intervening items between a repeated item's presentations; 
4. mixed lists consisting of once presented, massed and spaced items. Within each session, subjects encountered three lists of each of these four types. 

In each list there were 40 presentation positions, such that in the control lists each position was occupied by a unique list item, and in the pure massed and pure spaced lists, 20 unique words were presented twice to occupy the 40 positions. In the mixed lists 28 once-presented and six twice-presented words occupied the 40 positions. In the pure spaced lists, spacings of repeated items were chosen so that each of the lags 1-8 occurred with equal probability. In the mixed lists, massed repetitions (lag=0) and spaced repetitions (lags 1-8) were chosen such that each of the 9 lags of 0-8 were used exactly twice within each session. The order of presentation for the different list types was randomized within each session. For the first session, the first four lists were chosen so that each list type was presented exactly once. An experimenter sat in with the subject for these first four lists, though no subject had difficulty understanding the task.

The data for this experiment is stored in `data/repFR.mat`. We define a unique `prepare_repetition_data` function to build structures from the dataset that works with our existing data analysis and fitting functions.

Like in `prepare_murd_data`, we need list lengths, a data frame for visualizations with psifir, and a trials array encoding recall events as sequences of presentation positions. But we'll also need an additional array tracking presentation order, too.

In [None]:
# cell to temporarily disable jit while debugging
#%env NUMBA_DISABLE_JIT 1

import scipy.io as sio
import numpy as np
import pandas as pd
from psifr import fr

def prepare_repetition_data(path):
    
    # load all the data
    matfile = sio.loadmat(path, squeeze_me=True)['data'].item()
    subjects = matfile[0]
    pres_itemnos = matfile[4]
    recalls = matfile[6]
    list_types = matfile[7]
    list_length = matfile[12]
    
    # convert pres_itemnos into rows of unique indices for easier model encoding
    presentations = []
    for i in range(len(pres_itemnos)):
        seen = []
        presentations.append([])
        for p in pres_itemnos[i]:
            if p not in seen:
                seen.append(p)
            presentations[-1].append(seen.index(p))
    presentations = np.array(presentations)

    # discard intrusions from recalls
    trials = []
    for i in range(len(recalls)):
        trials.append([])
        
        trial = list(recalls[i])
        for t in trial:
            if (t > 0) and (t not in trials[-1]):
                trials[-1].append(t)
        
        while len(trials[-1]) < list_length:
            trials[-1].append(0)
            
    trials = np.array(trials)
    
    # encode dataset into psifr format
    data = []
    for trial_index, trial in enumerate(trials):
        presentation = presentations[trial_index]
        
        # every time the subject changes, reset list_index
        if not data or data[-1][0] != subjects[trial_index]:
            list_index = 0
        list_index += 1
        
        # add study events
        for presentation_index, presentation_event in enumerate(presentation):
            data += [[subjects[trial_index], 
                      list_index, 'study', presentation_index+1, presentation_event]]
            
        # add recall events
        for recall_index, recall_event in enumerate(trial):
            if recall_event != 0:
                data += [[subjects[trial_index], list_index, 
                          'recall', recall_index+1, presentation[recall_event-1]]]
                
    data = pd.DataFrame(data, columns=[
        'subject', 'list', 'trial_type', 'position', 'item'])
    merged = fr.merge_free_recall(data)
    
    return trials, merged, list_length, presentations, list_types, data

In [None]:
trials, events, list_length, presentations, list_types, rep_data = prepare_repetition_data('../data/repFR.mat')

events.head()

Unnamed: 0,subject,list,item,input,output,study,recall,repeat,intrusion
0,1,1,0,1,1.0,True,True,0,False
1,1,1,1,2,2.0,True,True,0,False
2,1,1,2,3,3.0,True,True,0,False
3,1,1,3,4,4.0,True,True,0,False
4,1,1,4,5,5.0,True,True,0,False


In [None]:
(presentations[list_types==4] == presentations[list_types==4][0]).all()

False

In [None]:
np.unique(presentations[list_types==3], axis=0)

array([[ 0,  1,  0, ..., 17, 18, 19],
       [ 0,  1,  0, ..., 17, 18, 19],
       [ 0,  1,  0, ..., 17, 18, 19],
       ...,
       [ 0,  1,  2, ..., 16, 19, 17],
       [ 0,  1,  2, ..., 15, 17, 16],
       [ 0,  1,  2, ..., 16, 19, 18]])

In [None]:
presentations[list_types==3][0]

array([ 0,  1,  2,  3,  4,  5,  4,  6,  1,  0,  7,  2,  3,  7,  5,  6,  8,
        9, 10, 11, 10, 12, 13, 14,  9,  8, 15, 12, 11, 13, 15, 16, 14, 17,
       18, 19, 16, 19, 18, 17])

In [None]:
presentations[list_types==3][1]

array([ 0,  1,  2,  0,  1,  3,  4,  5,  6,  7,  3,  2,  4,  8,  6,  5,  7,
        9,  8, 10, 11, 12, 11, 10, 13,  9, 12, 14, 13, 15, 16, 14, 15, 16,
       17, 18, 19, 17, 18, 19])

## Configuring the Parameter Search
We'll construct unique likelihood functions for repetition datasets since they require additionally specifying the sequence of item presentations in each trial.

### InstanceCMR

In [None]:
import numpy as np
from numba import njit, prange
from numba.typed import List
from instance_cmr.models import *
from instance_cmr.model_analysis import *
import numpy as np
import matplotlib.pyplot as plt

@njit(fastmath=True, nogil=True, parallel=True)
def icmr_rep_likelihood(
        trials, presentations, list_length, encoding_drift_rate, start_drift_rate, 
        recall_drift_rate, shared_support, item_support, learning_rate, 
        primacy_scale, primacy_decay, stop_probability_scale, 
        stop_probability_growth, choice_sensitivity):
    """
    Generalized cost function for fitting the InstanceCMR model optimized 
    using the numba library.
    
    Output scales inversely with the likelihood that the model and specified 
    parameters would generate the specified trials. For model fitting, is 
    usually wrapped in another function that fixes and frees parameters for 
    optimization.

    **Arguments**:
    - data_to_fit: typed list of int64-arrays where rows identify a unique 
        trial of responses and columns corresponds to a unique recall index.  
    - A configuration for each parameter of `InstanceCMR` as delineated in 
        `Formal Specification`.

    **Returns** the negative sum of log-likelihoods across specified trials 
    conditional on the specified parameters and the mechanisms of InstanceCMR.
    """

    likelihood = np.ones((len(trials), list_length))

    for trial_index in prange(len(trials)):

        item_count = np.max(presentations[trial_index])+1
        items = np.eye(item_count, item_count + 1, 1)
        
        model = InstanceCMR(
            item_count, list_length, encoding_drift_rate, start_drift_rate, 
            recall_drift_rate, shared_support, item_support, learning_rate, 
            primacy_scale, primacy_decay, stop_probability_scale, 
            stop_probability_growth, choice_sensitivity)
        
        model.experience(items[presentations[trial_index]])
        trial = trials[trial_index]

        model.force_recall()
        for recall_index in range(len(trial) + 1):

            # identify index of item recalled; if zero then recall is over
            if recall_index == len(trial) and len(trial) < item_count:
                recall = 0
            elif trial[recall_index] == 0:
                recall = 0
            else:
                recall = presentations[trial_index][trial[recall_index]-1] + 1

            # store probability of and simulate recalling item with this index
            activation_cue = np.hstack((np.zeros(model.item_count + 1), model.context))
            likelihood[trial_index, recall_index] = \
                model.outcome_probabilities(activation_cue)[recall]

            if recall == 0:
                break
            model.force_recall(recall)

        # reset model to its pre-retrieval (but post-encoding) state
        model.force_recall(0)

    return -np.sum(np.log(likelihood))

def icmr_rep_objective_function(data_to_fit, presentations, list_length, fixed_parameters, free_parameters):
    """
    Generates and returns an objective function for input to support search 
    through parameter space for ICMR model fit using an optimization function.

    Arguments:  
    - fixed_parameters: dictionary mapping parameter names to values they'll 
        be fixed to during search, overloaded by free_parameters if overlap  
    - free_parameters: list of strings naming parameters for fit during search  
    - data_to_fit: array where rows identify a unique trial of responses and 
        columns corresponds to a unique recall index

    Returns a function that accepts a vector x specifying arbitrary values for 
    free parameters and returns evaluation of icmr_likelihood using the model 
    class, all parameters, and provided data.
    """
    return lambda x: icmr_rep_likelihood(data_to_fit, presentations, list_length, **{**fixed_parameters, **{
        free_parameters[i]:x[i] for i in range(len(x))}})

In [None]:

lb = np.finfo(float).eps
hand_fit_parameters = {
    'encoding_drift_rate': .8,
    'start_drift_rate': .7,
    'recall_drift_rate': .8,
    'shared_support': 0.01,
    'item_support': 1.0,
    'learning_rate': .3,
    'primacy_scale': 1,
    'primacy_decay': 1,
    'stop_probability_scale': 0.01,
    'stop_probability_growth': 0.3,
    'choice_sensitivity': 2
}
icmr_rep_likelihood(trials[:80], presentations[:80], list_length, **hand_fit_parameters)

4443.69335527225

4443.693355272251

In [None]:
%%timeit
icmr_rep_likelihood(trials[:160], presentations[:160], list_length, **hand_fit_parameters)

In [None]:
from scipy.optimize import differential_evolution
import numpy as np

free_parameters = [
    'encoding_drift_rate',
    'start_drift_rate',
    'recall_drift_rate',
    'shared_support',
    'item_support',
    'learning_rate',
    'primacy_scale',
    'primacy_decay',
    'stop_probability_scale',
    'stop_probability_growth',
    'choice_sensitivity']

lb = np.finfo(float).eps
ub = 1-np.finfo(float).eps

bounds = [
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, 100),
    (lb, 100),
    (lb, ub),
    (lb, 10),
    (lb, 10)
]

# cost function to be minimized
# ours scales inversely with the probability that the data could have been 
# generated using the specified parameters and our model
cost_function = icmr_rep_objective_function(trials, presentations, list_length, {}, free_parameters)

result = differential_evolution(cost_function, bounds, disp=True)
print(result)

Sensitivity scaling after integration:
```
fun: 3619.5817057890545
     jac: array([ 0.03046807, -0.09708856,  0.08521965, 13.9653821 ,  0.5622951 ,
       -0.04638423,  1.07802408,  0.        ,  2.44758667,  0.78562152,
       -0.30640876])
 message: 'Optimization terminated successfully.'
    nfev: 9549
     nit: 48
 success: True
       x: array([8.87278300e-01, 9.13467071e-01, 9.85035079e-01, 1.20111257e-03,
       7.69001004e-01, 2.06474640e-01, 6.73100678e+00, 2.46920873e+01,
       9.43637973e-03, 1.72443952e-01, 8.40156503e-01])
```

And before:

```
     fun: 4332.436283221303
     jac: array([-4.87525540e+00,  2.36468623e-02, -2.38242136e+01, -6.44049578e+00,
       -1.98451744e-01,  1.97158885e+05,  1.86810196e-01,  2.94930942e+00,
       -1.06220796e+01, -2.01659532e+01,  5.40596372e+01])
 message: 'Optimization terminated successfully.'
    nfev: 5649
     nit: 20
 success: True
       x: array([1.38795654e-04, 2.07991977e-01, 7.48665206e-01, 5.56069504e-02,
       3.95373716e-01, 2.22044605e-16, 6.00784478e+01, 3.28006823e-01,
       1.00836725e-02, 1.68024644e-01, 7.52043690e-01])
```

In [None]:
assert(False)

### CMR

In [None]:
import numpy as np
from numba import njit
from numba.typed import List
from instance_cmr.models import *
from instance_cmr.model_analysis import *
import numpy as np
import matplotlib.pyplot as plt

#@njit(fastmath=True, nogil=True)
def cmr_rep_likelihood(
        trials, presentations, list_length, encoding_drift_rate, start_drift_rate, 
        recall_drift_rate, shared_support, item_support, learning_rate, 
        primacy_scale, primacy_decay, stop_probability_scale, 
        stop_probability_growth, choice_sensitivity):
    """
    Generalized cost function for fitting the InstanceCMR model optimized 
    using the numba library.
    
    Output scales inversely with the likelihood that the model and specified 
    parameters would generate the specified trials. For model fitting, is 
    usually wrapped in another function that fixes and frees parameters for 
    optimization.

    **Arguments**:
    - data_to_fit: typed list of int64-arrays where rows identify a unique 
        trial of responses and columns corresponds to a unique recall index.  
    - A configuration for each parameter of `InstanceCMR` as delineated in 
        `Formal Specification`.

    **Returns** the negative sum of log-likelihoods across specified trials 
    conditional on the specified parameters and the mechanisms of InstanceCMR.
    """

    likelihood = np.ones((len(trials), list_length))

    for trial_index in range(len(trials)):

        item_count = np.max(presentations[trial_index])+1
        items = np.eye(item_count, item_count)
        
        model = CMR(
            item_count, list_length, encoding_drift_rate, start_drift_rate, 
            recall_drift_rate, shared_support, item_support, learning_rate, 
            primacy_scale, primacy_decay, stop_probability_scale, 
            stop_probability_growth, choice_sensitivity)
        
        model.experience(items[presentations[trial_index]])
        trial = trials[trial_index]

        model.force_recall()
        for recall_index in range(len(trial) + 1):

            # identify index of item recalled; if zero then recall is over
            if recall_index == len(trial) and len(trial) < item_count:
                recall = 0
            elif trial[recall_index] == 0:
                recall = 0
            else:
                recall = presentations[trial_index][trial[recall_index]-1] + 1

            # store probability of and simulate recalling item with this index
            likelihood[trial_index, recall_index] = \
                model.outcome_probabilities(model.context)[recall]

            if recall == 0:
                break
            model.force_recall(recall)

        # reset model to its pre-retrieval (but post-encoding) state
        model.force_recall(0)

    return -np.sum(np.log(likelihood))

def cmr_rep_objective_function(data_to_fit, presentations, list_length, fixed_parameters, free_parameters):
    """
    Generates and returns an objective function for input to support search 
    through parameter space for ICMR model fit using an optimization function.

    Arguments:  
    - fixed_parameters: dictionary mapping parameter names to values they'll 
        be fixed to during search, overloaded by free_parameters if overlap  
    - free_parameters: list of strings naming parameters for fit during search  
    - data_to_fit: array where rows identify a unique trial of responses and 
        columns corresponds to a unique recall index

    Returns a function that accepts a vector x specifying arbitrary values for 
    free parameters and returns evaluation of icmr_likelihood using the model 
    class, all parameters, and provided data.
    """
    return lambda x: cmr_rep_likelihood(data_to_fit, presentations, list_length, **{**fixed_parameters, **{
        free_parameters[i]:x[i] for i in range(len(x))}})

In [None]:

lb = np.finfo(float).eps
hand_fit_parameters = {
    'encoding_drift_rate': .8,
    'start_drift_rate': .7,
    'recall_drift_rate': .8,
    'shared_support': 0.01,
    'item_support': 1.0,
    'learning_rate': .3,
    'primacy_scale': 1,
    'primacy_decay': 1,
    'stop_probability_scale': 0.01,
    'stop_probability_growth': 0.3,
    'choice_sensitivity': 2
}
cmr_rep_likelihood(trials[:80], presentations[:80], list_length, **hand_fit_parameters)

4443.693355272251

In [None]:
%%timeit
icmr_rep_likelihood(trials[:80], presentations[:80], list_length, **hand_fit_parameters)

In [None]:
from scipy.optimize import differential_evolution
import numpy as np

free_parameters = [
    'encoding_drift_rate',
    'start_drift_rate',
    'recall_drift_rate',
    'shared_support',
    'item_support',
    'learning_rate',
    'primacy_scale',
    'primacy_decay',
    'stop_probability_scale',
    'stop_probability_growth',
    'choice_sensitivity']

lb = np.finfo(float).eps
ub = 1-np.finfo(float).eps

bounds = [
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, 100),
    (lb, 100),
    (lb, ub),
    (lb, 10),
    (lb, 10)
]

# cost function to be minimized
# ours scales inversely with the probability that the data could have been 
# generated using the specified parameters and our model
cost_function = icmr_rep_objective_function(trials[:100], presentations[:100], list_length, {}, free_parameters)

result = differential_evolution(cost_function, bounds, disp=True)
print(result)

## Fit Visualization

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

def visualize_fit(model_class, parameters, data, rep_data, trials, presentations, list_length, items, data_query=None, experiment_count=100, savefig=False):
    """
    Apply organizational analyses to visually compare the behavior of the model with these parameters against
    specified dataset.
    """
    
    # generate simulation data from model
    for experiment in range(experiment_count):
        for trial_index in range(len(trials)):
            
            item_count = np.max(presentations[trial_index])+1
            model = model_class(**{item_count, **parameters})
            model.experience(items[presentations[trial_index]])

            sim = []        
            for experiment in range(experiment_count):
                sim += [[experiment*10000+trial_index, 0, 'recall', i + 1, o] for i, o in enumerate(model.free_recall())]

    sim = pd.DataFrame(sim, columns=['subject', 'list', 'trial_type', 'position', 'item'])
    sim = pd.concat(sim, rep_data[rep_data['trial_type']=='study'].query(data_query).reset_index())
    sim_data = fr.merge_free_recall(sim)
    
    # generate simulation-based spc, pnr, lag_crp
    sim_spc = fr.spc(sim_data).reset_index()
    sim_pfr = fr.pnr(sim_data).query('output <= 1') .reset_index()
    sim_lag_crp = fr.lag_crp(sim_data).reset_index()
    
    # generate data-based spc, pnr, lag_crp
    data_spc = fr.spc(data).query(data_query).reset_index()
    data_pfr = fr.pnr(data).query('output <= 1').query(data_query).reset_index()
    data_lag_crp = fr.lag_crp(data).query(data_query).reset_index()
    
    # combine representations
    data_spc['Source'] = 'Data'
    sim_spc['Source'] = model_class.__name__
    combined_spc = pd.concat([data_spc, sim_spc], axis=0)
    
    data_pfr['Source'] = 'Data'
    sim_pfr['Source'] = model_class.__name__
    combined_pfr = pd.concat([data_pfr, sim_pfr], axis=0)
    
    data_lag_crp['Source'] = 'Data'
    sim_lag_crp['Source'] = model_class.__name__
    combined_lag_crp = pd.concat([data_lag_crp, sim_lag_crp], axis=0)
    
    # generate plots of result
    # spc
    g = sns.FacetGrid(dropna=False, data=combined_spc)
    g.map_dataframe(sns.lineplot, x='input', y='recall', hue='Source')
    g.set_xlabels('Serial position')
    g.set_ylabels('Recall probability')
    plt.title('P(Recall) by Serial Position Curve')
    g.add_legend()
    g.set(ylim=(0, 1))
    if savefig:
        plt.savefig('figures/{}_fit_spc.jpeg'.format(model_class.__name__), bbox_inches='tight')
    else:
        plt.show()
    
    #pdf
    h = sns.FacetGrid(dropna=False, data=combined_pfr)
    h.map_dataframe(sns.lineplot, x='input', y='prob', hue='Source')
    h.set_xlabels('Serial position')
    h.set_ylabels('Probability of First Recall')
    plt.title('P(First Recall) by Serial Position')
    h.add_legend()
    h.set(ylim=(0, 1))
    if savefig:
        plt.savefig('figures/{}_fit_pfr.jpeg'.format(model_class.__name__), bbox_inches='tight')
    else:
        plt.show()
    
    # lag crp
    max_lag = 5
    filt_neg = f'{-max_lag} <= lag < 0'
    filt_pos = f'0 < lag <= {max_lag}'
    i = sns.FacetGrid(dropna=False, data=combined_lag_crp)
    i.map_dataframe(
        lambda data, **kws: sns.lineplot(data=data.query(filt_neg),
                                         x='lag', y='prob', hue='Source', **kws))
    i.map_dataframe(
        lambda data, **kws: sns.lineplot(data=data.query(filt_pos),
                                         x='lag', y='prob', hue='Source', **kws))
    i.set_xlabels('Lag')
    i.set_ylabels('Recall Probability')
    plt.title('Recall Probability by Item Lag')
    i.add_legend()
    i.set(ylim=(0, 1))
    if savefig:
        plt.savefig('figures/{}_fit_crp.jpeg'.format(model_class.__name__), bbox_inches='tight')
    else:
        plt.show()

In [None]:
visualize_fit()