# <center>An Independent Effect of Deficient Processing for List Memory?

### Background
The big idea of the deficient encoding hypothesis as an account of the spacing effect is that:
1. Item familiarity ("close acquaintance") impairs attention and memory of subsequent encoding.
2. Repeated experience of an item increases familiarity with that item
3. On the other hand, familiarity decreases with intervening experience, allowing better memory of repetitions
    
The MINERVA-DE model (Collins et al, 2020) implements a model with these mechanisms to account for various effects in recognition memory. In the model, a short-term memory store called primary memory is maintained with a long-term secondary memory. When a word is attended, its familiarity is computed relative to the items in primary memory. The more familiar a studied item is to existing items in primary memory, the less well it is encoded in secondary memory. This process is called discrepancy encoding, because encoding is strongest for items that are discrepant with information already stored in primary memory.

I was interested in motivating a similar mechanism within the framework of retrieved context theory.

- Review spacing effect
- Review retrieved context theory and work relating it to spacing effect
- Review deficient encoding account of effect
- State that an account of repetition effects integrating these frameworks has so far gone unexplored.
- Draw a diagram outlining what such an account might look like

### Hypothesis
If during a free recall task
1. Short-term familiarity mediates memory for item repetitions as outlined above,
2. Short-term familarity itself is mediated by a representation of temporal context reflecting "a recency-weighted average of information related to recently presented stimuli", and
3. Studying an item prompts retrieval and reinstatement of previous contextual associations

Then for an item $A$ originally presented at position $i$ during study, repetition of an item $B$ originally presented at position $i+1$ should enhance short-term familiarity with $A$ by prompting retrieval of shared contextual associations. This in turn, should drive an independent spacing effect between positions of the second presentation of $B$ and a later second presentation of $A$.

### ...But a Problem
This conceptualization effectively ties short-term familarity to contextual variability, making it impossible to tease apart the two accounts. Unfortunately, I tried testing this hypothesis as posed anyway!

In [None]:
import numpy as np
import pandas as pd
from repfr.datasets import prepare_repdata

trials, events, list_length, presentations, list_types, rep_data, subjects = prepare_repdata(
    '../data/repFR.mat')

events.head()

ModuleNotFoundError: No module named 'repfr'

### Approach

I searched for evidence of this pattern using data from Lohnas (2014), particularly in trials consisting of pure spaced lists with length 40, consisting of items presented twice at lags 1-8, where lag is defined as the number of intervening items between a repeated item's presentations. In the pure spaced lists, spacings of repeated items were chosen so that each of the lags 1-8 occurred with equal probability.

I identified all trials in the dataset with $A, B, ..., B, A$ subsequences, with no multiple presentations of a single item between $AB$ and $BA$. I also required at least one intervening item between presentations of $B$.

For comparison and to control for possible serial position or spacing effects, I matched identified trials to ones in the dataset where serial position, subject id, and lag between presentations of $A$ were identical, but a lag was present between second presentations of $B$ and of $A$: $A, B, ..., B, ..., A$. Where multiple matching trials could be found for a given $A, B, ..., B, A$, we enforced downstream comparison outcomes to reflect an weighted average of all of them.

### Results

### Matching Features
We define a new function that finds matching subsequences this way.

In [None]:
def match_features(target_group, list_type, startPos, subject, lagA):
    
    result = []
    for trial_index, sequence in enumerate(presentations):
    
        # test for matching list_type
        if list_types[trial_index] != list_type:
            continue
            
        # test for matching subject
        if subjects[trial_index] != subject:
            continue

        for item in np.unique(sequence):
            list_positions = np.where(sequence == item)[0]

            # no use considering items presented just once
            if (len(list_positions) == 1):
                continue

            # test for matching startPos
            if list_positions[0] != startPos:
                continue

            # test for matching lag
            lag = list_positions[1] - list_positions[0] - 1
            subsequence = sequence[list_positions[0]+1:list_positions[1]]
            if lag != lagA:
                continue
                
            # track and avoid sequences where # unique items != lag - 1
            multiply_presented_items = lag - len(
                np.unique(subsequence))
            if multiply_presented_items != 1:
                continue
            
            # if item at i+1 and j-1 are the same, that's target group
            if subsequence[0] == subsequence[-1]:
                group = 0
            else:
                group = 1
                
            if group != target_group:
                continue
                
            if group == 0:
                lagB, lagB1, lagB2 = -1, -1, -1
            else:
                # find positions of multiply repeated item within subsequence
                for candidate in subsequence:
                    sub_positions = np.where(subsequence == candidate)[0]
                    if len(sub_positions) != 1:
                        lagB = sub_positions[1] - sub_positions[0] - 1
                        lagB1 = sub_positions[0]
                        lagB2 = len(subsequence) - sub_positions[1]
                        break
            
            result.append(
                [trial_index, list_types[trial_index], subjects[trial_index], "Lag Between Second B and A", item, 
                 item+1 in trials[trial_index], list_positions[0], 
                 lag, lagB, lagB1, lagB2])
    
    return result

### Building a DataFrame
For every subsequence in group 0 that we can find matching subsequences for in other groups (should we care if it's in all other groups? Figure that out later), we'll sample from the retrieved subsequences 100 times to build a dataframe with our controls.

I wonder if I can type this stuff and get a result?

In [None]:
import random

result = []

for trial_index, sequence in enumerate(presentations):
    
    if list_types[trial_index] != 3:
        continue
    
    for item in np.unique(sequence):
        list_positions = np.where(sequence == item)[0]

        # no use considering items presented just once
        if (len(list_positions) == 1):
            continue
        
        # also don't consider sequences with lag below a minimum
        lag = list_positions[1] - list_positions[0] - 1
        subsequence = sequence[list_positions[0]+1:list_positions[1]]

        if lag < 3:
            continue
            
        # track and avoid sequences where # unique items < lag - 1
        multiply_presented_items = lag - len(
            np.unique(subsequence))
        if multiply_presented_items > 1:
            continue
            
        # if item at i+1 and j-1 aren't the same, that's not the target group
        if subsequence[0] != subsequence[-1]:
            continue
        
        # we'll never include this group in analyses of B lags
        lagB, lagB1, lagB2 = lag-2, 0, 0
        
        # look for matched trials in alternative group to include in dataframe
        matched_trials1 = match_features(
            1, 3, list_positions[0], subjects[trial_index], lag)
        
        # if we can't find matched trials, go next
        if len(matched_trials1) == 0:
            continue
            
        # otherwise weight downstream analyses appropriately
        for i in range(100):
            result.append(
                [trial_index, list_types[trial_index], subjects[trial_index], 
                 "No Lag Between Second B and A", item, item+1 in trials[trial_index], list_positions[0], 
                 lag, lagB, lagB1, lagB2])
            
        # build a balanced sample from matched groups for comparison
        if len(matched_trials1) == 0:
            pass
        elif len(matched_trials1) == 1:
            for i in range(100):
                result.append(matched_trials1[0])
        else:
            for i in range(100):
                result.append(random.choice(matched_trials1))
        
result = pd.DataFrame(
    result, columns=['trial', 'list_type', 'subject', 'group', 'item', 'recalled', 'startPos', 
                     'lagA', 'lagB', 'lagB1', 'lagB2'])
result.head()

In [None]:
result[result.group=="Lag Between Second B and A"].head()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

g = sns.FacetGrid(result, col="lagB2")
g.map_dataframe(sns.histplot, x="lagA", discrete=True, stat="probability")
g.set_axis_labels("Lag", "Proportion in Dataset")
plt.show()

alt_group = result[result.group == "Lag Between Second B and A"]
result.lagB2.corr(result.lagA)

### Comparison of Recall Probabilities

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

ax = sns.barplot(data=result, x="group", y="recalled", estimator=np.mean)

### Lag Distributions by Group

In [None]:
g = sns.FacetGrid(result, col="group")
g.map_dataframe(sns.histplot, x="lagA", discrete=True, stat="probability")
g.set_axis_labels("Lag Between Presentations of A", "Proportion in Dataset")
plt.show()

### Serial Position by Group

In [None]:
g = sns.FacetGrid(result, col="group")
g.map_dataframe(sns.histplot, x="startPos", discrete=True, stat='probability')
g.set_axis_labels("startPos", "Proportion in Dataset")
plt.show()

### Subject ID by Group

In [None]:
g = sns.FacetGrid(result, col="group")
g.map_dataframe(sns.histplot, x="subject", discrete=True, stat='probability')
g.set_axis_labels("Subject", "Proportion in Dataset")
plt.show()

### Correlation Analysis

In [None]:
alt_group = result[result.group == "Lag Between Second B and A"]
alt_group.lagB2.corr(alt_group.recalled)

The greater the lag between a second presentation of B and a second presentation of A, the greater the probability of recalling A. But there's a positive correlation between this second-presentation lag and the lag between presentations of A. So maybe that's just the regular old spacing effect. We aren't effectively controlling for subsequence length yet.

## Results

## <center>Results

### Discussion
Within CMR, a formal model implementing retrieved context theory, a learning rate scalar modulates how strongly items are encoded into memory. The deficient encoding hypothesis calls for a mechanism to be added to CMR that modulates the value of this scalar based on the familarity of the currently encoded item. A hypothetical CMR-DE would thus track for each encoding index a measure of each item's familiarity based on the current state of context. The learning rate for memories of the current item and its contextual associations would then vary inversely with its familiarity.

I attempted to test the deficient encoding hypothesis about repetition effects by focusing on the idea that familiarity might hinge on the contextual dynamics that organize memory according to CMR. Here, the main implication I focus on is the corollary hypothesis that familiarity-based memory impairment for a given item can be increased without necessarily experiencing that item. Instead, contextual states associated with the item from a previous presentation can be reinstated through repetition of items originally presented near said previous presentations that share many of item's contextual associations. 

I found the opposite of the relationship I was looking for, even after trying to control for relevant variables. Unfortunately, even if I did find the expected relationship, these analyses do not do anything to identify an unique role for the deficient processing account of item spacing that contextual variability or study-phase retrieval accounts can't fill. Worse, I think that as long as one conceptualizes deficient processing as something that depends on how retrievable an item is from context, or even just the serial position of the last time an item was encoded, then **in principle any effect on recall connected to deficient processing is just as explicable in terms of the contextual variability account of repetition effects**. So this may be a dead end.

### References
Collins, R. N., Milliken, B., & Jamieson, R. K. (2020). MINERVA-DE: An instance model of the deficient processing theory. Journal of Memory and Language, 115, 104151.

Siegel, L. L., & Kahana, M. J. (2014). A retrieved context account of spacing and repetition effects in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(3), 755.