## What's At Stake When You Choose a Model Architecture?

<!--


Main ideas to highlight:
- Question: What's at stake when you're choosing between instance- and prototype-based architectures to model how humans do memory search
- To help find out, I developed an instance-based variant of an established prototype-based account of memory search called the context maintenance and retrieval model.
-->

What's at stake when you're choosing between instance- and prototype-based architectures to model how humans do memory search? To help find out, I developed an instance-based variant of an established prototype-based account of memory search called the context maintenance and retrieval model. I compared the variant and the original's capacity to account for human performance across various datasets using prediction-based model fitting and simulation of benchmark behavioral phenomena. Both variants performed similarly in my comparisons, demonstrating the architectural independence of the models' theoretical commitments and laying the groundwork for deeper integration or crossover between instance- and prototype-based modeling practices.

## Memory *associates* information based on a history of experiences

<!--
Slide visuals: 
- black box memory model

Main ideas to highlight:
- Memory *associates* information based on feature co-occurrence over a history of experiences
- Can be modeled with instance- or prototype-based frameworks
-->

To get into what I'm talking about, we can start with a simplified idea of memory as a system that associates cues with responses based on co-occurence of features over some history of experiences. Seeing a flower can remind you of details from other times you've seen that flower, such as in a vase or while walking outside. Being in this meeting might remind you of similar meetings you've had, and so on.

In the cognitive modeling literature, research often distinguishes between two basic frameworks summarizing how human memory systems pull this off -- between **instance** and **prototype** theories.

## Instance-Based Models

<!--
Slide visuals: 
- Each piece of an instance-based model (traces, probe, activations, echo)
- The nonlinear similarity equation!

Main ideas to highlight:
- Representation. Discrete traces stored for each experience.
- Probe. An activation generated for each stored trace based on similarity to the reminder.
- Echo. A blend of coactivated traces, weighted based on relevance to the probe.
-->

Instance-based accounts of memory conceptualize learning as growing a collection of distinct memory traces, each a record of a unique event or experience. A reminder retrieves associations by activating each stored instance in parallel based on similarity. Traces highly similar to your reminder are especially activated, while very *dis*similar traces see their activations suppressed, thereby prioritizing relevant information. The weighted sum of representations retrieved from all these traces shouting at the same time is an "echo" your memory system replies with. 

The way I'm describing instance models here closely coheres with the architecture established in the 80s by Hintzman to account for performance like tasks like item recognition and frequency judgments without explicit storage of prototypes. But instance-based models are diverse and have been applied to account for a ambitious range of phenomena. Works such as the Generalized Context and Exemplar-Based Random Walk models pervade accounts of category learning and are about as paradigmatic. A line of integrative works like Gordon's instance theory of attention and memory and more recent CRU model has even pushed the deceptively simple architecture to account for patterns and processes across many research domains at once, subsuming normally distinguished cognitive processes under a single umbrella.

## Prototype-based models

<!--
Slide visuals: 
- Each piece of an instance-based model (traces, probe, activations, echo)

Main ideas to highlight:
- Representations progressively update to reflect prototypical features common across past experience
- Example: weights in a simple neural network
-->

Rather than assuming we store each experience separately, prototype-based models assume experience updates memory representations to reflect prototypical features that are common across past experience. A simple and frequent example of what I'm talking about are the weights in linear associator network. In a linear associator, experiences activate units in an input layer to represent an array of features. These in turn pass activation to units in an output layer to form the memory system's responses - which we'll keep calling echoes for consistency. Weighted connections control the extent to which activation in an input unit drives activation in a given output unit, and get updated through some learning process with each new experience. For example, through the Hebbian learning rule, units that fire together, wire together, strengthening the connection between input and output units coactivated in a learning episode, thereby preserving a record of their correspondence during experience.

Beginning the rise of connectionist modeling beginning with work on perceptrons and parallel distributed processing theory by researchers like Rosenblott and McClelland, prototype-based models have become super pervasive, though the domain distribution seems to me to differ from that of instance-based models. Prototype-based accounts of categorization such as the Additive-Prototype model have been proposed, but leading accounts are mostly exemplar-based, and for good reasons. Models of free recall  along the tradition I'm familiar with though are largely though certainly not exclusively prototype-based, primarily representing encoded associations within accumulatory representations. Like instance-based models, they've also been applied to account for a range of behaviors, like recognition, emotional modulation, and financial decision-making, and undergone extensive iterative refinement.

## Model Architectures Trade Between Compression and Flexibility

<!--
Slide visuals: 
- Jamieson et al example + Homonym
- 

Main ideas to highlight:
- Flexible retrieval. Discrete traces keeps it easy to contact individuals without interference.
- Data Compression. Aggregated representations minimizes the costs of storage and retrieval.

-->

However, some fundamental differences between the frameworks though have brought them into conflict. For example, instance-based models have faced criticism from theorists like the memory search scientist Mike Kahana for their lack of data compression. The multitrace account implies that the number of traces can increase without bound, and that they are all contacted simultaneously, and both ideas are difficult to accept given the biological constraints that face cognition.

On the other hand and of more central focus for this project, this compressive aspect of prototype-based models has been criticized for collapsing the many contexts in which a item occurs to a single best-fitting representation. Researchers like Jamieson and Jones in 2018 have argued that this can constrain the models from accounting for the full flexibility humans exhibit in domains like semantic memory and categorization.

An example by those researchers compared the architectures' capacity to retrieve homonyms -- words that carry multiple meanings, like the way "break" can describe stopping a car, reporting a story in the news, or smashing a plate, depending on the context you use it in. They took a few popular prototype-based models of semantic meaning including LSA - or latent semantic analysis - and compared them against an instance-based model of semantic memory. They encoded homonyms with other words in an artificial language, weighting the distribution of co-occurrence frequencies so that a word like "break" more frequently corresponds with the "stop"-based meaning of the word instead of the "smash" or "news-report"-based meanings. Then they compound-cued retrieval of different senses of the homonym and compared the retrieved representations to synonyms to each relevant word sense. These comparisons and others found that prototype-based models like LSA could not flexibly retrieve the distinct senses of homonyms as well as the instance model, and observation the authors explained in terms of compression and related to similar debates in the category learning literature.

## Current Focus: Memory Search
<!--
Slide visuals: 
- Memory search schematic
- Free recall task

Main ideas to highlight:
- Flexible retrieval. Discrete traces keeps it easy to contact individuals without interference.
- Data Compression. Aggregated representations minimizes the costs of storage and retrieval.

-->

The significance of these differences have been examined in a few domains, but not our present focus -- memory SEARCH, a concept that extends our initial conceptualization of memory to include an iterative process where you might remember a piece of information using a probe, and then use the retrieved information to *update* your probe so you can access more information in memory. 

Much of what we know about memory search is based on performance on the free recall task, where participants are presented a sequence of items (usually words) and then prompted to recall as many list items as possible in whatever order they want. 

In the recall phase of this task, participants tend to exhibit a pattern called the temporal contiguity effect, where participants tend to transition between temporally contiguous items on the study list. Analyses like the lag-CRP plot in the bottom right of this slide are applied to showcase this pattern. This is a paradigmatic example, not tied to any particular dataset. For each successive recall a subject makes, there's a serial "lag" between that recall and the previous recall. The current item may have been studied at position 4 while the previous item may have been studied at position 5, making their "lag" 1. Researchers tabulate the frequency of transitions of each lag and plot the conditional probability of making a transition of a given lag across datasets. The high values within the red rectange of this plot where lags are close to zero illustrates the temporal contiguity effect.

In turn, this pattern reinforces our view that participants iteratively evolve how they probe their memory based on previous probing to search for new items to retrieve.

## --- 

To account for phenomena like this temporal contiguity effect and how people might perform memory search, the formal literature has largely converged on retrieved context theories of memory search that centers a evolving representation of temporal context as the dynamicaly updating probe driving the process. 

By this account, during encoding, items are associated with states of temporal context in memory and vice versa. A key result of this is that items with other items based on temporal contiguity. During recall, the current state of temporal context serves as the main memory probe and items are retrieved based on their contextual associations. The contextual probe is then updated based on the contextual associations of the retrieved item, thereby constructing a cue biased toward items temporally contiguous to the last recalled item, accounting for the temporal contiguity effect - and many other interesting memory phenomena. 

For this project, I took an established specification of this theory - the context maintenance and retrieval model - and compared prototype- and instance-based implementations of its mechanisms across various datasets.

## ---

The original implementation of CMR is prototype-based, encoding and retrieving item-to-context and context-to-item associations using a pair of simple linear associator memories called MFC and MCF, with F referring to item features and C to contextual features. Encoding item features retrieves contextual associations updates a contextual state. Coactivated contextual and item features update the weights of each network, establishing associations.

After exploring various possibilities, I found that an instance-based variant of this model based in the MINERVA 2 architecture I reviewed earlier can be readily implemented by storing coactive item and contextual features within a single memory trace, otherwise preserving model mechanisms. During retrieval, instead of using a separate memory to contact item-to-context or context-to-item associations, the content of the probe decides which associations are retrieved. A probe with item features constructs an echo aggregating traces with similar item features, thus pulling an contextual representation, and vice versa otherwise.

I do simplify the model a bit with this explanation, but the simplicity of this approach offers a surprisingly neat recipe for integrating RCT and account for temporal order effects into any pre-existing instance-based model: you sort of just have to track context across stimulus presentation and store its state along with a representation of a stimulus. Using and updating your contextual probe as specified by CMR is also key for model success, but yeah - it's a portable model.

## ---

Next though I outline how I know this approach actually works.

We apply an evaluation technique for free recall data surprisingly only recently introduced by Morton and Polyn in 2016. 

To simulate free recall, CMR generates a probability distribution for each item that still hasn't been recalled, given every simulated study and recall event that has happened so far in the trial. 

To evaluate a model, we can simulate and record the probability of each event in dataset, then pooling the probabilities into a log-likelihood score. 

Then using a genetic optimization algorithm called differential evolution, we can search for the model parameters that maximize the likelihood of our observed data. 

We can fit the model per participant to compare distributions of optimized log-likelihood scores between model variants. Alternatively, we can fit the models across an entire dataset and then produce a simulated dataset using model mechanisms, and compare important benchmark summary statistics between fitted models and the data. I do both!

## ---

First, I should clarify what I mean by benchmark summary statistics. These are the memory search phenomena that have received the most focus in the literature, rooted in the idea that free recall is composed by initiation, transitions, and then - though it's not getting much focus here - termination.

We've already talked about the lag-CRP and how it measures the temporal contiguity effect. 

##