# <center> What Comprehension Has to Do With Memory:<br>A First Look at the Landscape Model
    
<center> <img src="https://pbs.twimg.com/media/E4bioegWUAATLwP.jpg" width=70%> </center>

Models of semantic organization of free recall that we've explored (e.g., Morton & Polyn, 2016) have more or less supposed that semantic associations between items shaping recall are static across the course of list presentation. But when it comes to recall for narratives, written accounts of connected events/ideas, this assumption is straightforwardly violated. Theories of reading comprehension all but equate the process with the *evolution* of pre-existening semantic associations between story components into an integrated representation of the overall narrative. This basic idea - that alterations of item associations during encoding might not depend entirely on temporal features - defines a frontier for memory research that's gone largely unexplored by retrieved context theorists. 

To begin that exploration, we here explore what one comprehension model - the revised landscape model (LS-R) - can tell us about free recall that a retrieved context model like CMR cannot.

## Differences and Homologies Between Accounts of Word List and Narrative Memory

Retrieved context models like CMR and comprehension models like LS-R are both largely connectionist. The effect of encoding a new piece of information into memory also proceeds along two similar abstract steps:

1. Iterative updates to a slowly-drifting representation of recent experience based on a function of current experience and pre-existing memory associations
2. Corresponding updates to a long-term associative memory store based on item coactivations within the shorter-term recency-weighted representation 

The dynamics updating these dual representations differ, of course, and how they differ depends on the models under consideration. However, some key features of comprehension models that don't come up in retrieved context theory include:

### Convergence
How strongly a piece of information gets activated depends on support received from related information. In memory, this drives already highly connected idea units in a story toward even greater connectivity by the end of reading, and vice versa for more isolated units. Dynamics like these don't emerge in a model like CMR.

### Mapping
Concrete linguistic structures in a story can relate idea units without in memory based on properties aside from temporal contiguity or even semantic similarity. Different models characterize this mapping process in different ways or with different degrees of sophistication, but all emphasize the importance of this process.

### Coherence-Based Retrieval
Comprehension is considered roughly goal-directed, meaning-seeking, aiming at efficient integration of processed ideas into a coherent (logical/consistent) representation. The main consequence of this principle is that readers automatically resolve "cohesion gaps" in texts. When processes mapping between incoming and previously activated concepts fail, readers tend to nonetheless infer relations between text constituents automatically, inferring relationships based on prior knowledge to realize a fulyl connected narrative representation.

### Levels of comprehension
Some models distinguish between surface, text-base, and situation model levels of text understanding. Differences in processes for resolving coherent representations of a text at each level, mediated by reader goals and abilities, are thought to differentially influence final associations between idea units during recall. Other models avoid enforcing these distinctions, but tend to either be very flexible (effectively leaving the implications of these differences to be weighed on a case by case basis through parameter configuration) or very simple. The landscape model is like this!

## Where the Landscape Model Stands Among Other Models
We can think of comprehension models as *mainly* differing about two issues. 

### How Do Narratives Connect Ideas?
The first issue concerns the narrative features that tend to relate idea units together in memory. For example, the construction-integration model emphasizes argument overlap between units, while the causal networking model emphasizes cause-effect relations within a story. Other models like the structure-building and event-indexing model associate idea units into discrete groups based on transitions within a story as by scene, time, or perspective. 

The landscape model of reading comprehension is sort of agnostic about these matters. Researchers are provided broad latitude to specify how units are grouped into "cycle" entities that enforce unit coactivation during model simulation.

### Predictive and Bridging Inference
Different models specify different accounts of how readers 1) anticipate information yet to be revealed explicitly in the text (predictive inference) and 2) infer mediating information that specifies the conceptual relations between textual ideas (bridging inference). In general, models center an interaction between pre-existing semantic and ongoing text-specific knowledge  in accounts of these processes, but the details of these theories can result in very distinct predictions about the kinds of associations drawn between ideas.

While these inference-based associations are pertinent for free recall research as potential sources of response organization, comprehension researchers relate them to a broad variety of other phenomena, including reading times and the content and speed of answers to rich questions about story content and meaning. The breadth of phenomena these models try to account for presents rich opportunities for model convergence across research paradigms. Our analyses of response organization broaden this repertoire further!

The landscape model posits a spreading activation process to account for predictive processes during reading. When you process an idea unit in a story, its node within a idea network gets activated. Some of this activation is then passed to other idea units according to the strengths of its connections to them and modulated by a decay rate. Connection strengths themselves are initialized based on pre-experimental semantic similarity, but can change over the course of model simulation.

### The Math

To encode a text into the landscape model (LS-R), it is initially segmented into text units (e.g., words or propositions) and reading cycles (e.g., clauses or sentences) depending on researcher preference. These cycles are processed sequentially to simulate reading.  

The landscape model is more or less a spreading activation model of reading comprehension. Each relevant idea unit has an activation level (initialized at 0) and a configuration of connections to all other idea units. These connections are initialized based on cosine similarities of word vectors. For my work, I've been using a vector space based on the language model BERT called SentenceBERT.

As each reading cycle is simulated, unit activations are updated according to the following four mechanisms:
1. **Attention**. Units of the current cycle are activated to the highest allowed value.
2. **Recency**. Units from prior cycles carry residual activation following a decay rule.
3. **Spreading activation**. Text units receive activation via connections with text units activated in the current cycle according to a hyperbolic tangent function that effectively imposes a ceiling on the result:

<center> <img src="../img/Hyperbolic_Tangent.svg.png" width=50%> </center>

The math of this goes like:

$$
{A}_{i_c}={\displaystyle \sum_{j=1}^m\delta \cdot {A}_{j_{c-1}}\cdot \sigma \left({S}_{i{j}_{c-1}}\right)}
$$

$$
\sigma (x)=Tanh\left[3\left(X-1\right)\right]+1
$$

where $A$, $S$, $i$, and $c$ denote activation, connection strengths, the current unit, and the current cycle, respectively, while the parameter $\delta$ enforces a decay rate of unit activation, and $\sigma$ enforces a positive logarithmic change in the connection strength. 

4. **Memory capacity**. Also, a parameter sets a limit on the total amount of activation allowable within any reading cycle. Activations are reduced proportionately to attain the limit whenever it would otherwise be exceeded.

Model connection weights are then updated based on the unit coactivity according to a fairly traditional Hebbian process:

$$
{S}_{i{j}_c}={S}_{i{j}_{c-1}}+\lambda \cdot {A}_{i_c}\cdot {A}_{j_c}
$$

I draft an initial diagram of the process here:

<center> <img src="../img/landscape_model.svg"> </center>

## How Does the Landscape Model Relate to CMR?

There are striking similarities between the two models. The Landscape model's activation vector changes a lot like CMR's context vector. When an item (idea unit) is processed, pre-existing associations are retrieved and integrated into the vector. A decay mechanism ensures that past experiences have dwindling representation in the activation vector based on recency. Extra mechanisms ensure that the activations corresponding to any one item never exceed pre-defined thresholds. And then coactivation within this vector are used to updated a long-term associative memory store, modulated by some learning rate parameter! The math of how this happens are distinct, but the correspondences are uncanny at a high level.

But then again there are substantial differences between the two models that each might negotiate to account for a broader collection of phenomena. I think the core differences that give the landscape model an advantage over CMR can be narrowed down to two features:

### Unit co-processing within a cycle
In the landscape model, ideas can be processed *together* - like when they occur within the same sentence or other structure. This difference might seem like an afterthought at first, but since learning is driven by unit coactivation, it has important consequences for later recall. Recency mechanisms in CMR ensure that items processed near one another are more strongly associated in memory, but the extent of this association depends on the value of the drift/decay rate parameter enforcing recency-weighting in the context vector. 

The cycle construct disentangles co-processing from drift rate and (with activation normalization) focuses association strengthening to units with connections to _all_ units in the current cycle. 

### Dynamic semantic, not just temporal, associations
During encoding, only pre-existing serial positional associations are retrieved during computation of contextual input. The state of context at any given simulation step is mainly a function of these retrieved temporal associations and decayed memories of similar retrievals from past simulation steps. 

In the landscape model, retrieved associative information is at least initially semantic - literally the semantic textual similarities computed between each unit before simulation started.  The state of context at any given simulation step is mainly a function of these retrieved *semantic* associations and decayed memories of similar retrievals from past simulation steps. 

The result is that at any given learning step, the extent to which any connection weights between units are updated is strongly modulated by pre-experimental semantic similarity to an extent that doesn't happen in CMR. If CMR could initialize with semantic similarities in the way the Landscape Model does by default, it would have this feature.

### Leveraged to account for convergent and inferential dynamics
Coactivation in general (whether within or outside a cycle) is central to how the landscape model is able to account for complex features of comprehension. Within the model, two distinct ideas activated together in turn drives especial activation for ideas that are strongly associated with them both simultaneously, even if they haven't actually been processed yet. Dynamics like these are how the model is applied to account for the way inferential processes shape outcome variables like reading or response times.

If CMR could figure semantic information into its account of how internal contextual representations dynamically fluctuate over the course of list study, then it probably could similarly account for these unique aspects of story comprehension while at the same preserving a capacity to account for factors organizing free recall of word lists with more subtle semantic associations.

<center> <img src="https://pbs.twimg.com/media/E4bioegWUAATLwP.jpg" width=50%> </center>

## Current Analyses
Previous work focused on predicting inference activation (Yeari and van den Broek, 2015) found that integrating LSA computations within a dynamic comprehension model resulted in better predictions than using LSA alone. We'll similarly explore whether connections found through simulation of the landscape model better account for recall rates and response organization in free recall of narratives than semantic similarities alone.

If relevant, we'll also try to rule out a plausible potential explanation for our results: that, like CMR, the landscape model also primarily encodes information about serial order into idea units' connection weights. If it turns out that LS-R indeed works this way, then a more proper focus of this project might be to characterize CMR's capacity to account for benchmark phenomena in the reading comprehension modeling literature. Otherwise, further work will focus on integrating the Landscape model's affordances into retrieved context theory. 

### Data
<center> <img src="../img/cutler_method.png" width=50%> </center>

Recall for narratives, if split into idea units -- "meaningful chunks of information that convey a piece of the narrative" -- that are numbered according to chronological order, can be examined using analytic techniques developed for free and serial list recall tasks. This framework enables direct comparison between ideas, assumptions, and models applied to understand how people remember sequences such as word lists and those used to understand memory for narrative texts. To support analysis of narrative recall this way, we considered a dataset collected, preprocessed, and presented by Cutler, Palan, Polyn, and Brown-Schmidt (2019). In corresponding experiments, 22 research participants read 6 distinct short stories. Upon reading a story, participants performed immediate free recall of the narrative twice. Three weeks later, participants performed free recall of each narrative again. Each recall period was limited to five minutes. Following data collection, a pair of research assistants in the Brown-Schmidt laboratory were each instructed to independently split stories and participant responses into idea units as defined above, and to identify correspondences between idea units in participant responses and corresponding studied stories reflecting recall. Following this initial preprocessing, research assistants then compared and discussed their results and recorded consensus decisions regarding the segmentation and correspondence of idea units across the dataset. Further analysis focused on the sequences of story idea units recalled by participants on each trial as tracked by these researchers. Where relevant (e.g. for grouping idea units into cycles), the spaCy library for advanced natural language processing is applied to automatically segment passages into unique sentences.

### Semantic Textual Similarity
To compute pre-experimental semantic textual similarity between extracted idea units, we lean on the SentenceTransformers library, a Python framework for state-of-the-art sentence, text and image embeddings. The most advanced models supported by the library apply siamese and triplet network structures to fine-tune the BERT language model to achieve state-of-the-art performance on sentence-pair regression tasks including semantic textual similarity (Reimers & Gurevych, 2019). To compute the semantic similarity between two idea units, a vector representation corresponding to each word within the idea units are retrieved from the pretrained paraphrase-mpnet-base-v2 model, selected for its state of the art performance across various benchmarks. To obtain a vector representation for each entire idea unit, a mean vector is computed over every word in the unit. Finally, the cosine similarity of these mean-pooled idea vectors is used to represent the semantic textual similarity between the idea units under consideration.

### Semantic and Temporal Organization

## Baseline Overview
Before digging into the landscape model, we open with some baseline temporal and semantic organizational analyses to help build an initial grasp of dataset characteristics.

### Recall Rate by Time of Test
<center> <img src="../results/Catplot_Probability_Recall_by_Time_Test.svg"> </center>

### Recall Rate by Serial Position

<center> <img src="../results/Lineplot_SPC_by_Story_Time_Test_1.svg"></center>

<center> <img src="../results/Lineplot_SPC_by_Story_Time_Test_2.svg"> </center>

<center> <img src="../results/Lineplot_SPC_by_Story_Time_Test_3.svg"> </center>

### Recall Rate by Semantic Centrality
<center> <img src="../results/Lmplot_Probability_Recall_by_Mean_MiniLM_L12_v2_Cosine_Similarity.svg"> </center>


### Lag-CRP
<center> <img src="../results/Lineplot_CRP_by_Time_Test.svg"> </center>

### Semantic CRP
<center> <img src="../results/Lineplot_SemanticCRP_by_Time_Test.svg"> </center>

### Semantic Lag-Rank Organizational Score by Time_Test and Subject
<center> <img src="../results/Lmplot_Time_Test_by_Distance_Rank_Glove840B_by_Subject.svg"> </center>

## Simulation Results

### Recall Clustering by Model Connection Weights by Simulation Step
Time_Test == 2

<center> <img src="../results/Lineplot_LMR_Distance_Rank_by_Simulation_Step_by_Story.svg"> </center>

### Recall Rate by Mean Simulated Connection Weight
<center> <img src="../results/Lmplot_Probability_Recall_by_Mean_Connection_Weight.svg"> </center>

### Recall Rate by Mean Simulated Connection Weight by Story
<center> <img src="../results/Lmplot_Probability_Recall_by_Mean_Connection_Weight_by_Story.svg"> </center>

### Model DistanceCRP by Time_Test
<center> <img src="../results/Lineplot_Model_DistanceCRP_by_Time_Test.svg"> </center>

### Correlations


for time_test in range(1, 4):...

Time Test == 1

|                   |   time_test |      input |     recall |   cosine_similarity |
|:------------------|------------:|-----------:|-----------:|--------------------:|
| time_test         |         nan | nan        | nan        |          nan        |
| input             |         nan |   1        |  -0.404539 |           -0.291262 |
| recall            |         nan |  -0.404539 |   1        |            0.141831 |
| cosine_similarity |         nan |  -0.291262 |   0.141831 |            1        |

Time Test == 2

|                   |   time_test |      input |      recall |   cosine_similarity |
|:------------------|------------:|-----------:|------------:|--------------------:|
| time_test         |         nan | nan        | nan         |         nan         |
| input             |         nan |   1        |  -0.565403  |          -0.291262  |
| recall            |         nan |  -0.565403 |   1         |           0.0794009 |
| cosine_similarity |         nan |  -0.291262 |   0.0794009 |           1         |

Time Test == 3

|                   |   time_test |      input |      recall |   cosine_similarity |
|:------------------|------------:|-----------:|------------:|--------------------:|
| time_test         |         nan | nan        | nan         |         nan         |
| input             |         nan |   1        |  -0.194043  |          -0.291262  |
| recall            |         nan |  -0.194043 |   1         |           0.0938463 |
| cosine_similarity |         nan |  -0.291262 |   0.0938463 |           1         |