Beyond Memorization: The Challenge of Random Memory Access in Language Models

This repo contains the code for reproducing experiments in our paper, Beyond Memorization: The Challenge of Random Memory Access in Language Models.

In our study, we reveal that language models (GPT2) are able to sequentially access their parametric memory while encountering challenges in randomly accessing memorized content.

The central idea is that the model can memorize any content, but cannot access the memory in a random manner. We verify that the limited random access ability has implications on the real open-domain question answering: the model may fail to answer an question simply because it cannot access an answer stored in the middle of a memorized passage.

Requirements

Please create an environment using pip and requirements.txt file.

pip install -r requirements.txt

Data

All the data for the experiments are hosted on Huggingface hub. You can directly use them without downloading.

Experiments

The experiments are divided into the four parts:

Full recitation: Asking the model to recite the full passage given an passage ID
Selective recitation: Asking the model to recite a sentence from the passage give a passage ID
Grounded QA: Given an ID and a question, asking the model to answer the question.
Open-domain QA: Given a question, asking the model to answer the question. The model may be trained on the passages.

Running the experiments

The scripts for each of the experiments can be found in their respective folders. For instance, if you wish to run the full recitation experiments on gpt2-large, you should be at the project root folder, and run:

bash full_recitation/run.sh gpt2-large

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
full_recitation		full_recitation
grounded_qa		grounded_qa
odqa		odqa
selective_recitation		selective_recitation
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_clm.py		run_clm.py
run_clm.sh		run_clm.sh
trainer_gpt_qa.py		trainer_gpt_qa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Memorization: The Challenge of Random Memory Access in Language Models

Requirements

Data

Experiments

Running the experiments

About

Releases

Packages

Contributors 2

Languages

sail-sg/lm-random-memory-access

Folders and files

Latest commit

History

Repository files navigation

Beyond Memorization: The Challenge of Random Memory Access in Language Models

Requirements

Data

Experiments

Running the experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages