Repeat After Me: Transformers are Better than State Space Models at Copying

About

This repository gathers the experiments for the paper Repeat After Me: Transformers are Better than State Space Models at Copying. The experiments divide in two parts:

Synthetic experiments: this covers three tasks: standard copy, prefix key veriant of the n-gram lookup task and the suffix key variant. The models we consider are Transformers (with RoPE, NoPE, ALiBi and Hard-ALiBi positional encodings), Mamba and LSTM.
Experiments with pretrained models: this covers three tasks: copying C4 text, lookup on phone books and question answering on SQuAD_v2.

Installation

pip install causal-conv1d>=1.1.0 : an efficient implementation of a simple causal Conv1d layer used inside the Mamba block. pip install mamba-ssm : the core Mamba package. pip install names : names package to randomly sample names in the phone-book experiment.

Other requirements:

Linux
NVIDIA GPU
PyTorch 1.12+
CUDA 11.6+
transformers 4.35+
datasets 2.14+

Synthetic experiments

These experiments are intended to study a) how well the models learn the copy task in distribution b) the length generalization ability of these models c) their performance in lookup tasks where we give a prefix or suffix n-grams.

This folder covers three tasks: copy, prefix_ngram, suffix_ngram and using three models: Transformers with different positional encodings (model = T_rope, T_nope, T_alibi, T_hard_alibi), Mamba (mamba) and LSTMs (lstm). For instance, to run an experiment where we train a Transfomer with RoPE positional encoding on the copy task for strings with length up to 20 and then evalute it on strings of length 20, this is the command to run:

python3 synthetic_tasks/main.py --model "T_rope" --train_task "copy" --eval_task  "copy" --min_train_len 5 --max_train_len 20 --min_eval_len 20 --max_eval_len 20

Experiments on pre-trained models

These experiments cover three different tasks: copying natural text strings from the C4 dataset ( eval_task = c4_copy), lookup on a phone-book ( eval_task = phone_book) and question answering on squad_v2 ( eval_task = squad). We consider in particular the following models:

Mamba models: state-spaces/mamba-370m , state-spaces/mamba-1.4b , state-spaces/mamba-2.8b
Transformers: EleutherAI/pythia-410m, EleutherAI/pythia-1.4b , EleutherAI/pythia-2.8b

For instance, to run an experiment where we evaluate a Mamba-370m on the phone-book dataset with 20 (name,phone-number) entries, we run:

python3 pretrained_exps/main.py --model "state-spaces/mamba-370m" \
                --eval_task "phone_book" \
                --min_eval_len 20\
                --max_eval_len 20\

How to cite

@article{jelassi2024repeat,
  title={Repeat after me: Transformers are better than state space models at copying},
  author={Jelassi, Samy and Brandfonbrener, David and Kakade, Sham M and Malach, Eran},
  journal={arXiv preprint arXiv:2402.01032},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
pretrained_exps		pretrained_exps
synthetic_exps		synthetic_exps
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repeat After Me: Transformers are Better than State Space Models at Copying

About

Installation

Synthetic experiments

Experiments on pre-trained models

How to cite

About

Releases

Packages

Languages

License

sjelassi/transformers_ssm_copy

Folders and files

Latest commit

History

Repository files navigation

Repeat After Me: Transformers are Better than State Space Models at Copying

About

Installation

Synthetic experiments

Experiments on pre-trained models

How to cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages