Skip to content

sjelassi/transformers_ssm_copy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Repeat After Me: Transformers are Better than State Space Models at Copying

About

This repository gathers the experiments for the paper Repeat After Me: Transformers are Better than State Space Models at Copying. The experiments divide in two parts:

  • Synthetic experiments: this covers three tasks: standard copy, prefix key veriant of the n-gram lookup task and the suffix key variant. The models we consider are Transformers (with RoPE, NoPE, ALiBi and Hard-ALiBi positional encodings), Mamba and LSTM.

  • Experiments with pretrained models: this covers three tasks: copying C4 text, lookup on phone books and question answering on SQuAD_v2.

Installation

pip install causal-conv1d>=1.1.0 : an efficient implementation of a simple causal Conv1d layer used inside the Mamba block. pip install mamba-ssm : the core Mamba package. pip install names : names package to randomly sample names in the phone-book experiment.

Other requirements:

  • Linux
  • NVIDIA GPU
  • PyTorch 1.12+
  • CUDA 11.6+
  • transformers 4.35+
  • datasets 2.14+

Synthetic experiments

These experiments are intended to study a) how well the models learn the copy task in distribution b) the length generalization ability of these models c) their performance in lookup tasks where we give a prefix or suffix n-grams.

This folder covers three tasks: copy, prefix_ngram, suffix_ngram and using three models: Transformers with different positional encodings (model = T_rope, T_nope, T_alibi, T_hard_alibi), Mamba (mamba) and LSTMs (lstm). For instance, to run an experiment where we train a Transfomer with RoPE positional encoding on the copy task for strings with length up to 20 and then evalute it on strings of length 20, this is the command to run:

python3 synthetic_tasks/main.py --model "T_rope" --train_task "copy" --eval_task  "copy" --min_train_len 5 --max_train_len 20 --min_eval_len 20 --max_eval_len 20
                               

Experiments on pre-trained models

These experiments cover three different tasks: copying natural text strings from the C4 dataset ( eval_task = c4_copy), lookup on a phone-book ( eval_task = phone_book) and question answering on squad_v2 ( eval_task = squad). We consider in particular the following models:

  • Mamba models: state-spaces/mamba-370m , state-spaces/mamba-1.4b , state-spaces/mamba-2.8b

  • Transformers: EleutherAI/pythia-410m, EleutherAI/pythia-1.4b , EleutherAI/pythia-2.8b

For instance, to run an experiment where we evaluate a Mamba-370m on the phone-book dataset with 20 (name,phone-number) entries, we run:

python3 pretrained_exps/main.py --model "state-spaces/mamba-370m" \
                --eval_task "phone_book" \
                --min_eval_len 20\
                --max_eval_len 20\

How to cite

@article{jelassi2024repeat,
  title={Repeat after me: Transformers are better than state space models at copying},
  author={Jelassi, Samy and Brandfonbrener, David and Kakade, Sham M and Malach, Eran},
  journal={arXiv preprint arXiv:2402.01032},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages