PAPA

This repository contains the code for our Findings of EMNLP 2022 paper: How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers by Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith and Roy Schwartz.

Citation:

@inproceedingsi{Hassid:2022,
        author = {Michael Hassid and Hao Peng and Daniel Rotem and Jungo Kasai and Ivan Montero and Noah A. Smith and Roy Schwartz},
        title = {How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers},
	booktitle = {In Findings of EMNLP},
        year = {2022}
}

How to use

Our code is based on the Hugging Face framework (specificlly on the transformers library). To use our code please first follow the instrunctions in here.

Once evrything is set, copy our transformers/ directory to the original transformers directory (this will add some scripts and required capablities).

As an example we will provide the full command lines in order to run PAPA over BERT BASE with the CoLA task:

First navigate to the papa_scripts directory:

cd transformers/papa_scripts

To extract the constant matrices, run:

MODEL=bert-base-uncased
TASK=COLA

python3 run_papa_glue_avgs_creator.py --model_name_or_path ${MODEL}  --task_name ${TASK}  --max_length 64    --per_device_train_batch_size 8   --output_dir <dir_to_save_constant_matrices> --cache_dir <your_cache_dir> --use_papa_preprocess true  --pad_to_max_length

For extracting the heads sorting, run:

MODEL=bert-base-uncased
TASK=COLA

python3 run_papa_glue.py --model_name_or_path ${MODEL} --task_name ${TASK} --do_eval --max_seq_length 64 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --output_dir <dir_to_save_sorted_heads> --cache_dir <your_cache_dir>  --do_train --num_train_epochs 15.0 --learning_rate 2e-5 --lr_scheduler_type constant --disable_tqdm true --evaluation_strategy epoch --save_strategy no --use_papa_preprocess --use_freeze_extract_pooler true --static_heads_dir <dir_saves_constant_matrices>  --save_total_limit 0 --sort_calculating True

Now, to perform and get the full PAPA method results, run:

MODEL=bert-base-uncased
TASK=COLA

for static_heads_num in 0 72 126 135 144
 do
python3 run_papa_glue.py --model_name_or_path ${MODEL} --task_name ${TASK} --do_eval --max_seq_length 64 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --output_dir <dir_to_save_results> --cache_dir <your_cache_dir> --do_train --num_train_epochs 15.0 --learning_rate 2e-5 --lr_scheduler_type constant --disable_tqdm true --evaluation_strategy epoch --save_strategy no --use_papa_preprocess --grad_for_classifier_only true --use_freeze_extract_pooler true --static_heads_dir <dir_saves_constant_matrices> --static_heads_num ${static_heads_num} --save_total_limit 0 --sorting_heads_dir <dir_saved_sorted_heads>
done

To perform the same analysis with token-classification tasks use the scripts run_papa_ner.py instead of run_papa_glue.py and run_papa_ner_avgs_creator.py instead of run_papa_glue_avgs_creator.py.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
github		github
transformers		transformers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

github

github

transformers

transformers

README.md

README.md

Repository files navigation

PAPA

Citation:

How to use

About

Releases

Packages

Languages

schwartz-lab-NLP/papa

Folders and files

Latest commit

History

Repository files navigation

PAPA

Citation:

How to use

About

Resources

Stars

Watchers

Forks

Languages