Skip to content

schwartz-lab-NLP/papa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

PAPA

This repository contains the code for our Findings of EMNLP 2022 paper: How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers by Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith and Roy Schwartz.

Citation:

@inproceedingsi{Hassid:2022,
        author = {Michael Hassid and Hao Peng and Daniel Rotem and Jungo Kasai and Ivan Montero and Noah A. Smith and Roy Schwartz},
        title = {How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers},
	booktitle = {In Findings of EMNLP},
        year = {2022}
}

How to use

Our code is based on the Hugging Face framework (specificlly on the transformers library). To use our code please first follow the instrunctions in here.

Once evrything is set, copy our transformers/ directory to the original transformers directory (this will add some scripts and required capablities).

As an example we will provide the full command lines in order to run PAPA over BERT BASE with the CoLA task:

First navigate to the papa_scripts directory:

cd transformers/papa_scripts

To extract the constant matrices, run:

MODEL=bert-base-uncased
TASK=COLA

python3 run_papa_glue_avgs_creator.py --model_name_or_path ${MODEL}  --task_name ${TASK}  --max_length 64    --per_device_train_batch_size 8   --output_dir <dir_to_save_constant_matrices> --cache_dir <your_cache_dir> --use_papa_preprocess true  --pad_to_max_length

For extracting the heads sorting, run:

MODEL=bert-base-uncased
TASK=COLA

python3 run_papa_glue.py --model_name_or_path ${MODEL} --task_name ${TASK} --do_eval --max_seq_length 64 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --output_dir <dir_to_save_sorted_heads> --cache_dir <your_cache_dir>  --do_train --num_train_epochs 15.0 --learning_rate 2e-5 --lr_scheduler_type constant --disable_tqdm true --evaluation_strategy epoch --save_strategy no --use_papa_preprocess --use_freeze_extract_pooler true --static_heads_dir <dir_saves_constant_matrices>  --save_total_limit 0 --sort_calculating True

Now, to perform and get the full PAPA method results, run:

MODEL=bert-base-uncased
TASK=COLA

for static_heads_num in 0 72 126 135 144
 do
python3 run_papa_glue.py --model_name_or_path ${MODEL} --task_name ${TASK} --do_eval --max_seq_length 64 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --output_dir <dir_to_save_results> --cache_dir <your_cache_dir> --do_train --num_train_epochs 15.0 --learning_rate 2e-5 --lr_scheduler_type constant --disable_tqdm true --evaluation_strategy epoch --save_strategy no --use_papa_preprocess --grad_for_classifier_only true --use_freeze_extract_pooler true --static_heads_dir <dir_saves_constant_matrices> --static_heads_num ${static_heads_num} --save_total_limit 0 --sorting_heads_dir <dir_saved_sorted_heads>
done

To perform the same analysis with token-classification tasks use the scripts run_papa_ner.py instead of run_papa_glue.py and run_papa_ner_avgs_creator.py instead of run_papa_glue_avgs_creator.py.

Releases

No releases published

Packages

No packages published

Languages