Welcome to the official GitHub repository for our LREC-COLING 2024 paper, "Unveiling Vulnerabilities of Self-Attention".
To set up the required environment, execute the following commands:
conda create -n sattend python=3.7
conda activate sattend
pip install -r requirements.txt
cd TextAttack && pip install -e .
- Ensure you have the dataset by running the script located at data/download_data.sh.
- Finetune models in the Victim folder.
- Run
run_hackattend.py
.
We explore several defense strategies against adversarial attacks alongside the SAttend model.
TextAttack is used for evaluation. The implementation is modified from the original TextAttack library. The test set is generated by running the following command:
python -m attacks.attack_textfooler.py --recipe <recipe> --task_name <task_name>
<recipe>
can be any of the following:
- textfooler
- bertattack
-
Adversarial Training: This involves training models with adversarially generated examples. The implementation are located in the
victim/SAttend/
folder. The implementation is adapted from CreAT -
Adversarial Data Augmentation (ADA) samples.
- To generate adversarial samples, use the following command:
python -m attacks.attack_textfooler.py --recipe <recipe> --task_name <task_name> --generate_adv_samples
- Fine-tuning with adversarial samples. To fine-tune the model with adversarial samples, use the following command:
python run_sattend.py --do_eval --do_train --best_epoch best --task_name <task_name> --do_lower_case --num_train_epochs <num_epochs> --gradient_accumulation_steps <num_steps> --train_batch_size <batch_size> --fp16 --adversarial --adv_split train_<recipe> --warmup_proportion <warmup_proportion> --learning_rate <learning_rate>
-
S-Attend smoothing. run_sattend.py to train SAttend model with the flag
--adv_split test
and---mask_rate <mask_rate>