# Unveiling Vulnerabilities of Self-Attention

Welcome to the official GitHub repository for our LREC-COLING 2024 paper, "Unveiling Vulnerabilities of Self-Attention".

Setting Up the Environment

To set up the required environment, execute the following commands:

conda create -n sattend python=3.7
conda activate sattend
pip install -r requirements.txt
cd TextAttack && pip install -e .

Getting Started

Prerequisites

Ensure you have the dataset by running the script located at data/download_data.sh.

HackAttend

Finetune models in the Victim folder.
Run run_hackattend.py.

Implementing Defense Mechanisms

We explore several defense strategies against adversarial attacks alongside the SAttend model.

Evaluation

TextAttack is used for evaluation. The implementation is modified from the original TextAttack library. The test set is generated by running the following command:

python -m attacks.attack_textfooler.py --recipe <recipe> --task_name <task_name>

<recipe> can be any of the following:

textfooler
bertattack

Defense mechanisms are:

Adversarial Training: This involves training models with adversarially generated examples. The implementation are located in the victim/SAttend/ folder. The implementation is adapted from CreAT

Adversarial Data Augmentation (ADA) samples.

To generate adversarial samples, use the following command:

python -m attacks.attack_textfooler.py --recipe <recipe> --task_name <task_name> --generate_adv_samples

Fine-tuning with adversarial samples. To fine-tune the model with adversarial samples, use the following command:

python run_sattend.py --do_eval --do_train --best_epoch best --task_name <task_name> --do_lower_case --num_train_epochs <num_epochs> --gradient_accumulation_steps <num_steps> --train_batch_size <batch_size> --fp16 --adversarial --adv_split train_<recipe> --warmup_proportion <warmup_proportion> --learning_rate <learning_rate>

S-Attend smoothing. run_sattend.py to train SAttend model with the flag --adv_split test and ---mask_rate <mask_rate>

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
TextAttack		TextAttack
attacks		attacks
data		data
modelling		modelling
victim		victim
wandb_sweeps		wandb_sweeps
.gitignore		.gitignore
attack.py		attack.py
readme.md		readme.md
requirements.txt		requirements.txt
run_hackattend.py		run_hackattend.py
run_sattend.py		run_sattend.py
utils.py		utils.py

liongkj/HackAttend

Folders and files

Latest commit

History

Repository files navigation

# Unveiling Vulnerabilities of Self-Attention

Setting Up the Environment

Getting Started

Prerequisites

HackAttend

Implementing Defense Mechanisms

Evaluation

Defense mechanisms are:

About

Resources

Stars

Watchers

Forks

Languages