ACL'23 Grouped Head Attention

Code for the paper Finding the Pillars of Strength for Multi-Head Attention, Jinjie Ni, Rui Mao, Zonglin Yang, Han Lei, and Erik Cambria.

Requirements

torch==1.9.0+cu111
python==3.8.5
wandb==0.12.9

Install

cd fairseq
pip install -e .

General Usage

Our code uses W&B Sweep pipeline, and the running pipeline is integrated with Slurm Workload Manager.

To reproduce the results for GHT, please change your current working directory to HeadCollaboration; to reproduce the results for GHT-PS, please change your current working directory to HeadCollaboration_cluster_prune_pruneepoch.

On Slurm Clusters

Download & Process the data: bash data_process/data_preparation.sh.
wandb sweep #TheSweepYamlPath.
Copy & Paste the generated sweep id to the corresponding section of SweepID.sh.
Configure the hyperparameters in sweep.yaml according to the specifications in Appendix A.
Configure your partition and node in Sbatch.sh; configure the data, gpu id, and W&B project name in Run_Main.sh, run sbatch #path_to_Sbatch.sh.

Notes:

The above-mentioned files are under the SweepRuns_exp_ajusted folder of HeadCollaboration and HeadCollaboration_cluster_prune_pruneepoch respectively.
For users not using slurm clusters, you need to run wandb agent #username_projectname_sweepid (#username_projectname_sweepid is the variable in SweepID.sh).
If you want to run on multiple GPUs (our experiments are mostly based on a single A100-80GB GPU), you need to configure the fairseq-train command in Run_Main.sh according to https://fairseq.readthedocs.io/en/latest/.
More Sweep and W&B usage can be found at https://docs.wandb.ai/guide.

If you use this code in your work then please cite the paper Finding the Pillars of Strength for Multi-Head Attention with the following:

@article{DBLP:journals/corr/abs-2305-14380,
  author       = {Jinjie Ni and
                  Rui Mao and
                  Zonglin Yang and
                  Han Lei and
                  Erik Cambria},
  title        = {Finding the Pillars of Strength for Multi-Head Attention},
  journal      = {CoRR},
  volume       = {abs/2305.14380},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2305.14380},
  doi          = {10.48550/arXiv.2305.14380},
  eprinttype    = {arXiv},
  eprint       = {2305.14380},
  timestamp    = {Mon, 26 Jun 2023 20:50:08 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2305-14380.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACL'23 Grouped Head Attention

Requirements

Install

General Usage

On Slurm Clusters

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
HeadCollaboration		HeadCollaboration
HeadCollaboration_cluster_prune_pruneepoch		HeadCollaboration_cluster_prune_pruneepoch
data_process		data_process
fairseq		fairseq
README.md		README.md

SenticNet/GHA

Folders and files

Latest commit

History

Repository files navigation

ACL'23 Grouped Head Attention

Requirements

Install

General Usage

On Slurm Clusters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages