SQ-Transformer: Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
This repo host the code for the paper Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings, by Yichen Jiang, Xiang Zhou, and Mohit Bansal.
- This project is built on Python 3.6.8, pytorch 1.10.1 and fairseq 0.10.2. All dependencies can be installed via:
pip install -r requirements.txt
- The vector quantization code
./ar_seq2seq/vector_quantization.py
is adapted from an older version of the vector-quantize-pytorch repo. - The main components of the model is implemented in
./ar_seq2seq
. - We also use some basic Transformer layer and modules from Latent-Glat in
./nat
.
- In this work, we use the SCAN, COGS, CoGnition, and WMT17 En-De, WMT14 En-Fr datasets to train and evaluate our models.
- Since SCAN, COGS, and CoGnition do not have a validation set that require compositional generalization, we randomly select 20% examples from their test sets as the corresponding validation sets.
- The processed data binaries of SCAN, COGS, and CoGnition can be downloaded at this Google Drive.
- Please follow fairseq official documentation to process the WMT data.
SQ-Transformer/
├── data
├── scan_jump_x2_v2_1000prim/
├── dict.src.txt
├── dict.tgt.txt
├── test.src-tgt.src.bin
├── test.src-tgt.tgt.idx
├── train.src-tgt.tgt.bin
├── valid.src-tgt.src.idx
├── test.src-tgt.src.idx
├── train.src-tgt.src.bin
├── train.src-tgt.tgt.idx
├── valid.src-tgt.tgt.bin
├── preprocess.log
├── test.src-tgt.tgt.bin
├── train.src-tgt.src.idx
├── valid.src-tgt.src.bin
└── valid.src-tgt.tgt.idx
├── scan_around_right/
├── dict.src.txt
├── dict.tgt.txt
├── test.src-tgt.src.bin
├── test.src-tgt.tgt.idx
├── train.src-tgt.tgt.bin
├── valid.src-tgt.src.idx
├── test.src-tgt.src.idx
├── train.src-tgt.src.bin
├── train.src-tgt.tgt.idx
├── valid.src-tgt.tgt.bin
├── preprocess.log
├── test.src-tgt.tgt.bin
├── train.src-tgt.src.idx
├── valid.src-tgt.src.bin
└── valid.src-tgt.tgt.idx
├── cogs/
├── dict.src.txt
├── dict.tgt.txt
├── test.src-tgt.src.bin
├── test.src-tgt.tgt.idx
├── train.src-tgt.tgt.bin
├── valid.src-tgt.src.idx
├── test.src-tgt.src.idx
├── train.src-tgt.src.bin
├── train.src-tgt.tgt.idx
├── valid.src-tgt.tgt.bin
├── preprocess.log
├── test.src-tgt.tgt.bin
├── train.src-tgt.src.idx
├── valid.src-tgt.src.bin
└── valid.src-tgt.tgt.idx
├── cognition_cg/
├── dict.en.txt
├── dict.zh.txt
├── preprocess.log
├── test.en-zh.en
├── test.en-zh.zh
├── train.en-zh.en
├── train.en-zh.zh
├── valid.en-zh.en
└── valid.en-zh.zh
├── wmt14_en_fr/
└── wmt17_en_de
├── raw_data
├── cognition
├── cg-test
├── cg-test.compound
├── cg-test.en
├── cg-test.zh
├── NP
├── PP
└── VP
├── processed
├── test.en
├── test.zh
├── train.en
├── train.zh
├── valid.en
└── valid.zh
- To train on SCAN AddJump 2x (augmented), run
./train_scripts/train_vq_seq2seq_scan_jump.sh
. - To train on SCAN AroundRight, run
./train_scripts/train_vq_seq2seq_scan_aroundright.sh
. - To train on COGS, run
./train_scripts/train_vq_seq2seq_cogs.sh
. - To train on CoGnition, run
./train_scripts/train_vq_seq2seq_cognition.sh
.
- To evaluate on SCAN AddJump 2x (augmented), run
./eval_scripts/eval_vq_seq2seq_scan_jump.sh
. - To evaluate on SCAN AroundRight, run
./eval_scripts/eval_vq_seq2seq_scan_aroundright.sh
. - To evaluate on COGS, run
./eval_scripts/eval_vq_seq2seq_cogs.sh
. - To evaluate on CoGnition, run
./eval_scripts/eval_vq_seq2seq_cognition.sh
.
@article{jiang2024SQTransformer,
title={Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings},
author={Jiang, Yichen and Zhou, Xiang and Bansal, Mohit},
journal={arXiv preprint arXiv:2402.06492},
year={2024}
}