Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Open-resource code of our ICLR 2023 paper: Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study https://openreview.net/forum?id=UazgYBMS9-W

Requirements

python==3.8 pytorch>=1.7.0 transformers>=4.3.0

Download Datasets

We follow d'Autume et al. (2019), and download the datasets from their Google Drive.

Prepare

Clone from Github.
Create the data directory:

cd plms_are_lifelong_learners
mkdir data

Move the .tar.gz files to "data".
Uncompress the .tar.gz files and sample data from the original datasets:

bash uncompressing.sh
python sampling_data.py --seed 42

Train Models Sequentially

Tokenize the input texts:

python tokenizing.py --tokenizer bert-base-uncased --data_dir ./data/ --max_token_num 128

"bert-base-uncased" can be replaced by any other tokenizer in Hugging Face Transformer Models, e.g. "roberta-base", "prajjwal1/bert-tiny", etc. The files with tokenized texts will be saved in the directory ./data/

Train the models

CUDA_VISIBLE_DEVICES=0 python train_cla.py --plm_name bert-base-uncased --tok_name bert-base-uncased --pad_token 0 --plm_type bert --hidden_size 768 --device cuda --seed 1023 --padding_len 128 --batch_size 32 --learning_rate 1.5e-5 --trainer sequential --epoch 2 --order 0 --rep_itv 10000 --rep_num 100

"plm_name" indicates which pre-trained model (and its well-pre-trained weights) should be loaded. "tok_name" means which tokenizer are employed in the first step. These two parameters can be different, e.g., python train_cla.py --plm_name bert-large-uncased --tok_name bert-base-uncased ...
"plm_type" is required. When using GPT-2 or XLNet, please set "plm_type" as "gpt2" or "xlnet". It is because the classifiaction models based on GPT-2 or XLNet employ representations of the last tokens as features, while other models (like BERT or RoBERTa) employ the first token ([CLS]).
"pad_token" should be an integer, which is the ID of padding tokens in the tokenizer, e.g. 0 for BERT, or 1 for RoBERTa.
"trainer" can be sequential, replay, or multiclass. The sequential means training sequentially without Episodic Memory Play, and multiclass means training on all tasks together (multi-task learning).
"order" can be 0, 1, 2, 3, which is correspond to Appendix A in the paper.

Probing Study

Re-train the decoder in each checkpoint:

CUDA_VISIBLE_DEVICES=0 python probing_train.py --plm_name bert-base-uncased --tok_name bert-base-uncased --pad_token 0 --plm_type bert --hidden_size 768 --device cuda --seed 1023 --padding_len 128 --batch_size 32 --learning_rate 3e-5 --epoch 10 --train_time "1971-02-03-14-56-07"

Evaluate the performance of each re-trained model:

CUDA_VISIBLE_DEVICES=0 python test_model.py --plm_name bert-base-uncased --tok_name bert-base-uncased --pad_token 0 --plm_type bert --hidden_size 768 --device cuda --padding_len 128 --train_time "1971-02-03-14-56-07"

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
logs		logs
probing_save		probing_save
save		save
README.md		README.md
models.py		models.py
probing_train.py		probing_train.py
sampling_data.py		sampling_data.py
test_model.py		test_model.py
tokenizing.py		tokenizing.py
train_cla.py		train_cla.py
trainers.py		trainers.py
uncompressing.sh		uncompressing.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Requirements

Download Datasets

Prepare

Train Models Sequentially

Probing Study

About

Releases

Packages

Languages

kobayashikanna01/plms_are_lifelong_learners

Folders and files

Latest commit

History

Repository files navigation

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Requirements

Download Datasets

Prepare

Train Models Sequentially

Probing Study

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages