Setup

Code for "Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling"

Setup

Only tested on python3.6.

python -m pip install virtualenv
virtualenv bert_env
source bert_env/bin/activate
pip install -r requirements.txt

Usage

The code is built on the source code of On Losses for Modern Language Models with several enhancements and modifications. In addition to previous proposed pre-training tasks ("mlm", "rg (QT) in the paper", "tf", "tf_idf", "so"...etc), we provide a new training mechanism for transformers which enjoys the benefits of ensembling without sacrificing efficiency. To train our Multi-CLS Bert, simply specify --model-type mf (MCQT in paper) with number of facets K you want via --num-facets K.

Currently mf type can be combined with any of the following methods:

Using Hard Negative --use_hard_neg 1
Architecture-based Diversification --diversify_mode
On which layer of BERT to insert additional linear layer --diversify_hidden_layer
Enable cross facets loss--facet2facet
λ in our MCQT loss: agg_weight
Always using MLM loss (--always-mlm True). The loss of pre-training task will be "mf" + "mlm".
Initialize with pretrained bert's weight --pretrained

When pre-training with multi-tasks, the loss function can calculated using any of the following methods:

Summing all losses (default, incompatible between a small subset of tasks, see paper for more detail)
Continuous Multi-Task Learning, based on ERNIE 2.0 (--continual-learning True)
Alternating between losses (--alternating True)

To view all usable parameters that shares by all different pretrain tasks, you may find them in arguments.py.

Note that our code still supports those comparing tasks listed in our paper, you may just change the model type to reproduce the result (ex: using --model-type rg+so+tf_idf to perform MTL method )

Before training, you should

Set paths to read/save/load from in paths.py
To create datasets, see data_utils/make_dataset.py
For tf_idf prediction, you need to first calculate the idf score for your dataset. See idf.py for a script to do this.
If you want to change the transformer size, check out bert_config.json.
If you want to train bert-large, you may use bert_large_config.json with --tokenizer-model-type bert-large-uncased.

The following command is the best setting that we used our paper for Multi-Bert

python -m pretrain_bert --model-type mf,tf_idf,so --pretrained-bert --save-iters 200000 --lr 2e-5 --agg-function max --warmup 0.001 --facet2facet  --epochs 2 --num-facets 5 --diversify_hidden_layer 4,8 --loss_mode log  --use_hard_neg 1 --batch-size 30 --seed 1 --diversify_mode lin_new --add_testing_agg --agg_weight 0.1 --save_suffix _add_testing_agg_max01_n5_no_pooling_no_bias_h48_lin_no_bias_hard_neg_tf_idf_so_bsz_30_e2_norm_facet_warmup0001_s1

Fine-tuning

Before running fine-tuning task, change output_path in evaluate/generate_random_number.py as well as random_file_path in evaluate/config/test_bert.conf to your local path. Run the python file to generate random number, which is to ensure the random seeds for training data sampling remain same while fine-tuning.

To run fine-tuning task: You will need to convert the saved state dict of the required model using the convert_state_dict.py file. Then run: python3 -m evaluate.main --exp_name [experiment name] --overrides parameters_to_Overide Where experiment name is the same as the model type above. If using a saved checkpoint instead of the best model, use the --checkpoint argument. You may change the data you want to use in paths.py, can be glue or super glue. As for the --overrides, this parameter accepts command like strings to override the default values in fine-tuning config (evaluate/config/test_bert.conf). You may specify learning rate, model_suffix or few shot setting there.

In Multi-Bert, we provide different ways to aggregate all the CLS embeddings. To specify the aggregation function, change the value of pool_type in evaluate/config/test_bert.conf

Re-parameterization pool_type=proj_avg_train
Sum Aggregation pool_type=first_init

The following command is an example to run fine-tuning task on Glue dataset with few shot sample size =100. Use run name with suffix to reload the model weight you saved from pretraining.

common_para="warmup_ratio = 0.1, max_grad_norm = 1.0, pool_type=proj_avg_train, "
common_name="warmup01_clip1_proj_avg_train_correct"

python -m evaluate.main 
--exp_name $exp_name 
--overrides "run_name = ${model_name}_1, 
$common_para pretrain_tasks = glue}, 
target_tasks = glue, 
lr=1e-5, batch_size=4, few_shot = 32, max_epochs = 20, 
pooler_dropout = 0, random_seed = 1, 
run_name_suffix = adam_${common_name}_e20_bsz4:s1:lr"

Citation

@inproceedings{chang2023multi-cls,
  title={Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling},
  author={Haw-Shiuan Chang* and Ruei-Yao Sun* and Kathryn Ricci* and Andrew McCallum},
  booktitle={Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_utils		data_utils
evaluate		evaluate
model		model
optim		optim
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
arguments.py		arguments.py
bert_config.json		bert_config.json
bert_large_config.json		bert_large_config.json
check_vocab.ipynb		check_vocab.ipynb
configure_data.py		configure_data.py
convert_state_dict.py		convert_state_dict.py
embed_utils.py		embed_utils.py
get_raw_data.py		get_raw_data.py
idf.p		idf.p
idf.py		idf.py
learning_rates.py		learning_rates.py
parse_result_file.py		parse_result_file.py
paths.py		paths.py
pretrain_bert.py		pretrain_bert.py
print_parameters.py		print_parameters.py
requirements.txt		requirements.txt
submit_formal.sh		submit_formal.sh
submit_formal_large.sh		submit_formal_large.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Usage

Fine-tuning

Citation

About

Releases

Packages

Languages

License

iesl/multicls

Folders and files

Latest commit

History

Repository files navigation

Setup

Usage

Fine-tuning

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages