Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER
- Python 3.8+
- transformers 4.31.0
- torch 2.0.1
BLURB benchmark dataset
python utils.py \
[path to data] \
[storage or destination directory]
Alternatively inherit pre-processed BLURB datasets such as,
PMI masking
Construct a vocabularly from a dataset using the masking approach
./run_pmi.sh
Specify the paths to the data and set the masking budgets for both the Base level masking BLM and the Entity level masking ELM
./run_train.sh [DATASET]
@article{abaho2024improving,
title={Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER},
author={Abaho, Micheal and Bollegala, Danushka and Leeming, Gary and Joyce, Dan and Buchan, Iain E},
journal={arXiv preprint arXiv:2403.18025},
year={2024}
}