Skip to content

liuks/ep_plm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository contains the code and resources for COLING 2022 paper "A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products".

@inproceedings{liuks2022,
  title={A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products},
  author={Kesong Liu, Jianhui Jiang and Feifei Lyu},
  booktitle={COLING},
  year={2022}
}

Our code is developed based on ELECTRA. Following ELECTRA's replaced token detection (RTD) pre-training, we leverage biomedical entity masking (EM) strategy to learn better contextual word representations. Furthermore, we propose a novel pre-training task, product attribute prediction (PAP), to inject product knowledge into the pre-trained language model efficiently by leveraging medicinal product databases directly.

Usage

  • Pre-training: Please refer to the files "build_pretraining_dataset.py" and "run_pretraining.py" to build training samples and perform multi-task pre-training.
  • Training data: Please refer to the "data" directory for SAMPLE data.
  • Fine-tuning and evaluation: Please refer to the files in the "script" directory.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published