GitHub

Introduction

This repository contains the code and resources for COLING 2022 paper "A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products".

@inproceedings{liuks2022,
  title={A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products},
  author={Kesong Liu, Jianhui Jiang and Feifei Lyu},
  booktitle={COLING},
  year={2022}
}

Our code is developed based on ELECTRA. Following ELECTRA's replaced token detection (RTD) pre-training, we leverage biomedical entity masking (EM) strategy to learn better contextual word representations. Furthermore, we propose a novel pre-training task, product attribute prediction (PAP), to inject product knowledge into the pre-trained language model efficiently by leveraging medicinal product databases directly.

Usage

Pre-training: Please refer to the files "build_pretraining_dataset.py" and "run_pretraining.py" to build training samples and perform multi-task pre-training.
Training data: Please refer to the "data" directory for SAMPLE data.
Fine-tuning and evaluation: Please refer to the files in the "script" directory.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
finetune		finetune
model		model
pretrain		pretrain
script		script
util		util
LICENSE		LICENSE
README.md		README.md
build_cws_input.py		build_cws_input.py
build_openwebtext_pretraining_dataset.py		build_openwebtext_pretraining_dataset.py
build_pretraining_dataset.py		build_pretraining_dataset.py
calculate_cmedir.py		calculate_cmedir.py
calculate_pr.py		calculate_pr.py
cmrc2018_drcd_evaluate.py		cmrc2018_drcd_evaluate.py
configure_finetuning.py		configure_finetuning.py
configure_pretraining.py		configure_pretraining.py
conlleval.pl		conlleval.pl
convert_tagid_to_tagname.py		convert_tagid_to_tagname.py
run_finetuning.py		run_finetuning.py
run_pretraining.py		run_pretraining.py
run_pretraining_hvd.py		run_pretraining_hvd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

About

Releases

Packages

Languages

License

liuks/ep_plm

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages