Coarse-to-Fine Pre-training for Named Entity Recognition

====

Models and results can be found at our EMNLP 2020 paper Coarse-to-Fine Pre-training for Named Entity Recognition. It achieves the state-of-the-art result on three benchmarks of NER task.

Details will be updated soon.

Requirement:

Python: 3.8.5
PyTorch: 1.6.0

Data preporation:

Data for ESI

You can donwload the data for ESI used in our paper from HERE

If you wanna to generate your own data, please run "data_preprocess/write_tag.py"

Data for NEE

You can donwload the data for NEE used in our paper from HERE

Gazetteers used in our paper can be downloaded from HERE

If you wanna to generate your own data, please run "data_preprocess/write_tag_from_dict.py"

Query Generation

Write down queries for entity labels in ./data_preprocess/dump_query2file.py and run python3 ./data_preprocess/dump_query2file.py to dump queries to the folder ./data_preprocess/queries.

Transform tagger-style annotations to MRC-style triples

Run ./data_preprocess/example/generate_data.py to generate MRC-style data.

How to run the code?

Stage1: ESI

python -u run/train_bert_mrc.py --data_dir data/train_for_ESI/zhwiki/ --checkpoint 20000 --learning_rate 6e-6 --num_train_epochs 5 --output_dir data/saved_model/zhwiki/ --data_sign zhwiki

Stage2: NEE

Put the model generated in stage1 into "data/saved_model/zhwiki/"

Then run:

python -u run/train_bert_mrc.py --data_dir data/train_for_NEE/ecommerce/ --checkpoint 4000 --learning_rate 3e-5 --num_train_epochs 6 --output_dir data/saved_model/ecommerce/stage2 --data_sign ecommerce --pretrain data/saved_model/zhwiki --bert_model data/bert_model/bert-base-chinese-pytorch/ --warmup_proportion 0.4 --regenerate_rate 0.1 --STrain 1 --perepoch 0

Stage3: FET

Put the model generated in stage2 into "data/saved_model/ecommerce/stage2", put the ".npy" data generated by stage2 into "data/train_for_FET/ecommerce/",

then run:

python -u run/train_cluster_bert_mrc.py --data_dir data/train_for_FET/ecommerce/ --checkpoint 2000 --learning_rate 2e-5 --num_train_epochs 5 --output_dir data/saved_model/ecommerce/stage3 --data_sign ecommerce --pretrain data/saved_model/ecommerce/stage2 --bert_model data/bert_model/bert-base-chinese-pytorch/ --num_clusters 23 --gama 0.001 --clus_niter 60 --dropout_rate 0.1

Stage4: fine-tune

Put the model generated in stage3 into "data/saved_model/ecommerce/stage3"

python -u run/train_bert_mrc.py --data_dir data/supervised_data/ecommerce/ --checkpoint 100 --learning_rate 3e-5 --num_train_epochs 25 --output_dir data/saved_model/ecommerce/supervised --data_sign ecommerce --pretrain data/saved_model/ecommerce/stage3 --bert_model data/bert_model/bert-base-chinese-pytorch/

Cite:

@inproceedings{mengge-etal-2020-coarse, title = "{C}oarse-to-{F}ine {P}re-training for {N}amed {E}ntity {R}ecognition", author = "Mengge, Xue and Yu, Bowen and Zhang, Zhenyu and Liu, Tingwen and Zhang, Yue and Wang, Bin", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", year = "2020", pages = "6345--6354", }

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
configs		configs
data_loader		data_loader
data_preprocess		data_preprocess
layer		layer
metric		metric
model		model
run		run
test		test
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coarse-to-Fine Pre-training for Named Entity Recognition

Requirement:

Data preporation:

How to run the code?

Cite:

About

Releases

Packages

Languages

berryxue/CoFEE

Folders and files

Latest commit

History

Repository files navigation

Coarse-to-Fine Pre-training for Named Entity Recognition

Requirement:

Data preporation:

How to run the code?

Cite:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages