Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Code for the paper: "Can Encoder-Decoder Models Benefit from Pre-trained Language Representation in Grammatical Error Correction?" (In ACL 2020). If you use any part of this work, make sure you include the following citation:

@inproceedings{Kaneko:ACL:2020,
    title={Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction},
    author={Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki and Kentaro Inui},
    booktitle={Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year={2020}
}

Requirements

python >= 3.5
torch == 1.1.0
bert-nmt
subword
gec-pseudodata

How to use

First download the necessary tools using the following command:

cd scripts
./setup.sh

This code uses wi+locness dataset.
Note that since the gold of wi+locnness test data is not available, validation data was specified as test data.
Place your data in the data directory if necessary.
You can train the BERT-GEC model with the following command:

./train.sh

You can use trained BERT-GEC model with the following command:
This model achieves the F score 62.77 on CoNLL.
The results in the paper are initialized with four pre-trained models with different seeds.

./generate.sh /path/your/data gpu

The OUTPUT file contains the system outputs of ensembled models.

License

See the LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
output		output
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Requirements

How to use

License

About

Releases

Packages

Languages

License

kanekomasahiro/bert-gec

Folders and files

Latest commit

History

Repository files navigation

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Requirements

How to use

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages