Skip to content

kanekomasahiro/bert-gec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Code for the paper: "Can Encoder-Decoder Models Benefit from Pre-trained Language Representation in Grammatical Error Correction?" (In ACL 2020). If you use any part of this work, make sure you include the following citation:

@inproceedings{Kaneko:ACL:2020,
    title={Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction},
    author={Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki and Kentaro Inui},
    booktitle={Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year={2020}
}

Requirements

How to use

  • First download the necessary tools using the following command:
cd scripts
./setup.sh
  • This code uses wi+locness dataset.
  • Note that since the gold of wi+locnness test data is not available, validation data was specified as test data.
  • Place your data in the data directory if necessary.
  • You can train the BERT-GEC model with the following command:
./train.sh
  • You can use trained BERT-GEC model with the following command:
  • This model achieves the F score 62.77 on CoNLL.
  • The results in the paper are initialized with four pre-trained models with different seeds.
./generate.sh /path/your/data gpu
  • The OUTPUT file contains the system outputs of ensembled models.

License

See the LICENSE file

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published