Skip to content


Repository files navigation


A Neural Grammatical Error Correction System Built on Better Pre-training and Sequential Transfer Learning

Code accompanying Team Kakao&Brain's submission to the ACL 2019 BEA Workshop Shared Task.
(helo_word is our informal team name.)


ACL Anthology:


YJ Choe^, Jiyeon Ham^, Kyubyong Park^, Yeoil Yoon^

^Equal contribution.


Requires Python 3.

# apt-get packages (required for hunspell & pattern)
apt-get update
apt-get install libhunspell-dev libmysqlclient-dev -y

# pip packages
pip install --upgrade pip
pip install --upgrade -r requirements.txt
python -m spacy download en

# custom fairseq (fork of 0.6.1 with gec modifications)
pip install --editable fairseq

# errant
git clone

# pattern3 (see for any installation issues)
pip install pattern3
python -c "import site; print(site.getsitepackages())"
cp PATH_TO_SITE_PACKAGES/pattern3/text/

Download & Preprocess Data


Restricted Track

  • Prepare data for the restricted track
    python --track 1
  • Pre-train
    • If you train the model, the system will automatically create a checkpoint directory.
    • Fill it in {ckpt_dir}.
    • Also fill in the number of GPUs used for training in {ngpu}.
    python --track 1 --train-mode pretrain --model base --ngpu {ngpu}
    python --track 1 --subset valid --ckpt-dir {ckpt_dir}
  • Train
    • If you evaluate the model, the system will automatically create an output directory.
    • Fill the previous model output directory into {prev_model_output_dir}.
    python --track 1 --train-mode train --model base --ngpu {ngpu} \
        --lr 1e-4 --max-epoch 40 --reset --prev-model-output-dir {prev_model_output_dir}
    python --track 1 --subset valid --ckpt-dir {ckpt_dir}
  • Fine-tune
    • Fill the best validation report into {prev_model_output_fpath}.
    • Then will give you a list of error types to be removed.
    • Fill them into {remove_error_type_lst}.
    python --track 1 --train-mode finetune --model base --ngpu {ngpu} \
        --lr 5e-5 --max-epoch 80 --reset --prev-model-output-dir {prev_model_output_dir}
    python --track 1 --subset valid --ckpt-dir {ckpt_dir}
    python --report {prev_model_output_fpath} \
        --max_error_types 10 --n_simulations 1000000
    python --track 1 --subset test --ckpt-fpath {ckpt_fpath} \
        --remove-unk-edits --remove-error-type-lst {remove_error_type_lst} \
        --apply-rerank --preserve-spell --max-edits 7 

Low Resource Track

  • Prepare data for the low resource track
    python --track 3
  • Pre-train
    python --track 3 --train-mode pretrain --model base --ngpu {ngpu}
    python --track 3 --subset valid --ckpt-dir {ckpt_dir}
  • Train
    python --track 3 --train-mode finetune --model base --ngpu {ngpu} \
        --max-epoch 40 --prev-model-output-dir {prev_model_output_dir} 
    python --track 3 --subset valid --ckpt-dir {ckpt_dir}
    python --track 3 --subset test --ckpt-fpath {ckpt_fpath} \
        --remove-unk-edits --remove-error-type-lst {remove_error_type_lst} \
        --apply-rerank --preserve-spell --max-edits 7 

A Note on fairseq

We ran our Transformer models using fairseq-0.6.1. We had to make several modifications to the package though, including our own implementation of the copy-augmented Transformer model. You can find all of our modifications in fairseq/


If you use our code for research, please cite our work as:

    title = "A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning",
    author = "Choe, Yo Joong  and
      Ham, Jiyeon  and
      Park, Kyubyong  and
      Yoon, Yeoil",
    booktitle = "Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "",
    pages = "213--227",
    abstract = "Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited.To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are sub-sequently used to pre-train Transformer models. Then, by sequentially applying transfer learning, we adapt these models to the domain and style of the test set. Combined with a context-aware neural spellchecker, our system achieves competitive results in both restricted and low resource tracks in ACL 2019 BEAShared Task. We release all of our code and materials for reproducibility.",


No releases published


No packages published