Skip to content
A BERT-Based Machine Reading Comprehension Baseline
Branch: master
Clone or download
Latest commit 653e0ee May 2, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
bert a demo for DREAM Apr 23, 2019 Update May 2, 2019

A BERT-Based Machine Reading Comprehension Baseline

This repository maintains a machine reading comprehension baseline based on BERT. The implementations follow the baseline system descriptions in the following two papers.

If you find this code useful, please consider citing the following papers.

  title={Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension},
  author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},

  title={Improving Question Answering with External Knowledge},
  author={Pan, Xiaoman and Sun, Kai and Yu, Dian and Ji, Heng and Yu, Dong},

Here, we show the usage of this baseline using a demo designed for DREAM, a dialogue-based three-choice machine reading comprehension task.

  1. Download and unzip the pre-trained language model from and set up the environment variable for BERT by export BERT_BASE_DIR=/PATH/TO/BERT/DIR.
  2. Copy the data folder data from the DREAM repo to bert/.
  3. In bert, execute python --tf_checkpoint_path=$BERT_BASE_DIR/bert_model.ckpt --bert_config_file=$BERT_BASE_DIR/bert_config.json --pytorch_dump_path=$BERT_BASE_DIR/pytorch_model.bin
  4. Execute python --task_name dream --do_train --do_eval --data_dir . --vocab_file $BERT_BASE_DIR/vocab.txt --bert_config_file $BERT_BASE_DIR/bert_config.json --init_checkpoint $BERT_BASE_DIR/pytorch_model.bin --max_seq_length 512 --train_batch_size 24 --learning_rate 2e-5 --num_train_epochs 8.0 --output_dir dream_finetuned --gradient_accumulation_steps 3
  5. The resulting fine-tuned model, predictions, and evaluation results are stored in bert/dream_finetuned.

Results on DREAM:

We run the experiments five times with different random seeds and report the best development set performance and the corresponding test set performance.

Method/Language Model Batch Size Learning Rate Epochs Dev Test
BERT-Base, Uncased 24 2e-5 8 63.4 63.2
BERT-Large, Uncased 24 2e-5 16 66.0 66.8
Human Performance 93.9 95.5
Ceiling Performance 98.7 98.6

Environment: The code has been tested with Python 3.6 and PyTorch 1.0

You can’t perform that action at this time.