Skip to content

stevezheng23/mrc_tf

Repository files navigation

Machine Reading Comprehension

Machine reading comprehension (MRC), a task which asks machine to read a given context then answer questions based on its understanding, is considered one of the key problems in artificial intelligence and has significant interest from both academic and industry. Over the past few years, great progress has been made in this field, thanks to various end-to-end trained neural models and high quality datasets with large amount of examples proposed.

Figure 1: MRC example from SQuAD 2.0 dev set

Setting

  • Python 3.6.7
  • Tensorflow 1.13.1
  • NumPy 1.13.3
  • SentencePiece 0.1.82

DataSet

  • SQuAD is a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
  • CoQA a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation. CoQA is pronounced as coca
  • QuAC is a dataset for modeling, understanding, and participating in information seeking dialog. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.

Usage

  • Run SQuAD experiment
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_squad.py \
    --spiece_model_file=model/cased_L-24_H-1024_A-16/spiece.model \
    --model_config_path=model/cased_L-24_H-1024_A-16/xlnet_config.json \
    --init_checkpoint=model/cased_L-24_H-1024_A-16/xlnet_model.ckpt \
    --task_name=v2.0 \
    --random_seed=100 \
    --predict_tag=xxxxx \
    --data_dir=data/squad/v2.0 \
    --output_dir=output/squad/v2.0/data \
    --model_dir=output/squad/v2.0/checkpoint \
    --export_dir=output/squad/v2.0/export \
    --max_seq_length=512 \
    --train_batch_size=12 \
    --predict_batch_size=12 \
    --num_hosts=1 \
    --num_core_per_host=4 \
    --learning_rate=3e-5 \
    --train_steps=8000 \
    --warmup_steps=1000 \
    --save_steps=1000 \
    --do_train=true \
    --do_predict=true \
    --do_export=true \
    --overwrite_data=false
  • Run CoQA experiment
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_coqa.py \
    --spiece_model_file=model/cased_L-24_H-1024_A-16/spiece.model \
    --model_config_path=model/cased_L-24_H-1024_A-16/xlnet_config.json \
    --init_checkpoint=model/cased_L-24_H-1024_A-16/xlnet_model.ckpt \
    --task_name=v1.0 \
    --random_seed=100 \
    --predict_tag=xxxxx \
    --data_dir=data/coqa/v1.0 \
    --output_dir=output/coqa/v1.0/data \
    --model_dir=output/coqa/v1.0/checkpoint \
    --export_dir=output/coqa/v1.0/export \
    --max_seq_length=512 \
    --train_batch_size=12 \
    --predict_batch_size=12 \
    --num_hosts=1 \
    --num_core_per_host=4 \
    --learning_rate=3e-5 \
    --train_steps=8000 \
    --warmup_steps=1000 \
    --save_steps=1000 \
    --do_train=true \
    --do_predict=true \
    --do_export=true \
    --overwrite_data=false
  • Run QuAC experiment
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_quac.py \
    --spiece_model_file=model/cased_L-24_H-1024_A-16/spiece.model \
    --model_config_path=model/cased_L-24_H-1024_A-16/xlnet_config.json \
    --init_checkpoint=model/cased_L-24_H-1024_A-16/xlnet_model.ckpt \
    --task_name=v1.0 \
    --random_seed=100 \
    --predict_tag=xxxxx \
    --data_dir=data/quac/v0.2 \
    --output_dir=output/quac/v0.2/data \
    --model_dir=output/quac/v0.2/checkpoint \
    --export_dir=output/quac/v0.2/export \
    --max_seq_length=512 \
    --train_batch_size=12 \
    --predict_batch_size=12 \
    --num_hosts=1 \
    --num_core_per_host=4 \
    --learning_rate=3e-5 \
    --train_steps=8000 \
    --warmup_steps=1000 \
    --save_steps=1000 \
    --do_train=true \
    --do_predict=true \
    --do_export=true \
    --overwrite_data=false

Experiment

SQuAD v1.1

Figure 2: Illustrations of fine-tuning XLNet on SQuAD v1.1 task

Model Train Data # Train Steps Batch Size Max Length Learning Rate EM F1
XLNet-base SQuAD 2.0 8,000 48 512 3e-5 85.90 92.17
XLNet-large SQuAD 2.0 8,000 48 512 3e-5 88.61 94.28

Table 1: The dev set performance of XLNet model finetuned on SQuAD v1.1 task

SQuAD v2.0

Figure 3: Illustrations of fine-tuning XLNet on SQuAD v2.0 task

Model Train Data # Train Steps Batch Size Max Length Learning Rate EM F1
XLNet-base SQuAD 2.0 8,000 48 512 3e-5 80.23 82.90
XLNet-large SQuAD 2.0 8,000 48 512 3e-5 85.72 88.36

Table 2: The dev set performance of XLNet model finetuned on SQuAD v2.0 task

CoQA v1.0

Figure 4: Illustrations of fine-tuning XLNet on CoQA v1.0 task

Model Train Data # Train Steps Batch Size Max Length Max Query Len Learning Rate EM F1
XLNet-base CoQA 1.0 6,000 48 512 128 3e-5 76.4 84.4
XLNet-large CoQA 1.0 6,000 48 512 128 3e-5 81.8 89.4

Table 3: The dev set performance of XLNet model finetuned on CoQA v1.0 task

QuAC v0.2

Figure 5: Illustrations of fine-tuning XLNet on QuAC v0.2 task

Model Train Data # Train Steps Batch Size Max Length Max Query Len Learning Rate Overall F1 HEQQ HEQD
XLNet-base QuAC 0.2 8,000 48 512 128 2e-5 66.4 62.6 6.8
XLNet-large QuAC 0.2 8,000 48 512 128 2e-5 71.5 68.0 11.1

Table 3: The dev set performance of XLNet model finetuned on QuAC v0.2 task

Reference