No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Googler Jannis Bulian
Googler and Jannis Bulian Project import generated by Copybara.
PiperOrigin-RevId: 217384891
Latest commit bd5d47b Oct 16, 2018

ActiveQA: Active Question Answering

This repo contains code for our paper Ask the Right Questions: Active Question Reformulation with Reinforcement Learning.

Small forewarning, this is still much more of a research codebase than a library. No support is provided.

If you use this code for your research, please cite the paper.


ActiveQA is an agent that transforms questions online in order to find the best answers. The agent consists of a Tensorflow model that reformulates questions and an Answer Selection model. It interacts with an environment that contains a question-answering system. The agent queries the environment with variants of a question and calculates a score for the answer against the original question. The model is trained end-to-end using reinforcement learning.

This version addresses the SearchQA question-answering task, and the environment consists of the Bi-directional Attention Flow (BiDAF) model of Seo et al. (2017).



We require tensorflow and many other supporting libraries. Tensorflow should be installed separately following the docs. To install the other dependencies use

pip install -r requirements.txt

Note: We only ran this code with Python 2, so Python 3 is not officially supported.


Download the source dataset from SearchQA, GloVe, and NLTK corpus and save them in $HOME/data.

export DATA_DIR=$HOME/data
mkdir $DATA_DIR


Download the SearchQA dataset (~600 MB) for training, testing, and validation here:

<Download the dataset to $DATA_DIR/>
unzip $DATA_DIR/ -d $DATA_DIR

Download GloVe (~850 MB):

export GLOVE_DIR=$DATA_DIR/glove
mkdir $GLOVE_DIR

wget -c -O $GLOVE_DIR/

Download NLTK (for tokenizer). Make sure that nltk is installed!

python -m nltk.downloader -d $HOME/nltk_data punkt

Download the reformulator model pretrained on UN+Paralex datasets (~140 MB):

export PRETRAINED_DIR=$DATA_DIR/pretrained



The SearchQA dataset requires a 2-step preprocessing:

  1. Convert into SQuAD data format as the model was written to only work with that format.

    export SQUAD_DIR=$DATA_DIR/squad
    mkdir $SQUAD_DIR
    python -m searchqa.prepro \
    --searchqa_dir=$DATA_DIR/SearchQA \
  2. Preprocess the SearchQA dataset in SQuAD format (along with GloVe vectors) and save them in $PWD/data/squad (~60 minutes):

    python -m third_party.bi_att_flow.squad.prepro \
    --glove_dir=$GLOVE_DIR \

Note that Python2 and Python3 handle Unicode differently and hence the preprocessing output differs. For converting the SearchQA format to SQuAD format either version can be used; use Python3 for other datasets.


We need to compile the gRPC interface for the Environment Server.

chmod +x; ./

Run Environment Server

The training requires running the environment gRPC server, which receives queries from the ActiveQA agent and sends back one response per query.

python -m px.environments.bidaf_server \
--port=10000 \
--squad_data_dir=data/squad \
--bidaf_shared_file=data/bidaf/shared.json \

Reformulator Training

We first train reformulator from a model pretrained on UN and Paralex datasets. It should take a week on a single P100 GPU to reach ~42 F1 score on SearchQA's dev set.

export OUT_DIR=/tmp/active-qa
mkdir $OUT_DIR

export REFORMULATOR_DIR=$OUT_DIR/reformulator

echo "model_checkpoint_path: \"$PRETRAINED_DIR/translate.ckpt-1460356\"" > checkpoint
cp -f checkpoint $REFORMULATOR_DIR
cp -f checkpoint $REFORMULATOR_DIR/initial_checkpoint.txt

python -m px.nmt.reformulator_and_selector_training \
--environment_server_address=localhost:10000 \
--hparams_path=px/nmt/example_configs/reformulator.json \
--enable_reformulator_training=true \
--enable_selector_training=false \
--input_questions=$SQUAD_DIR/train-questions.txt \
--input_annotations=$SQUAD_DIR/train-annotation.txt \
--dev_questions=$SQUAD_DIR/dev-questions.txt \
--dev_annotations=$SQUAD_DIR/dev-annotation.txt \
--glove_path=$GLOVE_DIR/glove.6B.100d.txt \

Note: if you don't want to wait a week of training, you can download this checkpoint of the reformulator trained on SearchQA, with dev set F1 score of 42.5. Note that this is not the exact model analyzed in the paper, but one with equivalent performance.

Selector Training

After training the reformulator, we can now train the selector. It should take 2-3 days on a single P100 GPU to reach ~47.5 F1 score on SearchQA's dev set.

python -m px.nmt.reformulator_and_selector_training \
--environment_server_address=localhost:10000 \
--hparams_path=px/nmt/example_configs/reformulator.json \
--enable_reformulator_training=false \
--enable_selector_training=true \
--input_questions=$SQUAD_DIR/train-questions.txt \
--input_annotations=$SQUAD_DIR/train-annotation.txt \
--dev_questions=$SQUAD_DIR/dev-questions.txt \
--dev_annotations=$SQUAD_DIR/dev-annotation.txt \
--glove_path=$GLOVE_DIR/glove.6B.100d.txt \
--batch_size_train=16 \
--batch_size_eval=64 \
--save_path=$OUT_DIR/selector \


This repository relies on the work of the following repositories:

and uses data from the following sources:


  author    = {Christian Buck and
               Jannis Bulian and
               Massimiliano Ciaramita and
               Andrea Gesmundo and
               Neil Houlsby and
               Wojciech Gajewski and
               Wei Wang},
  title     = {Ask the Right Questions: Active Question Reformulation with Reinforcement
  booktitle = {Sixth International Conference on Learning Representations (ICLR)},
  year      = {2018},
  month     = {May},
  address   = {Vancouver, Canada},
  url       = {},