Skip to content

Study of Attention Mechanisms and Adversarial Training for QA task

License

Notifications You must be signed in to change notification settings

kkteru/reading-comprehension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the code for work in Study of Attention Mechanisms and Adversarial Training for Question Answering.

Getting started

  • If your machine doesn't have a GPU, change tensorflow-gpu==1.4.1 to tensorflow==1.4.1 in requirement.txt.
  • Run the startup script ./get_started.sh to create a conda environment squad. This would download the GloVe word embeddings, download and pre-process SQuAD 1.1, SQuAD 2.0 and Adversarial SQuAD datasets (the AddSent version) and store them all in data directory.
  • Activate the created environment source activate squad. The main script is main.py. You can check all the available parameters with python main.py --help.

Training the model

  • python main.py --experiment_name=baseline --mode=train should start training the baseline model.

Evaluating the model

  • python main.py --train_dir=baseline --mode=eval will evaluate the model (give F1 and EM scores) trained in the experiment named baseline.

Inspecting output

  • python main.py --train_dir=baseline --mode=show_examples will output 10 randomly selected samples of (context, question, predicted answer, true answer) for baseline model.

Attention model config

There are couple of switches to configure thee attention mechanism for the model.

  • --attention_model takes two values uni-dir (default) and bi-dir.
  • --attention_weight takes two values weighted and unweighted (default).

Adversarial training config

  • --eval_squad_2 flag sets the model to train on SQuAD 2.0 datatset.
  • --na_bias takes two values b(default) and w for simple-bias and aggregated-bias as described in the paper.

Acknowledgement

Big shout out to authors of cs224n-win18-squad! This code is based off of it.



APPENDIX

Reading comprehension task can expose the model's understanding of the language and meaning. Below are some of the datasets and possible ways of how they fall short on evaluating the model's true understanding of language and meaning.

Datasets

  • Squad 1.1

    • SOTA models trained on this dataset miserably fail in the face of adverserial examples.
    • Simple heuristic based model performs near SOTA, putting the increasing complex models in perspective.
    • Has all answerable questions (forces the model to answer a question when no correct answer exists).
    • Perturb the dataset by associating the questions with other paragraphs to artificially generate unanswerable questions to make the model more robust.

    Possible experiment:

    • Shuffle the sentences of each paragraph and study the behaviour of the models. Shuffling the sentences in the paragraph might change the answers or in most cases make the answer ambiguous.
  • Squad 2.0 (with unanswerable questions)

    • Makes the dataset harder by including questions with high token overlap with the context paragraph and also with plausible answer (same POS type) but no correct answer from the passage. This reinforces the model to have a greater understanding of language than pattern matching.

    Possible experiment:

    • Evaluate current SOTA models on Squad 2.0. Find the % of negetive examples that the model correctly deemed unanswerable. Add answers to those correclty predicted negative examples and check if the model answers now. (Still need to concretely formulate how to add answers to previously unanswerable questions.)

Hypothesis tested in this code base:

  • A variant of BiDAF model was trained on both Squad 1.1 and Squad 2.0. Both models were compared against their performance on adversarial test set released by Robin et. al. If the model trained on Squad 2.0 performs better, this prooves the hypothesis that Squad 2.0 improves the language understanding capabilities of the models.

About

Study of Attention Mechanisms and Adversarial Training for QA task

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages