Adapted from the code for the Default Final Project (SQuAD) for CS224n, Winter 2018
Note: this code is adapted in part from the Neural Language Correction code by the Stanford Machine Learning Group.
This repo is a deep learning system for Machine Comprehension, as defined by the Stanford Question Answering Dataset (SQuAD) challenge. The task is as follows: given a paragraph, and a question about that paragraph as inputs, answer the question correctly by providing the start and ending word within the context.
Our baseline model has three components: a RNN encoder layer, that encodes both the context and the question into hidden states, an attention layer, that combines the context and question representations, and an output layer, which applies a fully connected layer and then two separate softmax layers (one to get the start location, and one to get the end location of the answer span). All these modules can be found in modules.py
, and the code to connect them is in qa_model.py
.
Our implementation follows the original BiDAF paper, titled "Bidirectional Attention Flow for Machine Comprehension ". We implemented the initial embedding layer, contextual embedding layer, bi-attention layer, modeling layer, and finally the output layer. More details can be found in our paper.
We adapt a co-attention layer from the recently published paper “Dynamic coattention networks forquestion answering”. So that we can compare its performance to that of the bi-attention layer, we modify the layer to mimic the bi-attention layer’s structure, but still capture essence of co-attention rather than bi-attention.
We adapt a self-attention layer from the recently published paper “r-net: Machine Reading Com-prehension with Self-Matching Networks”. Similarly, we modify it to mimic our bi-attention layer.
EM (%) | F1 (%) | |
---|---|---|
single | 66.2 | 76.1 |
ensemble | 68.4 | 77.8 |
EM (%) | F1 (%) | |
---|---|---|
ensemble | 67.7 | 77.1 |
Refer to our paper for more details. See SQuAD Leaderboard to compare with other models.
./get_started.sh
Activate the virtual environment: source activate squad
Training a Model:
cd code # Change to code directory
python main.py --experiment_name=<EXPERIMENT_NAME> --mode=train --model_type=<MODEL_TYPE> # model defaults to bidaf use flags 'baseline', 'coattn', 'selfattn', or 'bidaf' as described above
Inspecting Output:
python main.py --experiment_name=<EXPERIMENT_NAME> --mode=show_examples
Running Official Eval on a tiny dev dataset from CodaLab:
cd cs224n-squad # Go to the root of the repository
cl download -o data/tiny-dev.json 0x4870af # Download the sanity check dataset
python code/main.py --experiment_name=<EXPERIMENT_NAME> --model_type=<MODEL_TYPE> --mode=official_eval \
--json_in_path=data/tiny-dev.json \
--ckpt_load_dir=experiments/<EXPERIMENT_NAME>/best_checkpoint
Then run the following to evaluate the predictions:
python code/evaluate.py data/tiny-dev.json predictions.json
To run the ensembling for multiple experiments, modify the run_ensemble.sh
bash file and run:
./run_ensemble.sh tiny-dev.json predictions.json
Our paper can be found here, and our poster can be found here.