This folder is for
r2c models. They broadly follow the allennlp configuration format. If you want r2c, you'll want to look at
Replicating validation results
Here's how you can replicate my val results. Run the command(s) below. First, you might want to make your GPUs available. When I ran these experiments I used
source activate r2c && export LD_LIBRARY_PATH=/usr/local/cuda-9.0/ && export PYTHONPATH=/home/rowan/code/r2c && export CUDA_VISIBLE_DEVICES=0,1,2
- For question answering, run:
python train.py -params multiatt/default.json -folder saves/flagship_answer
- for Answer justification, run
python train.py -params multiatt/default.json -folder saves/flagship_rationale -rationale
You can combine the validation predictions using
python eval_q2ar.py -answer_preds saves/flagship_answer/valpreds.npy -rationale_preds saves/flagship_rationale/valpreds.npy
Submitting to the leaderboard
VCR features a leaderboard where you can submit your answers on the test set. Submitting to the leaderboard is easy! You'll need to submit something like the example submission CSV file. You can use the
eval_for_leaderboard.py script, which formats everything in the right way.
Essentially, your submission has to have the following columns:
To evaluate, I'll first take the argmax over the answer choices, then take the argmax over your rationale choices (conditioned on the right answers). These give two sets of predictions, which can be used to compute Q->A and QA->R accuracy. For Q->AR accuracy, we take a bitwise AND between the hits of the QA and QAR columns. In other words, to get a question right, you have to get the answer AND the rationale right.