This folder is for r2c models. They broadly follow the allennlp configuration format. If you want r2c, you'll want to look at multiatt.

Replicating validation results

Here's how you can replicate my val results. Run the command(s) below. First, you might want to make your GPUs available. When I ran these experiments I used

source activate r2c && export LD_LIBRARY_PATH=/usr/local/cuda-9.0/ && export PYTHONPATH=/home/rowan/code/r2c && export CUDA_VISIBLE_DEVICES=0,1,2

  • For question answering, run:
python -params multiatt/default.json -folder saves/flagship_answer 
  • for Answer justification, run
python -params multiatt/default.json -folder saves/flagship_rationale -rationale

You can combine the validation predictions using python -answer_preds saves/flagship_answer/valpreds.npy -rationale_preds saves/flagship_rationale/valpreds.npy

Submitting to the leaderboard

VCR features a leaderboard where you can submit your answers on the test set. Submitting to the leaderboard is easy! You'll need to submit something like the example submission CSV file. You can use the script, which formats everything in the right way.

Essentially, your submission has to have the following columns:


To evaluate, I'll first take the argmax over the answer choices, then take the argmax over your rationale choices (conditioned on the right answers). These give two sets of predictions, which can be used to compute Q->A and QA->R accuracy. For Q->AR accuracy, we take a bitwise AND between the hits of the QA and QAR columns. In other words, to get a question right, you have to get the answer AND the rationale right.