Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.
In this file we describe how to analyze open-domain QA models. We will give an example using the natural_questions_comp_gen dataset, but other datasets can be analyzed in a similar way.
-
(1)
datalab
: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset. -
(2)
json
(basically, it's a list of dictionaries with two keys:question
andanswers
)
[
{'question': 'who got the first nobel prize in physics', 'answers': ['Wilhelm Conrad Röntgen']},
{'question': 'when is the next deadpool movie being released', 'answers': ['May 18 , 2018']},
...
]
In this task, your system outputs should be as follows:
william henry bragg
may 18, 2018
...
where each line represents one predicted answer. An example system output file is here: test.dpr.nq.txt
Let's say we have several files such as
etc. from different systems.
In order to perform your basic analysis, we can run the following command:
explainaboard --task qa-open-domain --dataset natural_questions_comp_gen --system-outputs ./data/system_outputs/qa_open_domain/test.dpr.nq.txt > report.json
where
--task
: denotes the task name, you can find all supported task names here--system-outputs
: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2--dataset
:denotes the dataset namereport.json
: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.