Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.
In this file we describe how to analyze QA models trained on datasets with a hybrid of tabular and textual context.
In this task, only the datalab
format is supported so far:
- (1)
datalab
: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset.
In this task, your system outputs should be:
[
{"q_id": "b64f475c0e1cc1e653d0b239f09da0d7", "answer": ["48.4"], "scale": "million"},
{"q_id": "a7457ad860d3137ebc05538509ad8ac8", "answer": 31.95, "scale": "million"},
{"q_id": "b484c510cb9ee8dd1f9524e0fad578dd", "answer": 35.15, "scale": "million"},
...
]
where each line represents one predicted answer. Specifically,
q_id
represents the question idanswer
denotes the predicted answerscale
is the predicted scale.
Check this page to know detailed
definition of scale
.
An example system output file is here: predictions_list.json
In order to perform your basic analysis, we can run the following command:
explainaboard --task qa-tat --output-file-type json --dataset tat_qa --system-outputs predictions_list.json > report.json
where
--task
: denotes the task name, you can find all supported task names here--system-outputs
: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2--dataset
:denotes the dataset namereport.json
: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.