The QA-SRL Gold Standard
A repository for high-quality QASRL data collected from crowd-workers. This repository is the reference point for the dataset and evaluation protocols described in the paper Controlled Crowdsourcing for High-Quality QA-SRL Annotation.
- The paper can be found here
- The data files are located here
- The evaluation code can be found here
- Writing Questions and Answers guidelines can be found here
- Consolidation gudielines can be found here
Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen. Recently, a large-scale crowdsourced QA-SRL corpus and a trained parser were released. Trying to replicate the QA-SRL annotation for new texts, we found that the resulting annotations were lacking in quality, particularly in coverage, making them insufficient for further research and evaluation. In this paper, we present an improved crowdsourcing protocol for complex semantic annotation, involving worker selection and training, and a data consolidation phase. Applying this protocol to QA-SRL yielded highquality annotation with drastically higher coverage, producing a new gold evaluation dataset. We believe that our annotation protocol and gold standard will facilitate future replicable research of natural semantic annotations.
The data files are organized based on the source corpora: (1) Wikinews and (2) Wikipedia and the development and test partitions. The sentences were sampled from the large-scale dataset created by Fitzgerald, 2018, with 1000 sentences from each source split equally between development and test.
- The sentences can be found under data/sentences
- The expert set described in the paper can be found under data/expert
- The QA-SRL gold annotation (by a pipeline of 3 trained workers) can be found under data/gold
- Evaluation scripts can be found under qasrl/ folder. See the next sections on how to apply the evaluation procedure.
The data is presented in tabular, comma separated format, conversion to the data format used in Large-Scale QASRL is underway. The CSV format includes the following headers:
- qasrl_id - Sentence identifier. Same id is used in the sentence files.
- verb_idx - Zero-based index of the predicate token
- verb - The verb token as appearns in the sentence in token verb_idx
- question - The question representing the role.
- answer - Multiple answer spans, separated by:
!. Each answer is a contigious span of tokens that depicts an argument for the role.
- answer_range - Multiple token ranges, separated by:
!. Each range corresponds to the answer span in the same position in the answer column, and is formatted with INCLUSIVE_START:EXCLUSIVE_END token indices from the sentence.
Fields 7 through 14 are taken from the QA-SRL question template, as parsed by the QASRL state-machine. Given a valid QASRL question you can re-run the state machine using a sample script from the annotation repository and re-create these fields together with some more data.
- wh - The WH question word
- aux - The Auxilliary slot
- subj - The subject placeholder (someone or something)
- obj - The direct object placeholder (someone or something)
- prep - The preposition used for the indirect object
- obj2 - The indirect object placeholder (someone or something)
- is_negated - Boolean, detects if there is a negation in the question
- is_passive - Boolean, detects if passive voice is used in the question
Evaluating QA-SRL system output.
To evaluate a QA-SRL system output against a reference QASRL data you will have to follow these instructions.
- Compile both datasets into the described CSV format.
- Use the script evaluate_dataset.py with the following command line arguments:
- Path to the system output CSV file
- Path to the reference (ground truth) CSV file
- Path to the sentences file (optional) to create a complete matched/unmatched table