Dataset Card for KQA Pro

annotations_creators

language

language_creators

license

multilinguality

pretty_name

size_categories

source_datasets

Dataset Card for KQA Pro

Dataset Description

Homepage: http://thukeg.gitee.io/kqa-pro/
Repository: https://github.com/shijx12/KQAPro_Baselines
Paper: KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base
Leaderboard: http://thukeg.gitee.io/kqa-pro/leaderboard.html
Point of Contact: shijx12 at gmail dot com

Dataset Summary

KQA Pro is a large-scale dataset of complex question answering over knowledge base. The questions are very diverse and challenging, requiring multiple reasoning capabilities including compositional reasoning, multi-hop reasoning, quantitative comparison, set operations, and etc. Strong supervisions of SPARQL and program are provided for each question.

Supported Tasks and Leaderboards

It supports knowlege graph based question answering. Specifically, it provides SPARQL and program for each question.

Languages

English

Dataset Structure

train.json/val.json

[
    {
        'question': str,
        'sparql': str, # executable in our virtuoso engine
        'program': 
        [
            {
                'function': str,  # function name
                'dependencies': [int],  # functional inputs, representing indices of the preceding functions
                'inputs': [str],  # textual inputs
            }
        ],
        'choices': [str],  # 10 answer choices
        'answer': str,  # golden answer
    }
]

test.json

[
    {
        'question': str,
        'choices': [str],  # 10 answer choices
    }
]

Data Configs

This dataset has two configs: train_val and test because they have different available fields. Please specify this like load_dataset('drt/kqa_pro', 'train_val').

Data Splits

train, val, test

Additional Information

Knowledge Graph File

You can find the knowledge graph file kb.json in the original github repository. It comes with the format:

{
    'concepts':
    {
        '<id>':
        {
            'name': str,
            'instanceOf': ['<id>', '<id>'], # ids of parent concept
        }
    },
    'entities': # excluding concepts
    {
        '<id>': 
        {
            'name': str,
            'instanceOf': ['<id>', '<id>'], # ids of parent concept
            'attributes':
            [
                {
                    'key': str, # attribute key
                    'value':  # attribute value
                    {
                        'type': 'string'/'quantity'/'date'/'year',
                        'value': float/int/str, # float or int for quantity, int for year, 'yyyy/mm/dd' for date
                        'unit': str,  # for quantity
                    },
                    'qualifiers':
                    {
                        '<qk>':  # qualifier key, one key may have multiple corresponding qualifier values
                        [
                            {
                                'type': 'string'/'quantity'/'date'/'year',
                                'value': float/int/str,
                                'unit': str,
                            }, # the format of qualifier value is similar to attribute value
                        ]
                    }
                },
            ]
            'relations':
            [
                {
                    'predicate': str,
                    'object': '<id>', # NOTE: it may be a concept id
                    'direction': 'forward'/'backward',
                    'qualifiers':
                    {
                        '<qk>':  # qualifier key, one key may have multiple corresponding qualifier values
                        [
                            {
                                'type': 'string'/'quantity'/'date'/'year',
                                'value': float/int/str,
                                'unit': str,
                            }, # the format of qualifier value is similar to attribute value
                        ]
                    }
                },
            ]
        }
    }
}

How to run SPARQLs and programs

We implement multiple baselines in our codebase, which includes a supervised SPARQL parser and program parser.

In the SPARQL parser, we implement a query engine based on Virtuoso. You can install the engine based on our instructions, and then feed your predicted SPARQL to get the answer.

In the program parser, we implement a rule-based program executor, which receives a predicted program and returns the answer. Detailed introductions of our functions can be found in our paper.

How to submit results of test set

You need to predict answers for all questions of test set and write them in a text file in order, one per line. Here is an example:

Tron: Legacy
Palm Beach County
1937-03-01
The Queen
...

Then you need to send the prediction file to us by email caosl19@mails.tsinghua.edu.cn, we will reply to you with the performance as soon as possible. To appear in the learderboard, you need to also provide following information:

model name
affiliation
open-ended or multiple-choice
whether use the supervision of SPARQL in your model or not
whether use the supervision of program in your model or not
single model or ensemble model
(optional) paper link
(optional) code link

Licensing Information

MIT License

Citation Information

If you find our dataset is helpful in your work, please cite us by

@inproceedings{KQAPro,
  title={{KQA P}ro: A Large Diagnostic Dataset for Complex Question Answering over Knowledge Base},
  author={Cao, Shulin and Shi, Jiaxin and Pan, Liangming and Nie, Lunyiu and Xiang, Yutong and Hou, Lei and Li, Juanzi and He, Bin and Zhang, Hanwang},
  booktitle={ACL'22},
  year={2022}
}

Contributions

Thanks to @happen2me for adding this dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitattributes		.gitattributes
README.md		README.md
kb.json		kb.json
kqa_pro.py		kqa_pro.py
test.json		test.json
train.json		train.json
val.json		val.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

README.md

README.md

kb.json

kb.json

kqa_pro.py

kqa_pro.py

test.json

test.json

train.json

train.json

val.json

val.json

Repository files navigation

Dataset Card for KQA Pro

Table of Contents

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Configs

Data Splits

Additional Information

Knowledge Graph File

How to run SPARQLs and programs

How to submit results of test set

Licensing Information

Citation Information

Contributions

About

Releases

Packages

Languages

happen2me/kqa-pro-datasets

Folders and files

Latest commit

History

Repository files navigation

Dataset Card for KQA Pro

Table of Contents

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Configs

Data Splits

Additional Information

Knowledge Graph File

How to run SPARQLs and programs

How to submit results of test set

Licensing Information

Citation Information

Contributions

About

Resources

Stars

Watchers

Forks

Languages