Skip to content
/ paxqa Public

Code and Data for "PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale" (EMNLP 2023)

Notifications You must be signed in to change notification settings

manestay/paxqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

PAXQA Datasets and Code

Code and Data for the paper "PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale" at EMNLP 2023 (Findings).

PAXQA Data

We release the PAXQA datasets on the HuggingFace Hub. The fields are consistent with the MLQA (and therefore SQuAD) fields.

The PAXQA test and validation sets are available at this link, and consists of 1788 QA examples total.

The PAXQA train sets are available at this link, and consists of 660K QA examples total. PAXQA_HWA are the 2 *gale* datasets, while PAXQA_AWA are the other 5 datasets.

Dataset sizes

Table 1 of the paper gives the number of QA examples for each split and each language: Table 1

You can verify the numbers with the files you downloaded above (contact the authors if there are inconsistencies).

Code

This section is forthcoming.

Citation

@article{li2023paxqa,
      title={\textsc{PaxQA}: Generating Cross-lingual Question Answering Examples at Training Scale}, 
      author={Bryan Li and Chris Callison-Burch},
      year={2023},
      journal={Findings of the Association for Computational Linguistics: EMNLP}
}

About

Code and Data for "PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale" (EMNLP 2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published