WikiQAar

WIKIQAar is a bilingual English--Arabic Question Answering corpus built on top of WIKIQA. In order to build WIKIQAar, we independently produced two (sub-)corpora. On the question side, we produced a parallel corpus by translating the questions into Arabic. We applied two automatic machine translators and crowdsourced the selection of the best one to be incorporated into the corpus. On the reference side, we produced a comparable corpus by retrieving the Arabic edition of the corresponding Wikipedia articles. In order to identify the exact answers, we applied a supervised model to find out which text fragment in the Arabic article corresponded to the answer in English (hence composing yet another parallel collection).

We make this dataset publicly available for the community to explore models such as cross-language question answering and answer triggering across languages; i.e. deciding if an answer exists in the same language as the question and ---if not--- searching in the other language.

List of files

WIKIQAar is organised in the same way as WIKIQA English.

The dataset comes in four files:

WikiQAar.tsv: contains all data
WikiQAar-train.tsv, WikiQAar-dev.tsv, WikiQAar-test.tsv: train/dev/test split of WikiQAar.tsv as in WIKIQA English.

Together with the WIKIQAar corpus, two other test sets are available:

WikiQAQuestionsCorpusAr-En.tsv: a collection of 3047 English questions translated into Arabic via machine translation and crowdsourcing.
WikiQAArticlesCorpusAr-En.tsv: a collection of Wikipedia articles in Arabic and English.

References

WIKIQA: A Challenge Dataset for Open-Domain Question Answering

Contact: abbes.ines@yahoo.com and albarron@hbku.edu.qa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

WikiQAArticlesCorpusAr-En.tsv

WikiQAArticlesCorpusAr-En.tsv

WikiQAQuestionsCorpusAr-En.tsv

WikiQAQuestionsCorpusAr-En.tsv

WikiQAar-dev.tsv

WikiQAar-dev.tsv

WikiQAar-test.tsv

WikiQAar-test.tsv

WikiQAar-train.tsv

WikiQAar-train.tsv

WikiQAar.tsv

WikiQAar.tsv

Repository files navigation

WikiQAar

List of files

References

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
WikiQAArticlesCorpusAr-En.tsv		WikiQAArticlesCorpusAr-En.tsv
WikiQAQuestionsCorpusAr-En.tsv		WikiQAQuestionsCorpusAr-En.tsv
WikiQAar-dev.tsv		WikiQAar-dev.tsv
WikiQAar-test.tsv		WikiQAar-test.tsv
WikiQAar-train.tsv		WikiQAar-train.tsv
WikiQAar.tsv		WikiQAar.tsv

qcri/WikiQAar

Folders and files

Latest commit

History

Repository files navigation

WikiQAar

List of files

References

About

Resources

Stars

Watchers

Forks