The SelQA dataset provides crowdsourced annotation for two selection-based question answer tasks, answer sentence selection and answer triggering. Our dataset composes about 8K factoid questions for the top-10 most prevalent topics among Wikipedia articles. Our study illustrates that question answering systems trained on SelQA show nearly as robust results as ones trained on a much larger dataset such as SQuAD.
- SelQA: A New Benchmark for Selection-based Question Answering. Tomasz Jurczyk, Michael Zhai, and Jinho D. Choi. In Proceedings of the 28th International Conference on Tools with Artificial Intelligence, ICTAI'16, 820-827, San Jose, CA, 2016 (slides).
- Analysis of Wikipedia-based Corpora for Question Answering. Tomasz Jurczyk, Amit Deshmane, and Jinho D. Choi. arXiv:1801.02073, 2017.
We gratefully acknowledge the support from Infosys Ltd. Any contents in this material are those of the authors and do not necessarily reflect the views of Infosys Ltd.