Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



10 Commits

Repository files navigation

Retrieved sentences for each (question, answer option) pair in three multiple-choice science question answering datasets (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) from the integrated reference corpus (IRC) plus the integrated external corpus (IEC) described in the paper Improving Question Answering with External Knowledge).

This is a re-implementation. As of the release date of this repository, the Allen Institute for Artificial Intelligence (AI2) disallows third parties to redistribute the ARC Corpus. Therefore, we cannot directly release a resource containing the retrieved sentences from the ARC Corpus. Instead, for all such sentences, we provide pointers to the ARC Corpus as well as a script for fetching the retrieved sentences based on the pointers and your local copy of the corpus.

If you find this resource useful, please cite the following paper.

  title={Improving Question Answering with External Knowledge},
  author={Pan, Xiaoman and Sun, Kai and Yu, Dian and Chen, Jianshu and 
          Ji, Heng and Cardie, Claire and Yu, Dong},
  booktitle={Proceedings of the Workshop on Machine Reading for Question Answering},
  address={Hong Kong, China},

Below are the detailed instructions.

  1. Clone this repository.
  2. Download from AI2, unzip it, and copy ARC_Corpus.txt (in the unzipped folder ARC-V1-Feb2018-2) to data folder. The CRC of ARC_Corpus.txt should be 8CFE08C6.
  3. Run python3 to generate arc_challenge.json, arc_easy.json, and openbookqa.json, which are input for models IRC + IEC and IRC + IEC + MD in Table 5 in the paper. The format of these files are as follows.
 FileName-QuestionID: [
  retrieved sentences for the 1st option,
  retrieved sentences for the 2nd option,

File names and question IDs follow and Retrieved sentences are splitted by "\n".


No releases published


No packages published