TallyQA Stats

TallyQA is the only dataset to distinguish between simple and complex counting questions. Moreover, to make sure that the complex questions have conterfactuals in images, we use Amazon Mechanical Turk (AMT) for data collection. In summary, it has

  • 287K questions
  • 165K images
  • 19K complex questions collected from human annotators using AMT

Download QA pairs

Please, download the dataset using this link. The zipped file contains json files for train and test splits. Each entries in the dataset contains the following fields:

 {'answer': 4,
  'data_source': 'imported_genome',
  'image': 'VG_100K_2/2410408.jpg',
  'image_id': 92410408,
  'issimple': False,
  'question': 'How many headlights does the black bus have?',
  'question_id': 30095774}

Attributes such as answer, image_id, image ,question should be straighforward. The issimple boolean flag field denotes whether the question is simple (True) or complex ( False). The data_source field shows the source of the question-answer pair. Questions collected from AMT annotators have data_source = amt. The imported QA pairs are derived/filtered from TDIUC and VQA datasets.

Download Images

The images used in the dataset are derived from COCO and Visual Genome. All the images can be downloaded from the publicly available datasets below:

Download HowmanyQA

HowmanyQA dataset can be downloaded from this repo or follow the setup instructions from HowmanyQA. HowmanyQA dataset has IDs that reference the questions found in the "VQA 2.0" and "Visual Genome" datasets, which must be downloaded separately.

Please cite the work using the following Bibtex:

  title={TallyQA: Answering Complex Counting Questions},
  author={Acharya, Manoj and Kafle, Kushal and Kanan, Christopher},


