# Evaluation dataset generation
In this notebook we will select 20 random questions for each question type  in the [mintaka dataset](https://github.com/amazon-science/mintaka).

Let's start by loading the dataset

In [2]:
# imports
import sys
import os
  
current = os.path.dirname(os.path.abspath(''))
parent_directory = os.path.dirname(current)

sys.path.append(parent_directory)

from utils.Json_utils import read_json, save_json

In [5]:
mintaka = read_json('../reference_datasets/Mintaka/mintaka_train.json') + read_json('../reference_datasets/Mintaka/mintaka_test.json')
len(mintaka)

18000

Let's check the structure for each question

In [6]:
mintaka[0].keys()

dict_keys(['id', 'question', 'translations', 'questionEntity', 'answer', 'category', 'complexityType'])

We will sort this list by complexity type

In [9]:
mintaka_sorted = {}
for question in mintaka:
    if mintaka_sorted.get(question.get('complexityType')) is None:
        mintaka_sorted[question.get('complexityType')] = []
    mintaka_sorted[question.get('complexityType')].append(question)

In [10]:
mintaka_sorted.keys()

dict_keys(['ordinal', 'intersection', 'generic', 'superlative', 'yesno', 'comparative', 'multihop', 'difference', 'count'])

Now we will generate our evaluation dataset by grabing n random questions from each complexity type.

In [13]:
import random

evaluation_dataset = {}
n = 20

for complexity_type, question_list in mintaka_sorted.items():
    evaluation_dataset[complexity_type] = random.sample(question_list, n)
    
evaluation_dataset.keys()

dict_keys(['ordinal', 'intersection', 'generic', 'superlative', 'yesno', 'comparative', 'multihop', 'difference', 'count'])

Let's save the dataset

In [15]:
save_json('../evaluation_datasets/evaluation_dataset.json', evaluation_dataset)