# Synthetically-Created, Human-Evaluated Reasoning Dataset

- CommonsenseQA, StrategyQA
- Varitions on questions / tasks / queries
- Human eval
- Think step-by-step reasoning prompts
- Huamn eval
- Answers to query + reasoning
- Human eval

Aim for 1,000 starting sampleset.


In [20]:
from datasets import load_dataset
from huggingface_hub import InferenceClient

from pprint import pprint
import random
import os

In [2]:
"""
CommonsenseQA from TAU (Tel Aviv University)

https://arxiv.org/abs/1811.00937

"""
dataset_id = 'tau/commonsense_qa'

dataset = load_dataset(dataset_id)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
        num_rows: 9741
    })
    validation: Dataset({
        features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
        num_rows: 1221
    })
    test: Dataset({
        features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
        num_rows: 1140
    })
})


In [3]:
train_dataset = dataset['train']
print(train_dataset)

Dataset({
    features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
    num_rows: 9741
})


In [44]:
len(train_dataset)

9741

In [41]:
q_no = 13
pprint(train_dataset[q_no])

{'answerKey': 'A',
 'choices': {'label': ['A', 'B', 'C', 'D', 'E'],
             'text': ['loss of heat',
                      'revenge',
                      'expansion',
                      'relaxation',
                      'calm down']},
 'id': 'b63b9809c203321d6659ddf8551894bf',
 'question': "James was cooling off two quickly.  He would die if he didn't "
             'find some way to stop what?',
 'question_concept': 'cooling off'}


In [46]:
import json
import time

model_id = 'mistralai/Mixtral-8x7B-Instruct-v0.1'

queries = random.sample(range(1, len(train_dataset) + 1), 1000)

responses = []

for q in queries:
    query = train_dataset[q]['question'].replace('  ', ' ')
    choices = train_dataset[q]['choices']['text']

    client = InferenceClient(
        model=model_id,
        token=os.getenv('HUGGINGFACE_TOKEN'),
    )

    prompt = f"Q: {query}\n\
        Choices: {', '.join(choices)}\n\
        A: Let's think step by step."
    pprint(prompt)

    seed = random.randint(0, 10000)

    while True:
        try:
            output = client.text_generation(
                prompt,
                max_new_tokens=50,
                do_sample=True,
                seed=seed
            )
            break
        except:
            print("An interruption occurred. Retrying in 3 minutes.")
            time.sleep(180)  
    
    response = {
        'id': train_dataset[q]['id'],
        'response': output
    }
    responses.append(response)
    time.sleep(random.uniform(1, 10))

# Save responses to a JSON file
with open('dev/responses.json', 'w') as f:
    json.dump(responses, f)

('Q: Some say they are at odds, but many find solace in both science and '
 'what?\n'
 '        Choices: history studies, geography, religion, math, ghosts\n'
 "        A: Let's think step by step.")
('Q: The bus stop implemented a size restriction for luggage, where was the '
 'bus stop going?\n'
 '        Choices: boarding bus, city, fast, urban area, airport\n'
 "        A: Let's think step by step.")
('Q: What is the world full of?\n'
 '        Choices: countries, thought, water, universe, galaxy\n'
 "        A: Let's think step by step.")
('Q: Where might a large dog live?\n'
 '        Choices: guard house, shake hands, drink water, come home, small '
 'house\n'
 "        A: Let's think step by step.")
('Q: If you buying a potato, carrots, strawberries and bananas, how would you '
 'carry them home?\n'
 '        Choices: restaurants, shopping bags, two wheels, vegetable soup, '
 'exhaust pipe\n'
 "        A: Let's think step by step.")
('Q: What will someone like be if they receiv

KeyboardInterrupt: 

In [47]:
responses

[{'id': 'ff25fafed0a27ef1e13bb7fc531f6084',
  'response': " The answer cannot be history studies, geography, or math because the proponents of religion and science have their own views regarding these areas. Ghosts can be ruled out because there's no logical connection. This leaves us with religion. Hence,"},
 {'id': 'a9378eb283e485b6cc3963da18d7fac4',
  'response': ' The question says the bus stop implemented a size restriction for luggage. Therefore, many people will be carrying big luggage such as in an airport. That is why the answer is airport.'},
 {'id': 'b27e8dd39a3a8e4bbf8b1baa552d8e98',
  'response': " The world is a small part of observable universe, but not a whole universe. It is not a country, it is a planet that is a part of the galaxy, so that can't be an option neither. Most part of the world"},
 {'id': 'aaac3a907ce186aa60b9679373b7624d',
  'response': " We know dogs can live in homes and houses, but the question asks for a large dog, so we're looking for a larger livin