# Synthetically-Created, Human-Evaluated Reasoning Dataset

- CommonsenseQA, StrategyQA
- Varitions on questions / tasks / queries
- Human eval
- Think step-by-step reasoning prompts
- Huamn eval
- Answers to query + reasoning
- Human eval

Aim for 1,000 starting sampleset.


In [20]:
from datasets import load_dataset
from huggingface_hub import InferenceClient

from pprint import pprint
import random
import os

In [2]:
"""
CommonsenseQA from TAU (Tel Aviv University)

https://arxiv.org/abs/1811.00937

"""
dataset_id = 'tau/commonsense_qa'

dataset = load_dataset(dataset_id)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
        num_rows: 9741
    })
    validation: Dataset({
        features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
        num_rows: 1221
    })
    test: Dataset({
        features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
        num_rows: 1140
    })
})


In [3]:
train_dataset = dataset['train']
print(train_dataset)

Dataset({
    features: ['id', 'question', 'question_concept', 'choices', 'answerKey'],
    num_rows: 9741
})


In [17]:
pprint(train_dataset[2])

{'answerKey': 'A',
 'choices': {'label': ['A', 'B', 'C', 'D', 'E'],
             'text': ['jewelry store',
                      'neck',
                      'jewlery box',
                      'jewelry box',
                      'boutique']},
 'id': '4c1cb0e95b99f72d55c068ba0255c54d',
 'question': 'To locate a choker not located in a jewelry box or boutique '
             'where would you go?',
 'question_concept': 'choker'}


In [16]:
query = train_dataset[2]['question'].replace('  ', ' ')
pprint(query)

prompt = f"Q: {query}\nA: Let's think step by step."
pprint(prompt)

('To locate a choker not located in a jewelry box or boutique where would you '
 'go?')
('Q: To locate a choker not located in a jewelry box or boutique where would '
 'you go?\n'
 "A: Let's think step by step.")


In [22]:
model_id = 'mistralai/Mixtral-8x7B-Instruct-v0.1'

client = InferenceClient(
    model = model_id,
    token = os.getenv('HUGGINGFACE_TOKEN'),
)

prompt = f"Q: {query}\nA: Let's think step by step."
pprint(prompt)

for i in range(3):
    seed = random.randint(0, 10000)
    output = client.text_generation(
        prompt,
        max_new_tokens = 500,
        do_sample=True,
        seed=seed
    )
    print(f"Response {i+1}: {output}")

('Q: To locate a choker not located in a jewelry box or boutique where would '
 'you go?\n'
 "A: Let's think step by step.")
Response 1:  A choker is a type of necklace, so it is likely to be found in places where other types of necklaces are sold. Some options to consider include:

* Department stores: Many department stores have sections dedicated to jewelry, where you may be able to find a choker.
* Specialty jewelry stores: Stores that specialize in selling jewelry may carry a variety of chokers in different styles and price ranges.
* Online marketplaces: Websites like Amazon, eBay, and Etsy offer a wide selection of chokers for sale. You can browse through the options and read customer reviews to help you make a decision.
* Costume jewelry stores: If you're looking for a more affordable choker, costume jewelry stores may be a good option. These stores sell jewelry that is meant to be worn for special occasions or as a fashion accessory, rather than for fine jewelry.

It's always a