---

#### $Load$ $Libraries$

---

In [19]:
from datasets import load_dataset
import os
import json

---

#### $Load$ $StrategyQA$

---

StrategyQA is a question answering benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. It contains `train` and `test` sets.

* **Train/Test Set**:  The datasets have the same structure. The questions designed to require reasoning over multiple facts or concepts. Each example includes a question, a boolean answer (True/False), supporting facts, and related term descriptions.


In [20]:
strategy_dataset = load_dataset("ChilleD/StrategyQA")


In [21]:
strategy_dataset.keys()

dict_keys(['train', 'test'])

Each example contains:

1. **`qid`**: Unique ID for the question.

2. **`term`**: The topic or key phrase involved.

3. **`description`**: Short explanation about the term.

4. **`question`**: The actual question requiring reasoning.

5. **`answer`**: The boolean answer (True/False).

6. **`facts`**: Supporting facts used to arrive at the answer.

In [22]:
strategy_dataset['train']

Dataset({
    features: ['qid', 'term', 'description', 'question', 'answer', 'facts'],
    num_rows: 1603
})

In [23]:
example_strategy = strategy_dataset["train"][0]

print(f"ID: {example_strategy['qid']}")
print(f"Term: {example_strategy['term']}\t | Description: {example_strategy['description']}")
print("="*50)
print(f"\nQuestion: {example_strategy['question']}")
print(f"\nAnswer: {example_strategy['answer']}")

print("\n\nSupporting Facts")
print("-"*74)
print("To answer the question, the model needs to find the following information:")
print(f"\n{example_strategy['facts']}")

ID: 4fd64bb6ce5b78ab20b6
Term: Mixed martial arts	 | Description: full contact combat sport

Question: Is Mixed martial arts totally original from Roman Colosseum games?

Answer: False


Supporting Facts
--------------------------------------------------------------------------
To answer the question, the model needs to find the following information:

Mixed Martial arts in the UFC takes place in an enclosed structure called The Octagon. The Roman Colosseum games were fought in enclosed arenas where combatants would fight until the last man was standing. Mixed martial arts contests are stopped when one of the combatants is incapacitated. The Roman Colosseum was performed in front of crowds that numbered in the tens of thousands. Over 56,000 people attended UFC 193.


---

#### $Dataset$ $Creation$

---

*First, we will create a few_shot_examples.json file by manually decomposing questions from the training set, using their supporting sentences as a guide. Next, we will build a separate evaluation dataset from the validation set, following the same manual decomposition process.*

##### $Few$ $Shot$ $Examples$

In [24]:
few_shot_examples = []

In [25]:
for example in strategy_dataset["train"]:
    example['id'] = example.pop('qid')

    if len(few_shot_examples) < 5:
        
        few_shot_examples.append(example)
    else: 
        break

In [26]:
few_shot_examples[0]["decomposition"] = [
    			"What is Mixed Martial Arts?",
                "What were the Roman Colosseum games?",
                "Are there similarities between MMA and Roman Colosseum games?",
                "Are there differences between MMA and Roman Colosseum games?",
                "Based on similarities and differences, is MMA totally original from Roman Colosseum games?"]

few_shot_examples[1]["decomposition"] = [
                "What defines vegan cuisine?",
                "What are the main components of traditional Hawaiian cuisine?",
                "Which traditional Hawaiian dishes or ingredients are naturally vegan or can be made vegan?",
                "Are there common Hawaiian dishes that contain animal products making them unsuitable for vegans?",
                "Based on this, is the overall cuisine of Hawaii suitable or adaptable for a vegan diet?"]

few_shot_examples[2]["decomposition"] = [
    			"Can giant squid be captured in their natural habitat?",
                "What equipment or gear is usually needed to capture a giant squid?",
                "Is capturing giant squid without gear impossible?"]

few_shot_examples[3]["decomposition"] = [
				"When did the Boxer Rebellion happen",
                "When was the Royal Air Force (RAF) established?",
                "Was the RAF active during the time of the Boxer Rebellion?",
                "Based on the timelines, did the RAF participate in the Boxer Rebellion?"]

few_shot_examples[4]["decomposition"] = [
                "What is Solanum melongena commonly called in general English?",
                "What are the common local names for Solanum melongena in Mumbai or India?",
                "Is the term “eggplant” commonly used or understood by people in Mumbai?",
                "Based on local language and culture, would “eggplant” be a usual reference in Mumbai?"]


In [27]:
filename = "strategyqa_few_shot.json"
folder = "../StrategyQA_dataset/"

# Construct the full path for the file
full_path = os.path.join(folder, filename)

# Save the file
with open(full_path, 'w', encoding='utf-8') as f:
	json.dump(few_shot_examples, f, ensure_ascii=False, indent=4)
print("Results have been saved!")


Results have been saved!


##### $Evaluation$ $Examples$

In [28]:
evaluation = []

In [29]:
for example in strategy_dataset["test"]:
    example['id'] = example.pop('qid')
    if len(evaluation) < 5:
        evaluation.append(example)
    else: 
        break

In [30]:
evaluation[0]["decomposition"] = [
    			"What was the name of the ship that recovered Apollo 13 astronauts?",
                "Is that ship’s name connected to a World War II battle?",
                "If yes, what was the World War II battle it was named after?",
                "Based on this, was the ship named after a World War II battle?"]

evaluation[1]["decomposition"] = [
                "What is the tibia?",
                "What role does the tibia play in playing hockey or sports in general?",
                "Is having a functioning tibia essential for a player to participate effectively in hockey?",
                "Can someone without a tibia still be part of a team that wins the Stanley Cup?",
                "Based on this, is the tibia necessary to win the Stanley Cup?"]

evaluation[2]["decomposition"] = [
    			"What does the Azerbaijani flag’s background look like?",
                "IWho are the Powerpuff Girls, and what kind of artistic styles or backgrounds do they create?",
                "Could the Powerpuff Girls (as characters or in their style) create or inspire a background similar to the Azerbaijani flag?",
                "Are there any symbolic or design conflicts between the Powerpuff Girls style and the Azerbaijani flag’s elements?",
                "Based on this, is it possible for the Powerpuff Girls to make the background to the Azerbaijani flag?"]

evaluation[3]["decomposition"] = [
				"Are there youth groups named or labeled after 'Eagles' that focus on skills training?",
                "Are there youth groups named or labeled after 'Young Bears' that focus on skills training?"
                "What kinds of skills or activities do these groups typically teach?",
                "Based on this, are both 'Eagles' and 'Young Bears' commonly used as names for skills-training youth groups?"]

evaluation[4]["decomposition"] = [
                "How physically fit are Olympic athletes generally?",
                "What is the typical effort level for an Olympic athlete running a mile?",
                "Does running a mile usually cause fatigue in athletes at this level?",
                "Are there factors like pace, recovery, or event specialty that affect whether they’d be tired?",
                "Based on this, would an Olympic athlete likely be tired out after running a mile?"]


In [31]:
filename = "strategyqa_dataset.json"
folder = "../StrategyQA_dataset/"

# Construct the full path for the file
full_path = os.path.join(folder, filename)

# Save the file
with open(full_path, 'w', encoding='utf-8') as f:
	json.dump(evaluation, f, ensure_ascii=False, indent=4)
print("Results have been saved!")


Results have been saved!
