---

#### $Load$ $Libraries$

---

In [73]:
from datasets import load_dataset
import os
import json

---

#### $Load$ $HotpotQA$ - $Distractor$ $version$

---

The distractor version contains a `train` set and a `validation` set. 
* **Train Set:** Provides the question, the answer, and the two gold standard context paragraphs required to answer it. Here there are no distractors.
* **Validation Set:** Contains a question and ~10 paragraphs. The 2 are the gold paragraphs, and 8 are carefully chosen distractors. The model's task is to read these 10 and answer.

This version does not contain a `test set`.

In [3]:
# This will  download the dataset
dataset_distractor = load_dataset("hotpot_qa", "distractor")


In [4]:
dataset_distractor.keys()

dict_keys(['train', 'validation'])

Each example contains:  

1. **`id`**: A unique identifier for the question.  
2. **`question`**: The question that needs to be answered.  
3. **`answer`**: The correct answer to the question.  
4. **`type`**: The type of question—either:  
   - **`comparison`** (compares two things)  
   - **`bridge`** (connects facts to find the answer).  
5. **`level`**: How hard the question is (`easy`, `medium`, or `hard`).  
6. **`supporting_facts`**: The key facts that prove the answer is correct. Each fact is given as:  
   - **`title`**: The title of the paragraph where the fact is found.  
   - **`sent_id`**: The sentence number (starting from 0) inside that paragraph.  
7. **`context`**: All the paragraphs the model must read to find the answer. Each paragraph has:  
   - **`title`**: The paragraph’s title.  
   - **`sentences`**: A list of sentences in that paragraph.  

In [5]:
dataset_distractor["train"]

Dataset({
    features: ['id', 'question', 'answer', 'type', 'level', 'supporting_facts', 'context'],
    num_rows: 90447
})

In [6]:
example_distractor = dataset_distractor["train"][0]

print(f"ID: {example_distractor['id']}")
print(f"Level: {example_distractor['level']}\t | Type: {example_distractor['type']}")
print("="*50)
print(f"\nQuestion: {example_distractor['question']}")
print(f"\nAnswer: {example_distractor['answer']}")

print("\n\nSupporting Facts")
print("-"*20)
print("To answer the question, the model needs to find the following information:")

# Extract the titles and sentence IDs for the supporting facts
supporting_titles = example_distractor['supporting_facts']['title']
supporting_sent_ids = example_distractor['supporting_facts']['sent_id']

# Get the full context list
context_titles = example_distractor['context']['title']
context_sentences = example_distractor['context']['sentences']

# Loop through the supporting facts and find the full sentences
for i in range(len(supporting_titles)):
	fact_title = supporting_titles[i]
	fact_sent_id = supporting_sent_ids[i]
	
	# Find the index of the article in the main context
	try:
		context_index = context_titles.index(fact_title)
		# Get the actual sentence
		sentence = context_sentences[context_index][fact_sent_id]
		print(f"\n  - From Article '{fact_title}':")
		print(f"    \"{sentence}\"")
	except ValueError:
		print(f"Error: Could not find the context for title '{fact_title}'")


ID: 5a7a06935542990198eaf050
Level: medium	 | Type: comparison

Question: Which magazine was started first Arthur's Magazine or First for Women?

Answer: Arthur's Magazine


Supporting Facts
--------------------
To answer the question, the model needs to find the following information:

  - From Article 'Arthur's Magazine':
    "Arthur's Magazine (1844–1846) was an American literary periodical published in Philadelphia in the 19th century."

  - From Article 'First for Women':
    "First for Women is a woman's magazine published by Bauer Media Group in the USA."


---

#### $Load$ $HotpotQA$ - $Fullwiki$ $version$

---

The fullwiki version contains a `train` set, a `validation` set and a `test` set. 
* **Train Set:** The train set is the same as in the distractor version. Provides the question, the answer, and the two gold standard context paragraphs required to answer it. Here there are no distractors.
* **Validation Set:** The 10 paragraphs in the validation set are not the ground truth context. They are the output of a standard, baseline Information Retrieval (IR) system. The dataset creators ran their own simple search engine over all of Wikipedia to find the 10 most likely paragraphs for each question. 
* **Test Set:** We are given only the questions. There is no context provided. The system must implement its own retrieval module to search all the articles of the Wikipedia corpus, find the relevant information, and then answer the question.


In [7]:
dataset_fullwiki = load_dataset("hotpot_qa", "fullwiki")

In [8]:
dataset_fullwiki.keys()

dict_keys(['train', 'validation', 'test'])

Each example contains the same features as in the distractor version: 

1. **`id`**: A unique identifier for the question.  
2. **`question`**: The question that needs to be answered.  
3. **`answer`**: The correct answer to the question.  
4. **`type`**: The type of question—either:  
   - **`comparison`** (compares two things)  
   - **`bridge`** (connects facts to find the answer).  
5. **`level`**: How hard the question is (`easy`, `medium`, or `hard`).  
6. **`supporting_facts`**: The key facts that prove the answer is correct. Each fact is given as:  
   - **`title`**: The title of the paragraph where the fact is found.  
   - **`sent_id`**: The sentence number (starting from 0) inside that paragraph.  
7. **`context`**: All the paragraphs the model must read to find the answer. Each paragraph has:  
   - **`title`**: The paragraph’s title.  
   - **`sentences`**: A list of sentences in that paragraph.  

$Attention$ : $The$ **$supporting$ _ $facts$** $and$ **$answers$** $are$ $not$ $present$ $in$ $the$ $test$ $set.$

In [9]:
dataset_fullwiki["train"]


Dataset({
    features: ['id', 'question', 'answer', 'type', 'level', 'supporting_facts', 'context'],
    num_rows: 90447
})

In [10]:
example_fullwiki = dataset_fullwiki["train"][0]

print(f"ID: {example_fullwiki['id']}")
print(f"Level: {example_fullwiki['level']}\t | Type: {example_fullwiki['type']}")
print("="*50)
print(f"\nQuestion: {example_fullwiki['question']}")
print(f"\nAnswer: {example_fullwiki['answer']}")

print("\n\nSupporting Facts")
print("-"*20)
print("To answer the question, the model needs to find the following information:")

# Extract the titles and sentence IDs for the supporting facts
supporting_titles = example_fullwiki['supporting_facts']['title']
supporting_sent_ids = example_fullwiki['supporting_facts']['sent_id']

# Get the full context list
context_titles = example_fullwiki['context']['title']
context_sentences = example_fullwiki['context']['sentences']

# Loop through the supporting facts and find the full sentences
for i in range(len(supporting_titles)):
	fact_title = supporting_titles[i]
	fact_sent_id = supporting_sent_ids[i]
	
	# Find the index of the article in the main context
	try:
		context_index = context_titles.index(fact_title)
		# Get the actual sentence
		sentence = context_sentences[context_index][fact_sent_id]
		print(f"\n  - From Article '{fact_title}':")
		print(f"    \"{sentence}\"")
	except ValueError:
		print(f"Error: Could not find the context for title '{fact_title}'")


ID: 5a7a06935542990198eaf050
Level: medium	 | Type: comparison

Question: Which magazine was started first Arthur's Magazine or First for Women?

Answer: Arthur's Magazine


Supporting Facts
--------------------
To answer the question, the model needs to find the following information:

  - From Article 'Arthur's Magazine':
    "Arthur's Magazine (1844–1846) was an American literary periodical published in Philadelphia in the 19th century."

  - From Article 'First for Women':
    "First for Women is a woman's magazine published by Bauer Media Group in the USA."


---

#### $Distractor$ $vs.$ $Fullwiki$

---

* In the **distractor** setting, the 10 paragraphs are a "hand-picked" reasoning challenge where the 2 gold paragraphs are guaranteed to be present.

* In the **fullwiki** validation setting, the 10 paragraphs are the pre-computed results of a simple search engine. They are provided for convenience during development and to create a fair test of reading comprehension on a *retrieved* set of documents, where the necessary information is not guaranteed to be present.



For our project, which focuses on decomposition (a reasoning task), sticking with the **distractor** data seems to be a valid approach, since we are primarily testing the model's ability to reason over a given context, not its ability to retrieve documents from all of Wikipedia.

---

#### $Dataset$ $Creation$

---

*First, we will create a few_shot_examples.json file by manually decomposing questions from the training set, using their supporting sentences as a guide. Next, we will build a separate evaluation dataset from the validation set, following the same manual decomposition process.*

In [None]:
# This function processes a single example from the HotpotQA 'distractor' set.
# It extracts the ID, question, answer, and reconstructs the list of 
# supporting sentences needed to arrive at the correct answer.
# It returns these elements in a structured dictionary.

def decomposition_prompt(example_distractor):
	id = example_distractor['id']
	question = example_distractor['question']
	answer = example_distractor['answer']
	supporting_sentences = []
	for title, sent_id in zip(example_distractor["supporting_facts"]["title"], example_distractor["supporting_facts"]["sent_id"]):
		for context_title, sentences in zip(example_distractor["context"]["title"], example_distractor["context"]["sentences"]):
			if title == context_title: 
				supporting_sentences.append(sentences[sent_id].strip())

	supporting_sentences.append(f"Aggregating the above we conclude that the answer is: {answer}")

	dict_prompt = {"id":id, "question":question, "supporting_sentences":supporting_sentences, "answer":answer}
	
	return dict_prompt

##### $Few$ $Shot$ $Examples$

In [68]:
few_shot_examples = []

In [69]:
for example in dataset_distractor["train"]:
    if len(few_shot_examples) < 5:
        few_shot_examples.append(decomposition_prompt(example))
    else: 
        break

In [70]:
few_shot_examples[0]["decomposition"] = [
    			"When was Arthur's Magazine started?",
                "When was First for Women magazine started?",
                "Which of the two starting dates is earlier?"]

few_shot_examples[1]["decomposition"] = [
                "Which hotel company is the Oberoi family associated with?",
                "Where is the head office of The Oberoi Group located?"]

few_shot_examples[2]["decomposition"] = [
    			"Who created the character Milhouse in 'The Simpsons'?",
                "After whom did Matt Groening name the character Milhouse?"]

few_shot_examples[3]["decomposition"] = [
				"Who was James Henry Miller?",
                "Who was James Henry Miller's wife?",
                "What was James Henry Miller's wife nationality?"]

few_shot_examples[4]["decomposition"] = [
                "What chemical is Cadmium Chloride slightly soluble in?",
                "What is another name for this chemical?"]


In [None]:
filename = "hotpot_few_shot.json"
folder = "../HotpotQA_dataset/"

# Construct the full path for the file
full_path = os.path.join(folder, filename)

# Save the file
with open(full_path, 'w', encoding='utf-8') as f:
	json.dump(few_shot_examples, f, ensure_ascii=False, indent=4)
print("Results have been saved!")


Results have been saved!


##### $Evaluation$ $Examples$

In [None]:
evaluation = []

In [None]:
for example in dataset_distractor["validation"]:
    if len(evaluation) < 5:
        evaluation.append(decomposition_prompt(example))
    else: 
        break

In [None]:
evaluation[0]["decomposition"] = [
    			"What nationality Scott Derrickson had?",
                "What nationality Ed Wood had?",
                "Was the nationality the same?"]

evaluation[1]["decomposition"] = [
                "Who potrayed Caroliss Archer in the film Kiss and Tell?",
                "What governement position was held by her?"]

evaluation[2]["decomposition"] = [
    			"What book series has companion books, which narrate the story of an enslaved alien species?",
                "Is this book series classified as science fantasy and written for young adults?",
                "Is this series narrated in the first person?"]

evaluation[3]["decomposition"] = [
				"What is Laleli Mosque?",
                "Where is Laleli Mosque located?"
                "What is Esma Sultan Mansion?",
                "Where is Esma Sultan Mansion located?"
                "Are they located in the same neighborhood?"]

evaluation[4]["decomposition"] = [
                "Who directed the the romantic comedy 'Big Stone Gap'?",
                "In what New York city is the director located?"]


In [None]:
filename = "hotpot_dataset.json"
folder = "../HotpotQA_dataset/"

# Construct the full path for the file
full_path = os.path.join(folder, filename)

# Save the file
with open(full_path, 'w', encoding='utf-8') as f:
	json.dump(evaluation, f, ensure_ascii=False, indent=4)
print("Results have been saved!")


Results have been saved!
