# CS584 NLP HW4; Task: Machine Reading Comprehension (Question Answering)
Milan Chandiramani CWID: 20032010

Goals
The goal of HW4 is for you to get hands-on experience of utilizing pre-trained and large language model for the machine reading comprehension task. The questions are open questions and there is no fixed solution. Different data selection and processing, parameter initialization, data split strategies will lead to different predictions and results. Therefore, during the grading, the specific values in the results are not required. It is important that you focus on implementing and setting up the pipelines of applying these models to solve the tasks.

In [38]:
#Install required libraries
# !pip install transformers datasets
# !pip install transformers torch

## 1. Data preparation (10 points)

### 1\) (5 points) Data download and preprocessing:
Download the Squad2 dataset, which is a benchmark data for question answering. Please select 15 question-answer pairs for this homework. Each data sample will be a question-answer pair, which includes the following three primary components:

a) Context: a paragraph about the context of the question.

b) Question: a natural language question that needs to be answered based on the
context.

c) Answer: the answer to the question, which is a text span extracted from the context.
You can directly download the data with the following codes from Hugging Face. You will
need to install ü§ó Transformers and ü§ó Datasets first. A quick tour to set up and run Transformers on Hugging Face can be found here.


from datasets import load_dataset

dataset = load_dataset("squad_v2")

In [21]:
#Importing required libraries
from datasets import load_dataset
import pandas as pd
from transformers import RobertaForQuestionAnswering, RobertaTokenizer
import torch

#Download the SQuAD2.0 dataset
dataset = load_dataset("squad_v2")

In [22]:
# Select 15 samples
selected_data = dataset['train'][27 : 27+15]
selected_data

{'id': ['56be88473aeaaa14008c9080',
  '56be88473aeaaa14008c9083',
  '56be88473aeaaa14008c9084',
  '56bf725c3aeaaa14008c9643',
  '56bf725c3aeaaa14008c9644',
  '56bf725c3aeaaa14008c9645',
  '56bf725c3aeaaa14008c9646',
  '56bf725c3aeaaa14008c9647',
  '56d43f7e2ccc5a1400d830c7',
  '56d43f7e2ccc5a1400d830c8',
  '56d43f7e2ccc5a1400d830c9',
  '56d43f7e2ccc5a1400d830cb',
  '56be892d3aeaaa14008c908b',
  '56be892d3aeaaa14008c908c',
  '56be892d3aeaaa14008c908d'],
 'title': ['Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©',
  'Beyonc√©'],
 'context': ['A self-described "modern-day feminist", Beyonc√© creates songs that are often characterized by themes of love, relationships, and monogamy, as well as female sexuality and empowerment. On stage, her dynamic, highly choreographed performances have led to critics hailing her as one of the best entertainers in 

### 2\) (5 points) Data statistics:
based on the data you selected, get basic data statistics about the following aspects.
a) Statistics about the max/average/min length of the contexts
b) Statistics about the max/average/min length of the questions
c) Statistics about the max/average/min length of the answer

In [23]:
#Statistics about the context length
context_lens = [len(item.split()) for item in selected_data['context']]
max_context_len = max(context_lens)
min_context_len = min(context_lens)
avg_context_len = round(sum(context_lens) / len(context_lens), 0)

In [24]:
#Statistics about the question length
question_lens = [len(item.split()) for item in selected_data["question"]]
max_question_len = max(question_lens)
min_question_len = min(question_lens)
avg_question_len = round(sum(question_lens) / len(question_lens), 0)

In [25]:
#Statistics about the answer length
answer_lens = [len(item['text'][0].split()) for item in selected_data['answers']]
max_answer_len = max(answer_lens)
min_answer_len = min(answer_lens)
avg_answer_len = round(sum(answer_lens) / len(answer_lens), 0)

In [26]:
#Statistics table
table = {
    'Type': ['Context', 'Question', 'Answer'],
    'MaxLength': [max_context_len, max_question_len, max_answer_len],
    'MinLength': [min_context_len, min_question_len, min_answer_len],
    'AvgLength': [avg_context_len, avg_question_len, avg_answer_len]
}

stats_table = pd.DataFrame(table)
print(stats_table)

       Type  MaxLength  MinLength  AvgLength
0   Context        176         93      159.0
1  Question         18          5       11.0
2    Answer          4          1        2.0


## 2. Question answering (60 points, 20 points for each model)

### 1) Please use the pre-trained RoBERTa model to solve the question answering task on the data you prepared.

a) Implementation:
‚Ä¢ Link to the pretrained roberta-base-squad2 model along with instructions and
examples: link
‚Ä¢ More details about how to use RoBERTa for question answering: link
‚Ä¢ Feel free to directly utilize the existing implementations on Hugging Face.
‚Ä¢ There is no specific requirement for the parameter settings. You are encouraged to try and test different settings and report the results. For other settings that are not specified here, you have the flexibility to select.
b) Requirements about the output. Please follow the examples provided in this link, and print out the following output.
‚Ä¢ The answer starts index and answer end index for each data sample.
‚Ä¢ Predicted answer tokens, which will be the token index in the vocabulary.
‚Ä¢ Decoded answers, which will be the predicted text spans from the text.

In [34]:
model_name = "deepset/roberta-base-squad2"
model = RobertaForQuestionAnswering.from_pretrained(model_name)
tokenizer = RobertaTokenizer.from_pretrained(model_name)

In [35]:
# Loop through each data sample
for i in range(15):
    context = selected_data['context'][i]
    question = selected_data['question'][i]

    inputs = tokenizer.encode_plus(
        question, context, add_special_tokens=True, return_tensors="pt", truncation=True, padding=True
    )
    with torch.no_grad():
        outputs = model(**inputs)
    answer_start_ind = torch.argmax(outputs.start_logits)
    answer_end_ind = torch.argmax(outputs.end_logits)
    if answer_end_ind < answer_start_ind:
        answer_end_ind = answer_start_ind
    predicted_answer_tokens = inputs['input_ids'][0, answer_start_ind: answer_end_ind + 1]
    decoded_answer = tokenizer.decode(predicted_answer_tokens, skip_special_tokens=True)

    print(f"Question No. {i + 1}:")
    print(f"Question: {question}")
    print(f"Answer Start Index: {answer_start_ind}, Answer End Index: {answer_end_ind}")
    print(f"Predicted Answer Tokens: {predicted_answer_tokens}")
    print(f"Decoded Answer: {decoded_answer}")
    print()

Question No. 1:
Question: In her music, what are some recurring elements in them?
Answer Start Index: 36, Answer End Index: 42
Predicted Answer Tokens: tensor([  657,     6,  4158,     6,     8, 43907, 12463])
Decoded Answer:  love, relationships, and monogamy

Question No. 2:
Question: Time magazine named her one of the most 100 what people of the century?
Answer Start Index: 206, Answer End Index: 206
Predicted Answer Tokens: tensor([9283])
Decoded Answer:  influential

Question No. 3:
Question: Which magazine declared her the most dominant woman musician?
Answer Start Index: 211, Answer End Index: 211
Predicted Answer Tokens: tensor([11519])
Decoded Answer:  Forbes

Question No. 4:
Question: In which decade did the Recording Industry Association of America recognize Beyonce as the The Top Certified Artist?
Answer Start Index: 167, Answer End Index: 168
Predicted Answer Tokens: tensor([3788,   29])
Decoded Answer:  2000s

Question No. 5:
Question: What magazine rated Beyonce as the m

### 2) Please use design the prompts to interact with ChatGPT and solve the question answering task on the data you prepared.
a) You can use the OpenAI API if you have access, or you can use the user interface of ChatGPT to work on this question.

b) You can design your prompt accordingly. Feel free to explore different ways to design prompts. You can use the same prompts for all the 15 data samples, or you can explore different prompts for different data samples. Please explicitly mention your designed prompts in your submission.

c) You will need to provide the instructions in the prompt and ask the model to output the answer to the question and explain the reason.

  ### **Prompt Template for ChatGPT about the selected N question-answer pairs**

  You are an accurate question-answering system.

  I will give you a question based on a context which is biography-style passage.
  Your task:

  1. Provide the correct answer.
  2. Base your answer strictly on known facts extracted from context about Beyonc√© and her career.
  2. Explain the reasoning clearly and concisely.

  Question 1: "[QUESTION 1]"

  Question 2: "[QUESTION 2]"
  
  Question 3: "[QUESTION 3]"
  
  ...
  
  Question N: "[QUESTION N]"

  Format your output as:
  Question No. : <question>
  Reason: <short explanation>
  Answer: <your answer>

  ### Prompt to Chat GPT:
  **CONTEXT** : A self-described "modern-day feminist", Beyonc√© creates songs that are often characterized by themes of love, relationships, and monogamy, as well as female sexuality and empowerment. On stage, her dynamic, highly choreographed performances have led to critics hailing her as one of the best entertainers in contemporary popular music. Throughout a career spanning 19 years, she has sold over 118 million records as a solo artist, and a further 60 million with Destiny's Child, making her one of the best-selling music artists of all time. She has won 20 Grammy Awards and is the most nominated woman in the award's history. The Recording Industry Association of America recognized her as the Top Certified Artist in America during the 2000s decade. In 2009, Billboard named her the Top Radio Songs Artist of the Decade, the Top Female Artist of the 2000s and their Artist of the Millennium in 2011. Time listed her among the 100 most influential people in the world in 2013 and 2014. Forbes magazine also listed her as the most powerful female musician of 2015.
  Beyonc√© Giselle Knowles was born in Houston, Texas, to Celestine Ann "Tina" Knowles (n√©e Beyinc√©), a hairdresser and salon owner, and Mathew Knowles, a Xerox sales manager. Beyonc√©'s name is a tribute to her mother's maiden name. Beyonc√©'s younger sister Solange is also a singer and a former member of Destiny's Child. Mathew is African-American, while Tina is of Louisiana Creole descent (with African, Native American, French, Cajun, and distant Irish and Spanish ancestry). Through her mother, Beyonc√© is a descendant of Acadian leader Joseph Broussard. She was raised in a Methodist household.

  **QUESTIONS** :

  Question 1: In her music, what are some recurring elements in them?

  Question 2: Time magazine named her one of the most 100 what people of the century?

  Question 3: Which magazine declared her the most dominant woman musician?

  Question 4: In which decade did the Recording Industry Association of America recognize Beyonce as the The Top Certified Artist?

  Question 5: What magazine rated Beyonce as the most powerful female musician in 2015?

  Question 6: How did Beyonce describe herself as a feminist?

  Question 7: In which years did Time rate Beyonce in the 100 most influential people in the world?

  Question 8: How many records has Beyonce sold in her 19 year career?

  Question 9: How many records did Beyonc√© sell as part of Destiny's Child?

  Question 10: After leaving Destiny's Child, how many records did Beyonc√© release under her own name?

  Question 11: How many Grammy awards has Beyonc√© won?

  Question 12: What magazine named Beyonc√© as the most powerful female musician for 2015?

  Question 13: Beyonce's younger sibling also sang with her in what band?

  Question 14: Where did Beyonce get her name from?

  Question 15: What race was Beyonce's father?


### Chat GPT Outputs :

Question 1 : In her music, what are some recurring elements in them?

Reason: The context describes the themes that frequently appear in her songs.

Answer: Themes of love, relationships, monogamy, female sexuality, and empowerment.

-----

Question 2 : Time magazine named her one of the most 100 what people of the century?

Reason: The context states how Time listed her; it uses the exact phrase "100 most influential people."

Answer: One of the 100 most influential people in the world.

-----


Question 3 : Which magazine declared her the most dominant woman musician?

Reason: The context says Forbes listed her as the most powerful (i.e., dominant) female musician in 2015.

Answer: Forbes magazine.

-----


Question 4 : In which decade did the Recording Industry Association of America recognize Beyonce as the The Top Certified Artist?

Reason: The context explicitly names the decade for the RIAA recognition.

Answer: The 2000s decade.

-----


Question 5 : What magazine rated Beyonce as the most powerful female musician in 2015?

Reason: The context directly states which magazine gave that rating in 2015.

Answer: Forbes magazine.

-----


Question 6 : How did Beyonce describe herself as a feminist?

Reason: The context calls her a "self-described 'modern-day feminist'."

Answer: She described herself as a "modern-day feminist."

-----


Question 7 : In which years did Time rate Beyonce in the 100 most influential people in the world?

Reason: The context lists the specific years Time included her in the 100 most influential.

Answer: 2013 and 2014.

-----


Question 8 : How many records has Beyonce sold in her 19 year career?

Reason: The context gives solo and group sales which together make up her career total.

Answer: 178 million records (118 million solo + 60 million with Destiny's Child).

-----


Question 9 : How many records did Beyonc√© sell as part of Destiny's Child?

Reason: The context states the number sold with Destiny's Child.

Answer: 60 million records.

-----


Question 10 : After leaving Destiny's Child, how many records did Beyonc√© release under her own name?

Reason: The context quantifies her solo sales separately from group sales.

Answer: She sold over 118 million records as a solo artist.

-----


Question 11 : How many Grammy awards has Beyonc√© won?

Reason: The context gives her Grammy total directly.

Answer: 20 Grammy Awards.

-----


Question 12 : What magazine named Beyonc√© as the most powerful female musician for 2015?

Reason: This repeats the earlier magazine attribution stated in the context.

Answer: Forbes magazine.

-----


Question 13 : Beyonce's younger sibling also sang with her in what band?

Reason: The context identifies Solange and notes she was a former member of the group.

Answer: Destiny's Child.

-----


Question 14 : Where did Beyonce get her name from?

Reason: The context explains the origin of her name as a tribute.

Answer: It is a tribute to her mother's maiden name.

-----


Question 15 : What race was Beyonce's father?

Reason: The context specifies Mathew Knowles's background.

Answer: He is African-American.

### **Prompt Template 2 for ChatGPT question-answer pairs**
You are a grounded Question Answering assistant.

Given the question and context below, do the following:
1. Extract the key entities and relations from the context relevant to the question.
2. Identify information about Beyonc√© that answers the question.
3. Provide the final answer.
4. Explain briefly why this is the correct answer.

Question 1: "[QUESTION 1]"

  Question 2: "[QUESTION 2]"
  
  Question 3: "[QUESTION 3]"

  ...
  
  Question N: "[QUESTION N]"


Output format:
- Question No.: <question>
- Extracted facts:<facts>
- Answer: <answer>
- Reason: <reason>


### Chat GPT Outputs for Prompt Template 2 (Kept the prompt same as the prompt template 1):
Below are the answers for all questions, grounded strictly in the provided context.

---

### **Question 1:** In her music, what are some recurring elements in them?

* **Extracted facts:** Her songs are characterized by themes of love, relationships, monogamy, female sexuality, and empowerment.
* **Answer:** Themes of love, relationships, monogamy, female sexuality, and empowerment.
* **Reason:** These themes are explicitly listed as characteristics of her music.

---

### **Question 2:** Time magazine named her one of the most 100 what people of the century?

* **Extracted facts:** Time listed her among the 100 most influential people in the world (2013, 2014).
* **Answer:** Influential people.
* **Reason:** The context states she was listed among the most influential people.

---

### **Question 3:** Which magazine declared her the most dominant woman musician?

* **Extracted facts:** Billboard named her Top Radio Songs Artist of the Decade, Top Female Artist of the 2000s, and Artist of the Millennium (2011).
* **Answer:** Billboard.
* **Reason:** Billboard is the only magazine in the passage giving such titles; the phrasing refers to these recognitions.

---

### **Question 4:** In which decade did the Recording Industry Association of America recognize Beyonc√© as the Top Certified Artist?

* **Extracted facts:** ‚ÄúThe Recording Industry Association of America recognized her as the Top Certified Artist in America during the **2000s decade**.‚Äù
* **Answer:** The 2000s.
* **Reason:** Directly stated.

---

### **Question 5:** What magazine rated Beyonc√© as the most powerful female musician in 2015?

* **Extracted facts:** Forbes listed her as the most powerful female musician of 2015.
* **Answer:** Forbes.
* **Reason:** Explicitly stated.

---

### **Question 6:** How did Beyonc√© describe herself as a feminist?

* **Extracted facts:** She is a ‚Äúself-described modern-day feminist.‚Äù
* **Answer:** She described herself as a "modern-day feminist."
* **Reason:** Direct quotation from the context.

---

### **Question 7:** In which years did Time rate Beyonc√© in the 100 most influential people in the world?

* **Extracted facts:** Time listed her among the 100 most influential people in 2013 and 2014.
* **Answer:** 2013 and 2014.
* **Reason:** Explicitly stated.

---

### **Question 8:** How many records has Beyonc√© sold in her 19-year career?

* **Extracted facts:** She sold 118 million records as a solo artist + 60 million with Destiny‚Äôs Child.
* **Answer:** 118 million as a solo artist.
* **Reason:** The question asks *her* career total; the passage separately lists her solo and group sales.

---

### **Question 9:** How many records did Beyonc√© sell as part of Destiny's Child?

* **Extracted facts:** She sold 60 million with Destiny‚Äôs Child.
* **Answer:** 60 million.
* **Reason:** Directly stated.

---

### **Question 10:** After leaving Destiny's Child, how many records did Beyonc√© release under her own name?

* **Extracted facts:** 118 million solo records sold.
* **Answer:** 118 million.
* **Reason:** This is her solo sales count.

---

### **Question 11:** How many Grammy Awards has Beyonc√© won?

* **Extracted facts:** She has won 20 Grammy Awards.
* **Answer:** 20.
* **Reason:** Explicitly stated.

---

### **Question 12:** What magazine named Beyonc√© as the most powerful female musician for 2015?

* **Extracted facts:** Forbes listed her as the most powerful female musician of 2015.
* **Answer:** Forbes.
* **Reason:** Directly stated.

---

### **Question 13:** Beyonc√©'s younger sibling also sang with her in what band?

* **Extracted facts:** Solange is a former member of Destiny‚Äôs Child.
* **Answer:** Destiny‚Äôs Child.
* **Reason:** Stated in the family background section.

---

### **Question 14:** Where did Beyonc√© get her name from?

* **Extracted facts:** Her name is a tribute to her mother‚Äôs maiden name, Beyinc√©.
* **Answer:** It comes from her mother‚Äôs maiden name.
* **Reason:** Explicitly stated.

---

### **Question 15:** What race was Beyonc√©'s father?

* **Extracted facts:** Mathew is African-American.
* **Answer:** African-American.
* **Reason:** Stated directly in the parental background details.

---

If you‚Äôd like, I can also format this into a table or convert it into training examples for a QA model.


### 3) Like the requirements in 2), please design prompts to interact with Claude 3.5 Sonnet (or other LLMs that you have the access to) and solve the question answering task on the data you prepared.

**We have considered Gemini 3.0**

a) For each data sample, please use the same prompts as you designed in 2).

### **Prompt Template for ChatGPT and Gemini 3.0 about the selected N question-answer pairs**

You are an accurate question-answering system.

  I will give you a question based on a context which is biography-style passage.
  Your task:

  1. Provide the correct answer.
  2. Base your answer strictly on known facts extracted from context about Beyonc√© and her career.
  2. Explain the reasoning clearly and concisely.

  Question 1: "[QUESTION 1]"

  Question 2: "[QUESTION 2]"
  
  Question 3: "[QUESTION 3]"
  
  ...
  
  Question N: "[QUESTION N]"

  Format your output as:
  Question No. : <question>
  Reason: <short explanation>
  Answer: <your answer>

  ### Prompt to Gemini 3.0:
  **CONTEXT** : A self-described "modern-day feminist", Beyonc√© creates songs that are often characterized by themes of love, relationships, and monogamy, as well as female sexuality and empowerment. On stage, her dynamic, highly choreographed performances have led to critics hailing her as one of the best entertainers in contemporary popular music. Throughout a career spanning 19 years, she has sold over 118 million records as a solo artist, and a further 60 million with Destiny's Child, making her one of the best-selling music artists of all time. She has won 20 Grammy Awards and is the most nominated woman in the award's history. The Recording Industry Association of America recognized her as the Top Certified Artist in America during the 2000s decade. In 2009, Billboard named her the Top Radio Songs Artist of the Decade, the Top Female Artist of the 2000s and their Artist of the Millennium in 2011. Time listed her among the 100 most influential people in the world in 2013 and 2014. Forbes magazine also listed her as the most powerful female musician of 2015.
  Beyonc√© Giselle Knowles was born in Houston, Texas, to Celestine Ann "Tina" Knowles (n√©e Beyinc√©), a hairdresser and salon owner, and Mathew Knowles, a Xerox sales manager. Beyonc√©'s name is a tribute to her mother's maiden name. Beyonc√©'s younger sister Solange is also a singer and a former member of Destiny's Child. Mathew is African-American, while Tina is of Louisiana Creole descent (with African, Native American, French, Cajun, and distant Irish and Spanish ancestry). Through her mother, Beyonc√© is a descendant of Acadian leader Joseph Broussard. She was raised in a Methodist household.

  **QUESTIONS** :

  Question 1: In her music, what are some recurring elements in them?

  Question 2: Time magazine named her one of the most 100 what people of the century?

  Question 3: Which magazine declared her the most dominant woman musician?

  Question 4: In which decade did the Recording Industry Association of America recognize Beyonce as the The Top Certified Artist?

  Question 5: What magazine rated Beyonce as the most powerful female musician in 2015?

  Question 6: How did Beyonce describe herself as a feminist?

  Question 7: In which years did Time rate Beyonce in the 100 most influential people in the world?

  Question 8: How many records has Beyonce sold in her 19 year career?

  Question 9: How many records did Beyonc√© sell as part of Destiny's Child?

  Question 10: After leaving Destiny's Child, how many records did Beyonc√© release under her own name?

  Question 11: How many Grammy awards has Beyonc√© won?

  Question 12: What magazine named Beyonc√© as the most powerful female musician for 2015?

  Question 13: Beyonce's younger sibling also sang with her in what band?

  Question 14: Where did Beyonce get her name from?

  Question 15: What race was Beyonce's father?


### Gemini Outputs:

Question No. : Question 1: In her music, what are some recurring elements in them?

Reason: The context states her songs are "often characterized by themes of love, relationships, and monogamy, as well as female sexuality and empowerment."

Answer: Themes of love, relationships, and monogamy, as well as female sexuality and empowerment.

---

Question No. : Question 2: Time magazine named her one of the most 100 what people of the century?

Reason: The context states that Time listed her among the "100 most influential people in the world" in 2013 and 2014, not of the century.

Answer: Influential people in the world.

---

Question No. : Question 3: Which magazine declared her the most dominant woman musician?
Reason: The context states Forbes magazine listed her as the "most powerful female musician of 2015." It does not use the term "dominant woman musician."
Answer: Forbes magazine.

---

Question No. : Question 4: In which decade did the Recording Industry Association of America recognize Beyonce as the The Top Certified Artist?
Reason: The context specifies the decade as "the 2000s decade."
Answer: The 2000s decade.

---

Question No. : Question 5: What magazine rated Beyonce as the most powerful female musician in 2015?
Reason: The context states "Forbes magazine also listed her as the most powerful female musician of 2015."
Answer: Forbes magazine.

---

Question No. : Question 6: How did Beyonce describe herself as a feminist?
Reason: The context states she is a "self-described 'modern-day feminist'".
Answer: A "modern-day feminist".

---

Question No. : Question 7: In which years did Time rate Beyonce in the 100 most influential people in the world?
Reason: The context says "Time listed her among the 100 most influential people in the world in 2013 and 2014."
Answer: 2013 and 2014.

---

Question No. : Question 8: How many records has Beyonce sold in her 19 year career?
Reason: The context gives two figures: over 118 million records as a solo artist and a further 60 million with Destiny's Child. The total is the sum of these two numbers.
Answer: Over 178 million records (over 118 million + a further 60 million).

---

Question No. : Question 9: How many records did Beyonc√© sell as part of Destiny's Child?
Reason: The context states she sold "a further 60 million with Destiny's Child."
Answer: 60 million.

---

Question No. : Question 10: After leaving Destiny's Child, how many records did Beyonc√© release under her own name?
Reason: The context states she has sold "over 118 million records as a solo artist."
Answer: Over 118 million records.

---

Question No. : Question 11: How many Grammy awards has Beyonc√© won?
Reason: The context states she "has won 20 Grammy Awards."
Answer: 20 Grammy Awards.

---

Question No. : Question 12: What magazine named Beyonc√© as the most powerful female musician for 2015?
Reason: The context states "Forbes magazine also listed her as the most powerful female musician of 2015."
Answer: Forbes magazine.

---

Question No. : Question 13: Beyonce's younger sibling also sang with her in what band?
Reason: The context states, "Beyonc√©'s younger sister Solange is also a singer and a former member of Destiny's Child."
Answer: Destiny's Child.

---

Question No. : Question 14: Where did Beyonce get her name from?
Reason: The context states, "Beyonc√©'s name is a tribute to her mother's maiden name," which was Beyinc√©.
Answer: Her mother's maiden name (Beyinc√©).

---

Question No. : Question 15: What race was Beyonce's father?
Reason: The context states, "Mathew is African-American."
Answer: African-American.

### Prompt Template 2 for ChatGPT and Gemini 3.0 question-answer pairs

You are a grounded Question Answering assistant.

Given the question and context below, do the following:

    Extract the key entities and relations from the context relevant to the question.
    Identify information about Beyonc√© that answers the question.
    Provide the final answer.
    Explain briefly why this is the correct answer.

Question 1: "[QUESTION 1]"

Question 2: "[QUESTION 2]"

Question 3: "[QUESTION 3]"

...

Question N: "[QUESTION N]"

Output format:

    Question No.:
    Extracted facts:
    Answer:
    Reason:

### Gemini Outputs for Prompt Template 2 (same prompt as before):

Question 1:

Extracted facts: Beyonc√© creates songs that are often characterized by themes of love, relationships, and monogamy, as well as female sexuality and empowerment.

Answer: Themes of love, relationships, and monogamy, as well as female sexuality and empowerment.

Reason: The context directly states the themes that characterize Beyonc√©'s songs.

---

Question 2:

Extracted facts: Time listed her among the 100 most influential people in the world in 2013 and 2014.

Answer: Influential people in the world.

Reason: The context states that *Time* listed her among the 100 most influential people in the world.

---

Question 3:

Extracted facts: Forbes magazine also listed her as the most powerful female musician of 2015.

Answer: Forbes magazine.

Reason: The context explicitly mentions that *Forbes* magazine listed her as the most powerful female musician of 2015.

---

Question 4:

Extracted facts: The Recording Industry Association of America recognized her as the Top Certified Artist in America during the 2000s decade.

Answer: 2000s decade.

Reason: The context states that the Recording Industry Association of America recognized her as the Top Certified Artist in America during the 2000s decade.

---

Question 5:

Extracted facts: Forbes magazine also listed her as the most powerful female musician of 2015.

Answer: Forbes magazine.

Reason: The context explicitly mentions that *Forbes* magazine listed her as the most powerful female musician of 2015.

---

Question 6:

Extracted facts: A self-described "modern-day feminist", Beyonc√© creates songs that are often characterized by themes of love, relationships, and monogamy, as well as female sexuality and empowerment.

Answer: Modern-day feminist.

Reason: The context opens by describing her as a "self-described 'modern-day feminist'".

---

Question 7:

Extracted facts: Time listed her among the 100 most influential people in the world in 2013 and 2014.

Answer: 2013 and 2014.

Reason: The context directly lists the years 2013 and 2014 as when *Time* included her among the 100 most influential people.

---

Question 8:

Extracted facts: Throughout a career spanning 19 years, she has sold over 118 million records as a solo artist, and a further 60 million with Destiny's Child, making her one of the best-selling music artists of all time.

Answer: Over 178 million records (over 118 million as a solo artist + 60 million with Destiny's Child).

Reason: The total is calculated by adding her solo sales (over 118 million) and her Destiny's Child sales (60 million), as stated in the context for her 19-year career.

---

Question 9:

Extracted facts: Throughout a career spanning 19 years, she has sold over 118 million records as a solo artist, and a further 60 million with Destiny's Child, making her one of the best-selling music artists of all time.

Answer: 60 million records.

Reason: The context states she sold a further 60 million records with Destiny's Child.

---

Question 10:

Extracted facts: Throughout a career spanning 19 years, she has sold over 118 million records as a solo artist, and a further 60 million with Destiny's Child, making her one of the best-selling music artists of all time.

Answer: Over 118 million records.

Reason: The context states she has sold over 118 million records as a solo artist.

---

Question 11:

Extracted facts: She has won 20 Grammy Awards and is the most nominated woman in the award's history.

Answer: 20 Grammy Awards.

Reason: The context explicitly states the number of Grammy Awards she has won.

---

Question 12:

Extracted facts: Forbes magazine also listed her as the most powerful female musician of 2015.

Answer: Forbes magazine.

Reason: The context explicitly mentions that *Forbes* magazine listed her as the most powerful female musician of 2015.

---

Question 13:

Extracted facts: Beyonc√©'s younger sister Solange is also a singer and a former member of Destiny's Child.

Answer: Destiny's Child.

Reason: The context states that her younger sister Solange was a former member of Destiny's Child.

---

Question 14:

Extracted facts: Beyonc√© Giselle Knowles was born in Houston, Texas, to Celestine Ann "Tina" Knowles (n√©e Beyinc√©), a hairdresser and salon owner, and Mathew Knowles, a Xerox sales manager. Beyonc√©'s name is a tribute to her mother's maiden name.

Answer: Her mother's maiden name, Beyinc√©.

Reason: The context clearly states, "Beyonc√©'s name is a tribute to her mother's maiden name."

---

Question 15:

Extracted facts: Mathew is African-American, while Tina is of Louisiana Creole descent (with African, Native American, French, Cajun, and distant Irish and Spanish ancestry).

Answer: African-American.

Reason: The context states that her father, Mathew, is African-American.

## 3. Results analysis and evaluation (30 points)

### 1) (10 points) Please manually check if the output answer for each data sample is correct or
not. You can also compare the output answers with the ground truth answer to each
question.
a) The answers do not need to be exactly the same as the ground truth answer. It can be
a correct answer if there is only a partial matching, but with the same meaning. You
can decide based on your understanding.
b) Based on your manual checking and decision, please calculate a general score, i.e., #
of data samples provided correct answers/15, for each of the three models.

In [30]:
# Get the ground truth of the 15 selected question-answer pairs
ground_truth = [item['text'][0] for item in selected_data['answers']]
ground_truth

['love, relationships, and monogamy',
 'influential',
 'Forbes',
 '2000s',
 'Forbes',
 'modern-day feminist',
 '2013 and 2014',
 '118 million',
 '60 million',
 '118 million',
 '20',
 'Forbes',
 "Destiny's Child",
 "her mother's maiden name",
 'African-American']

In [37]:
# Define the elements of result table
result_table_elements = {
    'Answer for' : ['Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'Q11', 'Q12', 'Q13', 'Q14', 'Q15'],
    'Ground truth' : ['love, relationships, and monogamy',
 'influential',
 'Forbes',
 '2000s',
 'Forbes',
 'modern-day feminist',
 '2013 and 2014',
 '118 million',
 '60 million',
 '118 million',
 '20',
 'Forbes',
 "Destiny's Child",
 "her mother's maiden name",
 'African-American'],
    'Questions': ['In her music, what are some recurring elements in them?', 'Time magazine named her one of the most 100 what people of the century?', 'Which magazine declared her the most dominant woman musician?', 'In which decade did the Recording Industry Association of America recognize Beyonce as the The Top Certified Artist?', 'What magazine rated Beyonce as the most powerful female musician in 2015?', 'How did Beyonce describe herself as a feminist?', 'In which years did Time rate Beyonce in the 100 most influential people in the world?', 'How many records has Beyonce sold in her 19 year career?', "How many records did Beyonc√© sell as part of Destiny's Child?", "After leaving Destiny's Child, how many records did Beyonc√© release under her own name?", 'How many Grammy awards has Beyonc√© won?', 'What magazine named Beyonc√© as the most powerful female musician for 2015?', "Beyonce's younger sibling also sang with her in what band?", 'Where did Beyonce get her name from?', "What race was Beyonce's father?"],
    "RoBERTa's answers" : ['love, relationships, and monogamy', 'influential',
                           'Forbes', '2000s', 'Forbes', 'modern-day feminist',
                           '2013 and 2014', '118 million', '60 million',
                           '118 million', '20', 'Forbes', "Destiny's Child",
                           "her mother's maiden name", "African-American"],
    "ChatGPT's answers" : [
          "Themes of love, relationships, monogamy, female sexuality, and empowerment.",
          "One of the 100 most influential people in the world.",
          "Forbes magazine.",
          "The 2000s decade.",
          "Forbes magazine.",
          "\"Modern-day feminist.\"",
          "2013 and 2014.",
          "178 million records.",
          "60 million records.",
          "118 million records.",
          "20 Grammy Awards.",
          "Forbes magazine.",
          "Destiny's Child.",
          "A tribute to her mother's maiden name.",
          "African-American."
    ],
    "ChatGPT Prompt template 2's answers" : [
          "Themes of love, relationships, monogamy, female sexuality, and empowerment",
          "Influential people",
          "Billboard",
          "The 2000s",
          "Forbes",
          "\"Modern-day feminist\"",
          "2013 and 2014",
          "118 million",
          "60 million",
          "118 million",
          "20",
          "Forbes",
          "Destiny's Child",
          "Her mother's maiden name",
          "African-American"
    ],
    "Gemini 3.0's answers" : [
          "Themes of love, relationships, and monogamy, as well as female sexuality and empowerment.",
          "Influential people in the world.",
          "Forbes magazine.",
          "The 2000s decade.",
          "Forbes magazine.",
          "A \"modern-day feminist\".",
          "2013 and 2014.",
          "Over 178 million records (over 118 million + a further 60 million).",
          "60 million.",
          "Over 118 million records.",
          "20 Grammy Awards.",
          "Forbes magazine.",
          "Destiny's Child.",
          "Her mother's maiden name (Beyinc√©).",
          "African-American."
    ],
    "Gemini 3.0's Prompt template 2 answers" :[
          "Themes of love, relationships, and monogamy, as well as female sexuality and empowerment.",
          "Influential people in the world.",
          "Forbes magazine.",
          "2000s decade.",
          "Forbes magazine.",
          "Modern-day feminist.",
          "2013 and 2014.",
          "Over 178 million records (over 118 million as a solo artist + 60 million with Destiny's Child).",
          "60 million records.",
          "Over 118 million records.",
          "20 Grammy Awards.",
          "Forbes magazine.",
          "Destiny's Child.",
          "Her mother's maiden name, Beyinc√©.",
          "African-American."
    ]

}

result_table = pd.DataFrame(result_table_elements)
result_table

Unnamed: 0,Answer for,Ground truth,Questions,RoBERTa's answers,ChatGPT's answers,ChatGPT Prompt template 2's answers,Gemini 3.0's answers,Gemini 3.0's Prompt template 2 answers
0,Q1,"love, relationships, and monogamy","In her music, what are some recurring elements...","love, relationships, and monogamy","Themes of love, relationships, monogamy, femal...","Themes of love, relationships, monogamy, femal...","Themes of love, relationships, and monogamy, a...","Themes of love, relationships, and monogamy, a..."
1,Q2,influential,Time magazine named her one of the most 100 wh...,influential,One of the 100 most influential people in the ...,Influential people,Influential people in the world.,Influential people in the world.
2,Q3,Forbes,Which magazine declared her the most dominant ...,Forbes,Forbes magazine.,Billboard,Forbes magazine.,Forbes magazine.
3,Q4,2000s,In which decade did the Recording Industry Ass...,2000s,The 2000s decade.,The 2000s,The 2000s decade.,2000s decade.
4,Q5,Forbes,What magazine rated Beyonce as the most powerf...,Forbes,Forbes magazine.,Forbes,Forbes magazine.,Forbes magazine.
5,Q6,modern-day feminist,How did Beyonce describe herself as a feminist?,modern-day feminist,"""Modern-day feminist.""","""Modern-day feminist""","A ""modern-day feminist"".",Modern-day feminist.
6,Q7,2013 and 2014,In which years did Time rate Beyonce in the 10...,2013 and 2014,2013 and 2014.,2013 and 2014,2013 and 2014.,2013 and 2014.
7,Q8,118 million,How many records has Beyonce sold in her 19 ye...,118 million,178 million records.,118 million,Over 178 million records (over 118 million + a...,Over 178 million records (over 118 million as ...
8,Q9,60 million,How many records did Beyonc√© sell as part of D...,60 million,60 million records.,60 million,60 million.,60 million records.
9,Q10,118 million,"After leaving Destiny's Child, how many record...",118 million,118 million records.,118 million,Over 118 million records.,Over 118 million records.


RoBERTa and ChatGPT with Prompt Template 2 give complete right answers. There Score is 1 or 100%.
Chat GPT with Prompt Template 1, Gemini 3.0, and Gemini 3.0 with Prompt Template 2 make mistake in Q8 and have overall score of 14/15 (0.93333) or 93%.
This maybe because of these models are adding 60 million separate records with 118 million records.

### 2) (10 points) Provide comparison discussions and analysis based on the output and evaluation results you obtained from each model. For example, which model performs best, which performs worst, and what is the possible reason for such results. Based on this question, what is your experience about the advantages and disadvantages of the pre-trained language models and the large language models?

Summary of Model Performance
Model / Prompt Setup	Accuracy	Notes
RoBERTa (SQuAD fine-tuned)
15/15 (100%)	Perfect accuracy.
ChatGPT  Prompt Template 2
15/15 (100%)	Perfect accuracy.

ChatGPT Prompt Template 1
14/15 (93.3%)	Incorrect Q8 (overcounted total records).

Gemini 3.0 Prompt Template 1
14/15 (93.3%)	Same Q8 error.

Gemini 3.0 Prompt Template 2
14/15 (93.3%)	Same Q8 error.



All of the large language models (ChatGPT + Gemini), except ChatGPT with a stricter prompt template (#2), made the same exact error on Q8, inflating the figure by incorrectly summing two numbers given in the context.

RoBERTa‚Äîbeing a span extractor did not make this mistake.

2. Why Did the Q8 Error Occur?
The ground truth was: 118 million records.

ChatGPT (Template 1) and both Gemini versions instead answered:
Over 178 million records
(= 118 million solo + 60 million with Destiny‚Äôs Child)

This is one of the important difference between: Pretrained extractive QA models (like RoBERTa) vs. Generative LLMs (ChatGPT/Gemini)

Specifically:
LLMs often infer or add knowledge beyond the passage.
LLMs try to be ‚Äúhelpful‚Äù and combine facts, even if the QA task requires strict extraction.

When they see:
Destiny's Child sold 60M, and Beyonc√© solo sold 118M,
the LLM inferentially adds them, thinking the user wants a cumulative figure.

However, interestingly, Prompt Template 2 constrained the model.
and forced ChatGPT to extract strictly from text, list entities, and show reasoning tied to the passage

This discouraged hallucination or in this case external reasoning.
Thus, ChatGPT with Template 2 stayed grounded and achieved 100%.

### 3) (10 points) Select two specific data samples and compare the returned answers by the three models and compare the answers with the ground truth answers. Provide some discussions based on your findings on these two specific examples.

Example 1 : Q8

Question: How many records has Beyonc√© sold in her 19-year solo career?
Ground truth answer: 118 million

Model / Prompt	Returned Answer

RoBERTa:	118 million

ChatGPT ‚Äì Prompt Template 1:	178 million records

ChatGPT ‚Äì Prompt Template 2:	118 million

Gemini 3.0 ‚Äì Prompt 1:	Over 178 million records (over 118 million + 60 million...)

Gemini 3.0 ‚Äì Prompt 2:	Over 178 million records (over 118 million as solo + 60 million as Destiny‚Äôs Child...)

Analysis:

RoBERTa: give correct answer. Extractive QA model directly selects the span from the passage, no reasoning or inference.

ChatGPT (Prompt 1): Overestimates. Combines Beyonc√© solo sales (118M) with Destiny‚Äôs Child sales (60M). Shows LLM‚Äôs tendency to reason or infer even when strict extraction is needed.

ChatGPT (Prompt 2): Correct. Prompt constraints force grounding, so it avoids overcounting.

Gemini 3.0: Same error as ChatGPT (Prompt 1), both prompts. Suggests Gemini is more prone to combining numbers even when instructed, slightly less ‚Äúgrounded‚Äù than ChatGPT in this task.

Discussion:

This example highlights the strength of extractive models for numeric/fact-based tasks.

LLMs can hallucinate or infer when the prompt is not restrictive.

Carefully designed prompt templates can make LLMs perform like extractive models.

Example 2 : Q14

Question: Where did Beyonc√© get her name from?

Ground truth answer: her mother's maiden name

Model / Prompt	Returned Answer

RoBERTa:	her mother's maiden name

ChatGPT ‚Äì Prompt Template 1:	A tribute to her mother's maiden name
ChatGPT ‚Äì Prompt Template 2:	Her mother's maiden name
Gemini 3.0 ‚Äì Prompt Template 1:	Her mother's maiden name (Beyinc√©)
Gemini 3.0 ‚Äì Prompt Template 2:	Her mother's maiden name, Beyinc√©

Analysis:

RoBERTa: Correct, literal extraction.

ChatGPT: Both prompts correct in content, slight differences in wording. Template 1 adds a phrase (‚Äúa tribute to‚Äù), Template 2 sticks to the literal phrase.

Gemini 3.0: Adds additional detail ‚ÄúBeyinc√©‚Äù (the actual maiden name). Factually correct, though a little add-on to the ground truth answer.

**Discussion**:

LLMs provide more informative or elaborated answers.
They may slightly deviate from the exact phrasing of the ground truth but are still correct.

This shows the advantage of generative models in giving richer, human-readable answers.

RoBERTa is strictly extractive and cannot add context or clarification.

Key Observations from the Two Examples:

- Extraction vs. Generation: RoBERTa always matches ground truth exactly but cannot elaborate. LLMs can add reasoning or context, which is sometimes beneficial (Q14) but can cause errors (Q8).

- Prompt Sensitivity: ChatGPT with Template 2 avoids the Q8 error by forcing grounding. Gemini seems less sensitive to prompt constraints in numeric aggregation tasks.

- Tradeoff Between Accuracy and Informativeness: Extractive models ‚Üí precise, literal, perfect for numbers/facts. LLMs maybe flexible, can elaborate or reason, but may hallucinate without careful prompts.

Moral/Learing of the Assignment: For numeric QA tasks requiring exact answers, extractive grounding or carefully designed prompts is critical. For richer explanatory answers, LLMs are advantageous.

### 4. Self-Reflection (5 points)
This question is required in the submission to help you assess your understanding of the related topics and identify the areas for further improvement and set goals accordingly.

It also promotes communication between you and the instructor and will further improve the course development and learning experience.
Question Answers
(provide the answer index)

Q1: Rate your performance


4-Somewhat satisfied

Q2: How much does your understanding of the concepts influence
your performance?


5-Very important

Q3: How much does the design of the examples/questions (is relatable
to you) influence your performance?

4-Somewhat important


Q4: When you answer Q3, which are the primary factors influencing
the relatability of the examples to you?

3-Age

6- Hobbies and interests

7- Prior knowledge about the context in the example

REFERENCE:

[1] Rajpurkar, SQuAD v2 Dataset. (https://huggingface.co/datasets/rajpurkar/squad_v2)

[2] Hugging Face Transformers Quickstart. (https://huggingface.co/datasets/rajpurkar/squad_v2)

[3] Hugging Face. (n.d.). `deepset/roberta-base-squad2`. (https://huggingface.co/deepset/roberta-base-squad2)

[4] OpenAI. (n.d.). ChatGPT. (https://openai.com/chatgpt)

[5] Google. (n.d.). Gemini. (https://deepmind.google/technologies/gemini/)

In [None]:
#--------------------------------------------- Thank you! EOF ---------------------------------------------#