Nome: Fabio Grassiotto  
RA: 890441

# Aula 10_11 - Implementação do RAGAS

-Implementar o RAGAS com o LLaMA-3 70B para avaliar a qualidade das 50 anotações do IIRC usadas no exercício passado.  
-O RAGAS considera context, question, answer, keys que estão disponíveis no conjunto de teste do IIRC.  
Opcional:  
-Avaliar as respostas do exercício da aula 9_10  
-Usar multi agents  


## Setup Environment
### Install packages

In [1]:
%%capture
%pip install -q torch
%pip install groq
%pip install llama-index-llms-groq
%pip install langchain
%pip install langchain-groq
%pip install -U sentence-transformers
%pip install faiss-cpu
%pip install rank_bm25
%pip install bs4
%pip install spacy
!python -m spacy download en_core_web_sm
%pip install ragas

### Imports

In [2]:
import os
import sys
import json
import torch
from collections import Counter
import string
import re
from bs4 import BeautifulSoup
from tqdm import tqdm
import spacy
import pandas as pd
import warnings
warnings.simplefilter('ignore')

# Langchain
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from langchain.tools import tool
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.prompts import PromptTemplate

# Sentence Transformers
from sentence_transformers import SentenceTransformer, CrossEncoder
import faiss

# Ragas
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas import evaluate

### Collab setup

In [3]:
# Colab environment
IN_COLAB = 'google.colab' in sys.modules

if (IN_COLAB):
    # Google Drive
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)

    project_folder="/content/drive/MyDrive/Classes/IA024/Aula_9_10"
    os.chdir(project_folder)
    !ls -la

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cuda


## Globals

In [4]:
NUM_QUESTIONS = 50
LLM_MODEL_NAME="llama3-70b-8192"
SEARCH_MODEL_NAME = "sentence-transformers/msmarco-MiniLM-L-6-v3"
CROSSENCODER_MODEL_NAME = "cross-encoder/ms-marco-MiniLM-L-6-v2"

## IIRC Dataset

In [5]:
if not os.path.exists("dataset/context_articles.json"):
    !wget http://jamesf-incomplete-qa.s3.amazonaws.com/context_articles.tar.gz
    !tar -xzf context_articles.tar.gz --directory dataset
    !rm context_articles.tar.gz

In [6]:
test_dataset  = json.load(open('dataset/test_questions.json', 'r'))
articles = json.load(open('dataset/context_articles.json', 'r'))

In [7]:
test_dataset[5]

{'answer': {'type': 'span',
  'answer_spans': [{'text': 'White Hart Lane',
    'passage': 'Chris Brunt',
    'type': 'answer',
    'start': 622,
    'end': 637}]},
 'question': 'Which stadium where Brunt played can hold more people, The Hawthorns or White Hart Lane?',
 'context': [{'text': 'at The Hawthorns',
   'passage': 'main',
   'indices': [139, 155]},
  {'text': 'at White Hart Lane', 'passage': 'main', 'indices': [619, 637]},
  {'text': 'The Hawthorns is an all-seater football stadium in West Bromwich, West Midlands, England, with a capacity of 26,688',
   'passage': 'The Hawthorns',
   'indices': [0, 115]},
  {'text': ' Its capacity varied over the years; when changed to all-seater it had a capacity of 36,284 before demolition.',
   'passage': 'White Hart Lane',
   'indices': [132, 242]}],
 'question_links': ['The Hawthorns', 'White Hart Lane'],
 'title': 'Chris Brunt'}

### Load and parse

In [8]:
# Grab the first 50 questions  # type: ignorewith an answer
def grab_documents():

  questions_found = []
  num_questions_found = 0
  documents = []
  all_titles = []

  for item in test_dataset:
    question = item['question']
    answer = item['answer']
    answer_type = answer['type']

    if answer_type == 'binary' or answer_type == 'value':
      final_answer = answer['answer_value']
    elif answer_type == 'span':
      final_answer = answer['answer_spans'][0]['text']
    elif answer_type == 'none':
      final_answer = 'none'
    else:
      final_answer = 'An error perhaps, bad type'
      print(answer_type)

    if (final_answer == 'none'):
      # Skip this one.
      continue
    else:
      # Thats a good question.
      
      # First add some extra info in the context-part
      for context in item['context']:
          # Cleanup html tags
          soup = BeautifulSoup(context['text'], 'html.parser')
          clean_text = soup.get_text()

          documents.append({
                  "title": item['title'].lower(),
                  "content": clean_text
            }
          )
          
      all_titles.append(item['title'].lower())

      # And then grab the text from the articles json 
      for link in item["question_links"]:
        if link.lower() in articles and link.lower() not in all_titles:
          # Cleanup html tags
          soup = BeautifulSoup(articles[link.lower()], 'html.parser')
          clean_text = soup.get_text()

          documents.append({
            "title": link,
            "content": clean_text
          })
        all_titles.append(link.lower())

      # Fix leading, trailing and excessive spaces between words.
      question = question.strip()
      question = " ".join(question.split())

      questions_found.append({"Id": num_questions_found, "Question": question.strip(), "Answer": final_answer.strip()})
      num_questions_found += 1
      if (num_questions_found == NUM_QUESTIONS):
        # found our questions
        break

  return questions_found, documents, all_titles

questions_to_ask, documents, all_titles = grab_documents()

### Sequence segmentation (3 per document)

In [9]:
contents_list = []
nlp = spacy.load('en_core_web_sm')

for t in tqdm(documents):
    paragraph = t.get('content')
    paragraph = paragraph[:512] 

    sentences = list(sent.text for sent in nlp(paragraph).sents)
    sentences = sentences[:3]
    excerpt = ""
    for s in sentences:
        excerpt += s
    contents_list.append(excerpt)

100%|██████████| 192/192 [00:01<00:00, 124.90it/s]


### Inspecting data

In [10]:
len(questions_to_ask)

50

In [11]:
len(contents_list)

192

In [12]:
contents_list[:10]

['he Palici the sons of Zeus',
 'in Greek mythology',
 'Zeus (British English , North American English ; , Zeús ) is the sky and thunder god in ancient Greek religion',
 "Greek mythology is the body of myths originally told by the ancient Greeks and a genre of Ancient Greek folklore.These stories concern the origin and the nature of the world, the lives and activities of deities, heroes, and mythological creatures, and the origins and significance of the ancient Greeks' own cult and ritual practices.Modern scholars study the myths in an attempt to shed light on the religious and political institutions of ancient Greece and its civilization, and to gain understanding of the",
 'Zeus (British English , North American English ; , Zeús ) is the sky and thunder god in ancient Greek religion, who rules as king of the gods of Mount Olympus.His name is cognate with the first element of his Roman equivalent Jupiter.His mythologies and powers are similar, though not identical, to those of Indo-E

In [13]:
all_titles[:10]

['palici',
 'greek mythology',
 'zeus',
 'giovanni messe',
 'world war i',
 'giovanni messe',
 'world war i',
 'casino royale (1967 film)',
 'angela scoular',
 "on her majesty's secret service (film)"]

In [14]:
questions_to_ask[0]

{'Id': 0,
 'Question': 'What is Zeus know for in Greek mythology?',
 'Answer': 'sky and thunder god'}

## FAISS Indexing

In [15]:
model = SentenceTransformer(SEARCH_MODEL_NAME)
model.to(device)

embeddings = model.encode(contents_list, show_progress_bar=True)

Batches: 100%|██████████| 6/6 [00:00<00:00,  9.17it/s]


In [18]:
# Create a FAISS index from the embeddings
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

## Langchain Groq API

In [19]:
def load_groq_key():
    try:
        # Open and read the entire content of the file
        with open("groq-key.txt", 'r') as file:
            contents = file.read()
        
        return contents
    
    except FileNotFoundError:
        print(f"The file does not exist.")
        return None
    except Exception as e:
        # Handle other potential exceptions (e.g., permission errors)
        print(f"An error occurred while reading the file: {str(e)}")
        return None
    
chat = ChatGroq(temperature=0, groq_api_key=load_groq_key(), model_name=LLM_MODEL_NAME)

In [20]:
system = "You are a helpful assistant."
human = "{text}"
prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)])

chain = prompt | chat
chain.invoke({"text": "Explain the importance of low latency LLMs."})

AIMessage(content='Low-latency Large Language Models (LLMs) are crucial in various applications where real-time or near-real-time processing is essential. Here are some reasons why low-latency LLMs are important:\n\n1. **Interactive Systems**: In applications like chatbots, virtual assistants, and conversational AI, low-latency LLMs enable rapid response times, creating a more natural and engaging user experience. This is particularly important in customer-facing applications where delayed responses can lead to frustration and abandonment.\n2. **Real-time Decision Making**: In areas like finance, healthcare, and cybersecurity, timely decision-making is critical. Low-latency LLMs can quickly analyze vast amounts of data, enabling swift identification of patterns, anomalies, and insights, which inform timely decisions.\n3. **Live Content Generation**: Low-latency LLMs are essential for generating content in real-time, such as live subtitles, closed captions, or automated summarization of

## ReAct Agent Implementation

### Prompt from Langchain (https://smith.langchain.com/hub/hwchase17/react-chat)

This prompt was modified:

- to remove the chat history, so every question is a new chat with the LLM.
- to request the model to use short answers, and to answer with numerals instead of words for numbers, to increase our F1 score.

In [21]:
LANGCHAIN_CHAT_SYSTEM_HEADER = """\

Assistant is a large language model trained by Meta AI.
Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.
Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.
Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.

TOOLS:
------
Assistant has access to the following tools:
{tools}
To use a tool, please use the following format:
```
Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
```
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
```
Thought: Do I need to use a tool? No
Final Answer: [your response here]
```
Your final answer must be short, no more than 10 words, and use numerals instead of words for numbers.
Begin!
New input: {input}
{agent_scratchpad}
"""

### Agent tool definition

Top 5 hits returned from the search, using a crossencoder for reranking.

In [27]:
# Prepare context list for Ragas evaluation
contexts = []
for item in tqdm(questions_to_ask):
    question_id = item.get("Id")
    question = item.get('Question')
                        
    xq = model.encode([question])
    D, I = index.search(xq, 5)  # search
    results = []
    for i in range(0, 5):
        results.append(contents_list[I[0][i]]) 

    crossencoder = CrossEncoder(CROSSENCODER_MODEL_NAME)
    # Concatenate the query and all passages and predict the scores for the pairs [query, passage]
    crossencoder_inputs = [[question, passage] for passage in results]
    scores = crossencoder.predict(crossencoder_inputs)

    # Sort the scores in decreasing order
    reranked_results = [{"input": inp, "score": score} for inp, score in zip(crossencoder_inputs, scores)]
    reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)

    cross_enc_res = []
    for res in reranked_results:
        cross_enc_res.append(res['input'][1])

    contexts.append(cross_enc_res)
    
@tool
def get_top5_rerank(question: str) -> list[str]:
    """Returns a list of five strings with texts to help solve a question, using a cross-encoder for re-ranking."""
    xq = model.encode([question])
    D, I = index.search(xq, 5)  # search
    results = []
    for i in range(0, 5):
        results.append(contents_list[I[0][i]]) 

    crossencoder = CrossEncoder(CROSSENCODER_MODEL_NAME)
    # Concatenate the query and all passages and predict the scores for the pairs [query, passage]
    crossencoder_inputs = [[question, passage] for passage in results]
    scores = crossencoder.predict(crossencoder_inputs)

    # Sort the scores in decreasing order
    reranked_results = [{"input": inp, "score": score} for inp, score in zip(crossencoder_inputs, scores)]
    reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)

    cross_enc_res = []
    for res in reranked_results:
        cross_enc_res.append(res['input'][1])
    return cross_enc_res

100%|██████████| 50/50 [00:38<00:00,  1.30it/s]


## Execution Run

### Evaluation functions

In [29]:
def normalize_answer(s):
  """
  Taken from the official evaluation script for v1.1 of the SQuAD dataset.
  Lower text and remove punctuation, articles and extra whitespace.
  """

  def remove_articles(text):
      return re.sub(r"\b(a|an|the)\b", " ", text)

  def white_space_fix(text):
      return " ".join(text.split())

  def remove_punc(text):
      exclude = set(string.punctuation)
      return "".join(ch for ch in text if ch not in exclude)

  def lower(text):
      return text.lower()

  return white_space_fix(remove_articles(remove_punc(lower(s))))

def token_f1_score(prediction, ground_truth):
  """
  Taken from the official evaluation script for v1.1 of the SQuAD dataset.
  """
  prediction_tokens = normalize_answer(prediction).split()
  ground_truth_tokens = normalize_answer(ground_truth).split()
  common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
  num_same = sum(common.values())
  if num_same == 0:
      return 0
  precision = 1.0 * num_same / len(prediction_tokens)
  recall = 1.0 * num_same / len(ground_truth_tokens)
  f1 = (2 * precision * recall) / (precision + recall)
  return f1

### Analysing the output for 50 questions

In [30]:
def evaluate(executor):
    # Create dataframe with question, answer, Llm answer and scores
    df_results = pd.DataFrame(columns=['question', 'answer', 'LLM answer', 'F1', 'Exact Match'])

    for item in tqdm(questions_to_ask):
        question = item.get('Question')
        answer = normalize_answer(item.get('Answer'))

        # RAG context using search
        agent_answer = executor.invoke({"input": "Question: " + question})
        llm_answer = normalize_answer(agent_answer['output'])
        f1_score = token_f1_score(llm_answer, answer)
        e_match_score = 1 if llm_answer == answer else 0
        
        row = pd.Series([question, answer, llm_answer, f1_score, e_match_score], index=df_results.columns)
        df_results = pd.concat([df_results, pd.DataFrame([row])], ignore_index=True)

    return df_results

In [31]:
def run_agent(tool):
    prompt = PromptTemplate.from_template(LANGCHAIN_CHAT_SYSTEM_HEADER)
    llm = ChatGroq(temperature=0,
                model_name=LLM_MODEL_NAME,
                api_key=load_groq_key())
    tools = [tool]
    agent = create_react_agent(llm, tools, prompt)
    agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
    df = evaluate(agent_executor)
    return df

def print_results(df, results_file):
    display(df)
    df.to_csv(results_file)
    print()
    print("Metrics:")
    print(f"F1 score: Avg: {df['F1'].mean():.2f}. Std: {df['F1'].std():.2f}")
    print(f"Exact Match score: Avg: {df['Exact Match'].mean():.2f}. Std: {df['Exact Match'].std():.2f}")

### Searching with Tool returning top 5 hits with Rerank

In [32]:
df = run_agent(get_top5_rerank)

  0%|          | 0/50 [00:00<?, ?it/s]



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: What is Zeus know for in Greek mythology?[0m[36;1m[1;3m['Zeus (British English , North American English ; , Zeús ) is the sky and thunder god in ancient Greek religion, who rules as king of the gods of Mount Olympus.His name is cognate with the first element of his Roman equivalent Jupiter.His mythologies and powers are similar, though not identical, to those of Indo-European deities such as Jupiter, Perkūnas, Perun, Indra and Thor.\n\n', 'Zeus (British English , North American English ; , Zeús ) is the sky and thunder god in ancient Greek religion', "Greek mythology is the body of myths originally told by the ancient Greeks and a genre of Ancient Greek folklore.These stories concern the origin and the nature of the world, the lives and activities of deities, heroes, and mythological creatures, and the origins and significance of the ancient Greeks

  2%|▏         | 1/50 [00:02<02:08,  2.62s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: King of gods in Greek mythology[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How long had the First World War been over when Messe was named aide-de-camp?[0m[36;1m[1;3m['he became aide-de-camp to King Victor Emmanuel III, holding this post from 1923 to 1927', 'World War I (often abbreviated as WWI or WW1), also known as the First World War, the Great War, the Seminal Catastrophe, and initially in North America as the European War, was a global war originating in Europe that lasted from 28 July 1914 to 11 November 1918.Contemporaneously described as "the war to end all wars", it led to the mobilisation of more than 70\xa0million military personnel, including 60 million Europeans, making it one of the largest wars in history.It is also one of the deadliest conflicts', 'The 1936 Summer Olympics (German: Olympis

  4%|▍         | 2/50 [00:04<01:56,  2.42s/it]

[32;1m[1;3mThought: Do I need to use a tool? No
Final Answer: 5 years[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How old was Messe when the First World War started?[0m[36;1m[1;3m['World War I (often abbreviated as WWI or WW1), also known as the First World War, the Great War, the Seminal Catastrophe, and initially in North America as the European War, was a global war originating in Europe that lasted from 28 July 1914 to 11 November 1918.Contemporaneously described as "the war to end all wars", it led to the mobilisation of more than 70\xa0million military personnel, including 60 million Europeans, making it one of the largest wars in history.It is also one of the deadliest conflicts', 'Messe was born in Mesagne, in the Province of Brindisi in the Apulia region of Italy on 10 December 1883.', 'a global war originating in Europe that lasted from 28 July 1914

  6%|▌         | 3/50 [00:07<01:47,  2.29s/it]

[32;1m[1;3mThought: Do I need to use a tool? No
Final Answer: 31[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How long had Angela Scoular been acting professionally when she appeared in the movie "On Her Majesty's Secret Service"?[0m[36;1m[1;3m["Angela Scoular appeared as Ruby Bartlett in On Her Majesty's Secret Service", 'Angela Margaret Scoular (8 November 1945 – 11 April 2011) was an English actress.\n\nEarly life.Her father was an engineer and she was born in London.', "On Her Majesty's Secret Service is a 1969 British spy film and the sixth in the James Bond series produced by Eon Productions.It is based on the 1963 novel of the same name by Ian Fleming.Following Sean Connery's decision to retire from the role after You Only Live Twice, Eon Productions selected an unknown actor and model, George Lazenby, to play the part of James Bond.", 'On 7 May 1946, I

  8%|▊         | 4/50 [00:09<01:43,  2.24s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 14 years[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: What is the capacity of the stadium where Brunt returned to action after a torn ACL?[0m[36;1m[1;3m['Brunt returned to first-team action after eight months out with an anterior cruciate knee', "He was fully fit for the club's season opening match against , where he contributed 14 disposals.The loss of All-Australian defender and teammate Alex Rance to an ACL injury in that match however, would force Vlastuin into a more defensive role", 'his 200th AFL match in the season opening match against Carlton at the MCG in late March.Late in the third quarter of that match, Rance landed awkwardly during a marking contest and severely injured his right knee in what coach Damien Hardwick labelled a "suspected ACL (tear)"', 'he was left out of both the final team and 

 10%|█         | 5/50 [00:11<01:38,  2.20s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: MCG has 100,024 capacity.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which stadium where Brunt played can hold more people, The Hawthorns or White Hart Lane?[0m[36;1m[1;3m["White Hart Lane was a football stadium in Tottenham, North London and the home of Tottenham Hotspur Football Club from 1899 to 2017.Its capacity varied over the years; when changed to all-seater it had a capacity of 36,284 before demolition.The stadium was fully demolished after the end of the 2016–17 season and was replaced by Tottenham Hotspur Stadium as the club's home ground.\n\n", 'The Hawthorns is an all-seater football stadium in West Bromwich, West Midlands, England, with a capacity of 26,688', 'The Hawthorns is an all-seater football stadium in West Bromwich, West Midlands, England, with a capacity of 26,688', 'The Hawthorns is

 12%|█▏        | 6/50 [00:13<01:36,  2.18s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: White Hart Lane[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: In which country was Wilhelm Müller born?[0m[36;1m[1;3m['Wilhelm Max Müller (15 May 1862 – July 1919) was an American orientalist.\n\nBiography.He was born at Gleißenberg, Germany, the son of Friedrich Max Müller and the grandson of German romantic poet Wilhelm Müller.', 'Wilhelm Müller was born on 7 October 1794 at Dessau', 'Friedrich Max Müller (; 6 December 1823\xa0– 28 October 1900) was a German-born philologist and Orientalist, who lived and studied in Britain for most of his life.He was one of the founders of the western academic field of Indian studies and the discipline of Study of religions (science of religion, Religionswissenschaft).Müller wrote both scholarly and popular works on the subject of Indology.', 'Max Müller was born into a cu

 14%|█▍        | 7/50 [00:39<07:10, 10.01s/it]

[32;1m[1;3mThought: Do I need to use a tool? No
Final Answer: Germany[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which battle Wilhelm Müller fought in while in the Prussian army had the highest casualty?[0m[36;1m[1;3m['In the Battle of Bautzen (20–21 May 1813) a combined Russian–Prussian army, that was massively outnumbered, was pushed back by Napoleon I of France but escaped destruction, some sources claiming that Michel Ney failed to block their retreat.The Prussians under Count Gebhard von Blücher and Russians under Prince Peter Wittgenstein, retreating after their defeat at Lützen were attacked by French forces under Napoleon.\n\nPrelude.', 'Friedrich Max Müller (; 6 December 1823\xa0– 28 October 1900)', 'Wilhelm Max Müller (15 May 1862 – July 1919) was an American orientalist.\n\nBiography.He was born at Gleißenberg, Germany, the son of Friedrich Max Mü

 16%|█▌        | 8/50 [00:59<09:12, 13.16s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Battle of Kulm[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How much time had passed between Wilhelm's return to studies at Berlin and the birth of his son Friedrich?[0m[36;1m[1;3m['In 1814 he returned to his studies at Berlin.', 'Friedrich Max Müller (; 6 December 1823\xa0– 28 October 1900)', 'Wilhelm Max Müller (15 May 1862 – July 1919) was an American orientalist.\n\nBiography.He was born at Gleißenberg, Germany, the son of Friedrich Max Müller and the grandson of German romantic poet Wilhelm Müller.', 'Friedrich Max Müller (; 6 December 1823\xa0– 28 October 1900) was a German-born philologist and Orientalist, who lived and studied in Britain for most of his life.He was one of the founders of the western academic field of Indian studies and the discipline of Study of religions (science of religion, Religi

 18%|█▊        | 9/50 [01:28<12:17, 17.98s/it]

[32;1m[1;3mThought: Do I need to use a tool? No
Final Answer: 9 years[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which of Wilhelm's direct male descendants lived the longest?[0m[36;1m[1;3m['his grandson Wilhelm Max Müller', 'Wilhelm Max Müller (15 May 1862 – July 1919) was an American orientalist.\n\nBiography.He was born at Gleißenberg, Germany, the son of Friedrich Max Müller and the grandson of German romantic poet Wilhelm Müller.', 'Friedrich Max Müller (; 6 December 1823\xa0– 28 October 1900)', 'Friedrich Max Müller (; 6 December 1823\xa0– 28 October 1900) was a German-born philologist and Orientalist, who lived and studied in Britain for most of his life.He was one of the founders of the western academic field of Indian studies and the discipline of Study of religions (science of religion, Religionswissenschaft).Müller wrote both scholarly and popular 

 20%|██        | 10/50 [01:54<13:42, 20.57s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Wilhelm Max Müller (1919)[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: In which Italian region did Pesce studied medicine?[0m[36;1m[1;3m['and studied medicine at the University of Genoa in Italy.', 'The University of Genoa, known also with the acronym UniGe (), is one of the largest universities in Italy.It is located in the city of Genoa and regional Metropolitan City of Genoa, on the Italian Riviera in the Liguria region of northwestern Italy.The original university was founded in 1481.\n\n', "In the Battle of Lützen (German: Schlacht von Großgörschen, 2 May 1813), Napoleon I of France halted the advances of the Sixth Coalition after the French invasion of Russia and the massive French losses in the campaign.The Russian commander, Prince Peter Wittgenstein, attempting to forestall Napoleon's capture of Lei

 22%|██▏       | 11/50 [02:17<13:51, 21.31s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Liguria[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which country hosted the 1948 Leprosy Congress?[0m[36;1m[1;3m['The 1948 World Leprosy Congress in Havana endorsed the Latin-American strategy', 'The 1936 Summer Olympics (German: Olympische Sommerspiele 1936), officially known as the Games of the XI Olympiad, was an international multi-sport event held in 1936 in Berlin, Nazi Germany.Berlin won the bid to host the Games over Barcelona, Spain, on 26 April 1931, at the 29th IOC Session in Barcelona.It marked the second and final time the International Olympic Committee gathered to vote in a city that was bidding to host those Games.\n\n', ' It was founded as the Imperial Cricket Conference in 1909', 'he became aide-de-camp to King Victor Emmanuel III, holding this post from 1923 to 1927', ' In 1900, with the

 24%|██▍       | 12/50 [02:45<14:50, 23.44s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Cuba[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: When was the sports organization that the Turks and Caicos Islands became affiliate members in 2002 founded?[0m[36;1m[1;3m['Cricket is the national sport of the Turks and Caicos Islands.They became an affiliate member of the International Cricket Council (ICC) in 2002', 'The Atlanta Silverbacks FC was an American professional soccer club based in Atlanta, Georgia.Founded in 1998 as Atlanta Ruckus, the club played in many leagues over the years before folding after the 2015 season.The team played its home games at Atlanta Silverbacks Park, a large soccer complex featuring a 5,000-seat soccer-specific stadium in Atlanta, 15 miles northeast of downtown.', 'In July 1994, the American Professional Soccer League announced it had accepted the Atlanta Magic, owned by

 26%|██▌       | 13/50 [03:08<14:22, 23.30s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: International Cricket Council founded in 1909[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: What was the founding date of the championship the Turks and Caicos Islands national cricket team finished as runners up in 2006?[0m[36;1m[1;3m['Cricket is the national sport of the Turks and Caicos Islands.They became an affiliate member of the International Cricket Council (ICC) in 2002', ' It was founded as the Imperial Cricket Conference in 1909', 'The 2004 National League Division Series (NLDS), the opening round of the 2004 National League playoffs, began on Tuesday, October 5, and ended on Monday, October 11, with the champions of the three NL divisions—along with a "wild card" team—participating in two best-of-five series.They were:\n\n- (1) St. Louis Cardinals (Central Division champion, 105–57) vs. (3) Los An

 28%|██▊       | 14/50 [03:34<14:26, 24.06s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 1909[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How many years did the player who top-scored for Turks and Caicos Islands during the 2008 Standford 20/20 play for the team?[0m[36;1m[1;3m['Matthews played a single Twenty20 match for the Turks and Caicos Islands against Montserrat in the 2008 Stanford 20/20 at the Stanford Cricket Ground.', 'Donovan Anthony Matthews (born 29 October 1971) is a Guyanese-born cricketer who played for the Turks and Caicos Islands.Matthews was a right-handed batsman who fielded as a wicket-keeper.\n\nMatthews played a single Twenty20 match for the Turks and Caicos Islands against Montserrat in the 2008 Stanford 20/20 at the Stanford Cricket Ground.', 'They were invited to take part in the 2008 Standford 20/20, playing one match in a preliminary round against Montserrat.', 'He ha

 30%|███       | 15/50 [04:02<14:40, 25.14s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 1[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How many teams play in the NHL?[0m[36;1m[1;3m['The National Hockey League (NHL; ) is a professional ice hockey league in North America, currently comprising 31 teams: 24 in the United States and 7 in Canada.', 'The National Hockey League (NHL; ) is a professional ice hockey league in North America, currently comprising 31 teams: 24 in the United States and 7 in Canada.The NHL is considered to be the premier professional ice hockey league in the world, and one of the major professional sports leagues in the United States and Canada.The Stanley Cup, the oldest professional sports trophy in North America, is awarded annually to the league playoff champion at the end of each season.\n\n', 'the New York Rangers of the National Hockey League (NHL)', 'The first World C

 32%|███▏      | 16/50 [04:29<14:34, 25.73s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 31[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which hockey team won the World Cup in 1996?[0m[36;1m[1;3m['The first World Cup of Hockey (WCH), or 1996 World Cup of Hockey, replaced the Canada Cup as one of the premier championships for professional ice hockey.\n\nInaugural World Cup of Hockey.The first edition of the Cup featured eight teams divided into two groups.', 'and 1996 World Cup of Hockey (Finalists)', 'The National Hockey League (NHL; ) is a professional ice hockey league in North America, currently comprising 31 teams: 24 in the United States and 7 in Canada.The NHL is considered to be the premier professional ice hockey league in the world, and one of the major professional sports leagues in the United States and Canada.The Stanley Cup, the oldest professional sports trophy in North America, is

 34%|███▍      | 17/50 [04:55<14:13, 25.86s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: No hockey team won World Cup in 1996.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Who is the mayor of the city where the album was mixed by Andy Wallace?[0m[36;1m[1;3m['It was produced by Mike Elizondo and mixed in New York City by noted engineer Andy Wallace.', 'It was produced by Mike Elizondo and mixed in New York City by noted engineer Andy Wallace.', 'The present mayor is Bill de Blasio', "Monty Python's Life of Brian, also known as Life of Brian, is a 1979 British comedy film starring and written by the comedy group Monty Python (Graham Chapman, John Cleese, Terry Gilliam, Eric Idle, Terry Jones and Michael Palin).It was also directed by Jones.The film tells the story of Brian Cohen (played by Chapman), a young Jewish man who is born on the same day as—and next door to—Jesus Christ, and is subsequentl

 36%|███▌      | 18/50 [05:19<13:25, 25.17s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Bill de Blasio[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How long had Alex Rance been a professional player when he suffered an ACL injury during a season opening match?[0m[36;1m[1;3m['his 200th AFL match in the season opening match against Carlton at the MCG in late March.Late in the third quarter of that match, Rance landed awkwardly during a marking contest and severely injured his right knee in what coach Damien Hardwick labelled a "suspected ACL (tear)"', "He was fully fit for the club's season opening match against , where he contributed 14 disposals.The loss of All-Australian defender and teammate Alex Rance to an ACL injury in that match however, would force Vlastuin into a more defensive role", "Alex James Rance (born 9 October 1989) is a professional Australian rules footballer playing for the R

 38%|███▊      | 19/50 [05:43<12:53, 24.95s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 11 years[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which media that noticed the omission of accolade for Vlastuin was in operation first?[0m[36;1m[1;3m["Vlastuin started the 2019 season with appearances in each of the club's two pre-season matches", "He was fully fit for the club's season opening match against , where he contributed 14 disposals.The loss of All-Australian defender and teammate Alex Rance to an ACL injury in that match however, would force Vlastuin into a more defensive role", 'he was left out of both the final team and even the squad of 40 players for the official All-Australian award and was identified by Fox Footy and the Herald Sun as one of the most notable omissions.', 'Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major

 40%|████      | 20/50 [06:10<12:44, 25.49s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Herald Sun[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: In what country is the city located where the Arabic dialect is spoken by Jews?[0m[36;1m[1;3m['Baghdad Jewish Arabic (, ) or autonym haki mal yihud (Jewish Speech) or el-haki malna (our speech) is the Arabic dialect spoken by the Jews of Baghdad', 'Baghdad (; ) is the capital of Iraq and the second-largest city in the Arab world.Located along the Tigris River, the city was founded in the 8th century and became the capital of the Abbasid Caliphate.Within a short time of its inception, Baghdad evolved into a significant cultural, commercial, and intellectual center of the Muslim world.', 'Eduard van Beinum (; 3 September 1901 – 13 April 1959, Amsterdam) was a Dutch conductor.\n\nBiography.Van Beinum was born in Arnhem, Netherlands, where he received his f

 42%|████▏     | 21/50 [06:36<12:29, 25.84s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Iraq[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How many years were there between Pompeii's destruction and the discovery of the Maya site of Cerén?[0m[36;1m[1;3m['His largest known body of research comes from the archaeology of the Maya site of Cerén, or Joya de Cerén, in the Zapotitan Valley of El Salvador which has been ongoing since its discovery in 1978.', 'Joya de Cerén (Jewel of Cerén in the Spanish language) is an archaeological site in La Libertad Department, El Salvador, featuring a pre-Columbian Maya farming village.\nThe ancient Maya site of Joya de Cerén is located in the Zapotitán Valley 36 kilometers northwest of San Salvador, El Salvador.It is often referred to as the "Pompeii of the Americas", in comparison to the famed Ancient Roman ruins.', 'Pompeii, along with Herculaneum and many villa

 44%|████▍     | 22/50 [07:07<12:39, 27.13s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 1733[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which Central American country Payson D. Sheets worked at is the largest in area?[0m[36;1m[1;3m['His largest known body of research comes from the archaeology of the Maya site of Cerén, or Joya de Cerén, in the Zapotitan Valley of El Salvador', 'His largest known body of research comes from the archaeology of the Maya site of Cerén, or Joya de Cerén, in the Zapotitan Valley of El Salvador which has been ongoing since its discovery in 1978.', 'Eduard van Beinum (; 3 September 1901 – 13 April 1959, Amsterdam) was a Dutch conductor.\n\nBiography.Van Beinum was born in Arnhem, Netherlands, where he received his first violin and piano lessons at an early age.', 'New Comiskey Park (redubbed U.S. Cellular in 2003 and Guaranteed Rate Field in 2016)', 'The following c

 46%|████▌     | 23/50 [07:31<11:49, 26.28s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: El Salvador[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which university did Don Branby played college football for?[0m[36;1m[1;3m['Branby played college football at the end position for the Colorado Buffaloes football team.', 'Branby played college football at the end position for the Colorado Buffaloes football team', 'he was selected by the Associated Press as a first-team player on its 1952 College Football All-America Teams.', 'The Colorado Buffaloes football program represents the University of Colorado Boulder in college football at the NCAA Division I FBS level.The team is currently a member of the Pac-12 Conference, having previously been a charter member of the Big 12 Conference.Before joining the Big 12, they were members of the Big Eight Conference.', 'Suh Yun-bok, a sports coach at Anyang Techn

 48%|████▊     | 24/50 [08:02<11:59, 27.67s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: University of Colorado[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Who were the other end position players on the All-America Teams the same year Don Branby was selected?[0m[36;1m[1;3m['Branby played college football at the end position for the Colorado Buffaloes football team.', 'Branby played college football at the end position for the Colorado Buffaloes football team', 'The following chart identifies the NCAA-recognized consensus All-Americans and displays which first-team designations they received. \n\nEnds.-Frank McPhee, Princeton\n- Bernie Flowers, Purdue\n- Eddie Bell, Pennsylvania\n- Tom Stolhandske, Texas\n- Tom Scott, Virginia\n- Joe Collier, Northwestern\n- Don Branby, Colorado\n- Buck Martin, Georgia Tech\n- Steve Mellinger, Kentucky\n- Ed Luke, Michigan State\n- Harry Babcock, Georgia\n', 'he

 50%|█████     | 25/50 [08:20<10:19, 24.79s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Frank McPhee, Bernie Flowers[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How long had the Associated Press been in operation when they selected Don Branby for the All-America Teams?[0m[36;1m[1;3m['he was selected by the Associated Press as a first-team player on its 1952 College Football All-America Teams.', 'The Associated Press (AP) is a U.S.-based not-for-profit news agency headquartered in New York City.Founded in 1846', 'The Associated Press (AP) is a U.S.-based not-for-profit news agency headquartered in New York City.Founded in 1846, it operates as a cooperative, unincorporated association.Its members are U.S. newspapers and broadcasters.\n\n', 'The following chart identifies the NCAA-recognized consensus All-Americans and displays which first-team designations they received. \n\nEnds.-Frank McPhee, 

 52%|█████▏    | 26/50 [08:44<09:50, 24.61s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 106 years[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How long had the Perthshire Regiment been a going concern for when Harison joined it as a captain?[0m[36;1m[1;3m['He first arrived in the Cape of Good Hope in 1849 as a captain in the Perthshire Regiment', 'The 73rd Regiment of Foot was an infantry regiment of the British Army, raised in 1780.Under the Childers Reforms it amalgamated with the 42nd Regiment of Foot to form the Black Watch (Royal Highlanders) in 1881.\n\nFormation.', 'he was left out of both the final team and even the squad of 40 players for the official All-Australian award and was identified by Fox Footy and the Herald Sun as one of the most notable omissions.', 'They were invited to take part in the 2008 Standford 20/20, playing one match in a preliminary round against Montserrat.', "H

 54%|█████▍    | 27/50 [09:10<09:39, 25.18s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 1780[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: When was the company for which Tookey called a protest for founded?[0m[36;1m[1;3m['Tookey called for readers to boycott the products of distributor Sony,', 'Tookey called for readers to boycott the products of distributor Sony, and questioned the suitability to the role of the then director of the British Board of Film Classification.', 'Tookey denied charges that he was in favour of blanket censorship, writing in a letter to The Guardian that he has "frequently written and spoken against censorship and jumped to the defence of films (such as Pulp Fiction, Reservoir Dogs and The Life Of Brian)', 'The 1948 World Leprosy Congress in Havana endorsed the Latin-American strategy', 'he was selected by the Associated Press as a first-team player on its 1952 College 

 56%|█████▌    | 28/50 [09:35<09:08, 24.92s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Sony founded in 1946[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Of the films Tookey claimed to have defended against censorship, which one had the highest box office?[0m[36;1m[1;3m['Tookey denied charges that he was in favour of blanket censorship, writing in a letter to The Guardian that he has "frequently written and spoken against censorship and jumped to the defence of films (such as Pulp Fiction, Reservoir Dogs and The Life Of Brian)', 'Tookey called for readers to boycott the products of distributor Sony, and questioned the suitability to the role of the then director of the British Board of Film Classification.', 'Tookey called for readers to boycott the products of distributor Sony,', 'Life of Brian opened on 17 August 1979 in five North American theatres and grossed US$140,034 ($28,007 per screen)

 58%|█████▊    | 29/50 [09:58<08:32, 24.40s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Life of Brian ($19,398,164)[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Who is the current director of the board which Tookey questioned the suitability?[0m[36;1m[1;3m['Tookey called for readers to boycott the products of distributor Sony, and questioned the suitability to the role of the then director of the British Board of Film Classification.', 'Tookey denied charges that he was in favour of blanket censorship, writing in a letter to The Guardian that he has "frequently written and spoken against censorship and jumped to the defence of films (such as Pulp Fiction, Reservoir Dogs and The Life Of Brian)', 'Tookey called for readers to boycott the products of distributor Sony,', 'David Rodney Lewis (born July 3, 1953) is a Canadian former National Hockey League (NHL) defenceman and coach.He was most recent

 60%|██████    | 30/50 [10:23<08:12, 24.61s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: None mentioned in text.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which team won the first game in the 2004 National League Division Series?[0m[36;1m[1;3m['The 2004 National League Division Series (NLDS), the opening round of the 2004 National League playoffs, began on Tuesday, October 5, and ended on Monday, October 11, with the champions of the three NL divisions—along with a "wild card" team—participating in two best-of-five series.They were:\n\n- (1) St. Louis Cardinals (Central Division champion, 105–57) vs. (3) Los Angeles Dodgers (Western Division champion, 93–69):Cardinals win series, 3–1.\n- (2) Atlanta Braves (Eastern Division champion, 96–66) vs. (4)', 'the USA recorded a memorable pair of 5–2 victories in Montreal to win the series.', 'The first World Cup of Hockey (WCH), or 1996 World Cup of 

 62%|██████▏   | 31/50 [10:48<07:49, 24.71s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: St. Louis Cardinals[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How old was the Grand Olympic Auditorium at the time of New Regime playing a landmark concert there?[0m[36;1m[1;3m['That landmark concert was held at the Grand Olympic Auditorium on April 13, 1984.', 'The Grand Olympic Auditorium is a former sports venue in southern Downtown Los Angeles, California.The venue was built in 1924 at 1801 South Grand Avenue, now just south of the Santa Monica Freeway.The grand opening of the Olympic Auditorium was on August 5, 1925, and was a major media event, attended by such celebrities as Jack Dempsey and Rudolph Valentino.', 'The 1936 Summer Olympics (German: Olympische Sommerspiele 1936), officially known as the Games of the XI Olympiad, was an international multi-sport event held in 1936 in Berlin, Nazi Germa

 64%|██████▍   | 32/50 [11:15<07:38, 25.45s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 60 years old[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Who owned the Rochester Rhinos when Glenn left the team?[0m[36;1m[1;3m['Glenn had intended to continue playing for the Rhinos, but had contract disagreements with them and left the team in May 2001.In June, he signed with the Atlanta Silverbacks where he finished his career.', 'The Rochester Rhinos are an American professional soccer team based in Rochester, New York, United States.Founded in 1996, as the Rochester Raging Rhinos, the team formerly played in the United Soccer League in the second tier of the United States soccer league system, and is currently on hiatus.They changed their name to Rochester Rhinos to start the 2008 season.', 'The Rochester Rhinos are an American professional soccer team based in Rochester, New York, United States', 'He 

 66%|██████▌   | 33/50 [11:44<07:29, 26.45s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Steve Holroyd owned the team.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Who owned the last team Glenn ever played for when he first signed with them?[0m[36;1m[1;3m['Glenn had intended to continue playing for the Rhinos, but had contract disagreements with them and left the team in May 2001.In June, he signed with the Atlanta Silverbacks where he finished his career.', 'Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major League Baseball who had an 11-year career (1941, 1946–1955).He played for the Chicago Cubs and Philadelphia Phillies in the National League and for the Baltimore Orioles of the American League.He was elected to the National League All-Star team twice (1948 and 1949).\n\n', 'He had an unspectacular professional baseball career 

 68%|██████▊   | 34/50 [12:09<06:55, 25.96s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Atlanta Silverbacks[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: What state did Glenn finish his career in?[0m[36;1m[1;3m['Glenn had intended to continue playing for the Rhinos, but had contract disagreements with them and left the team in May 2001.In June, he signed with the Atlanta Silverbacks where he finished his career.', 'In June, he signed with the Atlanta Silverbacks where he finished his career.', 'He had an unspectacular professional baseball career for the Birmingham Barons, a Chicago White Sox farm team', 'Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major League Baseball who had an 11-year career (1941, 1946–1955).He played for the Chicago Cubs and Philadelphia Phillies in the National League and for the Baltimore Orioles of the Am

 70%|███████   | 35/50 [12:33<06:23, 25.54s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Georgia[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How many championship appearances has the team where Ashton played junior hockey had?[0m[36;1m[1;3m['Ashton played junior hockey for his hometown Saskatoon Blades of the Western Canadian Hockey League (now WHL), playing his first full season in 1972-73 with future NHL and WHA players Bob Bourne, Dave Lewis and George Pesut,', 'Ashton played junior hockey for his hometown Saskatoon Blades', 'Ashton played junior hockey for his hometown Saskatoon Blades', 'he was left out of both the final team and even the squad of 40 players for the official All-Australian award and was identified by Fox Footy and the Herald Sun as one of the most notable omissions.', 'his 200th AFL match in the season opening match against Carlton at the MCG in late March.Late in the thir

 72%|███████▏  | 36/50 [13:02<06:09, 26.41s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Saskatoon Blades had 2 championships.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: In what year was the team founded who Ashton played junior hockey for?[0m[36;1m[1;3m['Ashton played junior hockey for his hometown Saskatoon Blades of the Western Canadian Hockey League (now WHL), playing his first full season in 1972-73 with future NHL and WHA players Bob Bourne, Dave Lewis and George Pesut,', 'Ashton played junior hockey for his hometown Saskatoon Blades', 'Ashton played junior hockey for his hometown Saskatoon Blades', 'he was selected by the Associated Press as a first-team player on its 1952 College Football All-America Teams.', 'At the end of the season, he was selected by the Associated Press as a first-team player on its 1952 College Football All-America Teams.'][0m

 74%|███████▍  | 37/50 [13:22<05:19, 24.58s/it]

[32;1m[1;3mThought: Do I need to use a tool? No
Final Answer: 1966[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Who is the oldest teammate of Ashton's when he played junior hockey?[0m[36;1m[1;3m['Ashton played junior hockey for his hometown Saskatoon Blades of the Western Canadian Hockey League (now WHL), playing his first full season in 1972-73 with future NHL and WHA players Bob Bourne, Dave Lewis and George Pesut,', 'Ashton played junior hockey for his hometown Saskatoon Blades', 'Ashton played junior hockey for his hometown Saskatoon Blades', 'Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major League Baseball who had an 11-year career (1941, 1946–1955).He played for the Chicago Cubs and Philadelphia Phillies in the National League and for the Baltimore Orioles of the American League.He was el

 76%|███████▌  | 38/50 [13:44<04:44, 23.71s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Bob Bourne[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Had the Chicago White Sox been around longer than the Birminham Barons when he was assigned to the White Sox's minor league system?[0m[36;1m[1;3m["Jordan then further surprised the sports world by signing a minor league baseball contract with the Chicago White Sox.He reported to spring training and was assigned to the team's minor league system on March 31, 1994", "The Chicago White Sox are an American professional baseball team based in Chicago, Illinois.They compete in Major League Baseball (MLB) as a member club of the American League (AL) Central division.The White Sox are owned by Jerry Reinsdorf, and play their home games at Guaranteed Rate Field, located on the city's South Side.", 'Jordan then further surprised the sports world by signing a mino

 78%|███████▊  | 39/50 [14:10<04:30, 24.56s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: White Sox older than Barons[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Did the Bulls owner also own the Chicago White Sox?[0m[36;1m[1;3m["Jerry M. Reinsdorf (born February 25, 1936) is a CPA, lawyer and an owner of the NBA's Chicago Bulls and the MLB's Chicago White Sox.He started his professional life as a tax attorney with the Internal Revenue Service.He has been the head of the White Sox and Bulls for over 25 years.\n\n", "Jerry M. Reinsdorf (born February 25, 1936) is a CPA, lawyer and an owner of the NBA's Chicago Bulls and the MLB's Chicago White Sox.", "The Chicago White Sox are an American professional baseball team based in Chicago, Illinois.They compete in Major League Baseball (MLB) as a member club of the American League (AL) Central division.The White Sox are owned by Jerry Reinsdorf, and play

 80%|████████  | 40/50 [14:36<04:10, 25.01s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Yes, Jerry Reinsdorf owns both.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: What stadium do the Chicago White Sox play in?[0m[36;1m[1;3m["The Chicago White Sox are an American professional baseball team based in Chicago, Illinois.They compete in Major League Baseball (MLB) as a member club of the American League (AL) Central division.The White Sox are owned by Jerry Reinsdorf, and play their home games at Guaranteed Rate Field, located on the city's South Side.", "White Hart Lane was a football stadium in Tottenham, North London and the home of Tottenham Hotspur Football Club from 1899 to 2017.Its capacity varied over the years; when changed to all-seater it had a capacity of 36,284 before demolition.The stadium was fully demolished after the end of the 2016–17 season and was replaced by Tottenham Hotspur S

 82%|████████▏ | 41/50 [15:02<03:46, 25.22s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Guaranteed Rate Field[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How old was the person who shot Waitkus the year he was shot?[0m[36;1m[1;3m['Ruth Ann Steinhagen (December 23, 1929 – December 29, 2012) was an American woman who shot and nearly killed Eddie Waitkus, star first baseman of the Philadelphia Phillies, on June 14, 1949 in one of the first instances of what later became known as stalker crimes.The incident is one of the inspirations for the 1952 baseball book The Natural, made into a film in 1984.\n\nEarly life.', 'Although Waitkus would miss much of the 1949 season after being shot by an obsessed fan', 'Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major League Baseball who had an 11-year career (1941, 1946–1955).He played for the C

 84%|████████▍ | 42/50 [15:32<03:33, 26.65s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 19[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How long had Eddie Waitkus been playing professionally when he was traded to the Phils?[0m[36;1m[1;3m['Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major League Baseball who had an 11-year career (1941, 1946–1955).He played for the Chicago Cubs and Philadelphia Phillies in the National League and for the Baltimore Orioles of the American League.He was elected to the National League All-Star team twice (1948 and 1949).\n\n', 'After the 1948 campaign, Dubiel was traded to the Cubs in a four-player swap that brought first baseman Eddie Waitkus to the Phils.', 'Although Waitkus would miss much of the 1949 season after being shot by an obsessed fan', "He was fully fit for the club's season opening matc

 86%|████████▌ | 43/50 [16:07<03:23, 29.05s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 11 years[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: In what race was Sohn Kee-chung the gold medalist?[0m[36;1m[1;3m['Sohn Kee-chung (; ; August 29, 1912 – November 15, 2002) was a Korean athlete and long-distance runner.He became the first Korean to win a medal at the Olympic Games, winning gold in the marathon at the 1936 Berlin Olympics as a member of the Japanese delegation.\n\nSohn competed under the Japanese name Son Kitei, as Korea was under the colonial rule of the Japanese Empire during his career.\n', 'Sohn, competing for the Empire of Japan, won the gold medal at the 1936 Summer Olympics in the marathon.He ran the 42.195 km course in 2:29:19.2', 'The 1936 Summer Olympics saw two Koreans win Olympic medals: Sohn Kee-chung took the gold while Nam Sung-yong was the bronze medallist.', 'Nam Sung-yon

 88%|████████▊ | 44/50 [16:24<02:32, 25.44s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Marathon[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: How much behind did the bronze medalist from the gold medalist?[0m[36;1m[1;3m['The 1936 Summer Olympics saw two Koreans win Olympic medals: Sohn Kee-chung took the gold while Nam Sung-yong was the bronze medallist.', 'Nam Sung-yong (November 23, 1912 – February 20, 2001) was a Korean Olympian who won a bronze medal in the marathon at the 1936 Summer Olympics, completing the run in 2 hours, 31 minutes, and 42 seconds.\n\nHe was born in Junten (Suncheon), Japanese Korea and received higher education in Japan.Like the gold medalist Sohn Kee-chung, Nam Sung-yong used the Japanese pronunciation of his name, Nan Shōryū as Korea was then a part of the Japanese Empire.\n\n', 'Nam Sung-yong (November 23, 1912 – February 20, 2001) was a Korean Olympian who won a br

 90%|█████████ | 45/50 [16:50<02:08, 25.80s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 3 min 42 sec[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: In what year was Japan defeated in World War II?[0m[36;1m[1;3m[' On 15 August 1945, Japan surrendered, with the surrender documents finally signed at Tokyo Bay on the deck of the American battleship USS Missouri on 2 September 1945, ending the war.', 'When Japan was defeated in World War II', "World War\xa0II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.The vast majority of the world's countries—including all the great powers—eventually formed two opposing military alliances: the Allies and the Axis.A state of total war emerged, directly involving more than 100 million people from more than 30 countries.", 'Sohn, competing for the Empire of Japan, won the gold medal at the 1936 

 92%|█████████▏| 46/50 [17:15<01:41, 25.48s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 1945[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: What year did Suh Yun-bok win the Boston Marathon?[0m[36;1m[1;3m['Suh Yun-bok (9 January 1923 – 27 June 2017) was a South Korean marathoner, who is best known as the winner of the 1947 Boston Marathon. \n\nHe won the race with a world best time of 2:25:39 under the coach Sohn Kee-chung, the Korean winner of the marathon at the 1936 Berlin Olympics.His participation in the Boston Marathon was financed by donations from servicemen in the United States Forces Korea.', 'Suh Yun-bok, a sports coach at Anyang Technical College, won the first race and later took victory at the Boston Marathon.', "The Boston Marathon is an annual marathon race hosted by several cities in greater Boston in eastern Massachusetts, United States.It is always held on Patriots' Day, the th

 94%|█████████▍| 47/50 [17:42<01:18, 26.05s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: 1947[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Was Eduard van Beinum still alive when Gilmore retired?[0m[36;1m[1;3m['Eduard van Beinum (; 3 September 1901 – 13 April 1959, Amsterdam)', 'Eduard van Beinum (; 3 September 1901 – 13 April 1959, Amsterdam) was a Dutch conductor.\n\nBiography.Van Beinum was born in Arnhem, Netherlands, where he received his first violin and piano lessons at an early age.', 'Edward Stephen Waitkus (September 4, 1919\xa0– September 16, 1972) was a Lithuanian American first baseman in Major League Baseball who had an 11-year career (1941, 1946–1955).He played for the Chicago Cubs and Philadelphia Phillies in the National League and for the Baltimore Orioles of the American League.He was elected to the National League All-Star team twice (1948 and 1949).\n\n', 'Ruth Ann Steinhagen

 96%|█████████▌| 48/50 [18:07<00:50, 25.50s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: No, he was not alive.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which of the battles Sherman fought during a winter campaign had higher Union casualty rate?[0m[36;1m[1;3m['Sherman returned to serve under General Ulysses S. Grant in the winter of 1862 during the battles of forts Henry and Donelson.', "In the Battle of Lützen (German: Schlacht von Großgörschen, 2 May 1813), Napoleon I of France halted the advances of the Sixth Coalition after the French invasion of Russia and the massive French losses in the campaign.The Russian commander, Prince Peter Wittgenstein, attempting to forestall Napoleon's capture of Leipzig, attacked the isolated French right wing near Lützen, Germany.After a day of heavy fighting, the combined Prussian and Russian force retreated; due to French losses and a shortage of Fren", '

 98%|█████████▊| 49/50 [18:38<00:27, 27.09s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: Fort Donelson had higher Union casualty rate.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: get_top5_rerank
Action Input: Which country did some of the tribes escaped to after their deportation from northern Mexico?[0m[36;1m[1;3m['Those who escaped deportation were dispersed across northern Mexico and some even settled across the international border in southern Arizona', 'In the Battle of Bautzen (20–21 May 1813) a combined Russian–Prussian army, that was massively outnumbered, was pushed back by Napoleon I of France but escaped destruction, some sources claiming that Michel Ney failed to block their retreat.The Prussians under Count Gebhard von Blücher and Russians under Prince Peter Wittgenstein, retreating after their defeat at Lützen were attacked by French forces under Napoleon.\n\nPrelude.', "In the Battle of Lützen (German: Schlac

100%|██████████| 50/50 [19:06<00:00, 22.93s/it]

[32;1m[1;3mDo I need to use a tool? No
Final Answer: United States[0m

[1m> Finished chain.[0m





In [33]:
print_results(df, "results/react_top5_rerank.csv")

Unnamed: 0,question,answer,LLM answer,F1,Exact Match
0,What is Zeus know for in Greek mythology?,sky and thunder god,king of gods in greek mythology,0.0,0
1,How long had the First World War been over whe...,5,5 years,0.666667,0
2,How old was Messe when the First World War sta...,30,31,0.0,0
3,How long had Angela Scoular been acting profes...,2,14 years,0.0,0
4,What is the capacity of the stadium where Brun...,26688,mcg has 100024 capacity,0.0,0
5,Which stadium where Brunt played can hold more...,white hart lane,white hart lane,1.0,1
6,In which country was Wilhelm Müller born?,germany,germany,1.0,1
7,Which battle Wilhelm Müller fought in while in...,kulm,battle of kulm,0.5,0
8,How much time had passed between Wilhelm's ret...,9,9 years,0.666667,0
9,Which of Wilhelm's direct male descendants liv...,friedrich max müller,wilhelm max müller 1919,0.571429,0



Metrics:
F1 score: Avg: 0.44. Std: 0.43
Exact Match score: Avg: 0.30. Std: 0.46


In [34]:
# Add contexts for RAGAS
df['contexts'] = contexts

In [35]:
df

Unnamed: 0,question,answer,LLM answer,F1,Exact Match,contexts
0,What is Zeus know for in Greek mythology?,sky and thunder god,king of gods in greek mythology,0.0,0,"[Zeus (British English , North American Englis..."
1,How long had the First World War been over whe...,5,5 years,0.666667,0,[he became aide-de-camp to King Victor Emmanue...
2,How old was Messe when the First World War sta...,30,31,0.0,0,[World War I (often abbreviated as WWI or WW1)...
3,How long had Angela Scoular been acting profes...,2,14 years,0.0,0,[Angela Scoular appeared as Ruby Bartlett in O...
4,What is the capacity of the stadium where Brun...,26688,mcg has 100024 capacity,0.0,0,[Brunt returned to first-team action after eig...
5,Which stadium where Brunt played can hold more...,white hart lane,white hart lane,1.0,1,[White Hart Lane was a football stadium in Tot...
6,In which country was Wilhelm Müller born?,germany,germany,1.0,1,[Wilhelm Max Müller (15 May 1862 – July 1919) ...
7,Which battle Wilhelm Müller fought in while in...,kulm,battle of kulm,0.5,0,[In the Battle of Bautzen (20–21 May 1813) a c...
8,How much time had passed between Wilhelm's ret...,9,9 years,0.666667,0,[In 1814 he returned to his studies at Berlin....
9,Which of Wilhelm's direct male descendants liv...,friedrich max müller,wilhelm max müller 1919,0.571429,0,"[his grandson Wilhelm Max Müller, Wilhelm Max ..."


## Evaluation using RAGAS

In [37]:
# Data necessary for RAGAS evaluation
questions = df['question']
responses = df['LLM answer']
contexts = df['contexts']
ground_truths = df['answer']

In [39]:
# Code adapted from https://arize.com/blog/ragas-how-to-evaluate-rag-pipeline-phoenix
from datasets import Dataset

def generate_ragas_dataset(questions, responses, contexts, ground_truths):

    dataset_dict = {
        "question": questions, # From the dataset
        "answer": responses, # From Llama3
        "contexts": contexts,
        "ground_truth": ground_truths,
    }
    ds = Dataset.from_dict(dataset_dict)
    return ds

ragas_dataset = generate_ragas_dataset(questions, responses, contexts, ground_truths)

In [41]:
from ragas import evaluate
evaluation_result = evaluate(
    dataset=ragas_dataset,
    metrics=[answer_relevancy, faithfulness, context_recall, context_precision],
)
df_ragas = pd.DataFrame(evaluation_result.scores)

ValidationError: 1 validation error for ChatOpenAI
__root__
  Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error)