# GAIA

[GAIA](https://arxiv.org/abs/2311.12983) is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc).

The dataset is made of non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. 

Firstly, initialize a LLM needed for the WebBrowsing tool and for checking if the final answer is the correct one.

In [1]:
import instructor
from openai import OpenAI


class LLMClient:
    def __init__(self, openai_key, model):
        self.client = instructor.from_openai(
            OpenAI(api_key=openai_key), temperature=0, seed=0
        )
        self.model = model

    def invoke(self, response_model, system_prompt):
        prompt = [{"role": "system", "content": system_prompt}]
        return self.client.chat.completions.create(
            model=self.model,
            response_model=response_model,
            messages=prompt,
            max_retries=5,
        )


llm = LLMClient(OPENAI_KEY, model="gpt-4o")

Load GAIA questions and select a level - e.g., Level1

In [2]:
from agentquest.utils import load_data

game = load_data("gaia", "level1")["8e867cd7-cff9-4e6c-867a-ff5ddc2550be"]

Initialize the Gaia driver providing the LLM and get the first observation.

In [3]:
from agentquest.drivers.gaia import GaiaDriver

# Initialize the driver
driver = GaiaDriver(game=game, llm=llm)
# Get the first observation
obs = driver.reset()
f"OBSERVATION: {obs.output}"

'OBSERVATION: How many studio albums were published by Mercedes Sosa between 2000 and 2009 (included)? You can use the latest 2022 version of english wikipedia.'

You can use the `WebBrowser` tool to run an online search. The tool supports the following actions:
- `Search`: use DuckDuckGo to retrieve the first 5 links related to a search query
- `WebRead`: scrape a web page providing a URL and rely on the LLM to extract the text related to a search query
- `Download`: download a (typically) .pdf in `/tmp/` file from a URL.

Start online searching 'Mercedes Sosa discography'

In [9]:
from agentquest.utils import Action
from agentquest.tools.browsing import OnlineSearch
from pprint import pprint

action = OnlineSearch(search_query="Mercedes Sosa discography")
action = Action(action_value=action)
obs = driver.step(action)
print(obs.output)


Title: Mercedes Sosa Discography | Discogs
Link: https://www.discogs.com/artist/333361-Mercedes-Sosa
Snippet: Mercedes Sosa. Mercedes Sosa, known as La Negra, (born July 9, 1935 in San Miguel de Tucuman, Argentina - Death October 4, 2009 in Buenos Aires) was an Argentine singer who was and remains immensely popular throughout Latin America and internationally. With her roots in Argentine folk music, in 1950, at age fifteen, she won a singing ...

Title: Mercedes Sosa - Wikipedia
Link: https://en.wikipedia.org/wiki/Mercedes_Sosa
Snippet: Haydée Mercedes Sosa (Latin American Spanish: [meɾˈseðes ˈsosa]; 9 July 1935 - 4 October 2009), sometimes known as La Negra (lit. ' The Black ', an affectionate nickname for people with a darker complexion in Argentina), was an Argentine singer who was popular throughout Latin America and many countries outside the region. With her roots in Argentine folk music, Sosa became one of the ...

Title: Mercedes Sosa Songs, Albums, Reviews, Bio & Mo... | AllM

Try to find the answer to the question by inspecting the first link and providing a query.

In [8]:
from agentquest.tools.browsing import WebBrowse

action = WebBrowse(
    page_url="https://www.discogs.com/artist/333361-Mercedes-Sosa",
    web_query="Mercedes Sosa discography between 2000 and 2009 (included)",
)
action = Action(action_value=action)
obs = driver.step(action)
print(obs.output)

Extracted text: Between 2000 and 2009, Mercedes Sosa released the following albums: Misa Criolla (2000), Acústico (2002), Corazón Libre (2005), Cantora 1 (2009), and Cantora 2 (2009).
Useful links:
	- Misa Criolla (2000) - Mercedes Sosa: https://www.discogs.com/release/1234567-Mercedes-Sosa-Misa-Criolla
	- Acústico (2002) - Mercedes Sosa: https://www.discogs.com/release/2345678-Mercedes-Sosa-Ac%C3%BAstico
	- Corazón Libre (2005) - Mercedes Sosa: https://www.discogs.com/release/3456789-Mercedes-Sosa-Coraz%C3%B3n-Libre
	- Cantora 1 (2009) - Mercedes Sosa: https://www.discogs.com/release/4567890-Mercedes-Sosa-Cantora-1
	- Cantora 2 (2009) - Mercedes Sosa: https://www.discogs.com/release/5678901-Mercedes-Sosa-Cantora-2



Provide the final answer.

In [7]:
from agentquest.tools.questions import FinalAnswer

action = FinalAnswer(answer="three albums")
action = Action(action_value=action)
obs = driver.step(action)
print(obs.output)

You won!


Use the `Download` action to download a .pdf file locally.

In [None]:
from agentquest.tools.browsing import FileDownload

action = FileDownload(
    file_url="https://journals.le.ac.uk/ojs1/index.php/jist/article/download/733/684",
)
action = Action(action_value=action)
obs = driver.step(action)
pprint(obs.output)

'Successfully downloaded file 684.pdf'


Use the `FileReader` tool to read a .pdf file and rely on the LLM to extract the first-3-pages text related to a search query.

In [None]:
from pprint import pprint
from agentquest.tools.files import FileReader

action = FileReader(file_name="684.pdf", file_query="Volume of fishbag in m^3")
action = Action(action_value=action)
obs = driver.step(action)
pprint(obs.output)

('The fish bag is modeled as a cylinder. Hiccup, who is 1.625 meters tall, is '
 "used as a reference to estimate the bag's height, which is approximately 3/8 "
 'of his height, resulting in a bag height of 0.6094 meters. The volume of the '
 'bag is calculated using the formula for the volume of a cylinder (V = πr²h). '
 'With a radius of 0.3047 meters and a height of 0.6094 meters, the volume of '
 'the bag is approximately 0.1777 cubic meters.')
