GAIA Agent

AI agent for the GAIA benchmark using Hugging Face Inference API (Llama-3.1-70B) with automatic Groq fallback (Llama-3.3-70B) and multi-tool capabilities including web search, file processing, and mathematical calculations.

Live Demo: huggingface.co/spaces/hasancoded/gaia-agent

Features

Answer GAIA benchmark questions using Hugging Face Inference API (Llama-3.1-70B)
Automatic fallback to Groq API (Llama-3.3-70B) for reliability
Web search integration via Tavily API
File reading and processing (Excel, CSV, text files)
Mathematical calculations
Gradio web interface for testing and submission generation
Detailed reasoning traces for transparency
JSONL submission file generation

Tech Stack

Python 3.8+
Hugging Face Inference API - Primary LLM inference (Llama-3.1-70B)
Groq - Fallback LLM inference (Llama-3.3-70B)
Gradio - Web interface
Tavily - Web search
Pandas - Data processing
Requests - HTTP client

Architecture

The system follows a modular architecture with clear separation between presentation, orchestration, and service layers.

flowchart LR
    %% Client Layer
    UI["Gradio Interface<br/><i>app.py</i>"]

    %% Orchestration Layer
    Agent["GAIA Agent<br/><i>agent.py</i>"]
    Client["API Client<br/><i>gaia_client.py</i>"]

    %% Tool Layer
    Search["Web Search<br/><i>Tavily</i>"]
    FileReader["File Reader<br/><i>Excel/CSV/Text</i>"]
    Calculator["Calculator<br/><i>Math Eval</i>"]

    %% External Services
    HF["HF Inference API<br/><i>Llama-3.1-70B</i>"]
    Groq["Groq API<br/><i>Llama-3.3-70B</i>"]
    GAIA["GAIA Benchmark<br/><i>Questions & Eval</i>"]
    Tavily["Tavily API<br/><i>Search Engine</i>"]

    %% Primary Flow
    UI -->|User Query| Agent
    Agent -->|LLM Request| HF
    HF -->|Response| Agent
    Agent -.->|Fallback on Error| Groq
    Groq -.->|Response| Agent
    Agent -->|Answer| UI

    %% Tool Orchestration
    Agent -.->|Invoke| Search
    Agent -.->|Invoke| FileReader
    Agent -.->|Invoke| Calculator

    %% Tool-Service Connections
    Search -->|Query| Tavily
    Tavily -->|Results| Search

    FileReader -->|Download| GAIA
    GAIA -->|File Data| FileReader

    %% API Client Flow
    UI -->|Fetch/Submit| Client
    Client <-->|HTTP| GAIA

    %% Styling
    classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000,font-weight:bold
    classDef orchestrationStyle fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000,font-weight:bold
    classDef toolStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef externalStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000

    class UI clientStyle
    class Agent,Client orchestrationStyle
    class Search,FileReader,Calculator toolStyle
    class HF,Groq,GAIA,Tavily externalStyle

Component Overview

Client Layer

Gradio interface for user interaction, testing, and submission generation
Handles UI rendering, form inputs, and file downloads

Orchestration Layer

GAIA Agent (agent.py): Core reasoning engine that coordinates tool usage and generates answers
GAIA API Client (gaia_client.py): Manages communication with the GAIA benchmark API

Tool Layer

Web Search (tools.py): Tavily-powered search for real-time information retrieval
File Reader (tools.py): Downloads and processes files (Excel, CSV, text) from GAIA API
Calculator (tools.py): Safe mathematical expression evaluation

External Services

Hugging Face Inference API: Primary LLM inference using Llama-3.1-70B
Groq API: Fallback LLM inference using Llama-3.3-70B
GAIA Benchmark API: Question retrieval and answer submission
Tavily Search API: Web search capabilities

Data Flow

User submits query via Gradio interface
Agent analyzes question and determines required tools
Tools fetch external data (web search, files)
Agent sends context to HF Inference API
LLM generates reasoning and answer
Response returned to user with full reasoning trace

Prerequisites

Python 3.8 or higher
pip package manager
Hugging Face API token
Groq API key (optional, for fallback)
Tavily API key
GAIA API access

Installation

1. Clone Repository

git clone https://github.com/hasancoded/gaia-agent.git
cd gaia-agent

2. Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

cp .env.example .env

Edit .env and add your API credentials:

HF_API_TOKEN=your_huggingface_token
GROQ_API_KEY=your_groq_key  # Optional: for automatic fallback
TAVILY_API_KEY=your_tavily_key
GAIA_API_URL=https://agents-course-unit4-scoring.hf.space

Get API Keys:

HF_API_TOKEN: Hugging Face Settings
GROQ_API_KEY: Groq Console (optional)
TAVILY_API_KEY: Tavily Dashboard

Usage

Start Application

python app.py

Access the interface at http://localhost:7860

Test Agent

Navigate to the Test Agent tab
Click "Test on Random Question"
Review answer and reasoning trace

Generate Submission

Navigate to the Generate Submission tab
Click "Generate Submission File"
Download the generated .jsonl file
Submit to GAIA Leaderboard

Project Structure

gaia-agent/
├── agent.py            # Core GAIA agent implementation
├── app.py              # Gradio web interface
├── gaia_client.py      # GAIA API client
├── tools.py            # Agent tools (search, file reader, calculator)
├── requirements.txt    # Python dependencies
├── .env.example        # Environment template
├── .gitignore          # Git ignore rules
├── LICENSE             # MIT License
└── README.md           # This file

API Reference

GAIAAgent

Main agent class for answering GAIA benchmark questions.

from agent import GAIAAgent

agent = GAIAAgent(tools={
    "search": search_tool,
    "file_reader": file_reader_tool,
    "calculator": calculator_tool
})

answer, reasoning = agent.answer_question(question_text, task_id)

Tools

WebSearchTool: Tavily-powered web search

from tools import WebSearchTool
search_tool = WebSearchTool(api_key=tavily_key)
results = search_tool.search(query)

FileReaderTool: Download and process files

from tools import FileReaderTool
file_tool = FileReaderTool(api_url=gaia_url)
content = file_tool.read_file(task_id)

CalculatorTool: Safe mathematical calculations

from tools import CalculatorTool
calc_tool = CalculatorTool()
result = calc_tool.calculate(expression)

Configuration

Environment Variables

Variable	Required	Description	Get It
`HF_API_TOKEN`	Yes	Hugging Face API token	Get Token
`GROQ_API_KEY`	No	Groq API key (fallback)	Get Key
`TAVILY_API_KEY`	Yes	Tavily search API key	Get Key
`GAIA_API_URL`	Yes	GAIA benchmark API URL	Provided by organizers

Model Selection

Edit agent.py to change the models:

# Current configuration:
self.model_name = "meta-llama/Llama-3.1-70B-Instruct"  # Primary (HF)
self.groq_model = "llama-3.3-70b-versatile"            # Fallback (Groq)

# Other available HF models:
# - moonshotai/Kimi-K2-Instruct-0905 (excellent reasoning)
# - Qwen/Qwen2.5-72B-Instruct (complex tasks)
# - meta-llama/Llama-3.1-8B-Instruct (smaller, faster)

Troubleshooting

API Token Not Found

Ensure .env file exists and contains valid tokens. Restart the application after editing .env.

Module Import Errors

Verify virtual environment is activated and dependencies are installed:

pip install -r requirements.txt

Connection Errors

Check internet connection and verify API URLs are accessible.

Contributing

Contributions are welcome. Please follow these guidelines:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

GAIA Benchmark team
Hugging Face for Inference API
Tavily for web search
Gradio for web interface

Support

For issues related to:

GAIA Benchmark: Contact GAIA organizers
Hugging Face API: Check HF documentation
Tavily API: Visit Tavily docs

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
app.py		app.py
gaia_client.py		gaia_client.py
requirements.txt		requirements.txt
tools.py		tools.py

Folders and files

Latest commit

History

Repository files navigation

GAIA Agent

Features

Tech Stack

Architecture

Component Overview

Client Layer

Orchestration Layer

Tool Layer

External Services

Data Flow

Prerequisites

Installation

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

4. Configure Environment

Usage

Start Application

Test Agent

Generate Submission

Project Structure

API Reference

GAIAAgent

Tools

Configuration

Environment Variables

Model Selection

Troubleshooting

API Token Not Found

Module Import Errors

Connection Errors

Contributing

License

Acknowledgments

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages