Skip to content

hasancoded/gaia-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GAIA Agent

HF Space Python 3.8+ License: MIT

AI agent for the GAIA benchmark using Hugging Face Inference API (Llama-3.1-70B) with automatic Groq fallback (Llama-3.3-70B) and multi-tool capabilities including web search, file processing, and mathematical calculations.

Live Demo: huggingface.co/spaces/hasancoded/gaia-agent

Features

  • Answer GAIA benchmark questions using Hugging Face Inference API (Llama-3.1-70B)
  • Automatic fallback to Groq API (Llama-3.3-70B) for reliability
  • Web search integration via Tavily API
  • File reading and processing (Excel, CSV, text files)
  • Mathematical calculations
  • Gradio web interface for testing and submission generation
  • Detailed reasoning traces for transparency
  • JSONL submission file generation

Tech Stack

Architecture

The system follows a modular architecture with clear separation between presentation, orchestration, and service layers.

flowchart LR
    %% Client Layer
    UI["Gradio Interface<br/><i>app.py</i>"]

    %% Orchestration Layer
    Agent["GAIA Agent<br/><i>agent.py</i>"]
    Client["API Client<br/><i>gaia_client.py</i>"]

    %% Tool Layer
    Search["Web Search<br/><i>Tavily</i>"]
    FileReader["File Reader<br/><i>Excel/CSV/Text</i>"]
    Calculator["Calculator<br/><i>Math Eval</i>"]

    %% External Services
    HF["HF Inference API<br/><i>Llama-3.1-70B</i>"]
    Groq["Groq API<br/><i>Llama-3.3-70B</i>"]
    GAIA["GAIA Benchmark<br/><i>Questions & Eval</i>"]
    Tavily["Tavily API<br/><i>Search Engine</i>"]

    %% Primary Flow
    UI -->|User Query| Agent
    Agent -->|LLM Request| HF
    HF -->|Response| Agent
    Agent -.->|Fallback on Error| Groq
    Groq -.->|Response| Agent
    Agent -->|Answer| UI

    %% Tool Orchestration
    Agent -.->|Invoke| Search
    Agent -.->|Invoke| FileReader
    Agent -.->|Invoke| Calculator

    %% Tool-Service Connections
    Search -->|Query| Tavily
    Tavily -->|Results| Search

    FileReader -->|Download| GAIA
    GAIA -->|File Data| FileReader

    %% API Client Flow
    UI -->|Fetch/Submit| Client
    Client <-->|HTTP| GAIA

    %% Styling
    classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000,font-weight:bold
    classDef orchestrationStyle fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000,font-weight:bold
    classDef toolStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef externalStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000

    class UI clientStyle
    class Agent,Client orchestrationStyle
    class Search,FileReader,Calculator toolStyle
    class HF,Groq,GAIA,Tavily externalStyle
Loading

Component Overview

Client Layer

  • Gradio interface for user interaction, testing, and submission generation
  • Handles UI rendering, form inputs, and file downloads

Orchestration Layer

  • GAIA Agent (agent.py): Core reasoning engine that coordinates tool usage and generates answers
  • GAIA API Client (gaia_client.py): Manages communication with the GAIA benchmark API

Tool Layer

  • Web Search (tools.py): Tavily-powered search for real-time information retrieval
  • File Reader (tools.py): Downloads and processes files (Excel, CSV, text) from GAIA API
  • Calculator (tools.py): Safe mathematical expression evaluation

External Services

Data Flow

  1. User submits query via Gradio interface
  2. Agent analyzes question and determines required tools
  3. Tools fetch external data (web search, files)
  4. Agent sends context to HF Inference API
  5. LLM generates reasoning and answer
  6. Response returned to user with full reasoning trace

Prerequisites

Installation

1. Clone Repository

git clone https://github.com/hasancoded/gaia-agent.git
cd gaia-agent

2. Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

cp .env.example .env

Edit .env and add your API credentials:

HF_API_TOKEN=your_huggingface_token
GROQ_API_KEY=your_groq_key  # Optional: for automatic fallback
TAVILY_API_KEY=your_tavily_key
GAIA_API_URL=https://agents-course-unit4-scoring.hf.space

Get API Keys:

Usage

Start Application

python app.py

Access the interface at http://localhost:7860

Test Agent

  1. Navigate to the Test Agent tab
  2. Click "Test on Random Question"
  3. Review answer and reasoning trace

Generate Submission

  1. Navigate to the Generate Submission tab
  2. Click "Generate Submission File"
  3. Download the generated .jsonl file
  4. Submit to GAIA Leaderboard

Project Structure

gaia-agent/
├── agent.py            # Core GAIA agent implementation
├── app.py              # Gradio web interface
├── gaia_client.py      # GAIA API client
├── tools.py            # Agent tools (search, file reader, calculator)
├── requirements.txt    # Python dependencies
├── .env.example        # Environment template
├── .gitignore          # Git ignore rules
├── LICENSE             # MIT License
└── README.md           # This file

API Reference

GAIAAgent

Main agent class for answering GAIA benchmark questions.

from agent import GAIAAgent

agent = GAIAAgent(tools={
    "search": search_tool,
    "file_reader": file_reader_tool,
    "calculator": calculator_tool
})

answer, reasoning = agent.answer_question(question_text, task_id)

Tools

WebSearchTool: Tavily-powered web search

from tools import WebSearchTool
search_tool = WebSearchTool(api_key=tavily_key)
results = search_tool.search(query)

FileReaderTool: Download and process files

from tools import FileReaderTool
file_tool = FileReaderTool(api_url=gaia_url)
content = file_tool.read_file(task_id)

CalculatorTool: Safe mathematical calculations

from tools import CalculatorTool
calc_tool = CalculatorTool()
result = calc_tool.calculate(expression)

Configuration

Environment Variables

Variable Required Description Get It
HF_API_TOKEN Yes Hugging Face API token Get Token
GROQ_API_KEY No Groq API key (fallback) Get Key
TAVILY_API_KEY Yes Tavily search API key Get Key
GAIA_API_URL Yes GAIA benchmark API URL Provided by organizers

Model Selection

Edit agent.py to change the models:

# Current configuration:
self.model_name = "meta-llama/Llama-3.1-70B-Instruct"  # Primary (HF)
self.groq_model = "llama-3.3-70b-versatile"            # Fallback (Groq)

# Other available HF models:
# - moonshotai/Kimi-K2-Instruct-0905 (excellent reasoning)
# - Qwen/Qwen2.5-72B-Instruct (complex tasks)
# - meta-llama/Llama-3.1-8B-Instruct (smaller, faster)

Troubleshooting

API Token Not Found

Ensure .env file exists and contains valid tokens. Restart the application after editing .env.

Module Import Errors

Verify virtual environment is activated and dependencies are installed:

pip install -r requirements.txt

Connection Errors

Check internet connection and verify API URLs are accessible.

Contributing

Contributions are welcome. Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

For issues related to:

About

AI Agent achieving 100% accuracy on GAIA benchmark. Features multi-model LLM orchestration with automatic Groq fallback, tool integration (web search, file processing, mathematical computation), and comprehensive API architecture. Built with Python, Hugging Face Transformers, and Gradio.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages