AI agent for the GAIA benchmark using Hugging Face Inference API (Llama-3.1-70B) with automatic Groq fallback (Llama-3.3-70B) and multi-tool capabilities including web search, file processing, and mathematical calculations.
Live Demo: huggingface.co/spaces/hasancoded/gaia-agent
- Answer GAIA benchmark questions using Hugging Face Inference API (Llama-3.1-70B)
- Automatic fallback to Groq API (Llama-3.3-70B) for reliability
- Web search integration via Tavily API
- File reading and processing (Excel, CSV, text files)
- Mathematical calculations
- Gradio web interface for testing and submission generation
- Detailed reasoning traces for transparency
- JSONL submission file generation
- Python 3.8+
- Hugging Face Inference API - Primary LLM inference (Llama-3.1-70B)
- Groq - Fallback LLM inference (Llama-3.3-70B)
- Gradio - Web interface
- Tavily - Web search
- Pandas - Data processing
- Requests - HTTP client
The system follows a modular architecture with clear separation between presentation, orchestration, and service layers.
flowchart LR
%% Client Layer
UI["Gradio Interface<br/><i>app.py</i>"]
%% Orchestration Layer
Agent["GAIA Agent<br/><i>agent.py</i>"]
Client["API Client<br/><i>gaia_client.py</i>"]
%% Tool Layer
Search["Web Search<br/><i>Tavily</i>"]
FileReader["File Reader<br/><i>Excel/CSV/Text</i>"]
Calculator["Calculator<br/><i>Math Eval</i>"]
%% External Services
HF["HF Inference API<br/><i>Llama-3.1-70B</i>"]
Groq["Groq API<br/><i>Llama-3.3-70B</i>"]
GAIA["GAIA Benchmark<br/><i>Questions & Eval</i>"]
Tavily["Tavily API<br/><i>Search Engine</i>"]
%% Primary Flow
UI -->|User Query| Agent
Agent -->|LLM Request| HF
HF -->|Response| Agent
Agent -.->|Fallback on Error| Groq
Groq -.->|Response| Agent
Agent -->|Answer| UI
%% Tool Orchestration
Agent -.->|Invoke| Search
Agent -.->|Invoke| FileReader
Agent -.->|Invoke| Calculator
%% Tool-Service Connections
Search -->|Query| Tavily
Tavily -->|Results| Search
FileReader -->|Download| GAIA
GAIA -->|File Data| FileReader
%% API Client Flow
UI -->|Fetch/Submit| Client
Client <-->|HTTP| GAIA
%% Styling
classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000,font-weight:bold
classDef orchestrationStyle fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000,font-weight:bold
classDef toolStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
classDef externalStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
class UI clientStyle
class Agent,Client orchestrationStyle
class Search,FileReader,Calculator toolStyle
class HF,Groq,GAIA,Tavily externalStyle
- Gradio interface for user interaction, testing, and submission generation
- Handles UI rendering, form inputs, and file downloads
- GAIA Agent (
agent.py): Core reasoning engine that coordinates tool usage and generates answers - GAIA API Client (
gaia_client.py): Manages communication with the GAIA benchmark API
- Web Search (
tools.py): Tavily-powered search for real-time information retrieval - File Reader (
tools.py): Downloads and processes files (Excel, CSV, text) from GAIA API - Calculator (
tools.py): Safe mathematical expression evaluation
- Hugging Face Inference API: Primary LLM inference using Llama-3.1-70B
- Groq API: Fallback LLM inference using Llama-3.3-70B
- GAIA Benchmark API: Question retrieval and answer submission
- Tavily Search API: Web search capabilities
- User submits query via Gradio interface
- Agent analyzes question and determines required tools
- Tools fetch external data (web search, files)
- Agent sends context to HF Inference API
- LLM generates reasoning and answer
- Response returned to user with full reasoning trace
- Python 3.8 or higher
- pip package manager
- Hugging Face API token
- Groq API key (optional, for fallback)
- Tavily API key
- GAIA API access
git clone https://github.com/hasancoded/gaia-agent.git
cd gaia-agentpython -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txtcp .env.example .envEdit .env and add your API credentials:
HF_API_TOKEN=your_huggingface_token
GROQ_API_KEY=your_groq_key # Optional: for automatic fallback
TAVILY_API_KEY=your_tavily_key
GAIA_API_URL=https://agents-course-unit4-scoring.hf.spaceGet API Keys:
- HF_API_TOKEN: Hugging Face Settings
- GROQ_API_KEY: Groq Console (optional)
- TAVILY_API_KEY: Tavily Dashboard
python app.pyAccess the interface at http://localhost:7860
- Navigate to the Test Agent tab
- Click "Test on Random Question"
- Review answer and reasoning trace
- Navigate to the Generate Submission tab
- Click "Generate Submission File"
- Download the generated
.jsonlfile - Submit to GAIA Leaderboard
gaia-agent/
├── agent.py # Core GAIA agent implementation
├── app.py # Gradio web interface
├── gaia_client.py # GAIA API client
├── tools.py # Agent tools (search, file reader, calculator)
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
├── LICENSE # MIT License
└── README.md # This file
Main agent class for answering GAIA benchmark questions.
from agent import GAIAAgent
agent = GAIAAgent(tools={
"search": search_tool,
"file_reader": file_reader_tool,
"calculator": calculator_tool
})
answer, reasoning = agent.answer_question(question_text, task_id)WebSearchTool: Tavily-powered web search
from tools import WebSearchTool
search_tool = WebSearchTool(api_key=tavily_key)
results = search_tool.search(query)FileReaderTool: Download and process files
from tools import FileReaderTool
file_tool = FileReaderTool(api_url=gaia_url)
content = file_tool.read_file(task_id)CalculatorTool: Safe mathematical calculations
from tools import CalculatorTool
calc_tool = CalculatorTool()
result = calc_tool.calculate(expression)| Variable | Required | Description | Get It |
|---|---|---|---|
HF_API_TOKEN |
Yes | Hugging Face API token | Get Token |
GROQ_API_KEY |
No | Groq API key (fallback) | Get Key |
TAVILY_API_KEY |
Yes | Tavily search API key | Get Key |
GAIA_API_URL |
Yes | GAIA benchmark API URL | Provided by organizers |
Edit agent.py to change the models:
# Current configuration:
self.model_name = "meta-llama/Llama-3.1-70B-Instruct" # Primary (HF)
self.groq_model = "llama-3.3-70b-versatile" # Fallback (Groq)
# Other available HF models:
# - moonshotai/Kimi-K2-Instruct-0905 (excellent reasoning)
# - Qwen/Qwen2.5-72B-Instruct (complex tasks)
# - meta-llama/Llama-3.1-8B-Instruct (smaller, faster)Ensure .env file exists and contains valid tokens. Restart the application after editing .env.
Verify virtual environment is activated and dependencies are installed:
pip install -r requirements.txtCheck internet connection and verify API URLs are accessible.
Contributions are welcome. Please follow these guidelines:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- GAIA Benchmark team
- Hugging Face for Inference API
- Tavily for web search
- Gradio for web interface
For issues related to:
- GAIA Benchmark: Contact GAIA organizers
- Hugging Face API: Check HF documentation
- Tavily API: Visit Tavily docs