Skip to content

khang3004/AgentSQL-Asym

Repository files navigation

🤖 AgentSQL: Asymmetric Multi-Agent Text-to-SQL

License: MIT Python 3.11+ Framework: LangGraph Benchmark: BIRD-SQL

AgentSQL is a production-grade, asymmetric multi-agent framework designed to solve the Text-to-SQL dilemma: Balancing high Execution Accuracy (EX) with cost-efficiency.

By decoupling the high-volume Generation task from the complex Correction/Reasoning task, AgentSQL achieves state-of-the-art results on the BIRD benchmark while maintaining a significantly lower inference cost compared to monolithic frontier model approaches.


🏗️ Architecture: Asymmetric MasterPipeline

AgentSQL utilizes an Asymmetric Multi-Agent Architecture (MasterPipeline). The workflow strictly isolates offline pre-processing from online inference, allowing for specialized model selection and optimized token usage at each step.

AgentSQL Architecture Workflow

Tip

High-Quality Diagram: A professional TikZ version of this workflow is available in agentsql_workflow.tex, suitable for academic publications and high-resolution reports.

Pipeline Phases

  1. Phase 1: CHESS Pruning (tools/chess_linker.py): Offline semantic filtering using lightweight embedding models (e.g., bge-small) to isolate only the most relevant tables and eliminate schema noise.
  2. Phase 2: MCI-SQL Enrichment (tools/mci_sql_pipeline.py): Extracts precise metadata (cardinalities, min/max values, exact row samples) from the pruned schema to build a high-fidelity context.
  3. Phase 4a/b: Generator & Reflector (tools/master_pipeline.py): The core generation loop. An optimized open-source model (e.g., gpt-oss-120b or llama-4-scout-17b) generates the SQL, which is immediately evaluated by a Reflector for logical self-consistency via back-translation.
  4. Phase 4c: Resilient Critic (nodes/corrector.py): Activated only if the Execution Sandbox detects a syntax error or the Reflector detects a logical mismatch. Powered by a high-reasoning model (e.g., gemini-2.5-flash), it performs targeted patching using the MAGIC checklist.

✨ Key Features

  • 🛡️ Ephemeral Sandboxing: Native support for SQLite, MySQL, and PostgreSQL with automatic state reset and set-based result comparison.
  • 🔄 Round-Robin Key Rotation: The KeyRotator abstraction supports multiple API keys per provider to prevent rate-limiting during large-scale evaluations.
  • 🔌 Resilient LLM Factory: Automatic fallback to local Ollama instances if all cloud API keys are exhausted or unavailable.
  • 📊 Unified Research Suite: A centralized evaluation engine that calculates EX, VES, and Soft F1 metrics in a single pass.

📈 Evaluation Metrics

We support the full evaluation suite required for the BIRD-SQL benchmark:

Metric Definition Importance
EX Execution Accuracy Measures if the predicted SQL returns the exact same data as the ground truth.
VES Valid Efficiency Score Measures the runtime efficiency of the SQL (Speed vs. Ground Truth).
Soft F1 Semantic F1 Score Measures partial correctness by comparing row-level data matches (Precision/Recall).

Note

Recent evaluations of the MasterPipeline on the BIRD Mini-Dev dataset demonstrate highly competitive Execution Accuracy (EX) while significantly reducing API costs compared to monolithic GPT-4/Claude-3 setups.


🚀 Quick Start

1. Environment Setup

Populate your .env file with multiple keys for high-concurrency evaluation:

cp .env.example .env
# Fill GEMINI_API_KEY_1, GEMINI_API_KEY_2, GROQ_API_KEY_1, etc.

2. Launch with Docker

The framework is fully containerized for reproducibility:

make build
make up
make shell

3. Run Evaluation

Execute the AgentSQL MasterPipeline on the Mini-Dev dataset:

make eval-master NUM_SAMPLES=20

📁 Project Structure

.
├── research/               # Unified evaluation suite & SOTA comparison
├── llm/src/text2sql_agent/ # Core Framework (LangGraph Nodes, Tools, State)
├── evaluation/             # Legacy baseline evaluation scripts
├── data_minidev/           # BIRD-SQL dataset and SQLite databases
├── Makefile                # High-level orchestration commands
└── docker-compose.yml      # Isolated execution environment

👥 Authors

Implemented with ❤️ by the HCMUS Underdogs team. Dedicated to scaling agentic AI workflows with rigor and resilience.

About

Production-grade Asymmetric Multi-Agent Text-to-SQL on BIRD-SQL. Offline CHESS/FAISS pruning + MCI-SQL enrichment feed ≤3 Groq API calls: gpt-oss-120b generator · llama-4-scout reflector · gpt-oss-20b critic. LangGraph · SQLAlchemy sandbox · LangSmith · Docker · MIT.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors