Squrve is a lightweight and modular framework designed to help beginners in Text-to-SQL research quickly get started. It enables one-click reproduction of multiple baseline methods and provides a unified evaluation pipeline, making it easy to compare different approaches under the same metrics.
We sincerely welcome students and researchers to try Squrve, contribute new components, or report any issues. We will promptly evaluate and integrate valuable suggestions, and we look forward to collaborating with the community to advance Text-to-SQL research.
Squrve delivers comprehensive benchmarking. It evaluates end-to-end performance using the unified Execution Accuracy (EX) framework while simultaneously assessing granular components. You can accurately test schema linking (Recall, Precision, F1 via AmbiDB) and validate SQL Optimizers based on execution feedback. Benchmarking your methods is instant—no interface code changes required.
Jumpstart your Text-to-SQL journey without the friction. Squrve automates the tedious work of data formatting, documentation, and Chain-of-Thought construction. This allows you to immediately experiment with diverse methods, helping you master both fundamental paradigms and the latest frontier advancements with ease.
Innovate without limits. Whether building end-to-end solutions or modular workflows, Squrve lets you reuse state-of-the-art "Actor" components or combine them for scalable results. Focus purely on optimizing your algorithm’s logic; Squrve automatically handles the generalization of your method to any evaluation dataset or application scenario.
Deploy advanced methods to wide-ranging query scenarios effortlessly. Squrve requires zero code modifications—simply adjust configuration parameters to leverage top-tier open-source models. Built for real-world impact, Squrve empowers deep data analysis to drive intelligent business decisions.
Ensure your Python environment meets the following requirements:
- Python 3.11+
- Required dependencies (see requirements.txt)
# Clone the repository
git clone https://github.com/Satissss/Squrve.git
# Install dependencies
pip install -r requirements.txt
# Unzip benchmarks.zip to the root directory
unzip benchmarks.zip -d .If you encounter issues downloading benchmark.zip, an alternative is available at the backup Hugging Face repository: https://huggingface.co/datasets/satissss/Squrve-Benchmarks
Edit the configuration file to add your API keys:
{
"api_key": {
"qwen": "your_qwen_api_key",
"deepseek": "your_deepseek_api_key",
"zhipu": "your_zhipu_api_key"
}
}If using LinkAlign-related components, configure according to https://github.com/Satissss/LinkAlign/blob/master/README.md.
# Run Spider Dev dataset example
cd startup_run
python run.pyfrom core.base import Router
from core.engine import Engine
# Initialize with configuration file
router = Router(config_path="startup_run/startup_config.json")
engine = Engine(router)
# Execute task
engine.execute()
# Evaluate results
engine.evaluate()To tackle diverse and co-occuring difficulties in real-world scenarios, Squrve abstracts and formalizes seven atomic Actor components, each representing a distinct Text-to-SQL capability validated in prior research. Through Squrve's multi-actor collaboration mechanism, different actors can interact and cooperate that effectively fuses their complementary strengths.
| Actor Types | Methods | Key Challenges |
|---|---|---|
| Reduce | LinkAlign | Large-scale and multi-database |
| Parse | LinkAlign; RSL-SQL ... | Ambiguous queries and redundant schemas |
| Generate | DIN-SQL; CHESS ... | Efficient and high-quality SQL generation |
| Decompose | DIN-SQL; MAC-SQL | Chain-of-Thought for complex queries |
| Scale | CHESS; CHASE-SQL; OpenSearch ... | Diverse and high-quality decoding strategies |
| Optimize | CHASE-SQL; CHESS; OpenSearch ... | Effective and broader database feedback |
| Select | CHASE-SQL; CHESS; MCS-SQL ... | Accurate gold SQL identification |
This component eliminates redundant schemas from large-scale databases that may exceed LLM context windows.
This component performs schema linking by extracting tables and columns from candidate schemas that are potentially required for SQL generation.
This component generates complete SQL statements, encapsulating existing end-to-end academic methods.
This component decomposes complex queries into multiple logically progressive sub-questions and generates SQL statements for each.
This component generates diverse high-quality SQL candidates to increase the probability of covering the gold SQL.
This component leverages environmental feedback (e.g., database errors or results) to refine the quality of the generated SQL queries.
This component selects the optimal SQL statement from multiple candidates, typically in collaboration with the scale actor.
Squrve includes built-in support for several standard Text-to-SQL benchmarks for easy model evaluation and comparison:
| Benchmark | Involved Split | Description | Code Link |
|---|---|---|---|
| Spider | dev \ test | Cross-domain Text-to-SQL benchmark, supporting dev split. | https://github.com/taoyds/spider |
| BIRD | dev | Text-to-SQL benchmark with external knowledge. | https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird |
| Spider2 | snow \ lite | Extended version of Spider with more complex scenarios. | https://github.com/xlang-ai/Spider2 |
| AmbiDB | N/A | A variant of Spider, enhancing ambiguity through multi-database settings. | https://huggingface.co/datasets/satissss/AmbiDB |
Currently only supports LLM services through API calls.
- Qwen
- Deepseek
- Openai
- Zhipu
- Gemini
- ......
The following are the database types currently supported by Squrve. We will continue to add more enterprise database systems as we extend our dataset support.
- Sqlite
- BigQuery
- Snowflake
Squrve successfully reproduces existing Text-to-SQL baselines under the same LLM backbone with performance closely aligned to their originally executed results. Building upon the reproduced components, we further demonstrate the extensibility and effectiveness of the Multi-Actor collaboration mechanism through evaluation on two variants.
As shown in Table below, both ensemble methods substantially outperform all individual baselines in both benchmarks.

Complete API reference with detailed explanations of all configuration parameters and methods.
Usage guide and configuration examples for the Spider Dev dataset
We welcome active participation in improving and building Squrve. Contribute optimized methods, new components, or task-specific datasets to Squrve, making them accessible to the broader Text-to-SQL research community.
Define the Actor class and provide concrete implementations.
Refer to the API documentation to ensure valid formatting.
If you find Squrve useful for your research or work, please consider citing it:
@article{wang2025squrve,
title = {Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks},
author = {Wang, Yihan and Liu, Peiyu and Chen, Runyu and Pu, Jiaxing and Xu, Wei},
journal = {arXiv preprint arXiv:2510.24102},
year = {2025},
url = {https://doi.org/10.48550/arXiv.2510.24102}
}