Argus is a novel framework for automatically discovering and instantiating test oracles to find logic bugs in Database Management Systems (DBMSs) using Large Language Models (LLMs).
Since 2020, automated testing for DBMSs has flourished, uncovering hundreds of bugs in widely-used systems. However, while applying test oracles can be automated, their design remains fundamentally manual. Argus addresses this bottleneck by leveraging LLMs to automate the discovery and instantiation of test oracles.
Argus introduces the Constrained Abstract Query (CAQ) — a SQL skeleton containing placeholders and their associated instantiation conditions. The framework:
- Discovers Test Oracles: Uses LLMs to generate pairs of CAQs that are asserted to be semantically equivalent
- Ensures Soundness: Formally proves equivalence using SQL equivalence solvers (SQLSolver, QED, Polygon)
- Scales Efficiently: Instantiates verified skeletons with concrete, reusable SQL snippets
We evaluated Argus on five extensively tested DBMSs:
| DBMS | Found | Confirmed | Fixed | Logic Bugs |
|---|---|---|---|---|
| Dolt | 19 | 19 | 18 | 18 |
| DuckDB | 8 | 7 | 6 | 4 |
| MySQL | 8 | 6 | 0 | 8 |
| PostgreSQL | 1 | 1 | 1 | 1 |
| TiDB | 5 | 5 | 1 | 5 |
| Total | 41 | 36 | 26 | 36 |
- 40 previously unknown bugs discovered, with 35 logic bugs
- 36 bugs confirmed and 26 already fixed by developers
- Up to 1.19× improvement in code coverage vs. state-of-the-art
- 6.43× improvement in metamorphic coverage (logic bug detection metric)
- 3.33× more unique logic bugs than union of prior oracles (TLP, NoREC, EET, DQP)
- Python >= 3.9
- Java = 17
- Nix
- SQLSolver - Linear Integer Arithmetic-based prover
CREATE TABLE t(c INT);
INSERT INTO t VALUES (1);
-- Expected: {NULL}
SELECT sub.c FROM (
SELECT json_array_length(json_array(3, 2, t.c)) AS c FROM t
) AS sub RIGHT JOIN t ON FALSE;
-- Actual: {2} ✗CREATE TABLE t(c0 INT);
INSERT INTO t VALUES (1);
-- Expected: {} (empty)
SELECT * FROM t LEFT JOIN (
SELECT MOD(5, 2) AS c0 FROM t
) AS t2 ON FALSE
WHERE t2.c0 IS NOT NULL;
-- Actual: {1} ✗# Install Java 17 or higher
# Ubuntu/Debian: sudo apt install openjdk-17-jdk
# macOS: brew install openjdk@17
# Install Python dependencies
pip install -r requirements.txtCreate an api-keys.yml file in the root directory with your LLM API configuration:
your_model_name:
key: "your-api-key-here"
completion_url: "https://api.openai.com/v1"
api_version: "2023-05-15" # For Azure OpenAI onlydocker build -f docker/Dockerfile -t sqlequal .Generate SQL expression snippets for your target database engine:
# Supported engines: dolt, duckdb, mysql, postgresql, tidb
python -m expressions.<database_engine>_expressions_generator
python -m expressions.<database_engine>_expressions_crossoverExample:
python -m expressions.duckdb_expressions_generator
python -m expressions.duckdb_expressions_crossoverExecute the complete Argus pipeline for automated bug discovery:
python -m evaluate.sqlancer --database_engine <database_engine> --model <model_name>Example:
python -m evaluate.sqlancer --database_engine duckdb --model o4-miniThis project is research software. Please refer to the paper for implementation details.
All 40 bugs discovered by Argus have been reported to the respective DBMS developers, with 36 confirmed and 26 already fixed. Detailed information about each bug, including reproduction steps, confirmation status, and fix status, can be found in:
- Argus_Buglist.csv - Comprehensive bug database with metadata and status tracking
The bug database includes categorization by bug type (logic bugs, crash bugs, etc.), affected DBMS versions, and developer response tracking.