Argus: Automated Discovery of Test Oracles for Database Management Systems Using LLMs

Argus is a novel framework for automatically discovering and instantiating test oracles to find logic bugs in Database Management Systems (DBMSs) using Large Language Models (LLMs).

Overview

Since 2020, automated testing for DBMSs has flourished, uncovering hundreds of bugs in widely-used systems. However, while applying test oracles can be automated, their design remains fundamentally manual. Argus addresses this bottleneck by leveraging LLMs to automate the discovery and instantiation of test oracles.

Key Innovation: Constrained Abstract Query (CAQ)

Argus introduces the Constrained Abstract Query (CAQ) — a SQL skeleton containing placeholders and their associated instantiation conditions. The framework:

Discovers Test Oracles: Uses LLMs to generate pairs of CAQs that are asserted to be semantically equivalent
Ensures Soundness: Formally proves equivalence using SQL equivalence solvers (SQLSolver, QED, Polygon)
Scales Efficiently: Instantiates verified skeletons with concrete, reusable SQL snippets

Results

We evaluated Argus on five extensively tested DBMSs:

DBMS	Found	Confirmed	Fixed	Logic Bugs
Dolt	19	19	18	18
DuckDB	8	7	6	4
MySQL	8	6	0	8
PostgreSQL	1	1	1	1
TiDB	5	5	1	5
Total	41	36	26	36

Key Achievements

40 previously unknown bugs discovered, with 35 logic bugs
36 bugs confirmed and 26 already fixed by developers
Up to 1.19× improvement in code coverage vs. state-of-the-art
6.43× improvement in metamorphic coverage (logic bug detection metric)
3.33× more unique logic bugs than union of prior oracles (TLP, NoREC, EET, DQP)

Setup

Prerequisites

Python >= 3.9
Java = 17
Nix

Database Backends

SQL Equivalence Verifiers

SQLSolver - Linear Integer Arithmetic-based prover

Example Bugs Found

PostgreSQL: Incorrect JSON Function Handling in RIGHT JOIN

CREATE TABLE t(c INT);
INSERT INTO t VALUES (1);

-- Expected: {NULL}
SELECT sub.c FROM (
  SELECT json_array_length(json_array(3, 2, t.c)) AS c FROM t
) AS sub RIGHT JOIN t ON FALSE;
-- Actual: {2} ✗

MySQL: Wrong Result with MOD Function in LEFT JOIN

CREATE TABLE t(c0 INT);
INSERT INTO t VALUES (1);

-- Expected: {} (empty)
SELECT * FROM t LEFT JOIN (
  SELECT MOD(5, 2) AS c0 FROM t
) AS t2 ON FALSE
WHERE t2.c0 IS NOT NULL;
-- Actual: {1} ✗

Reproduction Guide

Environment Setup

Step 1: Install Dependencies

# Install Java 17 or higher
# Ubuntu/Debian: sudo apt install openjdk-17-jdk
# macOS: brew install openjdk@17

# Install Python dependencies
pip install -r requirements.txt

Step 2: Configure API Keys

Create an api-keys.yml file in the root directory with your LLM API configuration:

your_model_name:
  key: "your-api-key-here"
  completion_url: "https://api.openai.com/v1"
  api_version: "2023-05-15"  # For Azure OpenAI only

Step 3: Build docker image for verification

docker build -f docker/Dockerfile -t sqlequal .

Step 4: Generate Expression Snippets (Optional)

Generate SQL expression snippets for your target database engine:

# Supported engines: dolt, duckdb, mysql, postgresql, tidb
python -m expressions.<database_engine>_expressions_generator
python -m expressions.<database_engine>_expressions_crossover

Example:

python -m expressions.duckdb_expressions_generator
python -m expressions.duckdb_expressions_crossover

Running the Bug Discovery Pipeline

Execute the complete Argus pipeline for automated bug discovery:

python -m evaluate.sqlancer --database_engine <database_engine> --model <model_name>

Example:

python -m evaluate.sqlancer --database_engine duckdb --model o4-mini

License

This project is research software. Please refer to the paper for implementation details.

Bug Reports

All 40 bugs discovered by Argus have been reported to the respective DBMS developers, with 36 confirmed and 26 already fixed. Detailed information about each bug, including reproduction steps, confirmation status, and fix status, can be found in:

Argus_Buglist.csv - Comprehensive bug database with metadata and status tracking

The bug database includes categorization by bug type (logic bugs, crash bugs, etc.), affected DBMS versions, and developer response tracking.

Name		Name	Last commit message	Last commit date
Latest commit History 341 Commits
MetamorphicCoverageArtifact		MetamorphicCoverageArtifact
data		data
databases		databases
docker		docker
docs		docs
evaluate		evaluate
experiment		experiment
expressions		expressions
models		models
provers		provers
results		results
scripts		scripts
testing		testing
vtable		vtable
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Argus_Buglist.csv		Argus_Buglist.csv
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argus: Automated Discovery of Test Oracles for Database Management Systems Using LLMs

Overview

Key Innovation: Constrained Abstract Query (CAQ)

Results

Key Achievements

Setup

Prerequisites

Database Backends

SQL Equivalence Verifiers

Example Bugs Found

PostgreSQL: Incorrect JSON Function Handling in RIGHT JOIN

MySQL: Wrong Result with MOD Function in LEFT JOIN

Reproduction Guide

Environment Setup

Step 1: Install Dependencies

Step 2: Configure API Keys

Step 3: Build docker image for verification

Step 4: Generate Expression Snippets (Optional)

Running the Bug Discovery Pipeline

License

Bug Reports

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Argus: Automated Discovery of Test Oracles for Database Management Systems Using LLMs

Overview

Key Innovation: Constrained Abstract Query (CAQ)

Results

Key Achievements

Setup

Prerequisites

Database Backends

SQL Equivalence Verifiers

Example Bugs Found

PostgreSQL: Incorrect JSON Function Handling in RIGHT JOIN

MySQL: Wrong Result with MOD Function in LEFT JOIN

Reproduction Guide

Environment Setup

Step 1: Install Dependencies

Step 2: Configure API Keys

Step 3: Build docker image for verification

Step 4: Generate Expression Snippets (Optional)

Running the Bug Discovery Pipeline

License

Bug Reports

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages