Skip to content

joyemang33/Argus

Repository files navigation

Argus: Automated Discovery of Test Oracles for Database Management Systems Using LLMs

Argus is a novel framework for automatically discovering and instantiating test oracles to find logic bugs in Database Management Systems (DBMSs) using Large Language Models (LLMs).

Overview

Since 2020, automated testing for DBMSs has flourished, uncovering hundreds of bugs in widely-used systems. However, while applying test oracles can be automated, their design remains fundamentally manual. Argus addresses this bottleneck by leveraging LLMs to automate the discovery and instantiation of test oracles.

Key Innovation: Constrained Abstract Query (CAQ)

Argus introduces the Constrained Abstract Query (CAQ) — a SQL skeleton containing placeholders and their associated instantiation conditions. The framework:

  1. Discovers Test Oracles: Uses LLMs to generate pairs of CAQs that are asserted to be semantically equivalent
  2. Ensures Soundness: Formally proves equivalence using SQL equivalence solvers (SQLSolver, QED, Polygon)
  3. Scales Efficiently: Instantiates verified skeletons with concrete, reusable SQL snippets

Results

We evaluated Argus on five extensively tested DBMSs:

DBMS Found Confirmed Fixed Logic Bugs
Dolt 19 19 18 18
DuckDB 8 7 6 4
MySQL 8 6 0 8
PostgreSQL 1 1 1 1
TiDB 5 5 1 5
Total 41 36 26 36

Key Achievements

  • 40 previously unknown bugs discovered, with 35 logic bugs
  • 36 bugs confirmed and 26 already fixed by developers
  • Up to 1.19× improvement in code coverage vs. state-of-the-art
  • 6.43× improvement in metamorphic coverage (logic bug detection metric)
  • 3.33× more unique logic bugs than union of prior oracles (TLP, NoREC, EET, DQP)

Setup

Prerequisites

  • Python >= 3.9
  • Java = 17
  • Nix

Database Backends

SQL Equivalence Verifiers

  • SQLSolver - Linear Integer Arithmetic-based prover

Example Bugs Found

PostgreSQL: Incorrect JSON Function Handling in RIGHT JOIN

CREATE TABLE t(c INT);
INSERT INTO t VALUES (1);

-- Expected: {NULL}
SELECT sub.c FROM (
  SELECT json_array_length(json_array(3, 2, t.c)) AS c FROM t
) AS sub RIGHT JOIN t ON FALSE;
-- Actual: {2} ✗

MySQL: Wrong Result with MOD Function in LEFT JOIN

CREATE TABLE t(c0 INT);
INSERT INTO t VALUES (1);

-- Expected: {} (empty)
SELECT * FROM t LEFT JOIN (
  SELECT MOD(5, 2) AS c0 FROM t
) AS t2 ON FALSE
WHERE t2.c0 IS NOT NULL;
-- Actual: {1} ✗

Reproduction Guide

Environment Setup

Step 1: Install Dependencies

# Install Java 17 or higher
# Ubuntu/Debian: sudo apt install openjdk-17-jdk
# macOS: brew install openjdk@17

# Install Python dependencies
pip install -r requirements.txt

Step 2: Configure API Keys

Create an api-keys.yml file in the root directory with your LLM API configuration:

your_model_name:
  key: "your-api-key-here"
  completion_url: "https://api.openai.com/v1"
  api_version: "2023-05-15"  # For Azure OpenAI only

Step 3: Build docker image for verification

docker build -f docker/Dockerfile -t sqlequal .

Step 4: Generate Expression Snippets (Optional)

Generate SQL expression snippets for your target database engine:

# Supported engines: dolt, duckdb, mysql, postgresql, tidb
python -m expressions.<database_engine>_expressions_generator
python -m expressions.<database_engine>_expressions_crossover

Example:

python -m expressions.duckdb_expressions_generator
python -m expressions.duckdb_expressions_crossover

Running the Bug Discovery Pipeline

Execute the complete Argus pipeline for automated bug discovery:

python -m evaluate.sqlancer --database_engine <database_engine> --model <model_name>

Example:

python -m evaluate.sqlancer --database_engine duckdb --model o4-mini

License

This project is research software. Please refer to the paper for implementation details.

Bug Reports

All 40 bugs discovered by Argus have been reported to the respective DBMS developers, with 36 confirmed and 26 already fixed. Detailed information about each bug, including reproduction steps, confirmation status, and fix status, can be found in:

The bug database includes categorization by bug type (logic bugs, crash bugs, etc.), affected DBMS versions, and developer response tracking.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors