A CLI tool to compare SQL files using tree edit distance. It parses SQL statements into token trees and computes structural similarity using the APTED algorithm.
sql-similarity analyzes SQL queries by:
- Parsing SQL files using sqlparse
- Computing tree edit distance between token trees using APTED
- Returning a normalized similarity score (0.0 to 1.0) and detailed edit operations
The tool supports two modes:
- Pair mode: Compare two SQL files directly
- Batch mode: Compare all SQL files in a directory against each other
sqlparse is a non-validating SQL parser, therefore it should work in most of the dialects.
No installation required - run directly with uvx:
Compare two SQL files:
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity file1.sql file2.sqlOutput includes:
- Edit distance (number of tree operations)
- Similarity score (0.0-1.0)
- List of edit operations (insert, delete, rename, match)
Compare all .sql files in a directory:
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/sql/directoryThis compares all pairs of SQL files and outputs results sorted by similarity.
JSON output:
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity file1.sql file2.sql --json
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --jsonCSV output (batch mode only):
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --csvLimit results by maximum distance:
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --max-distance 10Show only the top N most similar pairs:
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --top 5uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity --versiongit clone https://github.com/myshmeh/sql-similarity-py.git
cd sql-similarity
uv sync --devRun tests:
uv run pytest- Python 3.11+
MIT