Skip to content

myshmeh/sql-similarity-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sql-similarity

Experimental

A CLI tool to compare SQL files using tree edit distance. It parses SQL statements into token trees and computes structural similarity using the APTED algorithm.

Overview

sql-similarity analyzes SQL queries by:

  1. Parsing SQL files using sqlparse
  2. Computing tree edit distance between token trees using APTED
  3. Returning a normalized similarity score (0.0 to 1.0) and detailed edit operations

The tool supports two modes:

  • Pair mode: Compare two SQL files directly
  • Batch mode: Compare all SQL files in a directory against each other

Supported SQL Dialects

sqlparse is a non-validating SQL parser, therefore it should work in most of the dialects.

Usage

No installation required - run directly with uvx:

Pair Mode

Compare two SQL files:

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity file1.sql file2.sql

Output includes:

  • Edit distance (number of tree operations)
  • Similarity score (0.0-1.0)
  • List of edit operations (insert, delete, rename, match)

Batch Mode

Compare all .sql files in a directory:

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/sql/directory

This compares all pairs of SQL files and outputs results sorted by similarity.

Output Formats

JSON output:

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity file1.sql file2.sql --json
uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --json

CSV output (batch mode only):

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --csv

Filtering Options (Batch Mode)

Limit results by maximum distance:

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --max-distance 10

Show only the top N most similar pairs:

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity /path/to/directory --top 5

Version

uvx --from https://github.com/myshmeh/sql-similarity-py.git sql-similarity --version

Development

git clone https://github.com/myshmeh/sql-similarity-py.git
cd sql-similarity
uv sync --dev

Run tests:

uv run pytest

Requirements

  • Python 3.11+

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages