What Is SQLCheck?
sqlcheck automatically detects common SQL anti-patterns. Such anti-patterns often slow down queries. Addressing them will, therefore, help accelerate queries.
sqlcheck targets all major SQL dialects.
For development updates on
sqlcheck and general news on next generation database systems, follow me at @joy_arulraj.
What it can do
Right now SQLCheck can detect the following anti-patterns:
Logical Database Design Anti-Patterns
- Multi-Valued Attribute
- Recursive Dependency
- Primary Key Does Not Exist
- Generic Primary Key
- Foreign Key Does Not Exist
- Entity-Attribute-Value Pattern
- Metadata Tribbles
Physical Database Design Anti-Patterns
- Imprecise Data Type
- Values In Definition
- Files Are Not SQL Data Types
- Too Many Indexes
- Index Attribute Order
- SELECT *
- NULL Usage
- NOT NULL Usage
- String Concatenation
- GROUP BY Usage
- ORDER BY RAND Usage
- Pattern Matching Usage
- Spaghetti Query Alert
- Reduce Number of JOINs
- Eliminate Unnecessary DISTINCT Conditions
- Implicit Column Usage
- HAVING Clause Usage
- Nested sub queries
- OR Usage
- UNION Usage
- DISTINCT & JOIN Usage
Application Development Anti-Patterns
- Download the debian package from the latest release.
- Install it by running the following command.
dpkg -i sqlcheck-x86_64.deb
- Download the rpm package from the latest release.
- Install it by running the following command.
yum --nogpgcheck localinstall sqlcheck-x86_64.rpm
- Download the dmg package from the latest release.
Click the dmg to mount the package. This will mount it in the
Terminalapp. This page contains more details on finding the app.
Copy over the SQLCheck binary file to desired local directory.
cp /Volumes/sqlcheck-x86_64/bin/sqlcheck /usr/local/bin/
- Download the exe file from the latest release.
- Open a terminal (like
cmd.exe) and run the executable.
- Download the zip package from the latest release.
- Unzip it and find the SQLCheck binary (
SQLCheck has the following software dependencies:
- g++ 4.9+
- cmake (Cmake installation guide)
First, clone the repository (with --recursive option).
git clone --recursive https://github.com/jarulraj/sqlcheck.git
Next, run the following commands to build and install SQLCheck:
./bootstrap cd build cmake -DCMAKE_BUILD_TYPE=RELEASE .. make make install
To run SQLCheck unit tests during development, run:
$ sqlcheck -h Command line options : sqlcheck <options> -f --file_name : file name -r --risk_level : set of anti-patterns to check : 1 (all anti-patterns, default) : 2 (only medium and high risk anti-patterns) : 3 (only high risk anti-patterns) -c --color_mode : color mode -v --verbose_mode : verbose mode
$ sqlcheck -f examples/top_mutexes.sql -v ------------------------------------------------- > RISK LEVEL :: ALL ANTI-PATTERNS > SQL FILE NAME :: examples/top_mutexes.sql ------------------------------------------------- ==================== Results =================== ------------------------------------------------- SQL Statement: with top_mutexes as ( select--+ leading(t1 s1 v1 v2 t2 s2) use_hash(s1) use_nl(v1) use_hash(s2) materialize t1.hsecs ,s1.* ,s2.sleeps as end_sleeps ,s2.wait_time as end_wait_time ,s2.sleeps-s1.sleeps as delta_sleeps ,t2.hsecs - t1.hsecs as delta_hsecs --,s2.* from v$timer t1 ,v$mutex_sleep s1 ,(select/*+ no_merge */ sum(level) a from dual connect by level<=1e6) v1 ,v$timer t2 ,v$mutex_sleep s2 where s1.mutex_type=s2.mutex_type and s1.location=s2.location ) select * from top_mutexes order by delta_sleeps desc;
[examples/top_mutexes.sql]: (HIGH RISK) (QUERY ANTI-PATTERN) SELECT * ● Inefficiency in moving data to the consumer: When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access. ● Indexing issues: Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next developer who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason. [Matching Expression: select *] [examples/top_mutexes.sql]: (LOW RISK) (QUERY ANTI-PATTERN) Spaghetti Query Alert ● Split up a complex spaghetti query into several simpler queries: SQL is a very expressive language—you can accomplish a lot in a single query or statement. But that doesn't mean it's mandatory or even a good idea to approach every task with the assumption it has to be done in one line of code. One common unintended consequence of producing all your results in one query is a Cartesian product. This happens when two of the tables in the query have no condition restricting their relationship. Without such a restriction, the join of two tables pairs each row in the first table to every row in the other table. Each such pairing becomes a row of the result set, and you end up with many more rows than you expect. It's important to consider that these queries are simply hard to write, hard to modify, and hard to debug. You should expect to get regular requests for incremental enhancements to your database applications. Managers want more complex reports and more fields in a user interface. If you design intricate, monolithic SQL queries, it's more costly and time-consuming to make enhancements to them. Your time is worth something, both to you and to your project. Split up a complex spaghetti query into several simpler queries. When you split up a complex SQL query, the result may be many similar queries, perhaps varying slightly depending on data values. Writing these queries is a chore, so it's a good application of SQL code generation. Although SQL makes it seem possible to solve a complex problem in a single line of code, don't be tempted to build a house of cards. ==================== Summary =================== All Anti-Patterns :: 2 > High Risk :: 1 > Medium Risk :: 0 > Low Risk :: 1
(1) SQL Anti-patterns: Avoiding the Pitfalls of Database Programming, Bill Karwin
(2) Common SQL Anti-patterns, StackOverflow
Contributions to SQLCheck are always welcome. You can contribute in different ways:
- Open an issue with suggestions for improvements and errors you're facing;
- Fork this repository and submit a pull request;
- Improve the documentation.
Licensed under the Apache License.