Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a DiagnosticsListener SPI #5960

Closed
lukaseder opened this Issue Mar 10, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@lukaseder
Copy link
Member

lukaseder commented Mar 10, 2017

This issue is for collecting ideas for a future SQL lint module, which collects some runtime statistics and emits hints about potential improvements in the generated SQL.

This module would be optional and could be turned on if required, but given the probably relatively high amount of data that needs to be collected, it shouldn't be turned on by default.

The module can work with jOOQ and/or JDBC directly, meaning that it can be used even in an application that otherwise doesn't use jOOQ.

API misuse

These issues deal with problematic use of the jOOQ and/or JDBC API. In the case of JDBC, we'll proxy the API and collect statistics behind it.

  • Find out which projected columns from SELECT statements aren't really needed. (JDBC)
    Users tend to select too many columns, which they don't really need in client code. Especially when using the convenient selectFrom() method (which corresponds to SQL's SELECT *), chances are, that a significant amount of column data wasn't needed. jOOQ could collect statistics about which columns of a record / result are really consumed from a given query. In the JDBC version, this will proxy ResultSet and compare ResultSetMetaData columns with actual getXYZ() calls.

  • Find out which projected columns from SELECT statements aren't really needed. (jOOQ): Moved to #7527

  • More rows are fetched than needed. (JDBC)
    If a Result contains hundreds of rows but only the first 5 are consumed, then this is definitely an issue. Users should use the limit() clause. In the JDBC version, this will proxy ResultSet and check if there would have been a next() row when the ResultSet is close()-d.

  • More rows are fetched than needed. (jOOQ): Moved to #7527

  • Consecutive execution of identical SQL on same Connection but on new PreparedStatement. (JDBC)
    In some databases, it is worth keeping a PreparedStatement open when executing identical SQL consecutively. Typical use-cases for this are:

    • N+1 loops (see below)
    • Batch insertion loops (see below)

    Sometimes, it is hard to fix the the above issues thoroughly, so simply reusing the PreparedStatement could be a quick fix.

  • Missed single-SQL multi-bind-value batch opportunity (JDBC)~: Moved to #7527

  • Excess ResultSet.wasNull() calls
    The JDBC wasNull() method is a leftover from times when primitive type usage heavily outperformed wrapper type usage. However, a SQL INT type could have been NULL, which cannot be represented by the JDBC ResultSet.getInt() call. So, wasNull() needs to be called. But only on primitive types, not on e.g. getString(), getDate(), etc. Depending on the JDBC driver implementation, this call can be somewhat expensive.

  • Missing ResultSet.wasNull() calls
    The inverse can be done as well, if a primitive type was fetched that is not known to be non-nullable, then wasNull() should usually be called.

  • Transaction optimisation~: Moved to #7527

SQL language misuse

These are relatively easy to do with jOOQ, where the expression tree is available for introspection. When using JDBC, we'll simply parse the SQL string into a jOOQ expression tree, and run the same logic again.

  • Concatenation in predicates. (jOOQ / JDBC)~: Moved to #7527

  • Duplicate SQL detection: Bind variables and whitespace (JDBC)
    Some SQL statements can be considered "duplicates", such as for instance:

    • SELECT name FROM person WHERE id = 1
    • SELECT name FROM person WHERE id = 2

    In databases that ship with an execution plan cache, such "duplicate" queries quickly put a lot of pressure on the cache and on its shared resources, in addition to causing excessive hard parsing. jOOQ could parse such SQL and replace all inline values by bind values (?) and then compare resulting SQL strings, collecting statistics on how many different versions of the same SQL string there exist

  • Duplicate SQL detection: Variable size IN lists (JDBC)
    Variable sized, dynamic IN lists can cause pressure on the execution plan cache despite bind variables being used, as the following SQL strings are not identical, but "duplicate":

    • SELECT name FROM person WHERE id IN (?, ?)
    • SELECT name FROM person WHERE id IN (?, ?, ?)
  • Duplicate SQL detection, advanced (JDBC)~: Moved to #7527

  • N+1 detection~: Moved to #7527

  • Bulk write suggestions~: Moved to #7527

  • NULL in predicates~: Moved to #7527

  • Boolean transformations~: Moved to #7527

Security

  • SQLi detection~: Moved to #7527

Execution metrics

  • Bind variable profiling~: Moved to #7527
  • SQL transformation suggestions (cost based)~: Moved to #7527
  • SQL transformation suggestions (constraint based)~: Moved to #7527

Metrics

Moved to #7527

Deployment

Initially, the data would be collected in the same process as the system running jOOQ / JDBC, but in a future version of this lint module, we'll be able to collect all of this data also through the JDK's instrumentation API and send it off to some external tool (hopefully with a nice UI), which allows for:

  • Easier analysis
  • Less production overhead (with the initial version, this will work only as an integration test utility)
  • Opt-in execution without changing application configuration (just like any profiler)

Communication could be done through JMX or other means. We'll look into this later.

@lukaseder lukaseder added this to the Version 3.10.0 milestone Mar 10, 2017

@lukaseder lukaseder changed the title Add a SQL lint module Add the jOOQ Diagnostics Pack Jan 23, 2018

@lukaseder lukaseder changed the title Add the jOOQ Diagnostics Pack Add DiagnosticsListener Jan 23, 2018

@lukaseder lukaseder changed the title Add DiagnosticsListener Add a DiagnosticsListener SPI Jan 23, 2018

lukaseder added a commit that referenced this issue Jan 23, 2018

lukaseder added a commit that referenced this issue Jan 23, 2018

lukaseder added a commit that referenced this issue Jan 23, 2018

[#5960] [#7094] Duplicate SQL detection
- [#5960] Duplicate SQL detection
- [#7094] Add ParamType.FORCE_INDEXED

This was referenced Jan 24, 2018

lukaseder added a commit that referenced this issue Jan 24, 2018

[#5960] [#7095] Add Settings.inListPadBase and use it in diagnostic
- [#5960] Add diagnostic to detect duplicate IN lists of different degree
- [#7095] Add Settings.inListPadBase

lukaseder added a commit that referenced this issue Jan 25, 2018

lukaseder added a commit that referenced this issue Jan 25, 2018

@lukaseder lukaseder referenced this issue May 31, 2018

Open

Improve DiagnosticsListener SPI #7527

0 of 25 tasks complete

@lukaseder lukaseder added the R: Fixed label May 31, 2018

@lukaseder lukaseder closed this May 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.