Skip to content

ANTLR4-based parser of textual representation of query trees, built and used by the MS SQL query optimizer during its execution

License

Notifications You must be signed in to change notification settings

isojk/mssql-query-tree-parser

Repository files navigation

This is an unofficial, incomplete ANTLR4-based parser of textual representation of query trees, built and used by the MS SQL query optimizer during its execution.

For any T-SQL statement, a batch of query trees for specific optimization steps can be acquired in a textual form as a part of diagnostic information from the standard output by enabling following trace flags (source):

  • 8605 - Shows the (converted) input tree, laying out the logical operations implied by the query
  • 8606 - Shows query trees for intermediate steps in the processing of the (converted) input tree, (input tree, simplified tree, join-collapsed tree, trees before and after "project normalization")
  • 8607 - Shows output tree composed from physical operators (not supported at this time)
  • 8612 - Shows extra information for certain logical operators, like cardinality estimations

Note that for having this diagnostic information being redirected to the standard output from the error log, trace flag 3604 must also be enabled.

Project Status

This is a toy project I am working on in my spare time, a byproduct of an effort to learn how the query optimizer works. It is not intended be used in practical scenarios as it is a non-goal at this time.

As of now, the grammar is vastly incomplete and not well-defined, and, due to the lack of any official documentation or specification, everything is based on assumptions inferred directly from the available diagnostic tools and public knowledge.

Limitations

T-SQL permits database object identifiers (called Delimited identifiers) to contain basically any character, other than alphanumeric ([a-zA-Z0-9]) and underscore ([_]), given that the identifier is escaped in brackets (see documentation). For example:

CREATE FUNCTION dbo.[   &'""<>!@#$%^&*() ... dbo.[  abcd  ]]    IsDet IsNonDet IsNonDet   IsDet IsDet     ]

Query trees containing such objects are currently not supported by the parser, as they are unfortunately being printed unescaped to the diagnostic output:

LogOp_Project COL: Expr1000  [ Card=0 ]
    LogOp_ConstTableGet (1) [empty] [ Card=0 ]
    AncOp_PrjList 
        AncOp_PrjEl COL: Expr1000 
            ScaOp_Udf dbo.   &'""<>!@#$%^&*() ... dbo.[  abcd  ]    IsDet IsNonDet IsNonDet   IsDet IsDet      IsNonDet 

This would make the grammar definition significantly more complex than it is and I do not consider to support it at this time.

License

This project is licensed under the MIT license. See LICENSE for details.

About

ANTLR4-based parser of textual representation of query trees, built and used by the MS SQL query optimizer during its execution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published