Tokenize SQL before parsing and preserve tokens for recompilation #3323
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SQL is now tokenized before parsing. Tokens are preserved for second parse if SQL contains an error. Tokens are also stored in prepared commands for re-use if their recompilation is needed. Forward scans now can simply iterate over already known tokens.
Handling of supplementary Unicode characters in identifiers and whitespace is improved.
In long tests with looped
TestScript
performance on Java 8 is improved by about 1.2–1.5% on warm JVM. Because this test doesn't use recompilation actively, most of its commands don't produce errors, and it contains only few commands where complex forward scans are required I hope we shouldn't have performance degradation caused by these complications with tokens.I think both
Parser
andTokenizer
can be improved in the future, but changes in them should be carefully tested, because actual performance impact may be very surprising.