-
-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for escape character strings to Postgres #1409
Added support for escape character strings to Postgres #1409
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1409 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 127 127
Lines 8553 8581 +28
=========================================
+ Hits 8553 8581 +28
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and minor nits. But in general, assuming that horrendous regex is accurate the change LGTM.
RegexLexer( | ||
"unicode_single_quote", | ||
r"(?s)U&(('')+?(?!')|('.*?(?<!')(?:'')*'(?!')))(\s*UESCAPE\s*'[^0-9A-Fa-f'+\-\s)]')?", | ||
CodeSegment, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not even going to pretend to understand this regex! But great that you added all the explanation above!
Any concerns on performance impact of such a complex regex? Or getting into some infinite loop?
On plus side at least only will impact Postgres and have test cases for this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran all these regexes through a tool called dlint
, which checks for the possibility of catastrophic backtracking, where a regex runs basically forever.
Reference:
- https://github.com/duo-labs/dlint/blob/master/docs/linters/DUO138.md
- https://github.com/duo-labs/dlint
The example script below includes an example regex known to have this issue, along with the 4 regexes from the PR.
import re
subject = 'x' * 64
re.search(r'(x+x+)+y', subject) # Boom
# "unicode_single_quote",
re.search(r"(?s)U&(('')+?(?!')|('.*?(?<!')(?:'')*'(?!')))(\s*UESCAPE\s*'[^0-9A-Fa-f'+\-\s)]')?", subject)
# "escaped_single_quote",
re.search(r"(?s)E(('')+?(?!')|'.*?((?<!\\)(?:\\\\)*(?<!')(?:'')*|(?<!\\)(?:\\\\)*\\(?<!')(?:'')*')'(?!'))", subject)
# "unicode_double_quote",
re.search(r'(?s)U&".+?"(\s*UESCAPE\s*\'[^0-9A-Fa-f\'+\-\s)]\')?', subject)
# "single_quote"
re.search(r"(?s)('')+?(?!')|('.*?(?<!')(?:'')*'(?!'))", subject)
Here's the command I ran plus the output. Note that only the test regex is flagged:
(sqlfluff-3.7.9) ➜ sqlfluff git:(bhart-issue_845_l016_compute_line_length_before_template_expansion) ✗ python -m flake8 --select=DUO t.py
t.py:4:1: DUO138 catastrophic "re" usage - denial-of-service possible
Brief summary of the change made
Added support for the different escape characters and string types in Postgres.
Fixes #1069
Are there any other side effects of this change that we should be aware of?
Creating the correct regex may have cost me my sanity.
Pull Request checklist
Test cases added
Added a comment block to explain the regex