-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error code for unsupported PCRE cases (RE_EUNSUPPPCRE), reject one. #447
Add error code for unsupported PCRE cases (RE_EUNSUPPPCRE), reject one. #447
Conversation
There's extra error codes in #440 for regexes that aren't UNSATISFIABLE per se, but depend on particular corner cases in PCRE that probably aren't worth supporting in an automata-based implementation. Add a test case for one, tests/pcre/in48.re: ^a|$[^x]b* This is a tricky one to handle properly; according to PCRE it should match either "a<anything...>" OR "\n", but nothing else. The newline match is because $ is a non-input-consuming check that evaluation is either at the end of input, or at a newline immediately before the end. In this case `$[^x]b*` matches exactly one newline; it's equivalent to "$\n". This probably isn't worth supporting, but we can detect cases where a potential newline match appears after a $ and reject them as an unsupported PCRE behavior.
…is' into sv/integrate-combinable-DFA-capture-resolution--fuzzing-branch Integrate #447's changes for continuing local fuzzing. It hasn't been reviewed yet.
src/libre/ast.h
Outdated
@@ -98,6 +98,7 @@ enum ast_flags { | |||
AST_FLAG_ANCHORED_START = 1 << 6, | |||
AST_FLAG_ANCHORED_END = 1 << 7, | |||
AST_FLAG_END_NL = 1 << 8, | |||
AST_FLAG_MATCHES_1NEWLINE= 1 << 9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be even more niche start/end anchor flags coming soon. Expressing PCRE's anchoring rules statically gets complicated.
I merged in a recent main and spent a bit more time trying to actually hit the possibly unreachable branch above. |
Ranges using named endpoints are currently rejected with "Unsupported operator".
There's extra error codes in #440 for regexes that aren't UNSATISFIABLE per se, but depend on particular corner cases in PCRE that probably aren't worth supporting in an automata-based implementation.
Add a test case for one, tests/pcre/in48.re:
^a|$[^x]b*
This is a tricky one to handle properly; according to PCRE it should match either "a<anything...>" OR "\n", but nothing else. The newline match is because $ is a non-input-consuming check that evaluation is either at the end of input, or at a newline immediately before the end. In this case
$[^x]b*
matches exactly one newline; it's equivalent to "$\n". This probably isn't worth supporting, but we can detect cases where a potential newline match appears after a $ and reject them as an unsupported PCRE behavior.