Parsing and printing expressions with flags produces pretty bad output. Example: (?i:ab*c|d?e) -> (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E). While the output is semantically correct, it causes the resulting expression to grow in size by approximately factor 3.
I am a core developer of https://github.com/coreruleset and we use Go to generate regular expressions that can become very large (hundreds of KB). A growth of an expression by factor 3 is not acceptable for us. The issue for us is mostly about the fold case flag (i) but other flags are affected as well, e.g., ab*c.|d?e. -> ab*c(?-s:.)|d?e(?-s:.) (better: (?:ab*c|d?e)(?-s:.), or (?-s)(?:ab*c|d?e).).
I propose to improve printing of regular expressions, so that flags are only repeated if necessary.
The text was updated successfully, but these errors were encountered:
We generate a single regular expression out of multiple smaller expressions. Here's an example:
\s+__\$
command1
command2
Each line is treated as an alternation. The result would be \s+__\$|command1|command2. We do more complicated things too, such as substitutions and concatenation. In the end, we always end up with a list of alternations as above. We then feed these to a library (https://github.com/itchyny/rassemble-go) for final assembly and optimisation.
This process is also recursive in some cases, so the output from rassemble-go may become one of the alternations.
rassemble-go uses regex/syntax to parse and process the expression:
Parsing and printing expressions with flags produces pretty bad output. Example:
(?i:ab*c|d?e)
->(?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E)
. While the output is semantically correct, it causes the resulting expression to grow in size by approximately factor 3.I am a core developer of https://github.com/coreruleset and we use Go to generate regular expressions that can become very large (hundreds of KB). A growth of an expression by factor 3 is not acceptable for us. The issue for us is mostly about the fold case flag (
i
) but other flags are affected as well, e.g.,ab*c.|d?e.
->ab*c(?-s:.)|d?e(?-s:.)
(better:(?:ab*c|d?e)(?-s:.)
, or(?-s)(?:ab*c|d?e).
).I propose to improve printing of regular expressions, so that flags are only repeated if necessary.
The text was updated successfully, but these errors were encountered: