-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefer-predefined-assertion does not recognize (\.(?!$)|$))
#659
Comments
Thank you for posting this issue! However, I don't think const s = 'partbefore.'
console.log(/partbefore(\.(?!$)|$)/.test(s), /(partbefore\.?\b)/.test(s)) // false true I think there probably needs to be a |
Yeah, looks like this is equivalent: -> Thank you for taking care already regarding the issue! |
Those 2 equivalent, but probably not in the way you think they are. Here's how we can simplify /^partbefore(\.(?!$)|$)$/;
= /^partbefore(\.(?!$)$|$$)/; // move the last $ inside the group
= /^partbefore(\.(?!$)$|$)/; // $$ == $
= /^partbefore($)/; // (?!$)$ will always reject, so we can remove the whole branch
= /^partbefore$/; // remove the group
/^(partbefore\.?\b)$/;
= /^partbefore\.?\b$/; // remove the group
= /^partbefore(\.|)\b$/; // replace \.? with an equivalent group
= /^partbefore(\.\b|\b)$/; // move \b inside the group
= /^partbefore(\.(?=\w)|\b)$/; // Since \. is a non-word character, `\b` is will enforce a word character after it
= /^partbefore(\.(?=\w)|(?!\w))$/; // Since "e" is a word character, `\b` is will enforce a non-word character after it
= /^partbefore(\.(?=\w)$|(?!\w)$)/; // Move $ inside the group
= /^partbefore(\.(?=\w)$|$)/; // (?!\w)$ == $
= /^partbefore($)/; // (?=\w)$ will always reject
= /^partbefore$/; // remove the group So both The only place where Don't get me wrong. Adding this to the rule would be relatively easy, but I don't know whether this makes the regex more understandable/maintainable. Since
Also, if you want the best performance, then consider inlining the quantifier to get rid of the assertions completely: |
@RunDevelopment I get the same result now too: Pretty curious, thank you for the hint - and I highly appreciate the in-depth explanation for the regex deduction!!
So just checking if I could follow your explanation - the only reason why |
That all being said, thank you @ota-meshi @RunDevelopment for fixing the two things quickly. From my side, we can close this thread too. Maintain-/readability and higher performance are good reasons for me to avoid optimizing the regex so to a smaller one. One last thing: I wonder if it'd make sense to create a rule that aims for highest runtime perf of the regex, so that e.g.
in that case it makes the regex more readable (I guess that is subjective?) and faster too. |
This will be difficult. I can say this with some certainty because I've only recently added this transformation (and similar ones) to refa (my regex, NFA, and DFA library). However, refa uses its own simplified AST format, which makes implementing these transformations a lot easier. If you're interested, here's the code. One of the main difficulties for this will be regexpp's AST (regexpp = the regex parser we use). Its AST is essentially readonly for the purposes of this project. So we would most likely have to invent our AST format that allows for easy mutations. Mutations are necessary, because it is practically impossible to do these optimizations in a single step. E.g. here are the steps required to optimize the IP regex: // I will use "byte" instead of the long "(?:25[0-5]|(?:2[0-4]|1\d|[1-9]|)\d)"
/^(byte(\.(?!$)|$)){4}$/;
// rewrite a quantfier "(AB){n}" as "A(BA){n-1}B"
/^byte((\.(?!$)|$)byte){3}(\.(?!$)|$)$/;
// remove the "|$" that will always reject
/^byte(\.(?!$)byte){3}(\.(?!$)|$)$/;
// remove the "\.(?!$)|" that will always reject
/^byte(\.(?!$)byte){3}$$/;
// remove the double "$"
/^byte(\.(?!$)byte){3}$/;
// remove the "(?!$)" that will always accept
/^byte(\.byte){3}$/; It's possible to do all of this programmatically, but it will require a lot of work since we would have to write an optimizer from scratch.
In this case. Unfortunately, there are a few scenarios where the regex blows up when we try to remove assertions. This mostly happens for regexes of the form
Yes. See above for more details. Generally, v8's regex engine seems to be pretty slow when it some to assertions (likely because assertions prevent certain optimizations), so rewriting regexes to avoid them often helps performance. Same for backreferences.
👍 |
Information:
I used the playground to test (so I assume all latest?)
Description
Coming from this SO post: stackoverflow.com/a/36760050
The regex
^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)(\.(?!$)|$)){4}$
got optimized (by hand) to:
^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$
so,
partbefore(\.(?!$)|$))
can be expressed as(partbefore\.?\b)
, avoiding the negative lookahead, making the regex both smaller and faster: esbench.com/bench/65326a147ff73700a4debb7d.I assume
prefer-predefined-assertion
is meant to capture this, but it does not:https://ota-meshi.github.io/eslint-plugin-regexp/playground/#eJwVydEKgjAUxvFXOR28cNBail60CB+hiy6dgbRjDHSOaVI5370FH/x/8K34GDWhRCHg6mYzmC9BazXczOB6032UVXZpPXiCC4C4p5WMy8v6yMsm/BlVNCFTSoc646eGVZEsPkod0mqXsJAwthZbIpYz7tG/eppQrujpSW8nnKeOPI/R1BlLmrfTRH42o0WZb9sP02szMA==
(by the way, if you turn on all rules, it errors for that regex: )
The text was updated successfully, but these errors were encountered: