New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup (?:) from beginning/end of groups #164
Cleanup (?:) from beginning/end of groups #164
Conversation
It looks like this replaces a few patterns:
In addition to I agree with these improvements in principle but have two concerns in practice: impact on perf, and correctness. For perf, what is the effect seen in As for correctness, it's possible to break this in cases like |
An alternative strategy for keeping generated regexes clean might be to give token handler functions access to the preceding generated regex token so they can more smartly return |
This doesn't change its behavior, but makes it more readable and easier to modify.
This will allow us to use it for matching other patterns than just quantifiers.
This test currently fails. Here's the actual and expected patterns, with whitespace inserted to illustrate the difference: '((?:)[0-9]{4}(?:))(?:)-?(?:)((?:)[0-9]{2}(?:))(?:)-?(?:)((?:)[0-9]{2}(?:))(?:)' '( [0-9]{4} )(?:)-?(?:)( [0-9]{2} )(?:)-?(?:)( [0-9]{2} )(?:)'
This passes the tests in the previous commit, using the new isPatternNext function to determine if the match is at the end of a group. Checking if the match is at the beginning of a group is a little more naive, since it only looks at the previous character, rather than ignoring comments and whitespace, but I haven't found a good way to improve on that.
I realized the token handlers are equivalent, so I made them a named function instead.
d01e253
to
68abf55
Compare
Thanks for the thorough review! I see that my original solution wasn't correct, with the Accordingly, I've force-pushed a different set of commits into this branch that uses the alternative strategy, and additionally tests the EDIT: Oh, as far as perf goes: I ran the test page several times with both this (updated) version of the code, as well as version 3.1.1 using the |
Use `new` with RegExp constructor, as is done everywhere else.
Thanks! I still need to look over the new set of diffs closely, but I love the direction of no longer inserting Aside: I should check if these lines in build.js are still needed after the changes here. |
var regex = XRegExp('( [0-9]{4} ) -? # year \n' + | ||
'( [0-9]{2} ) -? # month \n' + | ||
'( [0-9]{2} ) # day ', 'x'); | ||
expect(regex.source).toEqual('([0-9]{4})(?:)-?(?:)([0-9]{2})(?:)-?(?:)([0-9]{2})(?:)'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enforcing the inclusion of multiple (?:)
empty groups that aren't needed for this regex to operate correctly. The test should be re-framed to not enforce anything that is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yeah it is a bit brittle. We could change it to reject certain substrings, but then we might end up duplicating some of the logic in the non-test code. What if we changed it to something a little simpler, but still future-proof, like this?
expect(regex.source.length <= 54).toBe(true); // 54 is the length of the current result
Testing string length wouldn't verify that it's working. I've gone ahead and updated this in 622aaf3 to use a reduced test case. As a result of these changes, the "Constructor with x flag, whitespace, and comments" perf test is now meaningfully slower than in v3.1.1. It would be easy to create examples that are even more affected, since each regex token that triggers the new code will be slower. I'll try to look at speeding this back up later, probably after v3.2.0. A couple ideas: avoid the string concatenation in |
This makes the "Constructor with x flag, whitespace, and comments" test fast again. From slevithan#164 (comment): > A couple ideas: avoid the string concatenation in `isPatternNext` > (possibly going back to regex literals and making the function specific > to quantifiers again even though the current code is more > readable/maintainable, since this isn't needed to handle simple cases > with whitespace followed by `)`) Since babel-plugin-transform-xregexp automatically compiles the `new RegExp()` calls into literals, we get (most of) the performance back without sacrificing the readability of having separate subpatterns.
This makes the "Constructor with x flag, whitespace, and comments" test fast again. From #164 (comment): > A couple ideas: avoid the string concatenation in `isPatternNext` > (possibly going back to regex literals and making the function specific > to quantifiers again even though the current code is more > readable/maintainable, since this isn't needed to handle simple cases > with whitespace followed by `)`) Since babel-plugin-transform-xregexp automatically compiles the `new RegExp()` calls into literals, we get (most of) the performance back without sacrificing the readability of having separate subpatterns.
Following up on slevithan#164, this change prevents a `(?:)` from being inserted in the following places: * At the beginning of a non-capturing group (the end is already handled) * Before or after a `|` * At the beginning or the end of the pattern This solution isn't as complete as the one suggested in slevithan#179, but it's a decent stopgap.
This makes the "Constructor with x flag, whitespace, and comments" test fast again. From slevithan/xregexp#164 (comment): > A couple ideas: avoid the string concatenation in `isPatternNext` > (possibly going back to regex literals and making the function specific > to quantifiers again even though the current code is more > readable/maintainable, since this isn't needed to handle simple cases > with whitespace followed by `)`) Since babel-plugin-transform-xregexp automatically compiles the `new RegExp()` calls into literals, we get (most of) the performance back without sacrificing the readability of having separate subpatterns.
This makes the "Constructor with x flag, whitespace, and comments" test fast again. From slevithan/xregexp#164 (comment): > A couple ideas: avoid the string concatenation in `isPatternNext` > (possibly going back to regex literals and making the function specific > to quantifiers again even though the current code is more > readable/maintainable, since this isn't needed to handle simple cases > with whitespace followed by `)`) Since babel-plugin-transform-xregexp automatically compiles the `new RegExp()` calls into literals, we get (most of) the performance back without sacrificing the readability of having separate subpatterns.
This simplifies the compiled expressions by ensuring that groups don't
start or end with
(?:)
. For instance, this code:now compiles to this pattern:
instead of
Here are the two patterns side by side, with whitespace inserted into the new one to illustrate the differences: