New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexp: implement look-behind assertion #24264
Conversation
To emulate look-behind assertion, it runs the main regexp automaton and some look-behind automata in parallel, and an automaton refers look-behind automaton state when the automaton encounts look-behind assertion. Suprising point is reading input string only once even if regexp cotains look-behind. It is unique feature. It does not support captures in look-behind because the meaning of captures in look-behind is unknown. (and implementing is so hard, hehe ;) I believe additional cost of matching the regexp without look-behind is little.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
I signed it! |
CLAs look good, thanks! |
This PR (HEAD: 50c69d5) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it. Tip: You can toggle comments from me using the |
Message from Gobot Gobot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
Message from Ian Lance Taylor: Patch Set 1: Thanks, but see Russ's comment here: https://groups.google.com/d/msg/golang-nuts/7qgSDWPIh_E/OHTAm4wRZL0J Does your algorithm work in guaranteed O(N) time? If not we won't accept it for the regexp package. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
Message from TSUYUSATO Kitsune: Patch Set 1:
Yes, of course. My comment "it reads input string only once" intends O(N) time of your words. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
Because it cannot get correct fork size for now, but Inst size is too large for this.
This PR (HEAD: bbffdde) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it. Tip: You can toggle comments from me using the |
Message from Ian Lance Taylor: Patch Set 2: (5 comments) This is missing a change to regexp/syntax/doc.go. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
This PR (HEAD: 67f5530) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it. Tip: You can toggle comments from me using the |
This PR (HEAD: 4e5cdbf) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it. Tip: You can toggle comments from me using the |
Message from Ian Lance Taylor: Patch Set 4: Looking at this again, my comments on exec.go line 49 and regexp.go line 216 have not yet been addressed. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
Message from TSUYUSATO Kitsune: Patch Set 4:
I am considering how to explain this look-behind implementation and I have no time to work for this. Sorry. Please wait a little. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
Message from Ian Lance Taylor: Patch Set 4:
No worries. I mostly wanted to be sure that you weren't waiting on us. Thanks for the reply. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
a8a60ac
to
87412a1
Compare
e4259d6
to
6dbaf03
Compare
9092511
to
95c3348
Compare
0090c13
to
8fbbf63
Compare
Message from Gerrit User 5056: Patch Set 5: Code-Review-2 Sorry, but no. This is a complex step to take, it's only partially implemented here, and we don't understand how to do it efficiently in general. Deciding to extend the syntax of the regexp package requires a lot more than a single CL. Please don’t reply on this GitHub thread. Visit golang.org/cl/98760. |
Following Russ' comment, it doesn't look like this is the right way. I will go ahead and close the PR. Please do feel free to discuss this in a new issue on how to better implement this. Thank you. |
Look-behind assertion is one of the regular expression extension.
https://www.regular-expressions.info/lookaround.html
This implements positive look-behind assertion (?<=expr) and negative
look-behind assertion (?<!expr).
It supports variable-length look-behind.
(Variable-length look-behind is also supported by V8 and .NET.)
To emulate look-behind assertion, it runs the main regexp
automaton and some look-behind automata in parallel, and
an automaton refers look-behind automaton state when
the automaton encounters look-behind assertion.
Noteworthy point is it reads input string only once even if
regexp contains look-behind. It is unique feature.
It does not support captures in look-behind
because the meaning of captures in look-behind is unknown.
(and implementing is so hard, hehe ;)
I believe additional cost of matching the regexp without look-behind
is little.