Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: implement look-behind assertion #24264

Closed
wants to merge 7 commits into from

Conversation

makenowjust
Copy link

Look-behind assertion is one of the regular expression extension.
https://www.regular-expressions.info/lookaround.html

This implements positive look-behind assertion (?<=expr) and negative
look-behind assertion (?<!expr).
It supports variable-length look-behind.
(Variable-length look-behind is also supported by V8 and .NET.)

To emulate look-behind assertion, it runs the main regexp
automaton and some look-behind automata in parallel, and
an automaton refers look-behind automaton state when
the automaton encounters look-behind assertion.
Noteworthy point is it reads input string only once even if
regexp contains look-behind. It is unique feature.

It does not support captures in look-behind
because the meaning of captures in look-behind is unknown.
(and implementing is so hard, hehe ;)

I believe additional cost of matching the regexp without look-behind
is little.

To emulate look-behind assertion, it runs the main regexp
automaton and some look-behind automata in parallel, and
an automaton refers look-behind automaton state when
the automaton encounts look-behind assertion.
Suprising point is reading input string only once even if
regexp cotains look-behind. It is unique feature.

It does not support captures in look-behind
because the meaning of captures in look-behind is unknown.
(and implementing is so hard, hehe ;)

I believe additional cost of matching the regexp without look-behind
is little.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers
  • Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
  • The email used to register you as an authorized contributor must also be attached to your GitHub account.

@googlebot googlebot added the cla: no Used by googlebot to label PRs as having an invalid CLA. The text of this label should not change. label Mar 6, 2018
@makenowjust
Copy link
Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. and removed cla: no Used by googlebot to label PRs as having an invalid CLA. The text of this label should not change. labels Mar 6, 2018
@gopherbot
Copy link

This PR (HEAD: 50c69d5) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link

Message from Gobot Gobot:

Patch Set 1:

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
Within the next week or so, a maintainer will review your change and provide
feedback. See https://golang.org/doc/contribute.html#review for more info and
tips to get your patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11, it means that this CL will be reviewed as part of the next development
cycle. See https://golang.org/s/release for more details.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Ian Lance Taylor:

Patch Set 1:

Thanks, but see Russ's comment here: https://groups.google.com/d/msg/golang-nuts/7qgSDWPIh_E/OHTAm4wRZL0J

Does your algorithm work in guaranteed O(N) time? If not we won't accept it for the regexp package.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from TSUYUSATO Kitsune:

Patch Set 1:

Patch Set 1:

Thanks, but see Russ's comment here: https://groups.google.com/d/msg/golang-nuts/7qgSDWPIh_E/OHTAm4wRZL0J

Does your algorithm work in guaranteed O(N) time? If not we won't accept it for the regexp package.

Yes, of course. My comment "it reads input string only once" intends O(N) time of your words.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

Because it cannot get correct fork size for now, but Inst size is too
large for this.
@gopherbot
Copy link

This PR (HEAD: bbffdde) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link

Message from Ian Lance Taylor:

Patch Set 2:

(5 comments)

This is missing a change to regexp/syntax/doc.go.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

This PR (HEAD: 67f5530) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link

This PR (HEAD: 4e5cdbf) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/#/c/go/+/98760 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link

Message from Ian Lance Taylor:

Patch Set 4:

Looking at this again, my comments on exec.go line 49 and regexp.go line 216 have not yet been addressed.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from TSUYUSATO Kitsune:

Patch Set 4:

Patch Set 4:

Looking at this again, my comments on exec.go line 49 and regexp.go line 216 have not yet been addressed.

I am considering how to explain this look-behind implementation and I have no time to work for this.

Sorry. Please wait a little.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Ian Lance Taylor:

Patch Set 4:

Looking at this again, my comments on exec.go line 49 and regexp.go line 216 have not yet been addressed.

I am considering how to explain this look-behind implementation and I have no time to work for this.

Sorry. Please wait a little.

No worries. I mostly wanted to be sure that you weren't waiting on us. Thanks for the reply.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@gopherbot gopherbot force-pushed the master branch 8 times, most recently from 9092511 to 95c3348 Compare July 19, 2018 18:17
@gopherbot gopherbot force-pushed the master branch 2 times, most recently from 0090c13 to 8fbbf63 Compare July 28, 2018 01:16
@gopherbot
Copy link

Message from Gerrit User 5056:

Patch Set 5: Code-Review-2

Sorry, but no. This is a complex step to take, it's only partially implemented here, and we don't understand how to do it efficiently in general. Deciding to extend the syntax of the regexp package requires a lot more than a single CL.


Please don’t reply on this GitHub thread. Visit golang.org/cl/98760.
After addressing review feedback, remember to publish your drafts!

@agnivade
Copy link
Contributor

Following Russ' comment, it doesn't look like this is the right way. I will go ahead and close the PR. Please do feel free to discuss this in a new issue on how to better implement this. Thank you.

@agnivade agnivade closed this Nov 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants