Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: support lookaheads and lookbehinds #18868

Closed
JulienChampseix opened this issue Jan 31, 2017 · 4 comments
Closed

regexp: support lookaheads and lookbehinds #18868

JulienChampseix opened this issue Jan 31, 2017 · 4 comments

Comments

@JulienChampseix
Copy link

@JulienChampseix JulienChampseix commented Jan 31, 2017

What version of Go are you using (go version)?

Go version 1.7.4

What operating system and processor architecture are you using (go env)?

Linux (Ubuntu and Debian)

What did you do?

We are using telegraf software and the logparser of this tools looks to be impacted by the non support of regex lookaheads or lookbehinds on this version of GO.

You can find more details here :

influxdata/telegraf#2178
influxdata/telegraf#2302
influxdata/telegraf#2343

Do you know if there is already a workaround this request ?

@mvdan
Copy link
Member

@mvdan mvdan commented Jan 31, 2017

There have been discussions on this before, see https://groups.google.com/forum/#!topic/golang-nuts/7qgSDWPIh_E.

The consensus from @rsc and @ianlancetaylor there is that it hasn't been done in an efficient way yet. I don't know if there have been any other developments.

@mvdan mvdan changed the title Support - Negative lookaheads regexp: support lookaheads and lookbehinds Jan 31, 2017
@bradfitz bradfitz added this to the Unplanned milestone Jan 31, 2017
@smasher164
Copy link
Member

@smasher164 smasher164 commented Nov 15, 2017

One possible idea which I'm not sure has been suggested yet is to have multiple read-heads over the input. It is well known that a finite automata with two read-heads has more language-recognition power than a normal FA (for instance -- matching the language 0^n 1^n). In this case, if we wanted to stream the input, we assume the read-heads can only move left-to-right. Also, by increasing the number of read-heads linearly with the number of lookaround assertions, we can handle nested lookarounds.

In order to process a lookbehind assertion, we start stepping an additional read-head from the beginning of the input, and if the assertion is accepted by the time the two heads meet, we continue stepping the original read-head.

In order to process a lookahead assertion, we move the starting position of an additional read-head to match the position of the original. Then if the assertion is accepted, we switch back to the original head.

@jonstewart
Copy link

@jonstewart jonstewart commented Nov 15, 2017

I believe that lookaround assertions can be supported "efficiently" (i.e., with a one-time scan of the input) with an NFA. My colleague, Joel Uckelman, has done a substantial amount of work support lookaround assertions in lightgrep on an unfinished branch. I'm hoping we can pick it up again in the new year. The beauty of NFAs is that if you need to track more things, you just add more states. They aren't fast, though.

Regexp engines vary quite a bit in their support for types of lookaround assertions. Providing less than fully regular support may still be reasonable and useful.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 2, 2018

Experience with RE2 has shown that it's OK - not ideal but OK - to live without these. I think it makes more sense for Go to continue to match RE2 than to try to strike out on its own here. And on top of that I still don't know how to implement them efficiently, despite thinking about this for over a decade.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.