Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: no way to replace submatches with a function #5690

Open
gopherbot opened this issue Jun 12, 2013 · 17 comments
Open

regexp: no way to replace submatches with a function #5690

gopherbot opened this issue Jun 12, 2013 · 17 comments
Assignees
Milestone

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 12, 2013

by denys.seguret:

ReplaceAllStringFunc is useful when you need to process the match to compute the
replacement, but sometimes you need to match a bigger string than the one you want to
replace. A similar function able to replace submatch(es) seems necessary.

Let's say you have strings like

    input := `bla b:foo="hop" blabla b:bar="hu?"`

and you want to replace the part between quotes in b:foo="hop" and
b:bar="hu?" using a function.

It's easy to build a regular expression to get the match and submatch, for example

    r := regexp.MustCompile(`\bb:\w+="([^"]+)"`)

but when you use ReplaceAllStringFunc, the callback is only provided the whole match,
not the submatch, and must return the whole string. Practically this means you need to
execute the regexp (or another one) in the callback, for example like this :

        input := `bla bla b:foo="hop" blablabla b:bar="hu?"`
        r := regexp.MustCompile(`(\bb:\w+=")([^"]+)`)
        fmt.Println(r.ReplaceAllStringFunc(input, func(m string) string {
                parts := r.FindStringSubmatch(m)
                return parts[1] + complexFunc(parts[2])
        }))

I think a function ReplaceAllStringSubmatchFunc would be useful and would avoid the
second pass. The callback would receive the submatch and return the replacement of the
submatch. The last example would be rewritten as

        input := `bla bla b:foo="hop" blablabla b:bar="hu?"`
        r := regexp.MustCompile(`\bb:\w+="([^"]+)"`)
        fmt.Println(r.ReplaceAllStringSubmatchFunc(input, complexFunc))
        
A similar function (ReplaceAllStringSubmatchSliceFunc ?) could be designed to give the
callback an array of strings that the callback would change. In fact it could be decided
that only this last function is really necessary.

Links :

 - "How-to" question on Stack-Overflow : http://stackoverflow.com/q/17065465/263525
 - Playground link : http://play.golang.org/p/I6Pg8OUeTj
@robpike

This comment has been minimized.

Copy link
Contributor

@robpike robpike commented Jun 12, 2013

Comment 1:

Labels changed: added priority-later, packagechange, removed priority-triage.

Owner changed to @rsc.

Status changed to Accepted.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Jul 30, 2013

Comment 3:

Labels changed: added go1.3maybe.

@robpike

This comment has been minimized.

Copy link
Contributor

@robpike robpike commented Aug 20, 2013

Comment 4:

Labels changed: removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 5:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 6:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 7:

Labels changed: added repo-main.

@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Jul 1, 2014

Comment 8:

CL https://golang.org/cl/106360043 mentions this issue.
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Sep 11, 2014

Comment 9 by denys.seguret:

Small comment : the whole thing could be cleaner that what I initially proposed by
accepting a callback with submatches passed as variadic instead of an explicit array.
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@victorhooi

This comment has been minimized.

Copy link

@victorhooi victorhooi commented Jul 22, 2015

I just hit this issue as well. Does "Unplanned" mean this is unlikely to get worked on?

I'm also including some information on my use-case, in case that helps.

I'm trying to transformed loglines containing key-value pairs, to redact any string values. So for example:

name: "Joe", last_name: "Bloggs", age: 5, nickname: "Jogs" }

might become:

name: "SOME_HASH", last_name: "SOME_HASH", age: 5, $comment: "do not redact me", nickname: "SOME_HASH" }

I only want to target quoted strings that are followed by either , (comma) or } (closing curly-braces), and I also want to ignore any $comment fields.

I know that Go's regexp doesn't have lookahead/lookbehinds, which means I can't check for the above. using those. That restricts me somewhat. However, I figured I'd just capture everything using a regex like this:

quoted_string_regex, _ := regex.Compile(`(\$comment: )?"([^"]*)"[,| }]`)

and then check the actual subgroups to see if $comment was there, and also grab out the comma or curly-brace, and put that back on at the end.

However, I'm using ReplaceAllStringFunc which only gives you the entire match - so it seem like I either need to do a second regex inside my callback function, or I need to do a bunch of contains/splits/ends-with etc.

(Obviously, if I've missed something obvious that is available in Go, please feel free to correct the above).

@josharian

This comment has been minimized.

Copy link
Contributor

@josharian josharian commented Jul 22, 2015

Does "Unplanned" mean this is unlikely to get worked on?

Unplanned just means that this won't potentially block a release. I know that @michaelmatloob has been looking at regexp stuff recently; perhaps he is interested.

@crenz

This comment has been minimized.

Copy link

@crenz crenz commented Oct 26, 2016

Just wanted to add that I hit the very same issue today. I was trying to implement a simple tag replacement, e.g.

Name: {name}
First name: {firstname}

becomes

Name: Doe
First name: Jon

I'm coming from a Perl background; my first intuition was using a regexp like /{([^}]+)}/. Note the submatch in parentheses: In Perl, it would be possible to use replace (and call a function on the submatch) or use split (and get the submatches returned). In Go, split never returns the part that matches, and ReplaceAllStringFunc will return the complete string instead of just the submatch.

@matloob

This comment has been minimized.

Copy link
Contributor

@matloob matloob commented Oct 26, 2016

I'm not planning on working on this. If you're interested in contributing this, feel free to do so, but note that the freeze will start in a few days.

@AlekSi

This comment has been minimized.

Copy link
Contributor

@AlekSi AlekSi commented Sep 27, 2017

Is this issue solved by Regexp.Expand and Regexp.ExpandString?

@opennota

This comment has been minimized.

Copy link

@opennota opennota commented Sep 27, 2017

@AlekSi
I guess not, at least not in a straightforward way. The number of variables in the expand template is limited, whereas the number of matches in a string isn't.

@srackham

This comment has been minimized.

Copy link

@srackham srackham commented Jan 8, 2018

I came across this post by Elliot Chance, it solved a JavaScript to Go porting problem I was having (for consistency it would be nice if it was incorporated as a new method in the Go regexp package):

http://elliot.land/post/go-replace-string-with-regular-expression-callback

Gist here: https://gist.github.com/elliotchance/d419395aa776d632d897

@golang golang deleted a comment from c9s Aug 9, 2018
@alisonatwork

This comment has been minimized.

Copy link

@alisonatwork alisonatwork commented Aug 4, 2019

Thanks for the link @srackham - I hit exactly the same problem with trying to port something from JavaScript to Go. It would definitely be nice to see this functionality inside the standard regexp package.

I also found another project which appears to implement similar functionality in perhaps a cleaner way because it replaces the default regexp: https://github.com/agext/regexp

This gives some idea of how the solution could look: https://github.com/agext/regexp/blob/master/agext.go#L105

@slimsag

This comment has been minimized.

Copy link

@slimsag slimsag commented Jan 23, 2020

Here is a snippet for anyone else looking for a way to replace submatches with a function using bytes (not strings) and without having to deal with intermediate (non-captured) data: https://gist.github.com/slimsag/14c66b88633bd52b7fa710349e4c6749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.