Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: backreference to capturing group breaks if followed by underscore #39594

Open
yory8 opened this issue Jun 15, 2020 · 3 comments
Open

regexp: backreference to capturing group breaks if followed by underscore #39594

yory8 opened this issue Jun 15, 2020 · 3 comments

Comments

@yory8
Copy link

@yory8 yory8 commented Jun 15, 2020

What version of Go are you using (go version)?

$ go version
1.14.4

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/a/.cache/go-build"
GOENV="/home/a/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/a/.local/share/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build462357482=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Minimal case: Play

What did you expect to see?

I expect the pattern "$2_$1" to work without needing to escape into "${2}_$1", as in python etc.

@yory8 yory8 changed the title Backreference to capturing group breaks if followed by underscore Regexp: backreference to capturing group breaks if followed by underscore Jun 15, 2020
@andybons andybons changed the title Regexp: backreference to capturing group breaks if followed by underscore regexp: backreference to capturing group breaks if followed by underscore Jun 15, 2020
@andybons
Copy link
Member

@andybons andybons commented Jun 15, 2020

@andybons andybons added this to the Unplanned milestone Jun 15, 2020
@antong
Copy link
Contributor

@antong antong commented Jun 15, 2020

This may be counter-intuitive, but if I interpret the documentation correctly, I think this is the way it is supposed to work:

In the template, a variable is denoted by a substring of the form $name or ${name}, where name is a non-empty sequence of letters, digits, and underscores.
...
In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.

So, the template in the example "$2_$1" is the same as "${2_}${1}", not "${2}_${1}".

@mattn
Copy link
Member

@mattn mattn commented Jun 16, 2020

JavaScript

console.log('foo,bar'.replace(/(\w+),(\w+)/, '$2_$1'));

Result is bar_foo

Perl

my $a = 'foo,bar';
$a =~ s/(\w+),(\w+)/\2_\1/;
warn $a;

Result is bar_foo

Ruby

puts 'foo,bar'.sub(/(\w+),(\w+)/, '\2_\1')

Result is bar_foo

So, I propose to fix the behavior of Go.

diff --git a/src/regexp/all_test.go b/src/regexp/all_test.go
index be7a2e7111..7d944d4844 100644
--- a/src/regexp/all_test.go
+++ b/src/regexp/all_test.go
@@ -227,6 +227,7 @@ var replaceTests = []ReplaceTest{
 	{"(a)(((b))){0}c", ".$1.", "xacxacx", "x.a.x.a.x"},
 	{"((a(b){0}){3}){5}(h)", "y caramb$2", "say aaaaaaaaaaaaaaaah", "say ay caramba"},
 	{"((a(b){0}){3}){5}h", "y caramb$2", "say aaaaaaaaaaaaaaaah", "say ay caramba"},
+	{"(Hello)_(World)", "$2_$1", "Hello_World!", "World_Hello!"},
 }
 
 var replaceLiteralTests = []ReplaceTest{
diff --git a/src/regexp/regexp.go b/src/regexp/regexp.go
index b547a2ab97..7bab7a5d81 100644
--- a/src/regexp/regexp.go
+++ b/src/regexp/regexp.go
@@ -981,12 +981,24 @@ func extract(str string) (name string, num int, rest string, ok bool) {
 		str = str[1:]
 	}
 	i := 0
-	for i < len(str) {
-		rune, size := utf8.DecodeRuneInString(str[i:])
-		if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) && rune != '_' {
-			break
+	b := str[0]
+	if !brace && '0' <= b && b <= '9' {
+		i++
+		for i < len(str) {
+			rune, size := utf8.DecodeRuneInString(str[i:])
+			if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) {
+				break
+			}
+			i += size
+		}
+	} else {
+		for i < len(str) {
+			rune, size := utf8.DecodeRuneInString(str[i:])
+			if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) && rune != '_' {
+				break
+			}
+			i += size
 		}
-		i += size
 	}
 	if i == 0 {
 		// empty name is not okay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.