Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: len() of a local string variable performs better than that of a const or a global #28417

Closed
ainar-g opened this issue Oct 26, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@ainar-g
Copy link
Contributor

commented Oct 26, 2018

go version go1.11 linux/amd64
go version devel +43530e6b25 Fri Oct 26 12:37:30 2018 +0000 linux/amd64

Playground: https://play.golang.org/p/34va1HU2FJ7.

Results:

$ go test -bench . benchlen_test.go 
goos: linux
goarch: amd64
BenchmarkLen/const-4         	2000000000	         0.55 ns/op
BenchmarkLen/const_local-4   	2000000000	         0.54 ns/op
BenchmarkLen/var-4           	2000000000	         0.58 ns/op
BenchmarkLen/var_local-4     	2000000000	         0.41 ns/op
PASS
ok  	command-line-arguments	4.374s

len(local string variable) is consistently 15-20% faster on my machine. While global vs local []byte have the same performance.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2018

You're measuring the overhead of the loop more than you are measuring the actual cost of len. I'm not sure why the last of your four examples is faster. Could be alignment, cache effects, etc.
The four assembly inner loops are (in order):

	0x0009 00009 (/Users/khr/gowork/issue28417.go:19)	MOVQ	$10, "".sink(SB)
	0x0014 00020 (/Users/khr/gowork/issue28417.go:18)	INCQ	CX
	0x0017 00023 (/Users/khr/gowork/issue28417.go:18)	MOVQ	264(AX), DX
	0x001e 00030 (/Users/khr/gowork/issue28417.go:18)	CMPQ	CX, DX
	0x0021 00033 (/Users/khr/gowork/issue28417.go:18)	JLT	9

	0x0009 00009 (/Users/khr/gowork/issue28417.go:24)	MOVQ	$10, "".sink(SB)
	0x0014 00020 (/Users/khr/gowork/issue28417.go:23)	INCQ	CX
	0x0017 00023 (/Users/khr/gowork/issue28417.go:23)	MOVQ	264(AX), DX
	0x001e 00030 (/Users/khr/gowork/issue28417.go:23)	CMPQ	CX, DX
	0x0021 00033 (/Users/khr/gowork/issue28417.go:23)	JLT	9

	0x0009 00009 (/Users/khr/gowork/issue28417.go:29)	MOVQ	"".vs+8(SB), DX
	0x0010 00016 (/Users/khr/gowork/issue28417.go:29)	MOVQ	DX, "".sink(SB)
	0x0017 00023 (/Users/khr/gowork/issue28417.go:28)	INCQ	CX
	0x001a 00026 (/Users/khr/gowork/issue28417.go:28)	MOVQ	264(AX), DX
	0x0021 00033 (/Users/khr/gowork/issue28417.go:28)	CMPQ	CX, DX
	0x0024 00036 (/Users/khr/gowork/issue28417.go:28)	JLT	9

	0x000d 00013 (/Users/khr/gowork/issue28417.go:34)	MOVQ	AX, "".sink(SB)
	0x0014 00020 (/Users/khr/gowork/issue28417.go:33)	INCQ	DX
	0x0017 00023 (/Users/khr/gowork/issue28417.go:33)	MOVQ	264(CX), BX
	0x001e 00030 (/Users/khr/gowork/issue28417.go:33)	CMPQ	DX, BX
	0x0021 00033 (/Users/khr/gowork/issue28417.go:33)	JLT	13

@bradfitz bradfitz added this to the Unplanned milestone Oct 26, 2018

@agnivade

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

15-20% faster is one way of looking at it. But you are measuring difference at a scale of 0.1ns. At that scale, even CPU heat can make an impact.

From the generated assembly, there is no difference.

I am curious to know if this arose in a real world usecase. Otherwise we will just be trying to understand benchmarking here since the code is the same in all cases.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2018

I'm going to close this issue, I don't think there is anything actionable here.

@randall77 randall77 closed this Oct 26, 2018

@gopherbot

This comment has been minimized.

Copy link

commented Oct 26, 2018

Change https://golang.org/cl/145097 mentions this issue: cmd/compile: fix rule for combining loads with compares

@ainar-g

This comment has been minimized.

Copy link
Contributor Author

commented Oct 26, 2018

After rewriting my benchmark in accordance to Dave Cheney's recommendations, the effect is gone:

BenchmarkLen/const-4         	2000000000	         0.42 ns/op
BenchmarkLen/const_local-4   	2000000000	         0.40 ns/op
BenchmarkLen/var-4           	2000000000	         0.40 ns/op
BenchmarkLen/var_local-4     	2000000000	         0.40 ns/op

@randall77 Sorry for bothering. Happy that something good did come out of this.

gopherbot pushed a commit that referenced this issue Oct 27, 2018

cmd/compile: fix rule for combining loads with compares
Unlike normal load+op opcodes, the load+compare opcode does
not clobber its non-load argument. Allow the load+compare merge
to happen even if the non-load argument is used elsewhere.

Noticed when investigating issue #28417.

Change-Id: Ibc48d1f2e06ae76034c59f453815d263e8ec7288
Reviewed-on: https://go-review.googlesource.com/c/145097
Reviewed-by: Ainar Garipov <gugl.zadolbal@gmail.com>
Reviewed-by: Ben Shi <powerman1st@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.