Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: strange performance difference between two implementations #49785

Open
go101 opened this issue Nov 24, 2021 · 4 comments
Open

cmd/compile: strange performance difference between two implementations #49785

go101 opened this issue Nov 24, 2021 · 4 comments

Comments

@go101
Copy link

@go101 go101 commented Nov 24, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package pointers

import "testing"

const N = 10000

type T struct {
	x int
}

//go:noinline
func f(t *T) {
	t.x = 0
	for i := 0; i < N; i++ {
		t.x += i
	}
}

//go:noinline
func g(t *T) {
	var x = 0
	for i := 0; i < N; i++ {
		x += i
	}
	t.x = x
}

func Benchmark_f(b *testing.B) {
	var t = &T{}
	for i := 0; i < b.N; i++ { f(t) }
}

func Benchmark_g(b *testing.B) {
	var t = &T{}
	for i := 0; i < b.N; i++ { g(t) }
}

What did you expect to see?

Similar performances.

What did you see instead?

goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_f-4   	   48352	     24403 ns/op
Benchmark_g-4   	  292581	      3956 ns/op

I checked the generated assembly instructions. Yes, they are different, but the complexities are similar. So it is some strange that the performance difference is so large.

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 24, 2021

The inner loop in f still has writes in it, which is probably why it is slower than g (whose inner loop is completely in registers).

To fix this I think we'd have to promote t.x from memory to register somehow. That seems pretty challenging.
(If the loop were unrolled it might get much of that effect automatically.)

Loading

@go101
Copy link
Author

@go101 go101 commented Nov 25, 2021

The problem also exists for reads:

package pointers

import "testing"

const N = 1000
var a [N]int
var r int

//go:noinline
func g1(a *[N]int) int {
	var r int
	_ = *a
	for i := range a {
		r += a[i]
	}
	return r
}

//go:noinline
func g0(a *[N]int) int {
	var r int
	for i := range a {
		r += a[i]
	}
	return r
}

func Benchmark_g1(b *testing.B) {
	for i := 0; i < b.N; i++ { r = g1(&a) }
}

func Benchmark_g0(b *testing.B) {
	for i := 0; i < b.N; i++ { r = g0(&a) }
}
Benchmark_g1-4   	 2178316	       556.8 ns/op
Benchmark_g0-4   	 1949654	       611.8 ns/op

Loading

@go101
Copy link
Author

@go101 go101 commented Nov 25, 2021

It looks the read case is different from the write case. The compiler generates one more instruction TESTB AL, (AX) for the g0 function in the read case.

Loading

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 25, 2021

@go101 In your example it's just that the nil pointer check is outside the loop in g1 but inside the loop in g0. We'd need to lift the nil check out of the loop to make them the same speed. Which I believe is #41666.

Loading

@seankhliao seankhliao changed the title cmd/compile: strange performacne difference between two implementations cmd/compile: strange performance difference between two implementations Nov 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants