Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: dip in small struct copy performance in go1.17beta #47074

Closed
go101 opened this issue Jul 7, 2021 · 5 comments
Closed

cmd/compile: dip in small struct copy performance in go1.17beta #47074

go101 opened this issue Jul 7, 2021 · 5 comments

Comments

@go101
Copy link

@go101 go101 commented Jul 7, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17beta1 linux/amd64

Does this issue reproduce with the latest release?

Not for go1.16.5 and tip.

What did you do?

package foo

import (
    "testing"
)

const N = 8192

var struct3_0 struct{a, b, c int}

func Benchmark_CopyStruct_3_fields(b *testing.B) {
    var struct3_1 struct{a, b, c int}
    for i := 0; i < b.N; i++ {
        for range [N]struct{}{} {
            struct3_0 = struct3_1
        }
    }
}

func Benchmark_xxx(b *testing.B) {
    for i := 0; i < b.N; i++ {
    }
}

func Benchmark_yyy(b *testing.B) {
    for i := 0; i < b.N; i++ {
    }
}

What did you expect to see?

cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_CopyStruct_3_fields-4   	  183609	      6357 ns/op
Benchmark_xxx-4                   	1000000000	         0.4014 ns/op
Benchmark_yyy-4                   	1000000000	         0.4130 ns/op

What did you see instead?

cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_CopyStruct_3_fields-4   	   93883	     12407 ns/op
Benchmark_xxx-4                   	1000000000	         0.4014 ns/op
Benchmark_yyy-4                   	1000000000	         0.4130 ns/op

It at least one in the Benchmark_xxx or Benchmark_yyy is removed, then the result is normal.
Otherwise, the result is double.

The problem doesn't exist in go1.16.5 and tip. I don't know whether or not it is fixed in the go1.17 release branch.
I didn't find the go1.17 release branch in the repo, so I haven't tested it.

@go101
Copy link
Author

@go101 go101 commented Jul 7, 2021

Maybe it is not a code bug. If

var struct3_1 struct{a, b, c int}

is changed to

var struct3_1 = struct{a, b, c int}{3, 2, 1}

, then the result is constantly about

Benchmark_CopyStruct_3_fields-4   	  118312	      9430 ns/op
Benchmark_xxx-4                   	1000000000	         0.3946 ns/op
Benchmark_yyy-4                   	1000000000	         0.4007 ns/op

So it looks like a cpu cache related problem.

I don't know whether or not some improvements should be applied to the testing package (I hope it could handle the problem). If not, feel free to close this issue.

Loading

@martisch
Copy link
Contributor

@martisch martisch commented Jul 7, 2021

Its not uncommon to see benchmarks improve or regress in performance when unrelated code is added or removed. ANy code change e.g. compiler or linker or user code can effect this.

This can be due to caching, branch alignment or if the loop fits into the internal loop buffer of the cpu. There are all kinds of constraints e.g. how many branches are within a specific cacheline window. Maybe there is something that be done to harden benchmarks better (even higher alignment?).

To figure out what happens in this case a hardware profile could be made and seeing if any cpu internal stats like branch predictions or cache misses change majorly.

Loading

@go101
Copy link
Author

@go101 go101 commented Jul 7, 2021

Another might-similar benchmark: https://github.com/go101/go101/wiki/Remove-benchmark-surprises-(2) (not valid)

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jul 7, 2021

As @martisch points out, benchmarks with tight loops like this can end up being very sensitive to instruction caching and alignment. I'm not sure it's worth dwelling on this too much. Like @martisch says though, if you're interested, you can probably figure out exactly what's wrong with hardware performance counters. The easiest way to do this on Linux is with Linux's perf tool.

Loading

@mknyszek mknyszek changed the title testing: weird result for go1.17beta1 cmd/compile: dip in small struct copy performance in go1.17beta Jul 7, 2021
@mknyszek mknyszek added this to the Go1.17 milestone Jul 7, 2021
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jul 7, 2021

Given that everything's fine on tip and the nature of these kinds of benchmarks, I'm going to close this issue unless there's a more clear indication that something went wrong with the code the compiler produced.

Loading

@mknyszek mknyszek closed this Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants