Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: potential compiler regression in 1.16 with heavy arithmetic #43971

Closed
zx2c4 opened this issue Jan 28, 2021 · 2 comments
Closed

cmd/compile: potential compiler regression in 1.16 with heavy arithmetic #43971

zx2c4 opened this issue Jan 28, 2021 · 2 comments

Comments

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Jan 28, 2021

I just noticed a performance regression in a small benchmark with 1.16. Here's the code in question:

package test

import (
        "crypto/rand"
        "testing"
)

func BenchmarkBitslice(b *testing.B) {
        var key [32]byte
        var hex [64]byte
        rand.Read(key[:])

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                for j := 0; j < len(key); j++ {
                        hex[j*2] = byte(87 + (int(key[j]) >> 4) + ((((int(key[j]) >> 4) - 10) >> 8) & ^38))
                        hex[j*2+1] = byte(87 + (int(key[j]) & 0xf) + ((((int(key[j]) & 0xf) - 10) >> 8) & ^38))
                }
        }
}

func BenchmarkLookuptable(b *testing.B) {
        var key [32]byte
        var hex [64]byte
        rand.Read(key[:])

        b.ResetTimer()
        const table = "0123456789abcdef"
        for i := 0; i < b.N; i++ {
                for j := 0; j < len(key); j++ {
                        hex[j*2] = table[key[j]>>4]
                        hex[j*2+1] = table[key[j]&0xf]
                }
        }
}

I'm benching on a Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz using go test -v -benchtime 5s -bench Benchmark*.

Go 1.15:

BenchmarkBitslice-16            408505777               14.4 ns/op
BenchmarkLookuptable-16         389753908               15.6 ns/op

Go 1.16:

BenchmarkBitslice-16            385976983               15.36 ns/op
BenchmarkLookuptable-16         418508672               14.34 ns/op

The bitsliced performance got worse and the lookup table performance got better.

I'm not overly concerned about this, but it is somewhat curious. CC @josharian @bradfitz

@seankhliao seankhliao changed the title potential compiler regression in 1.16 with heavy arithmetic cmd/compile: potential compiler regression in 1.16 with heavy arithmetic Jan 28, 2021
@ALTree
Copy link
Member

@ALTree ALTree commented Jan 28, 2021

It seems like this is just a benchmarking artefact. I've run with -benchtime 2s -count 10 on my machine and compared the results with benchstat, and I got the opposite result (Lookuptable got <1ns slower).

name           old time/op  new time/op  delta
Bitslice-4     27.2ns ± 3%  27.0ns ± 4%    ~     (p=0.565 n=10+10)
Lookuptable-4  26.4ns ± 4%  27.2ns ± 5%  +2.70%  (p=0.050 n=10+10)

But it may be caused by different code alignment. More importantly, you can compare the machine code generated by 1.15 and 1.16 using

$ go test -gcflags -S p_test.go

And I see no meaningful difference between the two, which seems to support the "it's probably just benchmarking noise" explanation. You're measuring a <1ns difference, after all.

Loading

@randall77
Copy link
Contributor

@randall77 randall77 commented Jan 28, 2021

The assembly for BitSlice looks identical to me. The only difference I see is the offset used to load b.N.
I'm going to close this issue, as I don't think there is anything to do here.

Loading

@randall77 randall77 closed this Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants