New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: autogenerated wrapper is not inlined #25338

Closed
TocarIP opened this Issue May 10, 2018 · 6 comments

Comments

Projects
None yet
7 participants
@TocarIP
Contributor

TocarIP commented May 10, 2018

What version of Go are you using (go version)?

Master:

go version devel +5bd88b0 Thu May 10 18:02:07 2018 +0000 linux/amd64

Running encoding/binary benchmarks I see significant slowdown (compared to 1.10):

WriteSlice1000Int32s-4                           9.51µs ± 2%    12.83µs ± 2%    +34.92%  (p=0.008 n=5+5)
ReadSlice1000Int32s-4                            9.34µs ± 2%    12.60µs ± 2%    +34.90%  (p=0.008 n=5+5)

This is caused by compiler not inlining autogenerated binary.(*bigEndian).PutUint32, which does nothing but shuffles arguments and calls binary.bigEndian.PutUint32.

Bisect points to ca2f85f, which is IMHO strange, because it doesn't enable new export format by default.
@mdempsky any ideas?

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented May 10, 2018

In genwrapper (cmd/compile/internal/gc/subr.go) I see:

        // TODO(mdempsky): Investigate why this doesn't work with
        // indexed export. For now, we disable even in non-indexed
        // mode to ensure fair benchmark comparisons and to track down
        // unintended compilation differences.
        if false {
                inlcalls(fn)
        }

Should we at least change false to flagiexport, since benchmarks are done?

@josharian

This comment has been minimized.

Contributor

josharian commented May 10, 2018

@TocarIP I see you are systematically checking benchmarks and running down significant regressions. Just wanted to say a hearty thank you for doing this!

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented May 11, 2018

Additional regressions caused by that CL:

sort
StableInt1K-4         148µs ± 0%      237µs ± 0%    +59.40%  (p=0.008 n=5+5)
StableInt64K-4       13.2ms ± 1%     21.0ms ± 1%    +59.11%  (p=0.008 n=5+5)
SortInt1K-4           134µs ± 0%      201µs ± 1%    +50.75%  (p=0.008 n=5+5)
SortInt64K-4         13.0ms ± 1%     19.3ms ± 1%    +48.86%  (p=0.008 n=5+5)
Stable1e6-4           9.76s ± 0%     13.39s ± 0%    +37.26%  (p=0.008 n=5+5)
Sort1e6-4             2.46s ± 0%      3.37s ± 0%    +37.08%  (p=0.008 n=5+5)
Sort1e4-4            16.2ms ± 0%     22.2ms ± 0%    +36.50%  (p=0.016 n=5+4)
Stable1e2-4           155µs ± 0%      211µs ± 0%    +35.67%  (p=0.008 n=5+5)

bufio
WriterFlush-4       26.8ns ± 0%     31.6ns ± 0%    +18.09%  (p=0.016 n=5+4)

There are more regressions, but these are the most significant ones (highest delta).

@agnivade

This comment has been minimized.

Member

agnivade commented May 13, 2018

I see you are systematically checking benchmarks and running down significant regressions.

Probably we should automate this through a nightly job so that offfending CLs are caught quicker.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 17, 2018

Change https://golang.org/cl/135697 mentions this issue: cmd/compile/internal/gc: inline autogenerated (*T).M wrappers

gopherbot pushed a commit that referenced this issue Oct 16, 2018

cmd/compile/internal/gc: inline autogenerated (*T).M wrappers
Currently all inlining of autogenerated wrappers is disabled,
because it causes build failures, when indexed export format is enabled.
Turns out we can reenable it for common case of (*T).M wrappers.
This fixes most performance degradation of 1.11 vs 1.10.

encoding/binary:
name                    old time/op   new time/op   delta
ReadSlice1000Int32s-6    14.8µs ± 2%   11.5µs ± 2%  -22.01%  (p=0.000 n=10+10)
WriteSlice1000Int32s-6   14.8µs ± 2%   11.7µs ± 2%  -20.95%  (p=0.000 n=10+10)

bufio:
name           old time/op    new time/op    delta
WriterFlush-6    32.4ns ± 1%    28.8ns ± 0%  -11.17%  (p=0.000 n=9+10)

sort:
SearchWrappers-6       231ns ± 1%   231ns ± 0%     ~     (p=0.129 n=9+10)
SortString1K-6         365µs ± 1%   298µs ± 1%  -18.43%  (p=0.000 n=9+10)
SortString1K_Slice-6   274µs ± 2%   276µs ± 1%     ~     (p=0.105 n=10+10)
StableString1K-6       490µs ± 1%   373µs ± 1%  -23.73%  (p=0.000 n=10+10)
SortInt1K-6            210µs ± 1%   142µs ± 1%  -32.69%  (p=0.000 n=10+10)
StableInt1K-6          243µs ± 0%   151µs ± 1%  -37.75%  (p=0.000 n=10+10)
StableInt1K_Slice-6    130µs ± 1%   130µs ± 0%     ~     (p=0.237 n=10+8)
SortInt64K-6          19.9ms ± 1%  13.5ms ± 1%  -32.32%  (p=0.000 n=10+10)
SortInt64K_Slice-6    11.5ms ± 1%  11.5ms ± 1%     ~     (p=0.912 n=10+10)
StableInt64K-6        21.5ms ± 0%  13.5ms ± 1%  -37.30%  (p=0.000 n=9+10)
Sort1e2-6              108µs ± 2%    83µs ± 3%  -23.26%  (p=0.000 n=10+10)
Stable1e2-6            218µs ± 0%   161µs ± 1%  -25.99%  (p=0.000 n=8+9)
Sort1e4-6             22.6ms ± 1%  16.8ms ± 0%  -25.45%  (p=0.000 n=10+7)
Stable1e4-6           67.6ms ± 1%  49.7ms ± 0%  -26.48%  (p=0.000 n=10+10)
Sort1e6-6              3.44s ± 0%   2.55s ± 1%  -26.05%  (p=0.000 n=8+9)
Stable1e6-6            13.7s ± 0%    9.9s ± 1%  -27.68%  (p=0.000 n=8+10)

Fixes #27621
Updates #25338

Change-Id: I6fe633202f63fa829a6ab849c44d7e45f8835dff
Reviewed-on: https://go-review.googlesource.com/c/135697
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@randall77

This comment has been minimized.

Contributor

randall77 commented Dec 12, 2018

I believe this is fixed now. The CL above, and possibly the slightly more aggressive mid-stack inlining makes the wrapper inline successfully.
Tip is still slower than 1.10, but only by a few percent.

@randall77 randall77 closed this Dec 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment