Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: mid-stack inline dispatch functions that call a function on each path #30548

Open
josharian opened this issue Mar 3, 2019 · 1 comment

Comments

@josharian
Copy link
Contributor

commented Mar 3, 2019

package p

func dispatch(x int) int {
	if x < 10 {
		return small(x)
	}
	return big(x)
}

//go:noinline
func small(x int) int { return 0 }

//go:noinline
func big(x int) int { return 1 }

dispatch should be inlined: It is very simple, and on either path through the function we call exactly one other simple function.

It is not:

x.go:3:6: cannot inline dispatch: function too complex: cost 126 exceeds budget 80

This is because we attach a significant cost to each function call.

Note that we are happy to inline this dispatch function:

func dispatch(x int) int {
	fn := big
	if x < 10 {
		fn = small
	}
	return fn(x)
}

Fixing this requires either some special casing to recognize common patterns (like the original dispatch) or a bit of control flow analysis.

cc @randall77 @dr2chase

@gopherbot

This comment has been minimized.

Copy link

commented Mar 3, 2019

Change https://golang.org/cl/164968 mentions this issue: math/big: add fast path for pure Go addVW for large z

gopherbot pushed a commit that referenced this issue Mar 9, 2019
In the normal case, only a few words have to be updated when adding a word to a vector.
When that happens, we can simply copy the rest of the words, which is much faster.
However, the overhead of that makes it prohibitive for small vectors,
so we check the size at the beginning.

The implementation is a bit weird to allow addVW to continued to be inlined; see #30548.

The AddVW benchmarks are surprising, but fully repeatable.
The SubVW benchmarks are more or less as expected.
I expect that removing the indirect function call will
help both and make them a bit more normal.

name            old time/op    new time/op     delta
AddVW/1-8         4.27ns ± 2%     3.81ns ± 3%   -10.83%  (p=0.000 n=89+90)
AddVW/2-8         4.91ns ± 2%     4.34ns ± 1%   -11.60%  (p=0.000 n=83+90)
AddVW/3-8         5.77ns ± 4%     5.76ns ± 2%      ~     (p=0.365 n=91+87)
AddVW/4-8         6.03ns ± 1%     6.03ns ± 1%      ~     (p=0.392 n=80+76)
AddVW/5-8         6.48ns ± 2%     6.63ns ± 1%    +2.27%  (p=0.000 n=76+74)
AddVW/10-8        9.56ns ± 2%     9.56ns ± 1%    -0.02%  (p=0.002 n=69+76)
AddVW/100-8       90.6ns ± 0%     18.1ns ± 4%   -79.99%  (p=0.000 n=72+94)
AddVW/1000-8       865ns ± 0%       85ns ± 6%   -90.14%  (p=0.000 n=66+96)
AddVW/10000-8     8.57µs ± 2%     1.82µs ± 3%   -78.73%  (p=0.000 n=99+94)
AddVW/100000-8    84.4µs ± 2%     31.8µs ± 4%   -62.29%  (p=0.000 n=93+98)

name            old time/op    new time/op     delta
SubVW/1-8         3.90ns ± 2%     4.13ns ± 4%    +6.02%  (p=0.000 n=92+95)
SubVW/2-8         4.15ns ± 1%     5.20ns ± 1%   +25.22%  (p=0.000 n=83+85)
SubVW/3-8         5.50ns ± 2%     6.22ns ± 6%   +13.21%  (p=0.000 n=91+97)
SubVW/4-8         5.99ns ± 1%     6.63ns ± 1%   +10.63%  (p=0.000 n=79+61)
SubVW/5-8         6.75ns ± 4%     6.88ns ± 2%    +1.82%  (p=0.000 n=98+73)
SubVW/10-8        9.57ns ± 1%     9.56ns ± 1%    -0.13%  (p=0.000 n=77+64)
SubVW/100-8       90.3ns ± 1%     18.1ns ± 2%   -80.00%  (p=0.000 n=75+94)
SubVW/1000-8       860ns ± 4%       85ns ± 7%   -90.14%  (p=0.000 n=97+99)
SubVW/10000-8     8.51µs ± 3%     1.77µs ± 6%   -79.21%  (p=0.000 n=100+97)
SubVW/100000-8    84.4µs ± 3%     31.5µs ± 3%   -62.66%  (p=0.000 n=92+92)

Change-Id: I721d7031d40f245b4a284f5bdd93e7bb85e7e937
Reviewed-on: https://go-review.googlesource.com/c/go/+/164968
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
josharian added a commit to josharian/go that referenced this issue May 7, 2019
In the normal case, only a few words have to be updated when adding a word to a vector.
When that happens, we can simply copy the rest of the words, which is much faster.
However, the overhead of that makes it prohibitive for small vectors,
so we check the size at the beginning.

The implementation is a bit weird to allow addVW to continued to be inlined; see golang#30548.

The AddVW benchmarks are surprising, but fully repeatable.
The SubVW benchmarks are more or less as expected.
I expect that removing the indirect function call will
help both and make them a bit more normal.

name            old time/op    new time/op     delta
AddVW/1-8         4.27ns ± 2%     3.81ns ± 3%   -10.83%  (p=0.000 n=89+90)
AddVW/2-8         4.91ns ± 2%     4.34ns ± 1%   -11.60%  (p=0.000 n=83+90)
AddVW/3-8         5.77ns ± 4%     5.76ns ± 2%      ~     (p=0.365 n=91+87)
AddVW/4-8         6.03ns ± 1%     6.03ns ± 1%      ~     (p=0.392 n=80+76)
AddVW/5-8         6.48ns ± 2%     6.63ns ± 1%    +2.27%  (p=0.000 n=76+74)
AddVW/10-8        9.56ns ± 2%     9.56ns ± 1%    -0.02%  (p=0.002 n=69+76)
AddVW/100-8       90.6ns ± 0%     18.1ns ± 4%   -79.99%  (p=0.000 n=72+94)
AddVW/1000-8       865ns ± 0%       85ns ± 6%   -90.14%  (p=0.000 n=66+96)
AddVW/10000-8     8.57µs ± 2%     1.82µs ± 3%   -78.73%  (p=0.000 n=99+94)
AddVW/100000-8    84.4µs ± 2%     31.8µs ± 4%   -62.29%  (p=0.000 n=93+98)

name            old time/op    new time/op     delta
SubVW/1-8         3.90ns ± 2%     4.13ns ± 4%    +6.02%  (p=0.000 n=92+95)
SubVW/2-8         4.15ns ± 1%     5.20ns ± 1%   +25.22%  (p=0.000 n=83+85)
SubVW/3-8         5.50ns ± 2%     6.22ns ± 6%   +13.21%  (p=0.000 n=91+97)
SubVW/4-8         5.99ns ± 1%     6.63ns ± 1%   +10.63%  (p=0.000 n=79+61)
SubVW/5-8         6.75ns ± 4%     6.88ns ± 2%    +1.82%  (p=0.000 n=98+73)
SubVW/10-8        9.57ns ± 1%     9.56ns ± 1%    -0.13%  (p=0.000 n=77+64)
SubVW/100-8       90.3ns ± 1%     18.1ns ± 2%   -80.00%  (p=0.000 n=75+94)
SubVW/1000-8       860ns ± 4%       85ns ± 7%   -90.14%  (p=0.000 n=97+99)
SubVW/10000-8     8.51µs ± 3%     1.77µs ± 6%   -79.21%  (p=0.000 n=100+97)
SubVW/100000-8    84.4µs ± 3%     31.5µs ± 3%   -62.66%  (p=0.000 n=92+92)

Change-Id: I721d7031d40f245b4a284f5bdd93e7bb85e7e937
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.