Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: support inlining non-escaping closures #15561

Open
josharian opened this issue May 5, 2016 · 9 comments
Open

cmd/compile: support inlining non-escaping closures #15561

josharian opened this issue May 5, 2016 · 9 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@josharian
Copy link
Contributor

Split out from #9337 and #15537.

@rogpeppe
Copy link
Contributor

I'd very much like to see this happen. An idiom that's useful for dealing with lots of returns from a function with long-winded return values is this:

func f() (int, int, error) {
	fail := func(err error) (int, int, error) {
		return 0, 0, err
	}
	if somethingWrong {
		return fail(someError)
	}
	return foo, bar, nil
}

It would be great if this generated exactly the same code as the equivalent:

func f() (int, int, error) {
	if somethingWrong {
		return 0, 0, someError
	}
	return foo, bar, nil
}

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/65071 mentions this issue: cmd/compile: inline closures whenever possible

@huguesb
Copy link
Contributor

huguesb commented Sep 20, 2017

Taking a stab at this. I have a CL that can handle the basic test case above, with serious limitations/bugs.

I'm not sure what's going on with captured variables (getting invariant violations in the SSA phase) and would appreciate some guidance from someone more familiar with the compiler.

Also, there doesn't seem to be any easy way to tell if a variable is assigned more than once. Am I missing something?

gopherbot pushed a commit that referenced this issue Oct 11, 2017
Calls to a closure held in a local, non-escaping,
variable can be inlined, provided the closure body
can be inlined and the variable is never written to.

The current implementation has the following limitations:

 - closures with captured variables are not inlined because
   doing so naively triggers invariant violation in the SSA
   phase
 - re-assignment check is currently approximated by checking
   the Addrtaken property of the variable which should be safe
   but may miss optimization opportunities if the address is
   not used for a write before the invocation

Updates #15561

Change-Id: I508cad5d28f027bd7e933b1f793c14dcfef8b5a1
Reviewed-on: https://go-review.googlesource.com/65071
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/72490 mentions this issue: cmd/compile: inline closures with captures

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/75770 mentions this issue: cmd/compile: fix reassignment check

gopherbot pushed a commit that referenced this issue Nov 3, 2017
CL 65071 enabled inlining for local closures with no captures.

To determine safety of inlining a call sites, we check whether the
variable holding the closure has any assignments after its original
definition.

Unfortunately, that check did not catch OAS2MAPR and OAS2DOTTYPE,
leading to incorrect inlining when a variable holding a closure was
subsequently reassigned through a type conversion or a 2-valued map
access.

There was another more subtle issue wherein reassignment check would
always return a false positive for closure calls inside other
closures. This was caused by the Name.Curfn field of local variables
pointing to the OCLOSURE node instead of the corresponding ODCLFUNC,
which resulted in reassigned walking an empty Nbody and thus never
seeing any reassignments.

This CL fixes these oversights and adds many more tests for closure
inlining which ensure not only that inlining triggers but also the
correctness of the resulting code.

Updates #15561

Change-Id: I74bdae849c4ecfa328546d6d62b512e8d54d04ce
Reviewed-on: https://go-review.googlesource.com/75770
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Nov 5, 2017
When inlining a closure with captured variables, walk up the
param chain to find the one that is defined inside the scope
into which the function is being inlined, and map occurrences
of the captures to temporary inlvars, similarly to what is
done for function parameters.

No noticeable impact on compilation speed and binary size.

Minor improvements to go1 benchmarks on darwin/amd64

name                     old time/op    new time/op    delta
BinaryTree17-4              2.59s ± 3%     2.58s ± 1%    ~     (p=0.470 n=19+19)
Fannkuch11-4                3.15s ± 2%     3.15s ± 1%    ~     (p=0.647 n=20+19)
FmtFprintfEmpty-4          43.7ns ± 3%    43.4ns ± 4%    ~     (p=0.178 n=18+20)
FmtFprintfString-4         74.0ns ± 2%    77.1ns ± 7%  +4.13%  (p=0.000 n=20+20)
FmtFprintfInt-4            77.2ns ± 3%    79.2ns ± 6%  +2.53%  (p=0.000 n=20+20)
FmtFprintfIntInt-4          112ns ± 4%     112ns ± 2%    ~     (p=0.672 n=20+19)
FmtFprintfPrefixedInt-4     136ns ± 1%     135ns ± 2%    ~     (p=0.827 n=16+20)
FmtFprintfFloat-4           232ns ± 2%     233ns ± 1%    ~     (p=0.194 n=20+20)
FmtManyArgs-4               490ns ± 2%     484ns ± 2%  -1.28%  (p=0.001 n=20+20)
GobDecode-4                6.68ms ± 2%    6.72ms ± 2%    ~     (p=0.113 n=20+19)
GobEncode-4                5.62ms ± 2%    5.71ms ± 2%  +1.64%  (p=0.000 n=20+19)
Gzip-4                      235ms ± 3%     236ms ± 2%    ~     (p=0.607 n=20+19)
Gunzip-4                   37.1ms ± 2%    36.8ms ± 3%    ~     (p=0.060 n=20+20)
HTTPClientServer-4         61.9µs ± 2%    62.7µs ± 4%  +1.24%  (p=0.007 n=18+19)
JSONEncode-4               12.5ms ± 2%    12.4ms ± 3%    ~     (p=0.192 n=20+20)
JSONDecode-4               51.6ms ± 3%    51.0ms ± 3%  -1.19%  (p=0.008 n=20+19)
Mandelbrot200-4            4.12ms ± 6%    4.06ms ± 5%    ~     (p=0.063 n=20+20)
GoParse-4                  3.12ms ± 5%    3.10ms ± 2%    ~     (p=0.402 n=19+19)
RegexpMatchEasy0_32-4      80.7ns ± 2%    75.1ns ± 9%  -6.94%  (p=0.000 n=17+20)
RegexpMatchEasy0_1K-4       197ns ± 2%     186ns ± 2%  -5.43%  (p=0.000 n=20+20)
RegexpMatchEasy1_32-4      77.5ns ± 4%    71.9ns ± 7%  -7.25%  (p=0.000 n=20+18)
RegexpMatchEasy1_1K-4       341ns ± 3%     341ns ± 3%    ~     (p=0.732 n=20+20)
RegexpMatchMedium_32-4      113ns ± 2%     112ns ± 3%    ~     (p=0.102 n=20+20)
RegexpMatchMedium_1K-4     36.6µs ± 2%    35.8µs ± 2%  -2.26%  (p=0.000 n=18+20)
RegexpMatchHard_32-4       1.75µs ± 3%    1.74µs ± 2%    ~     (p=0.473 n=20+19)
RegexpMatchHard_1K-4       52.6µs ± 2%    52.0µs ± 3%  -1.15%  (p=0.005 n=20+20)
Revcomp-4                   381ms ± 4%     377ms ± 2%    ~     (p=0.067 n=20+18)
Template-4                 57.3ms ± 2%    57.7ms ± 2%    ~     (p=0.108 n=20+20)
TimeParse-4                 291ns ± 3%     292ns ± 2%    ~     (p=0.585 n=20+20)
TimeFormat-4                314ns ± 3%     315ns ± 1%    ~     (p=0.681 n=20+20)
[Geo mean]                 47.4µs         47.1µs       -0.73%

name                     old speed      new speed      delta
GobDecode-4               115MB/s ± 2%   114MB/s ± 2%    ~     (p=0.115 n=20+19)
GobEncode-4               137MB/s ± 2%   134MB/s ± 2%  -1.63%  (p=0.000 n=20+19)
Gzip-4                   82.5MB/s ± 3%  82.4MB/s ± 2%    ~     (p=0.612 n=20+19)
Gunzip-4                  523MB/s ± 2%   528MB/s ± 3%    ~     (p=0.060 n=20+20)
JSONEncode-4              155MB/s ± 2%   156MB/s ± 3%    ~     (p=0.192 n=20+20)
JSONDecode-4             37.6MB/s ± 3%  38.1MB/s ± 3%  +1.21%  (p=0.007 n=20+19)
GoParse-4                18.6MB/s ± 4%  18.7MB/s ± 2%    ~     (p=0.405 n=19+19)
RegexpMatchEasy0_32-4     396MB/s ± 2%   426MB/s ± 8%  +7.56%  (p=0.000 n=17+20)
RegexpMatchEasy0_1K-4    5.18GB/s ± 2%  5.48GB/s ± 2%  +5.79%  (p=0.000 n=20+20)
RegexpMatchEasy1_32-4     413MB/s ± 4%   444MB/s ± 6%  +7.46%  (p=0.000 n=20+19)
RegexpMatchEasy1_1K-4    3.00GB/s ± 3%  3.00GB/s ± 3%    ~     (p=0.678 n=20+20)
RegexpMatchMedium_32-4   8.82MB/s ± 2%  8.90MB/s ± 3%  +0.99%  (p=0.044 n=20+20)
RegexpMatchMedium_1K-4   28.0MB/s ± 2%  28.6MB/s ± 2%  +2.32%  (p=0.000 n=18+20)
RegexpMatchHard_32-4     18.3MB/s ± 3%  18.4MB/s ± 2%    ~     (p=0.482 n=20+19)
RegexpMatchHard_1K-4     19.5MB/s ± 2%  19.7MB/s ± 3%  +1.18%  (p=0.004 n=20+20)
Revcomp-4                 668MB/s ± 4%   674MB/s ± 2%    ~     (p=0.066 n=20+18)
Template-4               33.8MB/s ± 2%  33.6MB/s ± 2%    ~     (p=0.104 n=20+20)
[Geo mean]                124MB/s        126MB/s       +1.54%

Updates #15561
Updates #18270

Change-Id: I980086efe28b36aa27f81577065e2a729ff03d4e
Reviewed-on: https://go-review.googlesource.com/72490
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/83538 mentions this issue: cmd/compile: remove broken inlining accounting code

gopherbot pushed a commit that referenced this issue Dec 12, 2017
We can't currently inline functions that contain closures anyway, so
just delete this budgeting code for now. Re-enable once we can (if
ever) inline functions with nested closures.

Updates #15561.
Fixes #23093.

Change-Id: Idc5f8e042ccfcc8921022e58d3843719d4ab821e
Reviewed-on: https://go-review.googlesource.com/83538
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
@thepudds
Copy link
Contributor

thepudds commented Jul 20, 2019

Hi @rogpeppe, is your specific request in #15561 (comment) addressed at this point, or still pending?

For your example, it looks like fail gets inlined, but f does not (cannot inline f: unhandled op CLOSURE), but I'm not sure which piece you personally were focused on in your example.

Separately, I'm not sure sure if this overall issue is about addressing inlining fail in this example, or f, or both. Said another way, I am not sure if this issue should still be open, vs. marked as resolved.

func f() (int, int, error) {
	fail := func(err error) (int, int, error) {
		return 0, 0, err
	}
	if somethingWrong {
		return fail(someError)
	}
	return foo, bar, nil
}

(And sorry, I just looked at a bunch of examples, so non-trivial chance I might just be confused here).

mroth added a commit to mroth/go that referenced this issue Oct 6, 2020
The SearchInts, SearchFloat64s, and SearchStrings functions are
convenience wrappers around the generic Search function, which takes a
function parameter to determine truthfulness.

However, since this passed function is utilized within a for loop, it
cannot currently be inlined by the Go compiler, resulting in some degree
of performance overhead (see golang#15561).

This trivial commit manually inlines the Search function itself to these convenience wrappers, avoiding the function call overhead in the hot
loop.

It replaces 1 line of copy-pasted code with 10 lines of copy-pasted
code, however it has a roughly 2x beneficial effect on performance as
seen below:

$ benchstat before.txt after.txt
name               old time/op  new time/op  delta
SearchWrappers-16  84.7ns ± 1%  43.8ns ± 2%  -48.34%  (p=0.000 n=19+19)

In the future, generics may enable similar perf gains while
avoiding the minimal copypasta, but for now this small change can
improve standard library performance and give the existing search
"convenience wrappers" performance benefits as well as convenience.

Updates golang#15561
mroth added a commit to mroth/go that referenced this issue Oct 6, 2020
The SearchInts, SearchFloat64s, and SearchStrings functions are
convenience wrappers around the generic Search function, which takes a
function parameter to determine truthfulness.

However, since this passed function is utilized within a for loop, it
cannot currently be inlined by the Go compiler, resulting in some degree
of performance overhead (see golang#15561).

This trivial commit manually inlines the Search function itself to these
convenience wrappers, avoiding the function call overhead in the hot
loop.

It replaces 1 line of copy-pasted code with 10 lines of copy-pasted
code, however it has a roughly 2x beneficial effect on performance as
seen below:

$ benchstat before.txt after.txt
name               old time/op  new time/op  delta
SearchWrappers-16  84.7ns ± 1%  43.8ns ± 2%  -48.34%  (p=0.000 n=19+19)

In the future, generics may enable similar perf gains while
avoiding the minimal copypasta, but for now this small change can
improve standard library performance and give the existing search
"convenience wrappers" performance benefits as well as convenience.

Updates golang#15561
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/260018 mentions this issue: sort: inline search convenience wrappers

@DmitriyMV
Copy link
Contributor

DmitriyMV commented Jan 25, 2022

This appears to be fixed on the latest Go https://go.godbolt.org/z/MdGb366c5

@josharian @rogpeppe am I correct?

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
None yet
Development

No branches or pull requests

7 participants