Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: boost inlining into FORs #48209

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nimelehin
Copy link
Contributor

@nimelehin nimelehin commented Sep 6, 2021

As already Than McIntosh mentioned it's a common practise to boost
inlining to FORs, since the callsite could be "hotter". This patch
implements this functionality.

The implementation uses a stack of FORs to recognise calls which are
in a loop. The stack is maintained alongside inlnode function works
and contains information about ancenstor FORs relative to a current
node in inlnode.

There is "big" FOR which cost is >= inlineBigForCost(105). In such FORs
no boost is applied.

Updates #17566

The following results on GO1, while binary size not increased significantly
10454800 -> 10475120, which is less than 0.3%.

goos: linux
goarch: amd64
pkg: test/bench/go1
cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz
name old time/op new time/op delta
BinaryTree17-8 2.15s ± 1% 2.17s ± 1% ~ (p=0.065 n=6+6)
Fannkuch11-8 2.70s ± 0% 2.69s ± 0% -0.25% (p=0.010 n=6+4)
FmtFprintfEmpty-8 31.9ns ± 0% 31.4ns ± 0% -1.61% (p=0.008 n=5+5)
FmtFprintfString-8 57.0ns ± 0% 57.1ns ± 0% +0.26% (p=0.013 n=6+5)
FmtFprintfInt-8 65.2ns ± 0% 63.9ns ± 0% -1.95% (p=0.008 n=5+5)
FmtFprintfIntInt-8 103ns ± 0% 102ns ± 0% -1.01% (p=0.000 n=5+4)
FmtFprintfPrefixedInt-8 119ns ± 0% 118ns ± 0% -0.50% (p=0.008 n=5+5)
FmtFprintfFloat-8 169ns ± 0% 174ns ± 0% +2.75% (p=0.008 n=5+5)
FmtManyArgs-8 445ns ± 0% 447ns ± 0% +0.46% (p=0.002 n=6+6)
GobDecode-8 4.37ms ± 1% 4.40ms ± 0% +0.62% (p=0.009 n=6+6)
GobEncode-8 3.07ms ± 0% 3.04ms ± 0% -0.78% (p=0.004 n=5+6)
Gzip-8 195ms ± 0% 195ms ± 0% ~ (p=0.429 n=5+6)
Gunzip-8 28.2ms ± 0% 28.2ms ± 0% ~ (p=0.662 n=5+6)
HTTPClientServer-8 45.0µs ± 1% 45.4µs ± 1% ~ (p=0.093 n=6+6)
JSONEncode-8 8.01ms ± 0% 8.03ms ± 0% +0.31% (p=0.008 n=5+5)
JSONDecode-8 35.3ms ± 1% 35.1ms ± 0% -0.72% (p=0.008 n=5+5)
Mandelbrot200-8 4.50ms ± 0% 4.49ms ± 1% ~ (p=0.937 n=6+6)
GoParse-8 3.03ms ± 1% 3.00ms ± 1% ~ (p=0.180 n=6+6)
RegexpMatchEasy0_32-8 55.4ns ± 0% 53.2ns ± 3% -3.92% (p=0.004 n=5+6)
RegexpMatchEasy0_1K-8 178ns ± 0% 175ns ± 1% -1.57% (p=0.004 n=5+6)
RegexpMatchEasy1_32-8 50.1ns ± 0% 48.3ns ± 5% ~ (p=0.082 n=5+6)
RegexpMatchEasy1_1K-8 271ns ± 1% 262ns ± 1% -3.26% (p=0.004 n=6+5)
RegexpMatchMedium_32-8 949ns ± 0% 886ns ± 7% ~ (p=0.329 n=5+6)
RegexpMatchMedium_1K-8 27.1µs ± 7% 28.1µs ± 6% ~ (p=0.394 n=6+6)
RegexpMatchHard_32-8 1.28µs ± 2% 1.29µs ± 0% ~ (p=0.056 n=6+6)
RegexpMatchHard_1K-8 38.5µs ± 0% 38.4µs ± 0% -0.25% (p=0.009 n=6+5)
Revcomp-8 397ms ± 0% 396ms ± 0% ~ (p=0.429 n=6+5)
Template-8 48.1ms ± 1% 48.1ms ± 0% ~ (p=0.222 n=5+5)
TimeParse-8 213ns ± 0% 213ns ± 0% ~ (p=0.210 n=4+6)
TimeFormat-8 295ns ± 1% 259ns ± 0% -12.22% (p=0.002 n=6+6)
[Geo mean] 40.5µs 40.1µs -1.00%

name old speed new speed delta
GobDecode-8 176MB/s ± 1% 174MB/s ± 0% -0.61% (p=0.009 n=6+6)
GobEncode-8 250MB/s ± 0% 252MB/s ± 0% +0.79% (p=0.004 n=5+6)
Gzip-8 100MB/s ± 0% 100MB/s ± 0% ~ (p=0.351 n=5+6)
Gunzip-8 687MB/s ± 0% 687MB/s ± 0% ~ (p=0.662 n=5+6)
JSONEncode-8 242MB/s ± 0% 242MB/s ± 0% -0.31% (p=0.008 n=5+5)
JSONDecode-8 54.9MB/s ± 1% 55.3MB/s ± 0% +0.71% (p=0.008 n=5+5)
GoParse-8 19.1MB/s ± 1% 19.3MB/s ± 1% ~ (p=0.143 n=6+6)
RegexpMatchEasy0_32-8 578MB/s ± 0% 601MB/s ± 3% +4.10% (p=0.004 n=5+6)
RegexpMatchEasy0_1K-8 5.74GB/s ± 1% 5.85GB/s ± 1% +1.90% (p=0.002 n=6+6)
RegexpMatchEasy1_32-8 639MB/s ± 0% 663MB/s ± 4% ~ (p=0.082 n=5+6)
RegexpMatchEasy1_1K-8 3.78GB/s ± 1% 3.91GB/s ± 1% +3.38% (p=0.004 n=6+5)
RegexpMatchMedium_32-8 33.7MB/s ± 0% 36.2MB/s ± 7% ~ (p=0.268 n=5+6)
RegexpMatchMedium_1K-8 37.9MB/s ± 6% 36.5MB/s ± 6% ~ (p=0.411 n=6+6)
RegexpMatchHard_32-8 24.9MB/s ± 2% 24.8MB/s ± 0% ~ (p=0.063 n=6+6)
RegexpMatchHard_1K-8 26.6MB/s ± 0% 26.7MB/s ± 0% +0.25% (p=0.009 n=6+5)
Revcomp-8 640MB/s ± 0% 641MB/s ± 0% ~ (p=0.429 n=6+5)
Template-8 40.4MB/s ± 1% 40.3MB/s ± 0% ~ (p=0.222 n=5+5)
[Geo mean] 175MB/s 177MB/s +1.05%

@google-cla google-cla bot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Sep 6, 2021
@gopherbot
Copy link
Contributor

This PR (HEAD: 2fec790) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch 3 times, most recently from 415ea1b to 7de33c3 Compare September 9, 2021 15:51
@gopherbot
Copy link
Contributor

This PR (HEAD: 7de33c3) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 3:

(17 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from 7de33c3 to d7f3135 Compare September 10, 2021 15:39
@gopherbot
Copy link
Contributor

This PR (HEAD: d7f3135) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from d7f3135 to c662bfa Compare September 10, 2021 15:56
@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 4:

(18 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

This PR (HEAD: c662bfa) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 5:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5:

(6 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 5:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 5:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from c662bfa to 7eb224f Compare September 23, 2021 14:40
@gopherbot
Copy link
Contributor

This PR (HEAD: 7eb224f) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from 7eb224f to f398028 Compare September 23, 2021 15:11
@gopherbot
Copy link
Contributor

This PR (HEAD: f398028) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from f398028 to ba9fc48 Compare September 23, 2021 15:52
@gopherbot
Copy link
Contributor

This PR (HEAD: ba9fc48) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 9:

(8 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from ba9fc48 to d2928d0 Compare October 6, 2021 15:17
@gopherbot
Copy link
Contributor

This PR (HEAD: d2928d0) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 10:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 10:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

As already Than McIntosh mentioned it's a common practise to boost
inlining to FORs, since the callsite could be "hotter". This patch
implements this functionality.

The implementation uses a stack of FORs to recognise calls which are
in a loop. The stack is maintained alongside inlnode function works
and contains information about ancenstor FORs relative to a current
node in inlnode.

There is "big" FOR which cost is >= inlineBigForCost(105). In such FORs
no boost is applied.

Updates golang#17566

The following results on GO1, while binary size not increased significantly
10454800 -> 10475120, which is less than 0.3%.

goos: linux
goarch: amd64
pkg: test/bench/go1
cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz
name                     old time/op    new time/op    delta
BinaryTree17-8              2.15s ± 1%     2.17s ± 1%     ~     (p=0.065 n=6+6)
Fannkuch11-8                2.70s ± 0%     2.69s ± 0%   -0.25%  (p=0.010 n=6+4)
FmtFprintfEmpty-8          31.9ns ± 0%    31.4ns ± 0%   -1.61%  (p=0.008 n=5+5)
FmtFprintfString-8         57.0ns ± 0%    57.1ns ± 0%   +0.26%  (p=0.013 n=6+5)
FmtFprintfInt-8            65.2ns ± 0%    63.9ns ± 0%   -1.95%  (p=0.008 n=5+5)
FmtFprintfIntInt-8          103ns ± 0%     102ns ± 0%   -1.01%  (p=0.000 n=5+4)
FmtFprintfPrefixedInt-8     119ns ± 0%     118ns ± 0%   -0.50%  (p=0.008 n=5+5)
FmtFprintfFloat-8           169ns ± 0%     174ns ± 0%   +2.75%  (p=0.008 n=5+5)
FmtManyArgs-8               445ns ± 0%     447ns ± 0%   +0.46%  (p=0.002 n=6+6)
GobDecode-8                4.37ms ± 1%    4.40ms ± 0%   +0.62%  (p=0.009 n=6+6)
GobEncode-8                3.07ms ± 0%    3.04ms ± 0%   -0.78%  (p=0.004 n=5+6)
Gzip-8                      195ms ± 0%     195ms ± 0%     ~     (p=0.429 n=5+6)
Gunzip-8                   28.2ms ± 0%    28.2ms ± 0%     ~     (p=0.662 n=5+6)
HTTPClientServer-8         45.0µs ± 1%    45.4µs ± 1%     ~     (p=0.093 n=6+6)
JSONEncode-8               8.01ms ± 0%    8.03ms ± 0%   +0.31%  (p=0.008 n=5+5)
JSONDecode-8               35.3ms ± 1%    35.1ms ± 0%   -0.72%  (p=0.008 n=5+5)
Mandelbrot200-8            4.50ms ± 0%    4.49ms ± 1%     ~     (p=0.937 n=6+6)
GoParse-8                  3.03ms ± 1%    3.00ms ± 1%     ~     (p=0.180 n=6+6)
RegexpMatchEasy0_32-8      55.4ns ± 0%    53.2ns ± 3%   -3.92%  (p=0.004 n=5+6)
RegexpMatchEasy0_1K-8       178ns ± 0%     175ns ± 1%   -1.57%  (p=0.004 n=5+6)
RegexpMatchEasy1_32-8      50.1ns ± 0%    48.3ns ± 5%     ~     (p=0.082 n=5+6)
RegexpMatchEasy1_1K-8       271ns ± 1%     262ns ± 1%   -3.26%  (p=0.004 n=6+5)
RegexpMatchMedium_32-8      949ns ± 0%     886ns ± 7%     ~     (p=0.329 n=5+6)
RegexpMatchMedium_1K-8     27.1µs ± 7%    28.1µs ± 6%     ~     (p=0.394 n=6+6)
RegexpMatchHard_32-8       1.28µs ± 2%    1.29µs ± 0%     ~     (p=0.056 n=6+6)
RegexpMatchHard_1K-8       38.5µs ± 0%    38.4µs ± 0%   -0.25%  (p=0.009 n=6+5)
Revcomp-8                   397ms ± 0%     396ms ± 0%     ~     (p=0.429 n=6+5)
Template-8                 48.1ms ± 1%    48.1ms ± 0%     ~     (p=0.222 n=5+5)
TimeParse-8                 213ns ± 0%     213ns ± 0%     ~     (p=0.210 n=4+6)
TimeFormat-8                295ns ± 1%     259ns ± 0%  -12.22%  (p=0.002 n=6+6)
[Geo mean]                 40.5µs         40.1µs        -1.00%

name                     old speed      new speed      delta
GobDecode-8               176MB/s ± 1%   174MB/s ± 0%   -0.61%  (p=0.009 n=6+6)
GobEncode-8               250MB/s ± 0%   252MB/s ± 0%   +0.79%  (p=0.004 n=5+6)
Gzip-8                    100MB/s ± 0%   100MB/s ± 0%     ~     (p=0.351 n=5+6)
Gunzip-8                  687MB/s ± 0%   687MB/s ± 0%     ~     (p=0.662 n=5+6)
JSONEncode-8              242MB/s ± 0%   242MB/s ± 0%   -0.31%  (p=0.008 n=5+5)
JSONDecode-8             54.9MB/s ± 1%  55.3MB/s ± 0%   +0.71%  (p=0.008 n=5+5)
GoParse-8                19.1MB/s ± 1%  19.3MB/s ± 1%     ~     (p=0.143 n=6+6)
RegexpMatchEasy0_32-8     578MB/s ± 0%   601MB/s ± 3%   +4.10%  (p=0.004 n=5+6)
RegexpMatchEasy0_1K-8    5.74GB/s ± 1%  5.85GB/s ± 1%   +1.90%  (p=0.002 n=6+6)
RegexpMatchEasy1_32-8     639MB/s ± 0%   663MB/s ± 4%     ~     (p=0.082 n=5+6)
RegexpMatchEasy1_1K-8    3.78GB/s ± 1%  3.91GB/s ± 1%   +3.38%  (p=0.004 n=6+5)
RegexpMatchMedium_32-8   33.7MB/s ± 0%  36.2MB/s ± 7%     ~     (p=0.268 n=5+6)
RegexpMatchMedium_1K-8   37.9MB/s ± 6%  36.5MB/s ± 6%     ~     (p=0.411 n=6+6)
RegexpMatchHard_32-8     24.9MB/s ± 2%  24.8MB/s ± 0%     ~     (p=0.063 n=6+6)
RegexpMatchHard_1K-8     26.6MB/s ± 0%  26.7MB/s ± 0%   +0.25%  (p=0.009 n=6+5)
Revcomp-8                 640MB/s ± 0%   641MB/s ± 0%     ~     (p=0.429 n=6+5)
Template-8               40.4MB/s ± 1%  40.3MB/s ± 0%     ~     (p=0.222 n=5+5)
[Geo mean]                175MB/s        177MB/s        +1.05%
@nimelehin nimelehin force-pushed the boost_inline_into_fors branch from d2928d0 to d68cada Compare October 19, 2021 15:20
@gopherbot
Copy link
Contributor

This PR (HEAD: d68cada) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/347732 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 11:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 11:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 11:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Keith Randall:

Patch Set 11:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Nikita Melekhin:

Patch Set 11:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/347732.
After addressing review feedback, remember to publish your drafts!

Comment on lines +121 to +127
finalBudget := ctx.initialInlineBudget
if ctx.canBoostInliningIntoFor() && ctx.initialInlineBudget == inlineMaxBudget {
// Boosts only regular functions
finalBudget += inlineIntoForExtraBudget
}

return finalBudget
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the calculated value can be directly returned without being stored in an intermediary variable, it results in a more concise and straightforward function:

Suggested change
finalBudget := ctx.initialInlineBudget
if ctx.canBoostInliningIntoFor() && ctx.initialInlineBudget == inlineMaxBudget {
// Boosts only regular functions
finalBudget += inlineIntoForExtraBudget
}
return finalBudget
if ctx.canBoostInliningIntoFor() && ctx.initialInlineBudget == inlineMaxBudget {
// Only boost regular functions
return ctx.initialInlineBudget + inlineIntoForExtraBudget
}
return ctx.initialInlineBudget

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants