-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: boost inlining into FORs #48209
base: master
Are you sure you want to change the base?
Conversation
This PR (HEAD: 2fec790) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
415ea1b
to
7de33c3
Compare
This PR (HEAD: 7de33c3) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
Message from Keith Randall: Patch Set 3: (17 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
7de33c3
to
d7f3135
Compare
This PR (HEAD: d7f3135) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
d7f3135
to
c662bfa
Compare
Message from Nikita Melekhin: Patch Set 4: (18 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
This PR (HEAD: c662bfa) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
Message from Nikita Melekhin: Patch Set 5: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Keith Randall: Patch Set 5: (6 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Nikita Melekhin: Patch Set 5: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Keith Randall: Patch Set 5: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
c662bfa
to
7eb224f
Compare
This PR (HEAD: 7eb224f) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
7eb224f
to
f398028
Compare
This PR (HEAD: f398028) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
f398028
to
ba9fc48
Compare
This PR (HEAD: ba9fc48) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
Message from Nikita Melekhin: Patch Set 9: (8 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
ba9fc48
to
d2928d0
Compare
This PR (HEAD: d2928d0) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
Message from Nikita Melekhin: Patch Set 10: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Nikita Melekhin: Patch Set 10: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
As already Than McIntosh mentioned it's a common practise to boost inlining to FORs, since the callsite could be "hotter". This patch implements this functionality. The implementation uses a stack of FORs to recognise calls which are in a loop. The stack is maintained alongside inlnode function works and contains information about ancenstor FORs relative to a current node in inlnode. There is "big" FOR which cost is >= inlineBigForCost(105). In such FORs no boost is applied. Updates golang#17566 The following results on GO1, while binary size not increased significantly 10454800 -> 10475120, which is less than 0.3%. goos: linux goarch: amd64 pkg: test/bench/go1 cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz name old time/op new time/op delta BinaryTree17-8 2.15s ± 1% 2.17s ± 1% ~ (p=0.065 n=6+6) Fannkuch11-8 2.70s ± 0% 2.69s ± 0% -0.25% (p=0.010 n=6+4) FmtFprintfEmpty-8 31.9ns ± 0% 31.4ns ± 0% -1.61% (p=0.008 n=5+5) FmtFprintfString-8 57.0ns ± 0% 57.1ns ± 0% +0.26% (p=0.013 n=6+5) FmtFprintfInt-8 65.2ns ± 0% 63.9ns ± 0% -1.95% (p=0.008 n=5+5) FmtFprintfIntInt-8 103ns ± 0% 102ns ± 0% -1.01% (p=0.000 n=5+4) FmtFprintfPrefixedInt-8 119ns ± 0% 118ns ± 0% -0.50% (p=0.008 n=5+5) FmtFprintfFloat-8 169ns ± 0% 174ns ± 0% +2.75% (p=0.008 n=5+5) FmtManyArgs-8 445ns ± 0% 447ns ± 0% +0.46% (p=0.002 n=6+6) GobDecode-8 4.37ms ± 1% 4.40ms ± 0% +0.62% (p=0.009 n=6+6) GobEncode-8 3.07ms ± 0% 3.04ms ± 0% -0.78% (p=0.004 n=5+6) Gzip-8 195ms ± 0% 195ms ± 0% ~ (p=0.429 n=5+6) Gunzip-8 28.2ms ± 0% 28.2ms ± 0% ~ (p=0.662 n=5+6) HTTPClientServer-8 45.0µs ± 1% 45.4µs ± 1% ~ (p=0.093 n=6+6) JSONEncode-8 8.01ms ± 0% 8.03ms ± 0% +0.31% (p=0.008 n=5+5) JSONDecode-8 35.3ms ± 1% 35.1ms ± 0% -0.72% (p=0.008 n=5+5) Mandelbrot200-8 4.50ms ± 0% 4.49ms ± 1% ~ (p=0.937 n=6+6) GoParse-8 3.03ms ± 1% 3.00ms ± 1% ~ (p=0.180 n=6+6) RegexpMatchEasy0_32-8 55.4ns ± 0% 53.2ns ± 3% -3.92% (p=0.004 n=5+6) RegexpMatchEasy0_1K-8 178ns ± 0% 175ns ± 1% -1.57% (p=0.004 n=5+6) RegexpMatchEasy1_32-8 50.1ns ± 0% 48.3ns ± 5% ~ (p=0.082 n=5+6) RegexpMatchEasy1_1K-8 271ns ± 1% 262ns ± 1% -3.26% (p=0.004 n=6+5) RegexpMatchMedium_32-8 949ns ± 0% 886ns ± 7% ~ (p=0.329 n=5+6) RegexpMatchMedium_1K-8 27.1µs ± 7% 28.1µs ± 6% ~ (p=0.394 n=6+6) RegexpMatchHard_32-8 1.28µs ± 2% 1.29µs ± 0% ~ (p=0.056 n=6+6) RegexpMatchHard_1K-8 38.5µs ± 0% 38.4µs ± 0% -0.25% (p=0.009 n=6+5) Revcomp-8 397ms ± 0% 396ms ± 0% ~ (p=0.429 n=6+5) Template-8 48.1ms ± 1% 48.1ms ± 0% ~ (p=0.222 n=5+5) TimeParse-8 213ns ± 0% 213ns ± 0% ~ (p=0.210 n=4+6) TimeFormat-8 295ns ± 1% 259ns ± 0% -12.22% (p=0.002 n=6+6) [Geo mean] 40.5µs 40.1µs -1.00% name old speed new speed delta GobDecode-8 176MB/s ± 1% 174MB/s ± 0% -0.61% (p=0.009 n=6+6) GobEncode-8 250MB/s ± 0% 252MB/s ± 0% +0.79% (p=0.004 n=5+6) Gzip-8 100MB/s ± 0% 100MB/s ± 0% ~ (p=0.351 n=5+6) Gunzip-8 687MB/s ± 0% 687MB/s ± 0% ~ (p=0.662 n=5+6) JSONEncode-8 242MB/s ± 0% 242MB/s ± 0% -0.31% (p=0.008 n=5+5) JSONDecode-8 54.9MB/s ± 1% 55.3MB/s ± 0% +0.71% (p=0.008 n=5+5) GoParse-8 19.1MB/s ± 1% 19.3MB/s ± 1% ~ (p=0.143 n=6+6) RegexpMatchEasy0_32-8 578MB/s ± 0% 601MB/s ± 3% +4.10% (p=0.004 n=5+6) RegexpMatchEasy0_1K-8 5.74GB/s ± 1% 5.85GB/s ± 1% +1.90% (p=0.002 n=6+6) RegexpMatchEasy1_32-8 639MB/s ± 0% 663MB/s ± 4% ~ (p=0.082 n=5+6) RegexpMatchEasy1_1K-8 3.78GB/s ± 1% 3.91GB/s ± 1% +3.38% (p=0.004 n=6+5) RegexpMatchMedium_32-8 33.7MB/s ± 0% 36.2MB/s ± 7% ~ (p=0.268 n=5+6) RegexpMatchMedium_1K-8 37.9MB/s ± 6% 36.5MB/s ± 6% ~ (p=0.411 n=6+6) RegexpMatchHard_32-8 24.9MB/s ± 2% 24.8MB/s ± 0% ~ (p=0.063 n=6+6) RegexpMatchHard_1K-8 26.6MB/s ± 0% 26.7MB/s ± 0% +0.25% (p=0.009 n=6+5) Revcomp-8 640MB/s ± 0% 641MB/s ± 0% ~ (p=0.429 n=6+5) Template-8 40.4MB/s ± 1% 40.3MB/s ± 0% ~ (p=0.222 n=5+5) [Geo mean] 175MB/s 177MB/s +1.05%
d2928d0
to
d68cada
Compare
This PR (HEAD: d68cada) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/347732 to see it. Tip: You can toggle comments from me using the |
Message from Nikita Melekhin: Patch Set 11: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Keith Randall: Patch Set 11: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Nikita Melekhin: Patch Set 11: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Keith Randall: Patch Set 11: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
Message from Nikita Melekhin: Patch Set 11: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/347732. |
finalBudget := ctx.initialInlineBudget | ||
if ctx.canBoostInliningIntoFor() && ctx.initialInlineBudget == inlineMaxBudget { | ||
// Boosts only regular functions | ||
finalBudget += inlineIntoForExtraBudget | ||
} | ||
|
||
return finalBudget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the calculated value can be directly returned without being stored in an intermediary variable, it results in a more concise and straightforward function:
finalBudget := ctx.initialInlineBudget | |
if ctx.canBoostInliningIntoFor() && ctx.initialInlineBudget == inlineMaxBudget { | |
// Boosts only regular functions | |
finalBudget += inlineIntoForExtraBudget | |
} | |
return finalBudget | |
if ctx.canBoostInliningIntoFor() && ctx.initialInlineBudget == inlineMaxBudget { | |
// Only boost regular functions | |
return ctx.initialInlineBudget + inlineIntoForExtraBudget | |
} | |
return ctx.initialInlineBudget |
As already Than McIntosh mentioned it's a common practise to boost
inlining to FORs, since the callsite could be "hotter". This patch
implements this functionality.
The implementation uses a stack of FORs to recognise calls which are
in a loop. The stack is maintained alongside inlnode function works
and contains information about ancenstor FORs relative to a current
node in inlnode.
There is "big" FOR which cost is >= inlineBigForCost(105). In such FORs
no boost is applied.
Updates #17566
The following results on GO1, while binary size not increased significantly
10454800 -> 10475120, which is less than 0.3%.
goos: linux
goarch: amd64
pkg: test/bench/go1
cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz
name old time/op new time/op delta
BinaryTree17-8 2.15s ± 1% 2.17s ± 1% ~ (p=0.065 n=6+6)
Fannkuch11-8 2.70s ± 0% 2.69s ± 0% -0.25% (p=0.010 n=6+4)
FmtFprintfEmpty-8 31.9ns ± 0% 31.4ns ± 0% -1.61% (p=0.008 n=5+5)
FmtFprintfString-8 57.0ns ± 0% 57.1ns ± 0% +0.26% (p=0.013 n=6+5)
FmtFprintfInt-8 65.2ns ± 0% 63.9ns ± 0% -1.95% (p=0.008 n=5+5)
FmtFprintfIntInt-8 103ns ± 0% 102ns ± 0% -1.01% (p=0.000 n=5+4)
FmtFprintfPrefixedInt-8 119ns ± 0% 118ns ± 0% -0.50% (p=0.008 n=5+5)
FmtFprintfFloat-8 169ns ± 0% 174ns ± 0% +2.75% (p=0.008 n=5+5)
FmtManyArgs-8 445ns ± 0% 447ns ± 0% +0.46% (p=0.002 n=6+6)
GobDecode-8 4.37ms ± 1% 4.40ms ± 0% +0.62% (p=0.009 n=6+6)
GobEncode-8 3.07ms ± 0% 3.04ms ± 0% -0.78% (p=0.004 n=5+6)
Gzip-8 195ms ± 0% 195ms ± 0% ~ (p=0.429 n=5+6)
Gunzip-8 28.2ms ± 0% 28.2ms ± 0% ~ (p=0.662 n=5+6)
HTTPClientServer-8 45.0µs ± 1% 45.4µs ± 1% ~ (p=0.093 n=6+6)
JSONEncode-8 8.01ms ± 0% 8.03ms ± 0% +0.31% (p=0.008 n=5+5)
JSONDecode-8 35.3ms ± 1% 35.1ms ± 0% -0.72% (p=0.008 n=5+5)
Mandelbrot200-8 4.50ms ± 0% 4.49ms ± 1% ~ (p=0.937 n=6+6)
GoParse-8 3.03ms ± 1% 3.00ms ± 1% ~ (p=0.180 n=6+6)
RegexpMatchEasy0_32-8 55.4ns ± 0% 53.2ns ± 3% -3.92% (p=0.004 n=5+6)
RegexpMatchEasy0_1K-8 178ns ± 0% 175ns ± 1% -1.57% (p=0.004 n=5+6)
RegexpMatchEasy1_32-8 50.1ns ± 0% 48.3ns ± 5% ~ (p=0.082 n=5+6)
RegexpMatchEasy1_1K-8 271ns ± 1% 262ns ± 1% -3.26% (p=0.004 n=6+5)
RegexpMatchMedium_32-8 949ns ± 0% 886ns ± 7% ~ (p=0.329 n=5+6)
RegexpMatchMedium_1K-8 27.1µs ± 7% 28.1µs ± 6% ~ (p=0.394 n=6+6)
RegexpMatchHard_32-8 1.28µs ± 2% 1.29µs ± 0% ~ (p=0.056 n=6+6)
RegexpMatchHard_1K-8 38.5µs ± 0% 38.4µs ± 0% -0.25% (p=0.009 n=6+5)
Revcomp-8 397ms ± 0% 396ms ± 0% ~ (p=0.429 n=6+5)
Template-8 48.1ms ± 1% 48.1ms ± 0% ~ (p=0.222 n=5+5)
TimeParse-8 213ns ± 0% 213ns ± 0% ~ (p=0.210 n=4+6)
TimeFormat-8 295ns ± 1% 259ns ± 0% -12.22% (p=0.002 n=6+6)
[Geo mean] 40.5µs 40.1µs -1.00%
name old speed new speed delta
GobDecode-8 176MB/s ± 1% 174MB/s ± 0% -0.61% (p=0.009 n=6+6)
GobEncode-8 250MB/s ± 0% 252MB/s ± 0% +0.79% (p=0.004 n=5+6)
Gzip-8 100MB/s ± 0% 100MB/s ± 0% ~ (p=0.351 n=5+6)
Gunzip-8 687MB/s ± 0% 687MB/s ± 0% ~ (p=0.662 n=5+6)
JSONEncode-8 242MB/s ± 0% 242MB/s ± 0% -0.31% (p=0.008 n=5+5)
JSONDecode-8 54.9MB/s ± 1% 55.3MB/s ± 0% +0.71% (p=0.008 n=5+5)
GoParse-8 19.1MB/s ± 1% 19.3MB/s ± 1% ~ (p=0.143 n=6+6)
RegexpMatchEasy0_32-8 578MB/s ± 0% 601MB/s ± 3% +4.10% (p=0.004 n=5+6)
RegexpMatchEasy0_1K-8 5.74GB/s ± 1% 5.85GB/s ± 1% +1.90% (p=0.002 n=6+6)
RegexpMatchEasy1_32-8 639MB/s ± 0% 663MB/s ± 4% ~ (p=0.082 n=5+6)
RegexpMatchEasy1_1K-8 3.78GB/s ± 1% 3.91GB/s ± 1% +3.38% (p=0.004 n=6+5)
RegexpMatchMedium_32-8 33.7MB/s ± 0% 36.2MB/s ± 7% ~ (p=0.268 n=5+6)
RegexpMatchMedium_1K-8 37.9MB/s ± 6% 36.5MB/s ± 6% ~ (p=0.411 n=6+6)
RegexpMatchHard_32-8 24.9MB/s ± 2% 24.8MB/s ± 0% ~ (p=0.063 n=6+6)
RegexpMatchHard_1K-8 26.6MB/s ± 0% 26.7MB/s ± 0% +0.25% (p=0.009 n=6+5)
Revcomp-8 640MB/s ± 0% 641MB/s ± 0% ~ (p=0.429 n=6+5)
Template-8 40.4MB/s ± 1% 40.3MB/s ± 0% ~ (p=0.222 n=5+5)
[Geo mean] 175MB/s 177MB/s +1.05%