Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cmd/compile: boost inlining into FORs
As already Than McIntosh mentioned it's a common practise to boost inlining to FORs, since the callsite could be "hotter". This patch implements this functionality. The implementation uses a stack of FORs to recognise which calls are in a loop. The stack is maintained alongside inlnode function works and contains information about ancenstor FORs relative to a current node in inlnode. The forContext contains a liveCounter which shows for how many nodes this FOR is ancestor. Current constants are the following: A "big" FOR is a FOR which contains >=inlineBigForNodes(50) nodes or has more than inlineBigForCallNodes(5) inlinable call nodes. In such FORs no boost is applied. Other FORs are considired to be small and boost callsites with an extra budget equals to inlineExtraForBudget(20). Updates golang#17566 The following results on GO1, while binary size not increased significantly 10441232 -> 10465920, which is less than 0.3%. goos: linux goarch: amd64 pkg: test/bench/go1 cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz name old time/op new time/op delta BinaryTree17-8 2.15s ± 1% 2.15s ± 1% ~ (p=0.589 n=6+6) Fannkuch11-8 2.70s ± 0% 2.70s ± 0% -0.08% (p=0.002 n=6+6) FmtFprintfEmpty-8 31.9ns ± 0% 31.9ns ± 3% ~ (p=0.907 n=6+6) FmtFprintfString-8 57.0ns ± 0% 57.6ns ± 0% +1.19% (p=0.004 n=5+6) FmtFprintfInt-8 65.2ns ± 0% 64.1ns ± 0% -1.57% (p=0.002 n=6+6) FmtFprintfIntInt-8 103ns ± 0% 103ns ± 0% ~ (p=0.079 n=5+4) FmtFprintfPrefixedInt-8 119ns ± 0% 118ns ± 0% -0.37% (p=0.008 n=5+5) FmtFprintfFloat-8 169ns ± 0% 173ns ± 0% +2.55% (p=0.004 n=5+6) FmtManyArgs-8 450ns ± 1% 450ns ± 0% ~ (p=1.000 n=6+6) GobDecode-8 4.38ms ± 1% 4.35ms ± 1% ~ (p=0.132 n=6+6) GobEncode-8 3.07ms ± 0% 3.06ms ± 0% -0.38% (p=0.009 n=6+6) Gzip-8 195ms ± 0% 195ms ± 0% ~ (p=0.095 n=5+5) Gunzip-8 28.2ms ± 0% 28.4ms ± 0% +0.57% (p=0.004 n=6+6) HTTPClientServer-8 45.1µs ± 1% 45.3µs ± 1% ~ (p=0.082 n=5+6) JSONEncode-8 7.98ms ± 1% 7.94ms ± 0% -0.47% (p=0.015 n=6+6) JSONDecode-8 35.4ms ± 1% 35.1ms ± 0% -1.04% (p=0.002 n=6+6) Mandelbrot200-8 4.50ms ± 0% 4.50ms ± 0% ~ (p=0.699 n=6+6) GoParse-8 2.98ms ± 0% 2.99ms ± 1% ~ (p=0.095 n=5+5) RegexpMatchEasy0_32-8 55.5ns ± 1% 52.8ns ± 2% -4.94% (p=0.002 n=6+6) RegexpMatchEasy0_1K-8 178ns ± 0% 162ns ± 1% -9.18% (p=0.002 n=6+6) RegexpMatchEasy1_32-8 50.1ns ± 0% 48.4ns ± 2% -3.34% (p=0.002 n=6+6) RegexpMatchEasy1_1K-8 272ns ± 2% 268ns ± 1% ~ (p=0.065 n=6+6) RegexpMatchMedium_32-8 907ns ± 5% 897ns ± 7% ~ (p=0.660 n=6+6) RegexpMatchMedium_1K-8 26.5µs ± 0% 26.6µs ± 0% +0.41% (p=0.008 n=5+5) RegexpMatchHard_32-8 1.28µs ± 0% 1.29µs ± 1% ~ (p=0.167 n=6+6) RegexpMatchHard_1K-8 38.5µs ± 0% 38.6µs ± 0% ~ (p=0.126 n=6+5) Revcomp-8 398ms ± 0% 395ms ± 0% -0.64% (p=0.010 n=6+4) Template-8 48.4ms ± 0% 47.8ms ± 0% -1.30% (p=0.008 n=5+5) TimeParse-8 213ns ± 0% 213ns ± 0% ~ (p=0.108 n=6+6) TimeFormat-8 294ns ± 0% 259ns ± 0% -11.86% (p=0.000 n=5+6) [Geo mean] 40.4µs 40.0µs -1.11% name old speed new speed delta GobDecode-8 175MB/s ± 1% 176MB/s ± 1% ~ (p=0.132 n=6+6) GobEncode-8 250MB/s ± 0% 251MB/s ± 0% +0.38% (p=0.009 n=6+6) Gzip-8 99.3MB/s ± 0% 99.4MB/s ± 0% ~ (p=0.095 n=5+5) Gunzip-8 687MB/s ± 0% 683MB/s ± 0% -0.57% (p=0.004 n=6+6) JSONEncode-8 243MB/s ± 1% 244MB/s ± 0% +0.47% (p=0.015 n=6+6) JSONDecode-8 54.8MB/s ± 1% 55.3MB/s ± 0% +1.04% (p=0.002 n=6+6) GoParse-8 19.4MB/s ± 0% 19.4MB/s ± 1% ~ (p=0.103 n=5+5) RegexpMatchEasy0_32-8 576MB/s ± 1% 606MB/s ± 2% +5.21% (p=0.002 n=6+6) RegexpMatchEasy0_1K-8 5.75GB/s ± 0% 6.33GB/s ± 1% +10.10% (p=0.002 n=6+6) RegexpMatchEasy1_32-8 639MB/s ± 0% 661MB/s ± 2% +3.47% (p=0.002 n=6+6) RegexpMatchEasy1_1K-8 3.76GB/s ± 2% 3.82GB/s ± 1% ~ (p=0.065 n=6+6) RegexpMatchMedium_32-8 35.4MB/s ± 5% 35.7MB/s ± 7% ~ (p=0.615 n=6+6) RegexpMatchMedium_1K-8 38.6MB/s ± 0% 38.4MB/s ± 0% -0.40% (p=0.008 n=5+5) RegexpMatchHard_32-8 25.0MB/s ± 0% 24.8MB/s ± 1% ~ (p=0.167 n=6+6) RegexpMatchHard_1K-8 26.6MB/s ± 0% 26.6MB/s ± 0% ~ (p=0.238 n=5+5) Revcomp-8 639MB/s ± 0% 643MB/s ± 0% +0.65% (p=0.010 n=6+4) Template-8 40.1MB/s ± 0% 40.6MB/s ± 0% +1.32% (p=0.008 n=5+5) [Geo mean] 176MB/s 178MB/s +1.38%
- Loading branch information