Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: ppc64 function pointer call performance regression #42709

Closed
pmur opened this issue Nov 18, 2020 · 3 comments
Closed

cmd/compile: ppc64 function pointer call performance regression #42709

pmur opened this issue Nov 18, 2020 · 3 comments

Comments

@pmur
Copy link
Contributor

@pmur pmur commented Nov 18, 2020

There seems to be a large regression in performance when function pointer calls where changed to use the link register instead of the count register. Based on when this change went in, this likely affects all versions back to go1.14. The impact can be mitigated by setting the appropriate branch hint for this usage e.g

The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:

name                          old time/op    new time/op     delta
Find                             606ns ± 0%      447ns ± 0%  -26.27%
FindAllNoMatches                 309ns ± 0%      205ns ± 0%  -33.72%
FindString                       609ns ± 0%      451ns ± 0%  -26.04%
FindSubmatch                     734ns ± 0%      594ns ± 0%  -19.07%
FindStringSubmatch               706ns ± 0%      574ns ± 0%  -18.83%
Literal                          177ns ± 0%      136ns ± 0%  -22.89%
NotLiteral                      4.69µs ± 0%     2.34µs ± 0%  -50.14%
MatchClass                      6.05µs ± 0%     3.26µs ± 0%  -46.08%
MatchClass_InRange              5.93µs ± 0%     3.15µs ± 0%  -46.86%
ReplaceAll                      3.15µs ± 0%     2.18µs ± 0%  -30.77%
AnchoredLiteralShortNonMatch     156ns ± 0%      109ns ± 0%  -30.61%
AnchoredLiteralLongNonMatch      192ns ± 0%      136ns ± 0%  -29.34%
AnchoredShortMatch               268ns ± 0%      209ns ± 0%  -22.00%
AnchoredLongMatch                472ns ± 0%      357ns ± 0%  -24.30%
OnePassShortA                   1.16µs ± 0%     0.87µs ± 0%  -25.03%
NotOnePassShortA                1.34µs ± 0%     1.20µs ± 0%  -10.63%
OnePassShortB                    940ns ± 0%      655ns ± 0%  -30.29%
NotOnePassShortB                 873ns ± 0%      703ns ± 0%  -19.52%
OnePassLongPrefix                258ns ± 0%      155ns ± 0%  -40.13%
OnePassLongNotPrefix             943ns ± 0%      529ns ± 0%  -43.89%
MatchParallelShared              591ns ± 0%      436ns ± 0%  -26.31%
MatchParallelCopied              596ns ± 0%      435ns ± 0%  -27.10%
QuoteMetaAll                     186ns ± 0%      186ns ± 0%   -0.16%
QuoteMetaNone                   55.9ns ± 0%     55.9ns ± 0%   +0.02%
Compile/Onepass                 9.64µs ± 0%     9.26µs ± 0%   -3.97%
Compile/Medium                  21.7µs ± 0%     20.6µs ± 0%   -4.90%
Compile/Hard                     174µs ± 0%      174µs ± 0%   +0.07%
Match/Easy0/16                  7.35ns ± 0%     7.34ns ± 0%   -0.11%
Match/Easy0/32                   116ns ± 0%       97ns ± 0%  -16.27%
Match/Easy0/1K                   592ns ± 0%      562ns ± 0%   -5.04%
Match/Easy0/32K                 12.6µs ± 0%     12.5µs ± 0%   -0.64%
Match/Easy0/1M                   556µs ± 0%      556µs ± 0%   -0.00%
Match/Easy0/32M                 17.7ms ± 0%     17.7ms ± 0%   +0.05%
Match/Easy0i/16                 7.34ns ± 0%     7.35ns ± 0%   +0.10%
Match/Easy0i/32                 2.82µs ± 0%     1.64µs ± 0%  -41.71%
Match/Easy0i/1K                 83.2µs ± 0%     48.2µs ± 0%  -42.06%
Match/Easy0i/32K                2.13ms ± 0%     1.80ms ± 0%  -15.34%
Match/Easy0i/1M                 68.1ms ± 0%     57.6ms ± 0%  -15.31%
Match/Easy0i/32M                 2.18s ± 0%      1.80s ± 0%  -17.52%
Match/Easy1/16                  7.36ns ± 0%     7.34ns ± 0%   -0.24%
Match/Easy1/32                   118ns ± 0%       96ns ± 0%  -18.72%
Match/Easy1/1K                  2.46µs ± 0%     1.58µs ± 0%  -35.65%
Match/Easy1/32K                 80.2µs ± 0%     54.6µs ± 0%  -31.92%
Match/Easy1/1M                  2.75ms ± 0%     1.88ms ± 0%  -31.66%
Match/Easy1/32M                 87.5ms ± 0%     59.8ms ± 0%  -31.62%
Match/Medium/16                 7.34ns ± 0%     7.34ns ± 0%   +0.01%
Match/Medium/32                 2.60µs ± 0%     1.50µs ± 0%  -42.61%
Match/Medium/1K                 78.1µs ± 0%     43.7µs ± 0%  -44.06%
Match/Medium/32K                2.08ms ± 0%     1.52ms ± 0%  -27.11%
Match/Medium/1M                 66.5ms ± 0%     48.6ms ± 0%  -26.96%
Match/Medium/32M                 2.14s ± 0%      1.60s ± 0%  -25.18%
Match/Hard/16                   7.35ns ± 0%     7.35ns ± 0%   +0.03%
Match/Hard/32                   3.58µs ± 0%     2.44µs ± 0%  -31.82%
Match/Hard/1K                    108µs ± 0%       75µs ± 0%  -31.04%
Match/Hard/32K                  2.79ms ± 0%     2.25ms ± 0%  -19.30%
Match/Hard/1M                   89.4ms ± 0%     72.2ms ± 0%  -19.26%
Match/Hard/32M                   2.91s ± 0%      2.37s ± 0%  -18.60%
Match/Hard1/16                  11.1µs ± 0%      8.3µs ± 0%  -25.07%
Match/Hard1/32                  21.4µs ± 0%     16.1µs ± 0%  -24.85%
Match/Hard1/1K                   658µs ± 0%      498µs ± 0%  -24.27%
Match/Hard1/32K                 12.2ms ± 0%     11.7ms ± 0%   -4.60%
Match/Hard1/1M                   391ms ± 0%      374ms ± 0%   -4.40%
Match/Hard1/32M                  12.6s ± 0%      12.0s ± 0%   -4.68%
Match_onepass_regex/16           870ns ± 0%      611ns ± 0%  -29.79%
Match_onepass_regex/32          1.58µs ± 0%     1.08µs ± 0%  -31.48%
Match_onepass_regex/1K          45.7µs ± 0%     30.3µs ± 0%  -33.58%
Match_onepass_regex/32K         1.45ms ± 0%     0.97ms ± 0%  -33.20%
Match_onepass_regex/1M          46.2ms ± 0%     30.9ms ± 0%  -33.01%
Match_onepass_regex/32M          1.46s ± 0%      0.99s ± 0%  -32.02%

name                          old alloc/op   new alloc/op    delta
Find                             0.00B           0.00B         0.00%
FindAllNoMatches                 0.00B           0.00B         0.00%
FindString                       0.00B           0.00B         0.00%
FindSubmatch                     48.0B ± 0%      48.0B ± 0%    0.00%
FindStringSubmatch               32.0B ± 0%      32.0B ± 0%    0.00%
Compile/Onepass                 4.02kB ± 0%     4.02kB ± 0%    0.00%
Compile/Medium                  9.39kB ± 0%     9.39kB ± 0%    0.00%
Compile/Hard                    84.7kB ± 0%     84.7kB ± 0%    0.00%
Match_onepass_regex/16           0.00B           0.00B         0.00%
Match_onepass_regex/32           0.00B           0.00B         0.00%
Match_onepass_regex/1K           0.00B           0.00B         0.00%
Match_onepass_regex/32K          0.00B           0.00B         0.00%
Match_onepass_regex/1M           5.00B ± 0%      3.00B ± 0%  -40.00%
Match_onepass_regex/32M           136B ± 0%        68B ± 0%  -50.00%

name                          old allocs/op  new allocs/op   delta
Find                              0.00            0.00         0.00%
FindAllNoMatches                  0.00            0.00         0.00%
FindString                        0.00            0.00         0.00%
FindSubmatch                      1.00 ± 0%       1.00 ± 0%    0.00%
FindStringSubmatch                1.00 ± 0%       1.00 ± 0%    0.00%
Compile/Onepass                   52.0 ± 0%       52.0 ± 0%    0.00%
Compile/Medium                     112 ± 0%        112 ± 0%    0.00%
Compile/Hard                       424 ± 0%        424 ± 0%    0.00%
Match_onepass_regex/16            0.00            0.00         0.00%
Match_onepass_regex/32            0.00            0.00         0.00%
Match_onepass_regex/1K            0.00            0.00         0.00%
Match_onepass_regex/32K           0.00            0.00         0.00%
Match_onepass_regex/1M            0.00            0.00         0.00%
Match_onepass_regex/32M           2.00 ± 0%       1.00 ± 0%  -50.00%

name                          old speed      new speed       delta
QuoteMetaAll                  75.2MB/s ± 0%   75.3MB/s ± 0%   +0.15%
QuoteMetaNone                  465MB/s ± 0%    465MB/s ± 0%   -0.02%
Match/Easy0/16                2.18GB/s ± 0%   2.18GB/s ± 0%   +0.10%
Match/Easy0/32                 276MB/s ± 0%    330MB/s ± 0%  +19.46%
Match/Easy0/1K                1.73GB/s ± 0%   1.82GB/s ± 0%   +5.29%
Match/Easy0/32K               2.60GB/s ± 0%   2.62GB/s ± 0%   +0.64%
Match/Easy0/1M                1.89GB/s ± 0%   1.89GB/s ± 0%   +0.00%
Match/Easy0/32M               1.89GB/s ± 0%   1.89GB/s ± 0%   -0.05%
Match/Easy0i/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.10%
Match/Easy0i/32               11.4MB/s ± 0%   19.5MB/s ± 0%  +71.48%
Match/Easy0i/1K               12.3MB/s ± 0%   21.2MB/s ± 0%  +72.62%
Match/Easy0i/32K              15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/1M               15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/32M              15.4MB/s ± 0%   18.6MB/s ± 0%  +21.21%
Match/Easy1/16                2.17GB/s ± 0%   2.18GB/s ± 0%   +0.24%
Match/Easy1/32                 271MB/s ± 0%    333MB/s ± 0%  +23.07%
Match/Easy1/1K                 417MB/s ± 0%    648MB/s ± 0%  +55.38%
Match/Easy1/32K                409MB/s ± 0%    600MB/s ± 0%  +46.88%
Match/Easy1/1M                 381MB/s ± 0%    558MB/s ± 0%  +46.33%
Match/Easy1/32M                383MB/s ± 0%    561MB/s ± 0%  +46.25%
Match/Medium/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.01%
Match/Medium/32               12.3MB/s ± 0%   21.4MB/s ± 0%  +74.13%
Match/Medium/1K               13.1MB/s ± 0%   23.4MB/s ± 0%  +78.73%
Match/Medium/32K              15.7MB/s ± 0%   21.6MB/s ± 0%  +37.23%
Match/Medium/1M               15.8MB/s ± 0%   21.6MB/s ± 0%  +36.93%
Match/Medium/32M              15.7MB/s ± 0%   21.0MB/s ± 0%  +33.67%
Match/Hard/16                 2.18GB/s ± 0%   2.18GB/s ± 0%   -0.03%
Match/Hard/32                 8.93MB/s ± 0%  13.10MB/s ± 0%  +46.70%
Match/Hard/1K                 9.48MB/s ± 0%  13.74MB/s ± 0%  +44.94%
Match/Hard/32K                11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/1M                 11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/32M                11.6MB/s ± 0%   14.2MB/s ± 0%  +22.86%
Match/Hard1/16                1.44MB/s ± 0%   1.93MB/s ± 0%  +34.03%
Match/Hard1/32                1.49MB/s ± 0%   1.99MB/s ± 0%  +33.56%
Match/Hard1/1K                1.56MB/s ± 0%   2.05MB/s ± 0%  +31.41%
Match/Hard1/32K               2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/1M                2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/32M               2.66MB/s ± 0%   2.79MB/s ± 0%   +4.89%
Match_onepass_regex/16        18.4MB/s ± 0%   26.2MB/s ± 0%  +42.41%
Match_onepass_regex/32        20.2MB/s ± 0%   29.5MB/s ± 0%  +45.92%
Match_onepass_regex/1K        22.4MB/s ± 0%   33.8MB/s ± 0%  +50.54%
Match_onepass_regex/32K       22.6MB/s ± 0%   33.9MB/s ± 0%  +49.67%
Match_onepass_regex/1M        22.7MB/s ± 0%   33.9MB/s ± 0%  +49.27%
Match_onepass_regex/32M       23.0MB/s ± 0%   33.9MB/s ± 0%  +47.14%
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 18, 2020

Change https://golang.org/cl/271337 mentions this issue: cmd/compile,cmd/asm: insert hint when using blrl on ppc64

Loading

@laboger
Copy link
Contributor

@laboger laboger commented Nov 18, 2020

We realize this is during the freeze but thought we would see if it might possibly go in since it makes a nice improvement and is a very simple fix.

Loading

@ianlancetaylor ianlancetaylor changed the title ppc64 function pointer call performance regression cmd/compile: ppc64 function pointer call performance regression Nov 19, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Nov 19, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2020

This seems like a bug fix to me. I think it's fine during the freeze. Thanks.

Loading

@gopherbot gopherbot closed this in cb674b5 Nov 19, 2020
@golang golang locked and limited conversation to collaborators Nov 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants