Skip to content

cmd/compile: performance regression due to mid-stack inlining changes #19386

@josharian

Description

@josharian

Measuring from ed70f37 (just before mid-stack inlining changes began) to 9fd359a (tip as of writing), I see:

name       old time/op      new time/op      delta
Template        223ms ± 3%       232ms ± 2%   +4.03%  (p=0.000 n=20+19)
Unicode         100ms ± 4%       102ms ± 4%   +1.98%  (p=0.001 n=19+20)
GoTypes         622ms ± 4%       625ms ± 4%     ~     (p=0.945 n=20+19)
SSA             4.40s ± 4%       4.54s ± 3%   +3.22%  (p=0.000 n=20+20)
Flate           129ms ± 4%       133ms ± 2%   +3.30%  (p=0.000 n=19+18)
GoParser        156ms ± 4%       160ms ± 2%   +2.62%  (p=0.000 n=18+19)
Reflect         394ms ± 3%       405ms ± 4%   +2.77%  (p=0.000 n=20+20)
Tar             117ms ± 3%       121ms ± 5%   +3.25%  (p=0.000 n=20+20)
XML             219ms ± 2%       224ms ± 2%   +2.23%  (p=0.000 n=19+20)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 3%  298user-ms ± 3%   +9.06%  (p=0.000 n=20+20)
Unicode    142user-ms ± 5%  142user-ms ± 2%     ~     (p=0.916 n=19+18)
GoTypes    846user-ms ± 3%  845user-ms ± 4%     ~     (p=0.879 n=19+20)
SSA        6.55user-s ± 1%  6.93user-s ± 4%   +5.91%  (p=0.000 n=19+20)
Flate      162user-ms ± 4%  168user-ms ± 5%   +3.53%  (p=0.000 n=19+20)
GoParser   204user-ms ± 5%  206user-ms ± 6%     ~     (p=0.234 n=19+19)
Reflect    503user-ms ± 4%  514user-ms ± 5%   +2.11%  (p=0.004 n=20+20)
Tar        150user-ms ± 6%  154user-ms ± 5%   +2.89%  (p=0.002 n=20+20)
XML        282user-ms ± 2%  282user-ms ± 2%     ~     (p=0.828 n=18+20)

name       old alloc/op     new alloc/op     delta
Template       39.8MB ± 0%      42.6MB ± 0%   +6.96%  (p=0.000 n=19+20)
Unicode        31.0MB ± 0%      31.7MB ± 0%   +2.22%  (p=0.000 n=20+20)
GoTypes         116MB ± 0%       124MB ± 0%   +6.73%  (p=0.000 n=20+20)
SSA             889MB ± 0%       989MB ± 0%  +11.28%  (p=0.000 n=20+20)
Flate          25.9MB ± 0%      27.8MB ± 0%   +7.32%  (p=0.000 n=20+19)
GoParser       31.8MB ± 0%      34.3MB ± 0%   +8.02%  (p=0.000 n=19+20)
Reflect        80.9MB ± 0%      84.6MB ± 0%   +4.63%  (p=0.000 n=20+20)
Tar            27.3MB ± 0%      28.8MB ± 0%   +5.59%  (p=0.000 n=18+20)
XML            43.9MB ± 0%      47.2MB ± 0%   +7.59%  (p=0.000 n=20+20)

name       old allocs/op    new allocs/op    delta
Template         377k ± 0%        421k ± 1%  +11.67%  (p=0.000 n=19+20)
Unicode          324k ± 1%        338k ± 1%   +4.37%  (p=0.000 n=20+20)
GoTypes         1.14M ± 0%       1.28M ± 0%  +12.00%  (p=0.000 n=19+20)
SSA             7.88M ± 0%       9.14M ± 0%  +16.03%  (p=0.000 n=20+20)
Flate            239k ± 1%        267k ± 1%  +11.46%  (p=0.000 n=20+19)
GoParser         308k ± 1%        347k ± 1%  +12.66%  (p=0.000 n=19+20)
Reflect          994k ± 0%       1075k ± 0%   +8.11%  (p=0.000 n=20+19)
Tar              252k ± 0%        274k ± 1%   +8.88%  (p=0.000 n=18+20)
XML              396k ± 1%        449k ± 0%  +13.42%  (p=0.000 n=20+20)

These negate a month's worth of the 1.9 toolspeed improvements and some. Measuring from the go1.8 tag to tip:

name       old time/op      new time/op      delta
Template        229ms ± 4%       232ms ± 2%  +1.52%  (p=0.000 n=97+19)
Unicode         100ms ± 6%       102ms ± 4%  +1.43%  (p=0.014 n=99+20)
Reflect         425ms ± 6%       405ms ± 4%  -4.71%  (p=0.000 n=99+20)
Tar             119ms ± 5%       121ms ± 5%  +1.84%  (p=0.000 n=98+20)
XML             224ms ± 4%       224ms ± 2%    ~     (p=0.424 n=98+20)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 5%  298user-ms ± 3%  +8.85%  (p=0.000 n=98+20)
Unicode    138user-ms ± 7%  142user-ms ± 2%  +2.79%  (p=0.000 n=99+18)
Reflect    537user-ms ± 5%  514user-ms ± 5%  -4.28%  (p=0.000 n=99+20)
Tar        150user-ms ± 8%  154user-ms ± 5%  +2.32%  (p=0.001 n=99+20)
XML        285user-ms ± 6%  282user-ms ± 2%    ~     (p=0.161 n=97+20)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      42.6MB ± 0%  +4.49%  (p=0.000 n=97+20)
Unicode        30.6MB ± 0%      31.7MB ± 0%  +3.65%  (p=0.000 n=100+20)
Reflect        84.3MB ± 0%      84.6MB ± 0%  +0.39%  (p=0.000 n=98+20)
Tar            27.3MB ± 0%      28.8MB ± 0%  +5.57%  (p=0.000 n=98+20)
XML            44.7MB ± 0%      47.2MB ± 0%  +5.64%  (p=0.000 n=99+20)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        421k ± 1%  +4.98%  (p=0.000 n=95+20)
Unicode          331k ± 1%        338k ± 1%  +1.94%  (p=0.000 n=100+20)
Reflect         1.06M ± 0%       1.07M ± 0%  +1.78%  (p=0.000 n=99+19)
Tar              266k ± 1%        274k ± 1%  +3.03%  (p=0.000 n=98+20)
XML              417k ± 1%        449k ± 0%  +7.52%  (p=0.000 n=99+20)

It'd be nice to see whether some of these performance regressions can be undone. Let's revisit once mid-stack inlining is complete?

cc @davidlazar @mdempsky @randall77

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions