Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: NilCheckDeep performance regression #25389

Closed
TocarIP opened this issue May 14, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@TocarIP
Copy link
Contributor

commented May 14, 2018

After 00c8e14 NilCheckDeep benchmark shows significant regression:

NilCheckDeep1-8         358ns ± 2%     374ns ± 1%   +4.64%  (p=0.000 n=10+10)
NilCheckDeep10-8        641ns ± 3%     690ns ± 2%   +7.69%  (p=0.000 n=10+10)
NilCheckDeep100-8      3.52µs ± 4%    4.03µs ± 3%  +14.44%  (p=0.000 n=10+10)
NilCheckDeep1000-8     39.1µs ± 2%    45.4µs ± 1%  +15.96%  (p=0.000 n=10+9)
NilCheckDeep10000-8     412µs ± 2%     484µs ± 2%  +17.53%  (p=0.000 n=10+10)

name                 old speed      new speed      delta
NilCheckDeep1-8      2.79MB/s ± 2%  2.67MB/s ± 1%   -4.48%  (p=0.000 n=10+10)
NilCheckDeep10-8     15.6MB/s ± 3%  14.5MB/s ± 2%   -7.14%  (p=0.000 n=10+10)
NilCheckDeep100-8    28.4MB/s ± 4%  24.8MB/s ± 3%  -12.64%  (p=0.000 n=10+10)
NilCheckDeep1000-8   25.5MB/s ± 2%  22.0MB/s ± 1%  -13.77%  (p=0.000 n=10+9)
NilCheckDeep10000-8  24.3MB/s ± 2%  20.7MB/s ± 2%  -14.92%  (p=0.000 n=10+10)

name                 old alloc/op   new alloc/op   delta
NilCheckDeep1-8          101B ± 0%       93B ± 0%   -7.92%  (p=0.000 n=10+10)
NilCheckDeep10-8         192B ± 0%      184B ± 0%   -4.17%  (p=0.000 n=10+10)
NilCheckDeep100-8      1.17kB ± 0%    1.16kB ± 0%   -0.68%  (p=0.000 n=10+10)
NilCheckDeep1000-8     10.3kB ± 0%    10.3kB ± 0%   -0.08%  (p=0.000 n=10+10)
NilCheckDeep10000-8     102kB ± 0%     102kB ± 0%   -0.01%  (p=0.000 n=10+10)

name                 old allocs/op  new allocs/op  delta
NilCheckDeep1-8          6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep10-8         6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep100-8        6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep1000-8       6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep10000-8      6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)

This is caused by extra time spent in zeroing 256 bytes for:

stores := make([]*Value, 0, 64)

Reducing 64 to 8 removes degradation

@TocarIP

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2018

Running compilerbench with reduced size, shows compile time degradation, so NilCheckDeep looks like a bad benchmark. Should we just remove it altogether?
cc @randall77 @ALTree

@ALTree

This comment has been minimized.

Copy link
Member

commented May 14, 2018

Running compilerbench with reduced size, shows compile time degradation

This doesn't surprise me. For those stack-allocate changes I always tried to choose the smallest cap constant that would cover most of the calls. Reducing the cap will negate the effect and actually make things worse, because if we stack-allocate small (like with cap = 8) and then most calls actually go over 8, we'll have to do even more work because we'll often need to move the worklist to the heap.

NilCheckDeep looks like a bad benchmark.

Even if it doesn't reflect real-world compiler performance, it could still be useful as a microbenchmark for that specific part of the compiler. The comment says:

benchmarkNilCheckDeep is a stress test of nilcheckelim.
...
Run with multiple depths to observe big-O behavior.

which makes sense. My vote is to keep it, and also keep my stack-allocate change (since it makes the compiler faster); and ignore the benchmarkNilCheckDeep regression, since it's a stress-test by its own admission.

@ALTree ALTree added this to the Go1.11 milestone May 14, 2018

@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented May 15, 2018

@ALTree I don't have time to test right now, but wouldn't something like this help a bit? or would it still trigger the zeroing?

var stash [64]*Value
store := stash[:0]
@ALTree

This comment has been minimized.

Copy link
Member

commented May 15, 2018

@OneOfOne Just checked, they generate basically the exact same code. Both go on the stack and then there's a DUFFZERO $152 call.

@TocarIP

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2018

My vote is to keep it, and also keep my stack-allocate change (since it makes the compiler faster); and ignore the benchmarkNilCheckDeep regression, since it's a stress-test by its own admission.

Agreed, closing.

@TocarIP TocarIP closed this May 15, 2018

@golang golang locked and limited conversation to collaborators May 15, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.