New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: stack copying performance regressions #28678

Open
josharian opened this Issue Nov 8, 2018 · 8 comments

Comments

Projects
None yet
4 participants
@josharian
Contributor

josharian commented Nov 8, 2018

The runtime stack copying benchmarks have regressed significantly since Go 1.11.

A very quick benchmark run shows:

name                old time/op  new time/op   delta
StackCopyPtr-8      88.9ms ± 1%  117.7ms ± 1%  +32.34%  (p=0.000 n=5+17)
StackCopy-8         65.2ms ± 1%   95.8ms ± 1%  +46.99%  (p=0.000 n=5+19)
StackCopyNoCache-8   107ms ± 1%    135ms ± 2%  +26.23%  (p=0.000 n=5+19)

cc @aclements

@josharian josharian added this to the Go1.12 milestone Nov 8, 2018

@aclements

This comment has been minimized.

Member

aclements commented Nov 8, 2018

That's surprising. You don't happen to have a bisect handy, do you? (If not, I can run one.)

@josharian

This comment has been minimized.

Contributor

josharian commented Nov 8, 2018

I was planning to bisect tomorrow; laptop otherwise occupied today. Go for it.

@aclements

This comment has been minimized.

Member

aclements commented Nov 8, 2018

Bisect started

cd runtime && mkdir -p issue-28678 && pl benchmany -d issue-28678 -order metric -benchflags '-test.run NONE -test.bench StackCopy$' go1.11..master
@aclements

This comment has been minimized.

Member

aclements commented Nov 8, 2018

Unsurprisingly, this was cbafcc5 "cmd/compile,runtime: implement stack objects":

$ benchstat2 bench.log 433496615f cbafcc55e8
name          old time/op  new time/op  delta
StackCopy-12  86.6ms ± 0%  94.4ms ± 1%  +8.99%  (p=0.008 n=5+5)

/cc @randall77

@josharian

This comment has been minimized.

Contributor

josharian commented Nov 8, 2018

I'm seeing a much larger regression here. But on a hunch, I checked, and the second half of the regression starts at c803ffc, making it a dup of #28595. cc @mknyszek as FYI.

@randall77

This comment has been minimized.

Contributor

randall77 commented Nov 26, 2018

I'm not seeing the difference reported here. Tip seems faster than 1.11.2.

name              old time/op  new time/op  delta
StackCopyPtr      93.3ms ± 2%  90.8ms ± 2%  -2.77%  (p=0.003 n=8+8)
StackCopy         68.5ms ± 1%  67.4ms ± 2%  -1.54%  (p=0.002 n=8+8)
StackCopyNoCache   118ms ± 5%   120ms ± 2%    ~     (p=0.083 n=8+8)

It would be strange for these benchmarks to be affected by the extra stack object code. None of them has a stack object, so the additional cost is just another funcdata call and an empty for loop per frame.

GC stack scanning might be more expensive for StackCopyPtr as there are a lot of intra-stack pointers, but stack copying won't see that cost.

@josharian

This comment has been minimized.

Contributor

josharian commented Nov 28, 2018

Tip is faster than 1.11 because https://go-review.googlesource.com/c/go/+/110564 went in, which offset the other slowdown. Maybe that’s fine? Might still be worth a quick look before/after stack objects to make sure there’s no low hanging fruit.

@andybons

This comment has been minimized.

Member

andybons commented Nov 28, 2018

Given the offset, removing release-blocker. This needs more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment