New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: Split1 slowdown #19241

Open
TocarIP opened this Issue Feb 22, 2017 · 7 comments

Comments

Projects
None yet
5 participants
@TocarIP
Copy link
Contributor

TocarIP commented Feb 22, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.8

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"

What did you do?

Run strings/BenchmarkSplit1 benchmark

What did you expect to see?

Same performance as 1.7

What did you see instead?

3x slowdown
Split1-4 12.0ms ± 0% 52.7ms ± 2% +337.56% (p=0.002 n=6+6)

I've also reproduced this on tip. 8946502 renamed Split1 to SplitEmptySeparator, but kept benchmark the same.
This performance regression was caused by b92d39e
Investigation shows that extra time is spent in mostly GC,runtime.procyield and runtime.writebarrierptr_prewrite1. This is due to missing stack check prologue in utf8.DecodeRuneInString (disabling this optimization for DecodeRuneInString removes regression), which is called in a loop inside a benchmark.
Because of missing stack size check goroutine cannot be preempted, preventing concurent GC and resulting in worse performance.
If #10958 is fixed, this particular regression should disappear.
I'm not sure whether having more preemption points is more important than having lower call overhead.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Feb 22, 2017

@josharian

This comment has been minimized.

Copy link
Contributor

josharian commented Feb 23, 2017

I'd argue that we should fix this by inserting pre-emption points as necessary, so that we can keep the cost savings of function prologue elimination where possible (which is most places).

@bradfitz bradfitz added this to the Go1.9Maybe milestone Mar 21, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Mar 21, 2017

/cc @dr2chase

@dr2chase

This comment has been minimized.

Copy link
Contributor

dr2chase commented Mar 23, 2017

I think the answer is probably preemption points. We have the better GOXPERIMENT for 1.9, Keith's better spill location is in, and I'm working on unrolling right now to amortize the overhead (I assume loop unrolling will have benefits, but also costs from interference with other optimizations).

@agnivade

This comment has been minimized.

Copy link
Member

agnivade commented Oct 1, 2018

@TocarIP - I don't see any BenchmarkSplit1 function any more. What am I missing ?

@TocarIP

This comment has been minimized.

Copy link
Contributor

TocarIP commented Oct 1, 2018

@agnivade Split1 was renamed into SplitEmptySeparator in 8946502

@agnivade

This comment has been minimized.

Copy link
Member

agnivade commented Oct 1, 2018

Ah thanks.

With latest tip (devel +d1f7470c21 Sat Sep 29 04:26:02 2018 +0000 linux/amd64
), things look improved now

name time/op
SplitEmptySeparator-4 34.6ms ±10%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment