-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strings: Split1 slowdown #19241
Comments
/cc @josharian @aclements |
I'd argue that we should fix this by inserting pre-emption points as necessary, so that we can keep the cost savings of function prologue elimination where possible (which is most places). |
/cc @dr2chase |
I think the answer is probably preemption points. We have the better GOXPERIMENT for 1.9, Keith's better spill location is in, and I'm working on unrolling right now to amortize the overhead (I assume loop unrolling will have benefits, but also costs from interference with other optimizations). |
@TocarIP - I don't see any |
Ah thanks. With latest tip (devel +d1f7470c21 Sat Sep 29 04:26:02 2018 +0000 linux/amd64 name time/op |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.8
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
What did you do?
Run strings/BenchmarkSplit1 benchmark
What did you expect to see?
Same performance as 1.7
What did you see instead?
3x slowdown
Split1-4 12.0ms ± 0% 52.7ms ± 2% +337.56% (p=0.002 n=6+6)
I've also reproduced this on tip. 8946502 renamed Split1 to SplitEmptySeparator, but kept benchmark the same.
This performance regression was caused by b92d39e
Investigation shows that extra time is spent in mostly GC,runtime.procyield and runtime.writebarrierptr_prewrite1. This is due to missing stack check prologue in utf8.DecodeRuneInString (disabling this optimization for DecodeRuneInString removes regression), which is called in a loop inside a benchmark.
Because of missing stack size check goroutine cannot be preempted, preventing concurent GC and resulting in worse performance.
If #10958 is fixed, this particular regression should disappear.
I'm not sure whether having more preemption points is more important than having lower call overhead.
The text was updated successfully, but these errors were encountered: