I've also reproduced this on tip. 8946502 renamed Split1 to SplitEmptySeparator, but kept benchmark the same.
This performance regression was caused by b92d39e
Investigation shows that extra time is spent in mostly GC,runtime.procyield and runtime.writebarrierptr_prewrite1. This is due to missing stack check prologue in utf8.DecodeRuneInString (disabling this optimization for DecodeRuneInString removes regression), which is called in a loop inside a benchmark.
Because of missing stack size check goroutine cannot be preempted, preventing concurent GC and resulting in worse performance.
If #10958 is fixed, this particular regression should disappear.
I'm not sure whether having more preemption points is more important than having lower call overhead.
The text was updated successfully, but these errors were encountered:
I think the answer is probably preemption points. We have the better GOXPERIMENT for 1.9, Keith's better spill location is in, and I'm working on unrolling right now to amortize the overhead (I assume loop unrolling will have benefits, but also costs from interference with other optimizations).