Skip to content

runtime: osyield is expensive on darwin? #19409

Open
@josharian

Description

@josharian

This is a naive question about an unexpected and striking benchmarking result.

At tip, on my amd64 OS X laptop, I get:

$ go test -bench=BenchmarkSplitSingleByteSeparator -run=NONE bytes
BenchmarkSplitSingleByteSeparator-8   	     500	   2722171 ns/op

If I apply CL 37795, the execution time increases 65%:

$ go test -bench=BenchmarkSplitSingleByteSeparator -run=NONE bytes
BenchmarkSplitSingleByteSeparator-8   	     300	   4518575 ns/op

Note that in that CL, all that really happens is that a single function is removed from the call stack. Index checks the length of its argument, and if it is 1, then Index calls IndexByte.

CPU profiling indicates that basically all of the extra time is spent in runtime.usleep (called from runtime.osyield) and runtime.mach_semaphore_signal (called from runtime.notewakeup).

I'm left wondering:

(1) Is there a cheaper way to do an osyield on darwin that doesn't cost a full microsecond? (Linux appears to make an arch_prctl syscall instead of calling select.)

(2) Why does removing a function call from the stack create additional calls to osyield? Can this be avoided?

cc @ianlancetaylor @aclements

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions