New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: 40us pause in file-backed serve loops #15110

Open
dbentley opened this Issue Apr 4, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@dbentley

dbentley commented Apr 4, 2016

Summary: An idiomatic serve loop that reads from a file (as opposed to a socket) pauses for 40us before the request goroutine starts. We isolated this from a FUSE filesystem into a small benchmark.

The idiomatic serve loop (Cf. src/net/http/server.go Serve and bazil.org/fuse/fs/serve.go Serve) processes a request on a new goroutine:

for {
  req := accept()
  go process(req)
}

We compared it against a no-concurrency serial server:

for { 
  req := accept()
  process(req)
}

We measured the overhead (from the end of accept() to beginning of process()). Median of 100k runs for each strategy:
Serial: 441ns
Goroutine: 38us

For context, we've found the rest of our logic can run in ~10us, so the go scheduling overhead would be 400%

We tried a channel:

go func() {
  for {
    process(<-ch)
  }
}
for {
  req := accept()
  ch <- req
}

Which is comparable to a Goroutine:
Serial: 441ns
Goroutine: 38us
Channel: 40us

Most servers read/accept from a socket, which the Go runtime implements using polling via the netpoller. FUSE reads from a file, which uses a blocking syscall. AIUI, the Go runtime will let the thread make the syscall, and then another thread of the runtime will notice it's blocked after ~20 us, and then reschedule work. These numbers seem similar which makes me think they're related.

Our workaround: service one process (both accept and process) on one goroutine and hand off the next request to a new goroutine. Strategy "handoff" looks like:

func serve() {
  req := accept()
  go serve()
  process(req)
}

Serial: 441ns
Goroutine: 38us
Channel: 40us
Handoff: 671ns

I'm seeing this on Go version 1.6 darwin/amd64

Is this a known issue? Does it affect other platforms? Is our rewriting of the serve loop a known workaround?

Our full benchmark: http://play.golang.org/p/wc6hPnN778

Build it into a binary fileping. To run:
rm a b; mkfifo a; mkfifo b; ./bin/fileping -print -strategy handoff < b > a& ./bin/fileping -strategy serial < a > b& echo 000000000000000 > a

(Modify the -strategy argument in the first command to try different strategies)

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 4, 2016

@bradfitz bradfitz added this to the Unplanned milestone Apr 4, 2016

@bradfitz bradfitz added the Performance label Apr 4, 2016

@minux

This comment has been minimized.

Member

minux commented Apr 4, 2016

@dbentley

This comment has been minimized.

dbentley commented Apr 4, 2016

The benchmark that I showed has two servers/clients talking to each other. With strategy goroutine, the program takes 19s. With strategy handoff it takes 1.9s, so there is an improvement in the RTT.

Would it help if I made a benchmark that captured the latency from the client side?

jasonbs10 added a commit to twitter-forks/fuse that referenced this issue Apr 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment