Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: optionally (reliably) avoid netpoller #32009

Open
tamird opened this issue May 13, 2019 · 11 comments

Comments

@tamird
Copy link
Contributor

commented May 13, 2019

The gVisor project implements a user-space Kernel, and its implementation performance-sensitive, which forces a manual avoidance of the netpoller by avoiding certain APIs.

It would be nice to automate and enforce this avoidance, either by exposing some API that could be use to assert in a test that the netpoller has never been used, or by exposing a build tag that would guarantee that the netpoller is inactive. As of this writing it seems concretely that we want to avoid ever incrementing netpollWaiters.

cc @iangudger @nlacasse @prattmic @amscanne

@andybons

This comment has been minimized.

Copy link
Member

commented May 13, 2019

This seems like the beginnings of a proposal but there’s no concrete next steps.

What changes would you like to be made specifically?

Thanks

@randall77 @ianlancetaylor

@andybons andybons added this to the Unplanned milestone May 13, 2019

@randall77

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

Do you want gVisor to never use the net poller full stop, or does this need to apply only to certain operations within gVisor?

The whole point of the netpoller is to be more efficient (particularly, in the # of OS threads needed) than just blocking an OS thread on each read/write. I'm curious in what circumstances netpoll needs to be avoided. Maybe we could solve that problem instead of this one.

@iangudger

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

We never want gVisor to use netpoll, full stop. One way of doing this that we have discussed is adding some way to detect if it was ever used to the runtime and then running all of our tests and failing any which use netpoll.

golang.org/cl/78915 has more context on why netpoll is a problem.

@randall77

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

Could you just check runtime.netpollInited? Or does that get set despite netpoll never being used?

golang.org/cl/78915 has more context on why netpoll is a problem.

That CL has been merged for a long time now. I don't see any comments there about other issues besides the one that was fixed in the CL.

@iangudger

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

Could you just check runtime.netpollInited? Or does that get set despite netpoll never being used?

Correct. runtime.netpollInited gets initialized when files are created with os.File regardless of whether netpoll is actually used.

golang.org/cl/78915 has more context on why netpoll is a problem.

That CL has been merged for a long time now. I don't see any comments there about other issues besides the one that was fixed in the CL.

The benefits documented in the CL are only if netpoll is not in use.

@randall77

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

Correct. runtime.netpollInited gets initialized when files are created with os.File regardless of whether netpoll is actually used.

Bummer.

The benefits documented in the CL are only if netpoll is not in use.

So you want the 12% improvement to CPU usage that this CL provides? But you only get that 12% if you never use netpoll? Or are you interested in the 0.5% latency improvement?

How close is your app to those benchmarks? They are really corner case benchmarks, with lots of very quick trips into and out of the poller/channel ops/scheduler, with no work on top of that.

@iangudger

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

It is mostly the CPU usage. gVisor is used in high-density environments. gVisor's CPU usage is much higher than Linux and we are currently looking into other options for reducing CPU usage as well.

Latency is important too though. We measure latency in nanoseconds and shaving even a few nanoseconds in a hot path can be a win for us.

@prattmic

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

How close is your app to those benchmarks? They are really corner case benchmarks, with lots of very quick trips into and out of the poller/channel ops/scheduler, with no work on top of that.

At the time golang.org/cl/78915 was written, it reduced total runtime of a Tensorflow model training benchmark running inside gVisor by 5% (and total CPU usage by 10%).

Tensorflow can be extremely futex heavy, as it coordinates very small units of work (size depends on the model) on a threadpool, where workers contend on resources. When an application calls futex inside gVisor and it actually blocks, that ultimately becomes a wait on a channel. Since new work is likely to be available very soon, this application becomes very sensitive to overall latency and CPU usage of the Go scheduler to wake the goroutine back up.

google/gvisor#205 is a similar situation. In general, I think overall scheduler improvements (such as making netpoll cheaper) for these cases would be a possible alternative to an explicit API.

@nlacasse

This comment has been minimized.

Copy link

commented Jun 12, 2019

Ideally we'd like to prevent the netpoller from ever running, but I'd be happy just with a way to check whether the netpoller has run.

Here two proposals:

  1. Add a new "implementation" of netpoll that just panics, and put that behind a nonetpoll build tag.

  2. Add a boolean that is set to true whenever netpoll is called. Users that want to avoid netpoll could link against this boolean and check that it is false (likely inside tests).

Are there preferences/objections, or better alternatives to this?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

From my perspective this is so special purpose that it's hard to get excited about having to maintain some publicly visible API for it. I'm pretty skeptical that it would ever have more than one user.

We've talked about having some sort of runtime package stats access (#15490). Perhaps we could make sure that those stats include some data on use the netpoller. Then your tests could use that.

@gopherbot

This comment has been minimized.

Copy link

commented Jul 22, 2019

Change https://golang.org/cl/187137 mentions this issue: runtime: keep track of netpoll usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.