Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: automatically bump RLIMIT_NOFILE on Unix #46279

Closed
bradfitz opened this issue May 20, 2021 · 27 comments
Closed

runtime: automatically bump RLIMIT_NOFILE on Unix #46279

bradfitz opened this issue May 20, 2021 · 27 comments

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented May 20, 2021

I just read http://0pointer.net/blog/file-descriptor-limits.html which in a nutshell says:

  • don't use select
  • systemd sets the RLIMIT_NOFILE soft limit to 1024 for compatibility reasons, to not break select users
  • systemd sets the RLIMIT_NOFILE hard limit 512K, for programs that want more (without escalation), but by raising their soft limit past 1024, they're implicitly acknowledging that select won't work.

I realize that since Go doesn't use select, the Go runtime could automatically do this fd soft limit bumping on Linux.

We do have a Select wrapper at https://pkg.go.dev/golang.org/x/sys/unix#Select, though, so perhaps we could do the same thing we did for #42347 in 18510ae (https://go-review.googlesource.com/c/go/+/299671) and do the bumping conditionally based on whether the unix.Select func is in the binary. Or cgo too, I suppose.

I suspect many users are unaware of this 512K hard limit that's free to bump up to. I certainly was unaware. (I normally have to go in and manual tweak my systemd limits instead, usually in response to problems once I hit the limit...) I think fixing it automatically would help more users than it'd hurt. (I actually can't think how it'd hurt anybody?)

I don't think we need it as a backpressure mechanism. As the blog post mentions, memory limits are already that mechanism.

/cc @ianlancetaylor @aclements @rsc @randall77

@gopherbot gopherbot added this to the Proposal milestone May 20, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals May 20, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 20, 2021

The limitation on select is not a kernel limitation. It's a limitation on the implementation of fd_set in glibc. And we've inherited that limitation in x/sys/unix.FdSet, but perhaps we could fix that. If we did, then we could raise the soft limit to the hard limit unconditionally.

I note that on my Debian system the soft and hard limits are both 131072. On my CentOS 6 system the soft limit is 1024 and the hard limit is 4096. On my recent Fedora system the soft limit is 1024 and the hard limit is 524288.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented May 20, 2021

Yeah, I saw FdSet was struct { Bits [16]int64 }. Make it opaque with a [16]int64 used by default and a spill-over lazily-allocated bitmap when (*FDSet).Set(fd int) is called with a "big" fd? Doable, but I wonder if it's worth the effort. Does anybody actually use unix.Select?

We'd still need a conditional mechanism regardless for cgo I assume, as we wouldn't know whether C code was using select as easily?

FWIW, on my various Debian (buster) & Ubuntu (focal LTS, hirsute) machines here, I see 1024 & 1048576.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented May 20, 2021

Does anybody actually use unix.Select?

GitHub code search says https://github.com/search?l=&p=2&q=unix.Select+language%3AGo&type=Code .... it's mostly wireguard-go's rwcancel package.

(cc @zx2c4 as FYI)

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented May 20, 2021

I'm happy to get rid of that and replace it with poll. (Want to send a patch?) Done: https://git.zx2c4.com/wireguard-go/commit/?id=a9b377e9e10eb5194c0bdff32136c11b17253bfd

This proposal sounds like a good idea, with the caveat that we probably shouldn't do it in initialization for -buildmode=shared.

@rsc
Copy link
Contributor

@rsc rsc commented Nov 10, 2021

What happens today, even in programs that do nothing but file I/O (no select etc), is that if you open too many files you get errors. Auto-bumping would let those programs run longer.

If Go did it at startup, it would be inherited by non-Go programs that we fork+exec. That is a potential incompatibility, but probably not a large one. Technically, I suppose we could undo it in the subprocess between fork and exec.

@rsc rsc moved this from Incoming to Active in Proposals Nov 10, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Nov 10, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Nov 11, 2021

To summarize the limitating use cases where we should not be raising the soft limit.

  • select implementation from glibc (or other libc like musl too?) is used
  • cgo is used because dynamic linking can load anything with dlopen and also can cause exec calls in places we don't know.
  • select implementation from our own syscalls or unix package is used, unless that one is changed as suggested above.
  • NSS (user/group lookup and DNS lookup) is used from glibc.
  • and we need to reset, if we call the exec family of syscalls

@rsc
Copy link
Contributor

@rsc rsc commented Dec 1, 2021

One problem with restoring the limit in exec is we won't know if the limit was intentionally changed by the program in the interim. What about programs that explicitly raise the limit and then exec today? We would be dropping it back down.

It seems like if we are going to raise the limit, we should just do that, not try to put it back.

I just ran into this problem with gofmt on my Mac, where the limit defaults to 256 (and gofmt was editing many files in parallel). I'd love for Go to raise the limit there too.

How much does it really matter if we raise the limit for a subprocess?

People can always set the hard limit if they want Go not to try to bump the soft limit up.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 8, 2021

It's pretty awful that the limit is breaking completely reasonable Go programs like gofmt -w.
It seems very wrong for gofmt to have to put a bump in.
It seems like we should bump the limit at startup - Go doesn't use select.

It's very hard to see any programs benefiting from this limit in practice anymore.
I understand that systemd can't make such a global decision, but I think Go can.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 8, 2021

I think that seems quite reasonable.

We can even document this in unix.Select/syscall.Select and mark them as deprecated so that editors bring attention to them and maybe add something to go vet too. It seems always possible to move to poll or similar.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 15, 2021

Not sure anyone is using syscall.Select for fd's anyway.
Every time I've used it in the past decade it has been to get a sub-second-resolution sleeping API (selecting on no fds).

@rsc
Copy link
Contributor

@rsc rsc commented Dec 15, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals Dec 15, 2021
@AlekSi
Copy link
Contributor

@AlekSi AlekSi commented Dec 15, 2021

Should the title be updated to mention Unix or something instead of Linux?
Personally, I constantly run into that limitation on macOS; would like to see that resolved.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 15, 2021

The considerations may be different on different Unix systems. On Linux the details are somewhat specific to systemd.

It may well be appropriate to do this on macOS also, but I don't know what the tradeoffs are there. Why does macOS have a default low limit?

@AlekSi
Copy link
Contributor

@AlekSi AlekSi commented Dec 15, 2021

From what I was able to find, that default goes back to the very first OS X release and probably even back to BSD. The constant is there.

Of course, not doing that on macOS is not a deal-breaker but an annoyance.

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Dec 16, 2021

The only issue I am aware of that can arise if RLIMIT_NOFILE is set to a very high value is, some binaries (that may be executed from a Go program and thus inherit the limit) want to do something like this (pseudocode):

for fd := 3; fd < getrlimit(RLIMIT_NOFILE); fd++ {
      close(fd) // or set CLOEXEC flag
}

For a specific example, rpm package manager used to do that (fixed by rpm-software-management/rpm@5e6f05c), and also some older version of Python (but I'm not sure).

Most probably this should not be an issue, since Docker also does a similar thing (moby/moby#38814) and since everyone seems to be using containers now, let's hope that issues like this are fixed (yet better, maybe some programs have even started using close_range()).

Also, this is surely not a showstopper to accept the proposal -- just something to keep in mind.

@rsc rsc changed the title proposal: runtime: automatically bump RLIMIT_NOFILE on Linux? proposal: runtime: automatically bump RLIMIT_NOFILE on Unix Jan 5, 2022
@rsc rsc moved this from Likely Accept to Accepted in Proposals Jan 5, 2022
@rsc
Copy link
Contributor

@rsc rsc commented Jan 5, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime: automatically bump RLIMIT_NOFILE on Unix runtime: automatically bump RLIMIT_NOFILE on Unix Jan 5, 2022
@rsc rsc removed this from the Proposal milestone Jan 5, 2022
@rsc rsc added this to the Backlog milestone Jan 5, 2022
@rsc rsc self-assigned this Feb 22, 2022
@rsc rsc removed this from the Backlog milestone Feb 22, 2022
@rsc rsc added this to the Go1.19 milestone Feb 22, 2022
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 14, 2022

Change https://go.dev/cl/392415 mentions this issue: os: raise open file rlimit at startup

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 15, 2022

Change https://go.dev/cl/393016 mentions this issue: Revert "os: raise open file rlimit at startup"

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 15, 2022

The test for this change is failing on at least three builders — looks like we may need to plumb in OPEN_MAX for the case where getrlimit reports RLIM_INFINITY instead of the true max.

gopherbot pushed a commit that referenced this issue Mar 15, 2022
This reverts CL 392415.

Reason for revert: new test is failing on at least darwin-amd64-10_14, darwin-amd64-10_15, and openbsd-arm64-jsing.

Updates #46279.

Change-Id: I2890b72f8ee74f31000d65f7d47b5bb0ed5d6007
Reviewed-on: https://go-review.googlesource.com/c/go/+/393016
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
@rsc
Copy link
Contributor

@rsc rsc commented Mar 15, 2022

I'm confused about needing OPEN_MAX:

On my Mac with macOS 12.2:

% ulimit -n
ulimit -n 256
% ulimit -nH
ulimit -n unlimited
% ulimit -n unlimited
% ulimit -n
ulimit -n unlimited
% 

I've always used 'ulimit -n unlimited' without trouble on Macs. I wonder if the struct definitions are wrong.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 15, 2022

It looks like the macOS setrlimit behavior may have changed as of 11.0.
(The 10.14 and 10.15 builders broke, but the 11.0 and 12.0 builders did not.)

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 15, 2022

So I suppose one option might be to try the setrlimit call with the limit as given by getrlimit, and if that fails fall back to increasing to some reasonable-but-arbitrary constant (ideally equal to OPEN_MAX)..?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 15, 2022

Some more info at #40564.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 16, 2022

I put in a call to sysctl kern.maxfilesperproc. Hopefully that exists on the older macOS. And I skipped the OpenBSD failure entirely. (It is not a first-class port.)

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 16, 2022

Change https://go.dev/cl/393354 mentions this issue: os: raise open file rlimit at startup

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 20, 2022

Change https://go.dev/cl/394094 mentions this issue: os: skip TestOpenFileLimit on openbsd/mips64

gopherbot pushed a commit that referenced this issue Mar 22, 2022
For #46279
For #51713

Change-Id: I444f309999bf5576449a46a9808b23cf6537e7dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/394094
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Mar 26, 2022
- golang/go#46279
- http://0pointer.net/blog/file-descriptor-limits.html

The I/O runtime, Tokio and Mio, in this Project won't use select(2), so
it is safe to bump the RLIMIT_NOFILE soft limit to hard limit
automatically.
komuw added a commit to komuw/ong that referenced this issue Jun 12, 2022
@rsc rsc removed their assignment Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Development

No branches or pull requests

9 participants