-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use infinity for LimitNOFILE #1799
use infinity for LimitNOFILE #1799
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenTheElder The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
what can I say ;) |
Infinity in systemd files is not great in general but looks containerd are
ok with them so..
|
In both PRs in KIND we are matching the upstream Containerd unit file at that point in time. The Containerd PR is linked here. |
Oh whoops, thanks for pointing out the PR. I'd like to make a bit of an argument against this change :) "Because The reasoning above ignores the fact that this harms reproducibility in For example: things like NFS and MySQL get unhappy when Is there any way we could revisit this? Unless the perf issue is actually significant it seems better to me to keep the explicit limit. |
Sorry, I've been on vacation and not fully available.
This is the reason both changes were ultimately accepted, matching upstream.
Resource limits are something that are just not going to be reproducible in KIND and it's not a suitable tool for testing them. When you share a host kernel there are any number of leaky limits, for example inotify limits are not namespaced.
I'm not fully convinced, if we had reason to expect a standard limit on production clusters maybe, but as it stands applications need to handle varying limits better if they're meant to be portable to different Kubernetes clusters, and kind's goal is to test Kubernetes itself, and to help test applications on Kubernetes, standard, portable Kubernetes. |
Sorry for the delay, I had to think about this a bit and explore alternatives. I'll make a halfhearted argument but really I'd settle for a workaround, perhaps a way to make this configurable?
Right, Kind's main page says it is primarily for testing Kubernetes, but also for local development or CI. Speaking as one of the latter types of users, it seems to me that Kind's value is in bundling up the huge complexity of a Kubernetes deployment and setting it up for me in a reproducible way. From my perspective, there's no essential difference between resource limits and any other kind of complexity that Kind encapsulates. Nor is the fact that other leaky limits exist a great argument against improving reproducibility by pinning down some limits. (Maybe those other limits should be pinned down too!) FWIW, I think setting a limit like 1 million for However, I'd be totally happy if there were a way for a Kind user to tweak this. I tried passing |
Er, right, but the intention is for kind to represent, to the extent possible, 100% standard Kubernetes, so there's some confidence that if an app runs locally it will run in production. It's also very important that when we test Kubernetes, we're testing as close to standard / default as possible so we're not giving Kubernetes CI a false sense of working due to being in a non-standard configuration. If you're running on a bog standard Kubernetes cluster using containerd as the runtime, odds are very good the containerd systemd unit matches upstream, which does not set an arbitrary fixed limit.
For the host, perhaps, but not if you install the containerd package? Unless ubuntu is providing their own packaging and overriding this, last I checked theirs matched upstream, but it's been a while. Regarding cloud kubernetes providers, I checked GKE since I work there and work covers creating those, and it is infinity as expected. bentheelder@gke-cluster-1-default-pool-eb5c9ce6-7052 ~ $ cat /usr/lib/systemd/system/multi-user.target.wants/containerd.service | grep -i nofile
LimitNOFILE=infinity
They would be, if it was feasible, because nodes should not all be pretending to have the host's full CPU and memory etc., but it turns out nested docker containers are not particularly compatible with doing this, so kind and similarly constructed nested-container alternatives are currently a bad way to test things around resource limits.
To implement what you're describing, you'd need to set ulimit on the node, or within the application container. You should be able to do that, for example you can customize the node image you're using, or you can exec to the node and change it.
I can't say I've ever seen an application do this, do you have any pointers? I would expect reasonable applications to track their file handles, not brute force closing them ... TLDR: I would still prefer to make changes to this unit file in containerd upstream, and kind will match them. The same problem should be reached by this application on a non-kind cluster shipping upstream containerd currently. |
Ah, but the VM's kernel itself (or some other mechanism, I don't pretend to fully understand this) sets a lower limit. The existence of limits like this on most systems is probably why this problem doesn't come up more often. On GKE: tom@gke-... ~ $ cat /proc/1/limits
Limit Soft Limit Hard Limit Units
...
Max open files 1048576 1048576 files Thus, the decision for Kind to set the limit to
Ah, that's news to me. Although I don't really want to test them, just make them be reliably sane.
Okay, thanks! I guess I can do the former with some Nix magic or the latter with an exec to
The thing I'm wrestling with is an insufficiently sophisticated file descriptor closing system in Haskell's process library. Someone else experienced the same thing with an SFTP server.
Oh I don't think I will suggest it there, |
Thinking about it a bit more, maybe the right place to apply consistent limits would actually be in the Docker configuration. This seems like the part that's more analogous with the GKE VM. This sort of thing can be done by passing One thing that's a complete mystery to me is why it didn't work for me when I tried setting the |
Interesting, thanks.
That's the same decision in containerd upstream? If we want to limit differently than the host, we should probably do so at the node level, but the question is if the host doesn't have a reasonable limit, what limit should we be setting for the kind node and why? This will be a breaking change for another user with a particular host limit, right now they could instead tune their host limit. Sounds like if not the haskell library, NixOS should be reconfigured?
Not to my knowledge, kind does not invoke docker with these settings so the default should take effect. Docker may need to be restarted first. Relatedly I think you should be able to set ulimit in the application container (not another in the pod) if you only need to ensure a maximum (you can't raise the limit but you can lower it) just by calling ulimit. docker-library/rabbitmq#545 (comment) ... also related kubernetes/kubernetes#3595 😞 |
see: containerd/containerd#4475