-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark net.ipv4.ip_local_reserved_ports as a safe sysctl #111144
Comments
/sig network security |
/assign danwinship |
/sig node |
@hindessm just for clarification, you are requesting to add kubernetes/pkg/kubelet/sysctl/safe_sysctls.go Lines 24 to 31 in 410ac59
|
/sig node |
@aojea I'd not previously looked at the source code but yes, that is exactly what I was hoping for and would be fantastic. Thanks for your attention to this issue. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale There is a limitation that the host network pod cannot use it. This is already implemented. It would be safe to use it.
Meanwhile, there is a common problem with this sysctl with old kernel < 3.12. See moby/moby#8674 (comment) or istio/istio#36560. Users may get the error below when applying this sysctl.
I got this in a testing cluster kubernetes v1.25 and containerd v1.6.0.
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale It would be a shame not to fix this. As I mentioned when I originally opened it, it is often needed alongside the |
The PR is easy to raise like #115374.
|
@hindessm just curiosity, how common is this? I don't remember many issues about this problem |
Apologies my statement was probably misleading. I used the phrase "fairly common" in relation to configurations which could theoretically have the race condition I didn't mean to imply that the race was likely to be lost often. The potentially problematic configuration would need multiple containers in a pod with some containers listening to ports, other containers making outgoing connections and the ephemeral port range overlapping with the listening ports. I suspect that pods configured like this are fairly common - sidecar containers are a popular pattern, and the ephemeral port range is often extended to a broader range of ports such as 1024-65535. The problem occurs if a sidecar container makes any connections at startup - to obtain config, perform auth, etc. - before another containers listener has attempted to bind to a required port, since it is possible that sidecar connections can use the listeners port as an ephemeral port. If this connection in the sidecar remains open, then the failing listening container goes into crash loop backoff failing to bind requiring something external to kill the sidecar to force it to release the port. Even given a potentially problematic configuration the problem might not occur often because: The situation where we see it happening is due to a server that delays binding to vital ports until it has reloaded/recovered state and has a sidecar that opens a few long-lived connections. Even with this increased timing window, it doesn't happen often but when it does happen it requires intervention to resolve the crash loop backoff - as the listening container being restarted will continue to fail until the sidecar releases the port. It would be simple to avoid if the reserved ports setting was permitted on the containers service we deploy onto as we could protect the problematic ports and still get the expanded ephemeral port range. |
What would you like to be added?
The
net.ipv4.ip_local_reserved_ports
sysctl is namespaced so should be safe to permit. It is an often necessary counterpart tonet.ipv4.ip_local_port_range
sysctl which is currently in the default safe list. It would therefore be helpful to include it in the default safe list as well.Why is this needed?
If you have a pod with multiple containers which both sets the
net.ipv4.ip_local_port_range
to a large range such as the commonly used '1025 65535' in order to maximise the number of available ports and binds to local ports in that range then it is possible for a container doing a bind to deadlock repeatedly failing to start if the another container has already used the local port as an ephemeral port. If the ports needed by the bind can be added to the reserved ports list then this issue can be avoided without having to unnecessarily shrink the local port range or add coupling to ensure containers start in the correct order to prevent the deadlock.The text was updated successfully, but these errors were encountered: