Mark net.ipv4.ip_local_reserved_ports as a safe sysctl #111144

hindessm · 2022-07-14T10:02:27Z

What would you like to be added?

The net.ipv4.ip_local_reserved_ports sysctl is namespaced so should be safe to permit. It is an often necessary counterpart to net.ipv4.ip_local_port_range sysctl which is currently in the default safe list. It would therefore be helpful to include it in the default safe list as well.

Why is this needed?

If you have a pod with multiple containers which both sets the net.ipv4.ip_local_port_range to a large range such as the commonly used '1025 65535' in order to maximise the number of available ports and binds to local ports in that range then it is possible for a container doing a bind to deadlock repeatedly failing to start if the another container has already used the local port as an ephemeral port. If the ports needed by the bind can be added to the reserved ports list then this issue can be avoided without having to unnecessarily shrink the local port range or add coupling to ensure containers start in the correct order to prevent the deadlock.

The text was updated successfully, but these errors were encountered:

hindessm · 2022-07-14T10:20:34Z

/sig network security

dcbw · 2022-07-21T16:08:39Z

/assign danwinship

dcbw · 2022-07-21T16:09:43Z

/sig node
/triage accepted
/assign aojea

aojea · 2022-07-21T16:37:45Z

@hindessm just for clarification, you are requesting to add ip_local_reserved_ports to this list, right?

kubernetes/pkg/kubelet/sysctl/safe_sysctls.go

Lines 24 to 31 in 410ac59

    
           func SafeSysctlAllowlist() []string { 
        
           	return []string{ 
        
           		"kernel.shm_rmid_forced", 
        
           		"net.ipv4.ip_local_port_range", 
        
           		"net.ipv4.tcp_syncookies", 
        
           		"net.ipv4.ping_group_range", 
        
           		"net.ipv4.ip_unprivileged_port_start", 
        
           	}

aojea · 2022-07-21T16:45:46Z

/sig node
/cc @mrunalp

hindessm · 2022-07-21T17:06:17Z

@aojea I'd not previously looked at the source code but yes, that is exactly what I was hoping for and would be fantastic. Thanks for your attention to this issue.

k8s-triage-robot · 2022-10-19T17:44:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pacoxu · 2022-10-20T07:25:55Z

/remove-lifecycle stale

There is a limitation that the host network pod cannot use it. This is already implemented. It would be safe to use it.

Pod forbidden sysctl: "net.ipv4.ip_local_reserved_ports" not allowed with host net enabled, Users will get this error if the pod is hostNetwork.

Meanwhile, there is a common problem with this sysctl with old kernel < 3.12. See moby/moby#8674 (comment) or istio/istio#36560. Users may get the error below when applying this sysctl.

Warning FailedCreatePodSandBox 5s (x25 over 42s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open /proc/sys/net/ipv4/ip_local_reserved_ports: no such file or directory: unknown

I got this in a testing cluster kubernetes v1.25 and containerd v1.6.0.

Linux paco 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
[root@paco ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

k8s-triage-robot · 2023-01-18T07:44:21Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

hindessm · 2023-01-25T14:49:07Z

/remove-lifecycle stale

It would be a shame not to fix this. As I mentioned when I originally opened it, it is often needed alongside the net.ipv4.ip_local_port_range (which is in the safe list) to avoid timing issues in fairly common multiple container pods configurations that result in crash loop backoff of containers that want a port they are never going to get until the container which has used it as an ephemeral port is (manually) restarted.

pacoxu · 2023-01-28T14:17:48Z

The PR is easy to raise like #115374.

However, we need to get approved from sig network/node sides to make sure this has no side effects.

aojea · 2023-01-28T16:36:51Z

to avoid timing issues in fairly common multiple container pods configurations that result in crash loop backoff of containers that want a port they are never going to get until the container which has used it as an ephemeral port is (manually) restarted

@hindessm just curiosity, how common is this? I don't remember many issues about this problem

hindessm · 2023-02-21T15:04:17Z

Apologies my statement was probably misleading. I used the phrase "fairly common" in relation to configurations which could theoretically have the race condition I didn't mean to imply that the race was likely to be lost often.

The potentially problematic configuration would need multiple containers in a pod with some containers listening to ports, other containers making outgoing connections and the ephemeral port range overlapping with the listening ports. I suspect that pods configured like this are fairly common - sidecar containers are a popular pattern, and the ephemeral port range is often extended to a broader range of ports such as 1024-65535.

The problem occurs if a sidecar container makes any connections at startup - to obtain config, perform auth, etc. - before another containers listener has attempted to bind to a required port, since it is possible that sidecar connections can use the listeners port as an ephemeral port. If this connection in the sidecar remains open, then the failing listening container goes into crash loop backoff failing to bind requiring something external to kill the sidecar to force it to release the port.

Even given a potentially problematic configuration the problem might not occur often because:
a) the listening container will win the race to bind,
b) a vital ephemeral port isn't chosen, or
c) a vital ephemeral port is chose but only for a short-lived connection and the listening container perhaps restarts only once - which is unfortunate but not a big deal.
So, in a lot of configurations with the race condition, it is likely that the main container wins the race to bind to required ports and the issue does not occur or results in a single restart which might not be noticed.

The situation where we see it happening is due to a server that delays binding to vital ports until it has reloaded/recovered state and has a sidecar that opens a few long-lived connections. Even with this increased timing window, it doesn't happen often but when it does happen it requires intervention to resolve the crash loop backoff - as the listening container being restarted will continue to fail until the sidecar releases the port. It would be simple to avoid if the reserved ports setting was permitted on the containers service we deploy onto as we could protect the problematic ports and still get the expanded ephemeral port range.

hindessm added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 14, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 14, 2022

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/security Categorizes an issue or PR as relevant to SIG Security. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 14, 2022

k8s-ci-robot assigned rajibmitra Jul 14, 2022

rajibmitra removed their assignment Jul 14, 2022

k8s-ci-robot assigned danwinship Jul 21, 2022

k8s-ci-robot assigned aojea Jul 21, 2022

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 21, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2022

iutx mentioned this issue Nov 9, 2022

支持通过 helm 参数实现 Host 网络模式部署（Higress-gateway Pod Uses hostNetwork） alibaba/higress#39

Closed

1 task

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023

pacoxu mentioned this issue Jan 28, 2023

add net.ipv4.ip_local_reserved_ports to safe sysctls #115374

Merged

1 task

k8s-ci-robot closed this as completed in #115374 Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark net.ipv4.ip_local_reserved_ports as a safe sysctl #111144

Mark net.ipv4.ip_local_reserved_ports as a safe sysctl #111144

hindessm commented Jul 14, 2022

hindessm commented Jul 14, 2022

dcbw commented Jul 21, 2022

dcbw commented Jul 21, 2022

aojea commented Jul 21, 2022

aojea commented Jul 21, 2022

hindessm commented Jul 21, 2022 •

edited

Loading

k8s-triage-robot commented Oct 19, 2022

pacoxu commented Oct 20, 2022 •

edited

Loading

k8s-triage-robot commented Jan 18, 2023

hindessm commented Jan 25, 2023

pacoxu commented Jan 28, 2023

aojea commented Jan 28, 2023

hindessm commented Feb 21, 2023

Mark net.ipv4.ip_local_reserved_ports as a safe sysctl #111144

Mark net.ipv4.ip_local_reserved_ports as a safe sysctl #111144

Comments

hindessm commented Jul 14, 2022

What would you like to be added?

Why is this needed?

hindessm commented Jul 14, 2022

dcbw commented Jul 21, 2022

dcbw commented Jul 21, 2022

aojea commented Jul 21, 2022

aojea commented Jul 21, 2022

hindessm commented Jul 21, 2022 • edited Loading

k8s-triage-robot commented Oct 19, 2022

pacoxu commented Oct 20, 2022 • edited Loading

k8s-triage-robot commented Jan 18, 2023

hindessm commented Jan 25, 2023

pacoxu commented Jan 28, 2023

aojea commented Jan 28, 2023

hindessm commented Feb 21, 2023

hindessm commented Jul 21, 2022 •

edited

Loading

pacoxu commented Oct 20, 2022 •

edited

Loading