New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The ip_vs kernel module does not exist in the unprivileged container /proc #4278
Comments
Hi @pekindenis This is normal behavior https://github.com/torvalds/linux/blob/ceaa837f96adb69c0df0397937cd74991d5d821a/net/netfilter/ipvs/ip_vs_ctl.c#L4303 For non-init user namespace these sysctl's are not exposed. |
If i use privileged lxc container ip_vs “/proc/sys/net/ipv4/vs/” exists my current settings:
|
that's because for privileged containers host user namespace is used.
Can you describe a use case for that? You can try to bindmount part of procfs tree to the unprivileged container to show these values, but it makes no sense because you'll not be able to write these sysctls. |
I thought they could be mounted from the host machine with read/write capabilities for the unprivileged lxc container
I discovered this issue when I ran docker swarm on a container. |
Closing as this is a kernel restriction and not a LXC bug. |
There are two issues with sysfs bindmount solution:
IPVS itself is working in unprivileged mode, but docker wants to modify default values of IPVS sysctl's https://github.com/moby/libnetwork/blob/master/osl/namespace_linux.go#L679 For now I can suggest you to use privileged container, but I'll put this on my ToDo list and we discuss IPVS containerization internally with LXD team. cc @stgraber |
Thank you, I have already seen that the lxd team solved similar issue. |
yep, once I do that I'll let you know! |
Let's make all IPVS sysctls visible and RO even when network namespace is owned by non-initial user namespace. Let's make a few sysctls to be writable: - conntrack - conn_reuse_mode - expire_nodest_conn - expire_quiescent_template I'm trying to be conservative with this to prevent introducing any security issues in there. Maybe, we can allow more sysctls to be writable, but let's do this on-demand and when we see real use-case. This list of sysctls was chosen because I can't see any security risks allowing them and also Kubernetes uses [2] these specific sysctls. This patch is motivated by user request in the LXC project [1]. [1] lxc/lxc#4278 [2] https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 Cc: Stéphane Graber <stgraber@stgraber.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: NipaLocal <nipa@local>
Let's make all IPVS sysctls visible and RO even when network namespace is owned by non-initial user namespace. Let's make a few sysctls to be writable: - conntrack - conn_reuse_mode - expire_nodest_conn - expire_quiescent_template I'm trying to be conservative with this to prevent introducing any security issues in there. Maybe, we can allow more sysctls to be writable, but let's do this on-demand and when we see real use-case. This list of sysctls was chosen because I can't see any security risks allowing them and also Kubernetes uses [2] these specific sysctls. This patch is motivated by user request in the LXC project [1]. [1] lxc/lxc#4278 [2] https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 Cc: Stéphane Graber <stgraber@stgraber.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: NipaLocal <nipa@local>
On the host, “ip_vs” is loaded, and “/proc/sys/net/ipv4/vs/” exists, but
On the container, “/proc/sys/net/ipv4/vs” exists, but “/proc/sys/net/ipv4/vs/” does not.
root@pve01:~$ lsmod | grep ip_vs
ip_vs 155648 1 xt_ipvs
nf_conntrack 139264 10 xt_conntrack,nf_nat,xt_state,xt_nat,openvswitch,nf_conntrack_netlink,nf_conncount,xt_MASQUERADE,ip_vs,xt_REDIRECT
nf_defrag_ipv6 24576 3 nf_conntrack,openvswitch,ip_vs
libcrc32c 16384 7 nf_conntrack,nf_nat,openvswitch,btrfs,xfs,raid456,ip_vs
root@pve02:~# ls -l /proc/sys/net/ipv4/vs/
total 0
-rw-r--r-- 1 root root 0 Feb 11 21:52 am_droprate
-rw-r--r-- 1 root root 0 Feb 11 21:52 amemthresh
-rw-r--r-- 1 root root 0 Feb 11 21:52 backup_only
-rw-r--r-- 1 root root 0 Feb 11 21:52 cache_bypass
-rw-r--r-- 1 root root 0 Feb 11 21:52 conn_reuse_mode
-rw-r--r-- 1 root root 0 Feb 11 21:52 conntrack
-rw-r--r-- 1 root root 0 Feb 11 21:52 drop_entry
-rw-r--r-- 1 root root 0 Feb 11 21:52 drop_packet
-rw-r--r-- 1 root root 0 Feb 11 21:52 expire_nodest_conn
-rw-r--r-- 1 root root 0 Feb 11 21:52 expire_quiescent_template
-rw-r--r-- 1 root root 0 Feb 11 21:52 ignore_tunneled
-rw-r--r-- 1 root root 0 Feb 11 21:52 lblc_expiration
-rw-r--r-- 1 root root 0 Feb 11 21:52 lblcr_expiration
-rw-r--r-- 1 root root 0 Feb 11 21:52 nat_icmp_send
-rw-r--r-- 1 root root 0 Feb 11 21:52 pmtu_disc
-rw-r--r-- 1 root root 0 Feb 11 21:52 schedule_icmp
-rw-r--r-- 1 root root 0 Feb 11 21:52 secure_tcp
-rw-r--r-- 1 root root 0 Feb 11 21:52 sloppy_sctp
-rw-r--r-- 1 root root 0 Feb 11 21:52 sloppy_tcp
-rw-r--r-- 1 root root 0 Feb 11 21:52 snat_reroute
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_persist_mode
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_ports
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_qlen_max
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_refresh_period
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_retries
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_sock_size
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_threshold
-rw-r--r-- 1 root root 0 Feb 11 21:52 sync_version
root@container:~# ls -l /proc/sys/net/ipv4/vs/
total 0
root@container:~# lsmod | grep ip_vs
ip_vs_wrr 16384 0
ip_vs_wlc 16384 0
ip_vs_sh 16384 0
ip_vs_sed 16384 0
ip_vs_rr 16384 0
ip_vs_nq 16384 0
ip_vs_lc 16384 0
ip_vs_lblcr 16384 0
ip_vs_lblc 16384 0
ip_vs_ftp 16384 0
ip_vs_dh 16384 0
ip_vs 172032 33 ip_vs_wlc,ip_vs_rr,ip_vs_dh,ip_vs_lblcr,ip_vs_sh,ip_vs_nq,ip_vs_lblc,xt_ipvs,ip_vs_wrr,ip_vs_lc,ip_vs_sed,ip_vs_ftp
nf_nat 49152 6 xt_nat,nft_chain_nat,iptable_nat,xt_MASQUERADE,xt_REDIRECT,ip_vs_ftp
nf_conntrack 172032 8 xt_conntrack,nf_nat,xt_state,xt_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs,xt_REDIRECT
nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs
libcrc32c 16384 7 nf_conntrack,nf_nat,dm_persistent_data,btrfs,nf_tables,ip_vs,sctp
uname -a
Linux pve02 5.15.83-1-pve #1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z) x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: