-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode #1749
Conversation
Master branch: 3a029e1 |
Master branch: 3a029e1 |
c2d451f
to
21a2faf
Compare
Master branch: 3a029e1 |
21a2faf
to
3fe5adc
Compare
Master branch: 2f1aaf3 |
3fe5adc
to
a3c458e
Compare
Master branch: 2f1aaf3 |
a3c458e
to
6af29d4
Compare
Master branch: 2f1aaf3 |
6af29d4
to
f165140
Compare
Master branch: 2f1aaf3 |
f165140
to
18ba252
Compare
Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used. Back in the days, commit bd1060a ("sock, cgroup: add sock->sk_cgroup") embedded per-socket cgroup information into sock->sk_cgrp_data and in order to save 8 bytes in struct sock made both mutually exclusive, that is, when cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2 falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp). The assumption made was "there is no reason to mix the two and this is in line with how legacy and v2 compatibility is handled" as stated in bd1060a. However, with Kubernetes more widely supporting cgroups v2 as well nowadays, this assumption no longer holds, and the possibility of the v1/v2 mixed mode with the v2 root fallback being hit becomes a real security issue. Many of the cgroup v2 BPF programs are also used for policy enforcement, just to pick _one_ example, that is, to programmatically deny socket related system calls like connect(2) or bind(2). A v2 root fallback would implicitly cause a policy bypass for the affected Pods. In production environments, we have recently seen this case due to various circumstances: i) a different 3rd party agent and/or ii) a container runtime such as [0] in the user's environment configuring legacy cgroup v1 net_cls tags, which triggered implicitly mentioned root fallback. Another case is Kubernetes projects like kind [1] which create Kubernetes nodes in a container and also add cgroup namespaces to the mix, meaning programs which are attached to the cgroup v2 root of the cgroup namespace get attached to a non-root cgroup v2 path from init namespace point of view. And the latter's root is out of reach for agents on a kind Kubernetes node to configure. Meaning, any entity on the node setting cgroup v1 net_cls tag will trigger the bypass despite cgroup v2 BPF programs attached to the namespace root. Generally, this mutual exclusiveness does not hold anymore in today's user environments and makes cgroup v2 usage from BPF side fragile and unreliable. This fix adds proper struct cgroup pointer for the cgroup v2 case to struct sock_cgroup_data in order to address these issues; this implicitly also fixes the tradeoffs being made back then with regards to races and refcount leaks as stated in bd1060a, and removes the fallback, so that cgroup v2 BPF programs always operate as expected. [0] https://github.com/nestybox/sysbox/ [1] https://kind.sigs.k8s.io/ Fixes: bd1060a ("sock, cgroup: add sock->sk_cgroup") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Martynas Pumputis <m@lambda.lt>
Minimal set of helpers for net_cls classid cgroupv1 management in order to set an id, join from a process, initiate setup and teardown. cgroupv2 helpers are left as-is, but reused where possible. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
Minimal selftest which implements a small BPF policy program to the connect(2) hook which rejects TCP connection requests to port 60123 with EPERM. This is being attached to a non-root cgroup v2 path. The test asserts that this works under cgroup v2-only and under a mixed cgroup v1/v2 environment where net_classid is set in the former case. Before fix: # ./test_progs -t cgroup_v1v2 test_cgroup_v1v2:PASS:server_fd 0 nsec test_cgroup_v1v2:PASS:client_fd 0 nsec test_cgroup_v1v2:PASS:cgroup_fd 0 nsec test_cgroup_v1v2:PASS:server_fd 0 nsec run_test:PASS:skel_open 0 nsec run_test:PASS:prog_attach 0 nsec test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec run_test:PASS:skel_open 0 nsec run_test:PASS:prog_attach 0 nsec run_test:PASS:join_classid 0 nsec (network_helpers.c:219: errno: None) Unexpected success to connect to server test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0) #27 cgroup_v1v2:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED After fix: # ./test_progs -t cgroup_v1v2 #27 cgroup_v1v2:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
Master branch: 0e6491b |
18ba252
to
2be9619
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=546185 irrelevant now. Closing PR. |
Pull request for series with
subject: bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=544357