Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken SELinux/LSM labelling with MPTCP and accept(2) #320

Closed
matttbe opened this issue Dec 1, 2022 · 7 comments
Closed

Broken SELinux/LSM labelling with MPTCP and accept(2) #320

matttbe opened this issue Dec 1, 2022 · 7 comments
Assignees
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Dec 1, 2022

Reported by Ondrej Mosnacek on the mailing list

Hi all,

As discovered by our QE, there is a problem with how the
(userspace-facing) sockets returned by accept(2) are labeled when
using MPTCP. Currently they always end up with the label representing
the kernel (typically system_u:system_r:kernel_t:s0), white they
should inherit the context from the parent socket (the one that is
passed to accept(2)).

A minimal reproducer on a Fedora/CentOS/RHEL system:

# Install dependencies
dnf install -y mptcpd nginx curl
# Disable rules that silence some SELinux denials
semodule -DB
# Set up a dummy file to be served by nginx
echo test > /usr/share/nginx/html/testfile
chmod +r /usr/share/nginx/html/testfile
# Set up nginx to use MPTCP
sysctl -w net.mptcp.enabled=1
systemctl stop nginx
mptcpize enable nginx
systemctl start nginx
# This will fail (no reply from server)
mptcpize run curl -k -o /dev/null http://127.0.0.1/testfile
# This will show the SELinux denial that caused the failure
ausearch -i -m avc | grep httpd

It is also possible to trigger the issue by running the
selinux-testsuite [1] under mptcpize run (it will fail on the
inet_socket test in multiple places).

Based on what I could infer from the net & mptcp code, this is roughly
how it happens (may be inaccurate or incorrect - the maze of the
networking stack is not easy to navigate for me):

  1. When the server starts, the main mptcp socket is created:
    socket(2) -> ... -> socket_create() -> inet_create() ->
    mptcp_init_sock() -> __mptcp_socket_create()
  2. __mptcp_socket_create() calls mptcp_subflow_create_socket(), which
    creates another "kern" socket, which represents the initial(?)
    subflow.
  3. This subflow socket goes through security_socket_post_create() ->
    selinux_socket_post_create(), which gives it a kernel label based on
    kern == 1, which indicates that it's a kernel-internal socket.
  4. The main socket goes through its own selinux_socket_post_create(),
    which gives it the label based on the current task.
  5. Later, when the client connection is accepted via accept(2) on the
    main socket, an underlying accept operation is performed on the
    subflow socket, which is then returned directly as the result of the
    accept(2) syscall.
  6. Since this socket is cloned from the subflow socket, it inherits
    the kernel label from the original subflow socket (via
    selinux_inet_conn_request() and selinux_inet_csk_clone()).
    selinux_sock_graft() then also copies the label onto the inode
    representing the socket.
  7. When nginx later calls writev(2) on the new socket,
    selinux_file_permission() uses the inode label as the target in a
    tcp_socket::write permission check. This is denied, as in the Fedora
    policy httpd_t isn't allowed to write to kernel_t TCP sockets.

Side note: There is currently an odd conditional in sock_has_perm() in
security/selinux/hooks.c that skips SELinux permission checking for
sockets that have the kernel label, so native socket operations (such
as recv(2), send(2), recvmsg(2), ...) will not uncover this problem,
only generic file operations such as read(2), write(2), writev(2),
etc. I believe that check shouldn't be there, but that's for another
discussion...

So now the big question is: How to fix this? I can think of several
possible solutions, but neither of them seems to be the obvious
correct one:

  1. Wrap the socket cloned from the subflow socket in another socket
    (similar to how the main socket + subflow(s) are handled), which would
    be cloned from the non-kern outer socket that has the right label.
    This could have the disadvantage of adding unnecessary overhead, but
    would probably be simpler to do.
  2. Somehow ensure that the cloned socket gets the label from the main
    socket instead of the subflow socket. This would probably require
    adding a new LSM hook and I'm not sure at all what would be the best
    way to implement this.
  3. Somehow communicate the subflow socket <-> main socket relationship
    to the LSM layer so that it can switch to use the label of the main
    socket when handling an operation on a subflow socket (thus copying
    the label correctly on accept(2)). Not a great solution, as it
    requires each LSM that labels sockets to duplicate the indirection
    logic.
  4. Do not create the subflow sockets as "kern". (Not sure if that
    would be desirable.)
  5. Stop labeling kern sockets with the kernel's label on the SELinux
    side and just label them based on the current task as usual. (This
    would probably cause other issues, but maybe not...)

Any ideas, suggestions, or patches welcome!

[1] https://github.com/SELinuxProject/selinux-testsuite/

@matttbe matttbe added the bug label Dec 1, 2022
@matttbe
Copy link
Member Author

matttbe commented Dec 1, 2022

@pabeni already had a first look (notes from the last meeting):

  • the security context should be replaced by the one of the MPTCP socket
  • unfortunately, we cannot do that with the current net core base
  • it would certainly be better to create an extension of the net core to do that without having to adapt SELinux

@matttbe matttbe changed the title Broken SELinux/LSM labeling with MPTCP and accept(2) Broken SELinux/LSM labelling with MPTCP and accept(2) Dec 8, 2022
@matttbe
Copy link
Member Author

matttbe commented Jan 23, 2023

EDIT: different issue, not related to the new patches, moved to #339

@matttbe
Copy link
Member Author

matttbe commented Feb 7, 2023

@pabeni: now that the patches are in our tree, can we close this issue or do you prefer to keep it as we need to wait a bit before upstreaming them?

(We will continue listing the patches at our weekly meetings, should be fine I think, up to you)

@pabeni
Copy link

pabeni commented Feb 8, 2023

@pabeni: now that the patches are in our tree, can we close this issue or do you prefer to keep it as we need to wait a bit before upstreaming them?

I think it would be better waiting until the patches land at least into the LSM tree

@matttbe
Copy link
Member Author

matttbe commented Mar 7, 2023

From the meeting we just had:

@pabeni will send an email to Paul Moore to check what's the best to do here:

  • either we wait for the patches above to be in Linus tree (6.4)
  • or these 2 patches are carried in net-next with the rest

@matttbe
Copy link
Member Author

matttbe commented Mar 14, 2023

@pabeni by chance, did you get any feedback from Paul Moore?
Is it OK if we send only the net patches to netdev by the end of the week?

@matttbe
Copy link
Member Author

matttbe commented May 24, 2023

@matttbe matttbe closed this as completed May 24, 2023
matttbe pushed a commit that referenced this issue May 20, 2024
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: #344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: #345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: #346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants