Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Add socket destroy capability #4848

Closed
wants to merge 7 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: Add socket destroy capability
version: 5
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=735459

Previously, BPF TCP iterator was acquiring fast version of sock lock that
disables the BH. This introduced a circular dependency with code paths that
later acquire sockets hash table bucket lock.
Replace the fast version of sock lock with slow that faciliates BPF
programs executed from the iterator to destroy TCP listening sockets
using the bpf_sock_destroy kfunc (implemened in follow-up commits).

Here is a stack trace that motivated this change:

```
1) sock_lock with BH disabled + bucket lock

lock_acquire+0xcd/0x330
_raw_spin_lock_bh+0x38/0x50
inet_unhash+0x96/0xd0
tcp_set_state+0x6a/0x210
tcp_abort+0x12b/0x230
bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa
bpf_iter_run_prog+0x1ff/0x340
bpf_iter_tcp_seq_show+0xca/0x190
bpf_seq_read+0x177/0x450
vfs_read+0xc6/0x300
ksys_read+0x69/0xf0
do_syscall_64+0x3c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

2) sock lock with BH enable

[    1.499968]   lock_acquire+0xcd/0x330
[    1.500316]   _raw_spin_lock+0x33/0x40
[    1.500670]   sk_clone_lock+0x146/0x520
[    1.501030]   inet_csk_clone_lock+0x1b/0x110
[    1.501433]   tcp_create_openreq_child+0x22/0x3f0
[    1.501873]   tcp_v6_syn_recv_sock+0x96/0x940
[    1.502284]   tcp_check_req+0x137/0x660
[    1.502646]   tcp_v6_rcv+0xa63/0xe80
[    1.502994]   ip6_protocol_deliver_rcu+0x78/0x590
[    1.503434]   ip6_input_finish+0x72/0x140
[    1.503818]   __netif_receive_skb_one_core+0x63/0xa0
[    1.504281]   process_backlog+0x79/0x260
[    1.504668]   __napi_poll.constprop.0+0x27/0x170
[    1.505104]   net_rx_action+0x14a/0x2a0
[    1.505469]   __do_softirq+0x165/0x510
[    1.505842]   do_softirq+0xcd/0x100
[    1.506172]   __local_bh_enable_ip+0xcc/0xf0
[    1.506588]   ip6_finish_output2+0x2a8/0xb00
[    1.506988]   ip6_finish_output+0x274/0x510
[    1.507377]   ip6_xmit+0x319/0x9b0
[    1.507726]   inet6_csk_xmit+0x12b/0x2b0
[    1.508096]   __tcp_transmit_skb+0x549/0xc40
[    1.508498]   tcp_rcv_state_process+0x362/0x1180

```

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
This is a preparatory commit to remove the field. The field was
previously shared between proc fs and BPF UDP socket iterators. As the
follow-up commits will decouple the implementation for the iterators,
remove the field. As for BPF socket iterator, filtering of sockets is
exepected to be done in BPF programs.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
This is a preparatory commit to refactor code that matches socket
attributes in iterators to a helper function, and use it in the
proc fs iterator.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Batch UDP sockets from BPF iterator that allows for overlapping locking
semantics in BPF/kernel helpers executed in BPF programs.  This facilitates
BPF socket destroy kfunc (introduced by follow-up patches) to execute from
BPF iterator programs.

Previously, BPF iterators acquired the sock lock and sockets hash table
bucket lock while executing BPF programs. This prevented BPF helpers that
again acquire these locks to be executed from BPF iterators.  With the
batching approach, we acquire a bucket lock, batch all the bucket sockets,
and then release the bucket lock. This enables BPF or kernel helpers to
skip sock locking when invoked in the supported BPF contexts.

The batching logic is similar to the logic implemented in TCP iterator:
https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium to force
client sockets to reconnect when their remote load-balancing backends are
deleted. The other use case is on-the-fly policy enforcement where existing
socket connections prevented by policies need to be forcefully terminated.
The helper allows terminating sockets that may or may not be actively
sending traffic.

The helper is currently exposed to certain BPF iterators where users can
filter, and terminate selected sockets.  Additionally, the helper can only
be called from these BPF contexts that ensure socket locking in order to
allow synchronous execution of destroy helpers that also acquire socket
locks. The previous commit that batches UDP sockets during iteration
facilitated a synchronous invocation of the destroy helper from BPF context
by skipping taking socket locks in the destroy handler. TCP iterators
already supported batching.

The helper takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. As a comparison, BPF helpers enable this behavior with the
`ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
option available with the verifier logic that handles kfuncs where BTF
types are inferred. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe for certain sockets like request sockets, but these have
a special handling in the diag_destroy handlers.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
The helper will be used to programmatically retrieve,
and pass ports in userspace and kernel selftest programs.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
The test cases for destroying sockets mirror the intended usages of the
bpf_sock_destroy kfunc using iterators.

The destroy helpers set `ECONNABORTED` error code that we can validate in
the test code with client sockets. But UDP sockets have an overriding error
code from the disconnect called during abort, so the error code the
validation is only done for TCP sockets.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
@kernel-patches-bot
Copy link
Author

Upstream branch: 4ca13d1
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=735459
version: 5

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=735459 expired. Closing PR.

@kernel-patches-bot kernel-patches-bot deleted the series/735459=>bpf-next branch April 2, 2023 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants