Skip to content

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: selftests/bpf: Integrate test_xsk.c to test_progs framework
version: 5
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

@kernel-patches-review-bot
Copy link

commit c1ca2b253c61ceea2207c4f7dfd7f0b13dde0981
Author: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>

selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails

This patch changes __testapp_validate_traffic() to return TEST_FAILURE
instead of calling exit_with_error(), allowing subsequent tests to run.
It adds cleanup of sockets and umem when pthread_barrier_destroy fails.

> diff --git a/tools/testing/selftests/bpf/test_xsk.c b/tools/testing/selftests/bpf/test_xsk.c
> index 2c392d5b9..18a6ce648 100644
> --- a/tools/testing/selftests/bpf/test_xsk.c
> +++ b/tools/testing/selftests/bpf/test_xsk.c
> @@ -1772,12 +1772,12 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i
>  	err = test_spec_set_mtu(test, test->mtu);
>  	if (err) {
>  		ksft_print_msg("Error, could not set mtu.\n");
> -		exit_with_error(err);
> +		return TEST_FAILURE;
>  	}
>
>  	if (ifobj2) {
>  		if (pthread_barrier_init(&barr, NULL, 2))
> -			exit_with_error(errno);
> +			return TEST_FAILURE;
>  		pkt_stream_reset(ifobj2->xsk->pkt_stream);
>  	}
>
> @@ -1791,8 +1791,11 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i
>
>  	if (ifobj2) {
>  		pthread_barrier_wait(&barr);
> -		if (pthread_barrier_destroy(&barr))
> -			exit_with_error(errno);
> +		if (pthread_barrier_destroy(&barr)) {
> +			clean_sockets(test, ifobj1);
> +			clean_umem(test, ifobj1, NULL);
> +			return TEST_FAILURE;
> +		}

Can the running thread t0 cause problems here?  At line 1790, thread t0
is created and starts executing ifobj1->func_ptr.  When
pthread_barrier_destroy fails, the code cleans up ifobj1's sockets and
umem, then returns without joining or killing t0.  Thread t0 continues
running and may access the freed resources, potentially causing a crash.

The normal cleanup paths later in the function (lines 1806-1809) always
handle t0 with either pthread_kill or pthread_join before cleaning up
resources.

>
>  		/*Spawn TX thread */
>  		pthread_create(&t1, NULL, ifobj2->func_ptr, test);



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18554311548

@kernel-patches-daemon-bpf
Copy link
Author

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 48a97ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 50de48a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 4f8543b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

AF_XDP features are tested by the test_xsk.sh script but not by the
test_progs framework. The tests used by the script are defined in
xksxceiver.c which can't be integrated in the test_progs framework as is.

Extract these test definitions from xskxceiver{.c/.h} to put them in new
test_xsk{.c/.h} files.
Keep the main() function and its unshared dependencies in xksxceiver to
avoid impacting the test_xsk.sh script which is often used to test real
hardware.
Move ksft_test_result_*() calls to xskxceiver.c to keep the kselftest's
report valid

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
bitmap is used before being initialized.

Initialize it to zero before using it.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
__testapp_validate_traffic is supposed to return an integer value that
tells if the test passed (0), failed (-1) or was skiped (2). It actually
returns a boolean in the end. This doesn't harm when the test is
successful but can lead to misinterpretation in case of failure as 1
will be returned instead of -1.

Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
testapp_stats_rx_dropped() generates pkt_stream twice. The last
generated is released by pkt_stream_restore_default() at the end of the
test but we lose the pointer of the first pkt_stream.

Release the 'middle' pkt_stream when it's getting replaced to prevent
memory leaks.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr,
where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed.
At the end of the test pkt_stream_restore_default() only releases
xsk_arr[0] which leads to memory leaks.

Release the missing pkt_stream at the end of testapp_xdp_shared_umem()

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
The clean-up done at the end of a test in __testapp_validate_traffic()
isn't wrapped in a function. It isn't convenient if we want to use it
somewhere else in the code.

Wrap the clean-up in two new functions : the first deletes the sockets,
the second releases the umem.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
testapp_validate_traffic() doesn't release the sockets and the umem
created by the threads if the test isn't currently in its last step.
Thus, if the swap_xsk_resources() fails before the last step, the
created resources aren't cleaned up.

Clean the sockets and the umem in case of swap_xsk_resources() failure.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
init_iface() doesn't have any return value while it can fail. In case of
failure it calls exit_on_error() which exits the application
immediately. This prevents the following tests from being run and isn't
compliant with the CI

Add a return value to init_iface() so errors can be handled more
smoothly.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
xsk_reattach_xdp calls exit_with_error() on failures. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Add a return value to the functions handling XDP attachments to handle
errors more smoothly.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
exit_with_error() is called when gettimeofday() fails. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_on_error().

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
TX and RX workers can fail in many places. These failures trigger a call
to exit_with_error() which exits the program immediately. It prevents the
following tests from running and isn't compliant with the CI.

Add return value to functions that can fail.
Handle failures more smoothly through report_failure().

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
…ails

__testapp_validate_traffic() calls exit_with_error() on failures. This
exits the program immediately. It prevents the following tests from
running and isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_with_error().
Release the resource of the 1st thread if a failure happens between its
creation and the creation of the second thread.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is
called. This terminates the program immediately. It prevents the following
tests from running and isn't compliant with the CI.

Return NULL in case of allocation failure.
Return TEST_FAILURE when something goes wrong in the packet generation.
Clean up the resources if a failure happens between two steps of a test.

Move exit_with_error()'s definition into xskxceiver.c as it isn't used
anywhere else now.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Some tests are flaky and fail from time to time on virtual interfaces.
Adding them to the CI would trigger lots of 'false' errors.

Remove the flaky tests from the nominal tests table so they won't be
run by the CI in upcoming patch.
Create a flaky_tests table to hold them.
Use this flaky table in xskxceiver.c to keep all the tests available
from the test_xsk.sh script.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
test_xsk.c isn't part of the test_progs framework.

Integrate the tests defined by test_xsk.c into the test_progs framework
through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it
as veth peers don't support it.

Move test_xsk{.c/.h} to prog_tests/.

Add the find_bit library to test_progs sources in the Makefile as it is
is used by test_xsk.c

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 7361c86
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1012213
version: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant