Skip to content

How to run netdev selftests CI style

Jakub K edited this page Apr 15, 2024 · 14 revisions

netdev selftest runner

This page describes how we run selftests in netdev CI. It should be helpful for reproducing CI failures, but it's not the one and only way the tests can be run!

Netdev CI: https://netdev.bots.linux.dev/status.html

"Flakes" view: https://netdev.bots.linux.dev/flakes.html

vng

We run tests in virtme-ng: https://github.com/arighi/virtme-ng

virtme-ng builds the kernel and runs the tests.

Runner types

There are 3 runner types which can be distinguished by looking at the "Remote":

  • runners called metal-${name} are netdev runners running with a normal / full performance kernel;
  • runners called metal-${name}-dbg are netdev runners running with debug kernels;
  • runners without metal- in the name are external, run by other teams / companies and reported to the system.

How to build

Building the kernel

Kernels are built with just the relevant options enabled, for instance for net selftests:

vng --build  --config tools/testing/selftests/net/config

and for forwarding selftests we'd use the forwarding config:

vng --build  --config tools/testing/selftests/net/forwarding/config

The "-dbg" runners get extra debug options:

vng --build  \
        --config tools/testing/selftests/net/forwarding/config \
        --config kernel/configs/debug.config

vng is unreliable at detecting when kernels need to be rebuilt so it's a good idea to run make mrproper before the build to delete old builds. Note that make mrproper will remove a lot of artifacts, including your old configs!

Building the tests

Tests are built separately, e.g. for forwarding:

make -C tools/testing/selftests/ TARGETS=net/forwarding

Building tests is prone to failing as it tries to use distro headers by default. If compilation fails, generate uAPI headers: make headers, and try again.

How to run

Executor runs tests one by one:

vng -v --run . --user root --cpus 4 -- \
        make -C tools/testing/selftests TARGETS=net TEST_PROGS=pmtu.sh TEST_GEN_PROGS="" run_tests

Dealing with slow runners in performance/latency tests

The "-dbg" runner (and possibly other external runners) export KSFT_MACHINE_SLOW=yes environment variable if the runner's performance is low. This can be used by performance and timing tests to avoid returning failures. The tests are expected to still execute the steps (for pure code coverage) but ignore not meeting performance goals.

Tips

Improving performances

virtme-ng will use virtiofs if virtiofsd is available on the host. virtiofs apparently performs much better than 9p. Please note that 9p will be used for directories given to --rwdir and --rodir options: best not to use it with the whole kernel source tree. It might then be better to use --rw or --overlay-rwdir instead.

When virtme-ng is launched from a container, make sure it can access /dev/kvm, not to fallback to the tcg backend, which is slower. For example if you use Docker, an easy way to get KVM support is to use the --privileged option with docker run.

Reproducing unstable tests

We found that following changes increase test flakiness and can help reproduce rare failures:

  • Increase the number of CPUs with --cpus $number.
  • Disable KVM support by passing --disable-kvm to vng.
  • Generate some noises on the host: stress-ng --cpu $(nproc) --iomix $(nproc) --vm $(nproc) --vm-bytes 1G --timeout 60m
  • Reduce the priority of the VM: sudo renice -n 20 -p $(pidof qemu-system-x86_64)

If things are not working

  • Skip the -v option if you don't want to see kernel logs.
  • If QEMU fails to start, try using --disable-microvm.
  • Some tests try to write into the PWD and virtme-ng gives by default RO access to the host filesystem. You can either use OverlayFS with --overlay-rwdir tools/testing/selftests/net/, or a direct write access with 9p (might be slow if IO is important) using --rwdir tools/testing/selftests/net/ or put the whole host filesystem in RW access with --rw (e.g. if launched from a container).
  • Some tests use kernel modules loaded at run-time and the virtme environment uses the host modprobe configuration. Local configuration, e.g. module blacklist, can cause tests failure. To start virtme with an empty modprobe configuration use: mkdir modprobe.d; vng --rodir /etc/modprobe.d=modprobe.d #... other options