(2-3) qemu cross tests by daniel-noland · Pull Request #1558 · githedgehog/dataplane

daniel-noland · 2026-05-22T02:06:27Z

run cross-compiled tests under qemu-user!

A ci:+cross or ci:+cross/full label exercises the musl/gnu x aarch64/bluefield3 matrix end-to-end instead of stopping at "it linked".
Put the new qemu-user path and the existing Miri path behind a single emulated cfg, so timing-sensitive / filesystem-bound tests can declare "skip me under emulation" once instead of carrying parallel cfg_attr(miri, ...) and cfg_attr(qemu_user, ...) arms.

The cross CI job is advisory (job-level continue-on-error: true): its results post as a sticky PR comment via a small composite action, rather than turning the workflow red.

Auto-bump PRs ship the ci:+cross/full label so upstream regressions surface there too.

The goal: detect cross compile regressions more easily without blocking or slowing down CI in day-to-day operations. In the future we will likely want to make at least BF3 tests gating for PRs, but we are not quite there yet. For now I'll settle for getting a big warning when/if we, e.g., start depending on a crate which is x86_64 only.

Note

We run cross compiled tests in debug mode only! qemu-user isn't always able to emulate "higher" instructions for aarch64 (e.g. some NEON instructions) which can and will show up under the release build. I don't have a real solution to this problem so I'm going to ignore it and take what I can get 🤷

Components

scripts/test-runner.sh -- wired in via .cargo/config.toml's [target.<linux>] runners. Fast path: when the host arch matches the target and the host kernel is Linux, exec the test directly (zero qemu cost on native runs). Otherwise exec qemu-<arch>. Also forwards to .cargo-miri-wrapped when MIRI_SYSROOT is set, so the runner is a no-op under Miri.
nix/profiles.nix -- gains a host-arch parameter, threaded from default.nix. For test builds where arch != host-arch, the profile adds --cfg=emulated. --check-cfg=cfg(emulated) is always registered so cfg_attr(emulated, ...) sites don't trip unexpected_cfgs on native builds.
miri.just -- adds --cfg=emulated to RUSTFLAGS so the Miri path shares the umbrella.
.config/nextest.toml -- new cross-qemu nextest profile with slow-timeout = { period = "60s", terminate-after = 5 } and failure-output = immediate-final so a hung test in qemu does not hold the run indefinitely.
.github/workflows/dev.yml -- new cross job and an updated summary step. continue-on-error: true at the job level means needs.cross.result reports success to dependents regardless of outcome; summary therefore lists per-cell conclusions via the Actions API and only posts the sticky comment when at least one cell failed.
.github/actions/sticky-pr-comment/ -- composite action that create-or-updates a marker-keyed PR comment, with input validation and try/catch so a 5xx from the API does not fail the calling job.

Notes for reviewers

emulated deliberately conflates qemu-user with Miri. If a test wants "skip on Miri only", keep cfg_attr(miri, ignore); if it wants "skip on qemu only", check the target arch alongside. The umbrella exists for the common case (timing / fs / proc / UDS).
The cross job's matrix is max-parallel: 1 -- intentional while we learn the failure modes; revisit once the matrix is steady.
Cross failures on push / merge_group are not surfaced (sticky comments are PR-only); if main breakage tracking matters, we can add an issue or channel hook later.

Copilot

Pull request overview

This PR adds infrastructure to run (some) cross-arch test archives back on the build host under qemu-user, by introducing an emulated cfg used to downscale/ignore tests that are unreliable under emulation (and to share that behavior with Miri).

Changes:

Add a target runner wrapper (scripts/test-runner.sh) and configure Cargo to use it for Linux x86_64/aarch64 targets, falling back to qemu-user when host/target arch differs.
Introduce and propagate an emulated cfg in Nix and Miri paths, and update many tests to use #[cfg_attr(emulated, ...)] / cfg_select! { emulated => ... }.
Add CI plumbing for advisory cross failures, including a composite “sticky PR comment” action and workflow updates.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scripts/test-runner.sh	New runner wrapper: native exec on Linux host/target match; otherwise runs via `qemu-<arch>`.
routing/src/router/rio.rs	Ignore UDS-binding tests when compiled with `cfg(emulated)`.
routing/src/rib/vrftable.rs	Disable `traced_test` under emulation to reduce overhead/noise.
routing/src/rib/nexthop.rs	Disable `traced_test` under emulation across multiple tests.
routing/src/frr/test.rs	Ignore fake-FRR UDS test under emulation; keep tracing only when not emulated.
routing/src/fib/test.rs	Treat emulation like Miri for reduced iteration counts and suppressed prints.
routing/src/config/mod.rs	Disable `traced_test` under emulation across config tests.
routing/src/atable/resolver.rs	Ignore resolver test under emulation due to `/proc`/kernel interface assumptions.
nix/profiles.nix	Add `host-arch` param; set `--cfg=emulated` for cross test archives and register cfg via `--check-cfg`.
net/src/packet/hash.rs	Reduce packet counts under emulation.
nat/src/stateless/test.rs	Disable `traced_test` under emulation for several tests.
nat/src/stateful/test.rs	Disable `traced_test` under emulation for multiple async tests.
nat/src/portfw/test.rs	Disable `traced_test` under emulation across port-forwarding tests.
nat/src/portfw/portfwtable/objects.rs	Disable `traced_test` under emulation for table object tests.
nat/src/portfw/portfwtable/access.rs	Disable `traced_test` under emulation for table access tests.
miri.just	Add shared `--cfg=emulated` for Miri runs (matching the qemu-user path).
mgmt/tests/reconcile.rs	Disable `traced_test` under emulation for mgmt integration tests.
mgmt/src/tests/mgmt.rs	Disable `traced_test` under emulation for mgmt unit test/tool.
left-right-tlcache/src/lib.rs	Reduce iteration/handle counts under emulation.
k8s-less/src/local.rs	Disable `traced_test` under emulation for ignored async test.
k8s-intf/src/bolero/support.rs	Reduce bolero iterations under emulation.
flow-filter/src/tests.rs	Disable `traced_test` under emulation across many flow-filter tests.
flow-entry/src/flow_table/table.rs	Disable traced_test-based tracing under emulation for timing-sensitive tests.
flow-entry/src/flow_table/nf_lookup.rs	Reduce packet counts under emulation.
dpdk/src/acl/config.rs	Ignore a validator test on aarch64 due to cross-bindgen macro inconsistency (documented TODO).
default.nix	Thread `host-arch` into profiles; add `qemu-user` to dev shell packages.
config/src/utils/collapse.rs	Reduce bolero generator size under emulation.
cli/src/cliproto.rs	Ignore Unix-socket communications test under emulation.
.gitignore	Ignore qemu-user core dumps (`qemu_*.core`).
.github/workflows/dev.yml	Add `ci:+cross/full` label support; add advisory cross-test run on musl; add sticky comment surfacing for cross failures and required permissions.
.github/workflows/bump.yml	Add `ci:+cross/full` label to automated bump PRs.
.github/actions/sticky-pr-comment/action.yml	New composite action to create/update a marker-keyed PR comment via github-script.
.config/nextest.toml	Add `cross-qemu` nextest profile with different failure output/termination behavior.
.cargo/config.toml	Register `emulated` cfg via `--check-cfg` and add per-target runners invoking `scripts/test-runner.sh`.

Copilot

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 2 comments.

Add the plumbing that lets cargo execute cross-compiled tests under qemu-user when the host architecture doesn't match the target: * `scripts/test-runner.sh` -- thin wrapper. When `MIRI_SYSROOT` is set we delegate to `.cargo-miri-wrapped` so a miri run on a matching host arch still goes through the miri interpreter instead of running natively under qemu. Otherwise: native exec when target == host, else `qemu-${target_machine}`. * `.cargo/config.toml` -- runner entries for the four cross triples we ship (x86_64 / aarch64 × gnu / musl). Explicit triples (not cfg-patterns) because cargo miri injects a `cfg(all())` runner and refuses to disambiguate between two cfg-pattern matches; the explicit form wins method-resolution-style and the script's `MIRI_SYSROOT` branch handles the miri case from inside. * `default.nix` -- `qemu-user` joins the dev shell so the wrapper has `qemu-x86_64` / `qemu-aarch64` on PATH locally and in CI. No CI behaviour change yet -- this just makes `cargo test --target <cross-triple>` work via emulation when run by hand. The CI step that actually exercises the path lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Add a `test` step to the cross job that runs the full nextest suite against the cross-compiled archive. Each cross binary dispatches through `scripts/test-runner.sh` (registered as the cargo runner in `.cargo/config.toml`) which delegates to `qemu-${target_machine}` when the host architecture doesn't match. Gates: * Only runs on `pull_request` runs with `ci:+cross` or `ci:+cross/full` on the labels. push / merge_group cross legs stay build-only; ISA emulation pays a real wall-clock cost and we don't want to slow the merge queue. * Gated to `matrix.recipe.args == 'dataplane'` so the test pass isn't duplicated for the `frr.dataplane` row, which differs from the `dataplane` row only in the container recipe, not in the cross target. `ci:+cross` (today's "run the cross job at all" label) implies "include the qemu test pass" -- there is currently no way to schedule cross/full builds without a label, so the same gate serves both modes. Splitting them out (e.g. opting bump.yml into cross/full automatically) is a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

The test hard-codes `3` as a "non-multiple of `RESULTS_MULTIPLIER`" which assumes the multiplier is 4 (x86_64). Under cross-compiled aarch64 builds the wrapper's `RESULTS_MULTIPLIER` constant binds to 1 -- the validator's "not-a-multiple" branch becomes unreachable because every positive integer is trivially a multiple of 1, so the validator returns `Ok` and the test panics on the `matches!` assert. Upstream DPDK defines `RTE_ACL_RESULTS_MULTIPLIER` as `XMM_SIZE / sizeof(uint32_t)`, which should be 4 on every supported ISA. Either bindgen on the cross sysroot doesn't see the right `XMM_SIZE` typedef on aarch64 or the cross headers ship a divergent `rte_vect.h`; needs a follow-up investigation. Skip on aarch64 with a TODO so the rest of the cross-aarch64 test surface can run green under qemu-user. This is the only known target-specific divergence in the dpdk binding; the rest of the sweep passes cleanly under musl+qemu. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Introduce a single `emulated` cfg as the umbrella over both emulation backends in the test surface: * `miri.just`: append `--cfg=emulated` to `RUSTFLAGS` so miri invocations set it. * `nix/profiles.nix`: set `--cfg=emulated` whenever `for-tests` is true and the target arch differs from the build host's (`is-emulated-test`). Today this covers cross-arch test builds that run under qemu-user on the lab runners. `--check-cfg=cfg (emulated)` is set unconditionally so unused branches don't trip `unexpected_cfgs`. * `default.nix`: plumb the build host's arch (`stdenv.hostPlatform.parsed.cpu.name`) through to `profiles.nix` as `host-arch`, so `is-emulated-test` compares target against actual host rather than hard-coded \`"x86_64"\`. * `.cargo/config.toml`: same `--check-cfg=cfg(emulated)` for the native dev build path. Sweep existing `cfg_attr(miri, ignore)` / `cfg(miri)` / `cfg_select! { miri => N }` sites that apply equally to qemu-user: * `routing/src/router/rio.rs`, `routing/src/frr/test.rs`, `routing/src/atable/resolver.rs`, `cli/src/cliproto.rs`: tests that bind Unix domain sockets / read kernel-state files now skip under any emulation backend, not just miri. qemu-user's epoll readiness emulation has the same gaps that justified the original miri skips. * `routing/src/fib/test.rs`, `net/src/packet/hash.rs`, `flow-entry/src/flow_table/nf_lookup.rs`, `left-right-tlcache/src/lib.rs`, `k8s-intf/src/bolero/support .rs`, `config/src/utils/collapse.rs`: per-arm iteration counts in `cfg_select!` are now keyed by `emulated` rather than `miri`. qemu-user runs at ~5-10x slowdown vs native and the original miri count (which targeted miri's much steeper slowdown) is still a sensible upper bound for qemu-user too. Pair this with two CI/runtime side changes: * `.config/nextest.toml`: new `cross-qemu` nextest profile sets a `slow-timeout = { period = "60s", terminate-after = 5 }` so a qemu hang gets killed and surfaces as a TIMEOUT in the report instead of wedging the whole run. `fail-fast` is permissive so we collect the full list of qemu-affected tests in one go. * `.github/workflows/dev.yml`: cross-job `test` step is restricted to `matrix.libc == 'musl'` because gnu's libgcc_s unwinder mis-handles qemu-user's emulated signal frames on aarch64 (every panic-unwind path turns into SIGABRT). musl's LLVM libunwind walks those frames correctly, so musl legs run the full suite clean; gnu legs stay build-only until/unless we fix the gnu unwinder story. * `.gitignore`: ignore `**/qemu_*.core` so the SIGABRT cores qemu drops when running gnu cross binaries locally don't show up in `git status`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

It is much too slow to print to stdout like this under qemu-user or miri. Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Two halves of one logical change: * `.github/workflows/dev.yml`: extend the cross job's `if:` to fire on `ci:+cross/full` in addition to `ci:+cross`, so a PR labeled only with `ci:+cross/full` (no `ci:+cross`) still gets cross. * `.github/workflows/bump.yml`: auto-apply `ci:+cross/full` to the weekly cargo-upgrades PR. The weekly cargo-upgrades PR is the right place to catch cross-arch regressions introduced by transitive dep churn (a crate dropping an aarch64 target, changing alignment, etc.). Without an opt-in label the cross job stays build-only on PR runs, so we add \`ci:+cross/full\` alongside the existing \`automated\` and \`dependencies\` labels. Today that label triggers the qemu-user test step on the existing 4-leg cross matrix; once the matrix scope split lands, the same label will expand the matrix to the full hardware x libc sweep without further changes here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

daniel-noland · 2026-05-23T02:55:43Z

see PR #1560 for an illustration of how the ci:+cross/full is now applied to bump jobs

Cross stays advisory (`continue-on-error: true`) on every PR, push, and merge_group run, so leg flakes never block merge. That's the right default for interactive PRs and the merge queue, but it makes genuine cross failures easy to miss on the weekly cargo-bump PR (auto-labeled `ci:+cross/full`), where catching upstream-driven aarch64 regressions is the whole point. Add a sticky-comment surfacing step: * New composite action `.github/actions/sticky-pr-comment` -- create-or-update a PR comment keyed by an HTML-comment marker (so subsequent runs find and update the same comment instead of spamming the thread). * New steps in the `summary` job that read the cross matrix's real per-leg conclusions from the Actions REST API (via `actions/github-script`) and, if any leg failed, post the sticky comment with a link back to the failing run. The summary job grows `pull-requests: write` (for the comment) and `actions: read` (for the API call). We can't gate on `needs.cross.result` here: at the job level, `continue-on-error: true` makes that always report `success` to dependents, which is exactly the property we want for merge gating but which also hides the failure from this step. Reading the underlying job results sidesteps the masking. The alternative -- making cross blocking when `ci:+cross/full` is present -- creates an attribution trap: a non-labeled PR that silently regresses cross would land on main, and the next cross/full-labeled bump PR would inherit the broken main and read as "the bump caused the regression." A sticky comment gives reviewers a loud in-PR signal without that misattribution risk and preserves the "leg-flake doesn't block merge" property uniformly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Signed-off-by: Daniel Noland <daniel@githedgehog.com>

daniel-noland added ci:+cross run cross compile jobs ci:+cross/full labels May 22, 2026

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch 3 times, most recently from cfb126b to ca97529 Compare May 22, 2026 20:26

daniel-noland requested a review from Copilot May 22, 2026 20:31

Copilot started reviewing on behalf of daniel-noland May 22, 2026 20:32 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread scripts/test-runner.sh

Comment thread .github/workflows/dev.yml Outdated

daniel-noland requested a review from Copilot May 22, 2026 21:59

Copilot started reviewing on behalf of daniel-noland May 22, 2026 21:59 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread .cargo/config.toml

Comment thread nix/profiles.nix

daniel-noland self-assigned this May 22, 2026

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from 7443aea to fae819d Compare May 22, 2026 23:17

daniel-noland changed the base branch from main to pr/daniel-noland/concurrency-facade May 22, 2026 23:18

daniel-noland force-pushed the pr/daniel-noland/concurrency-facade branch from 66c8acf to 70b51a0 Compare May 22, 2026 23:21

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from fae819d to 340a7ac Compare May 22, 2026 23:22

daniel-noland force-pushed the pr/daniel-noland/concurrency-facade branch from 70b51a0 to 7c9c5f5 Compare May 22, 2026 23:22

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from 340a7ac to 84519d3 Compare May 22, 2026 23:23

daniel-noland changed the title ~~(ignore for now) qemu cross tests~~ (2) qemu cross tests May 22, 2026

daniel-noland marked this pull request as ready for review May 22, 2026 23:25

daniel-noland requested a review from a team as a code owner May 22, 2026 23:25

daniel-noland requested review from sergeymatov and removed request for a team May 22, 2026 23:25

daniel-noland force-pushed the pr/daniel-noland/concurrency-facade branch from 7c9c5f5 to 93f9bb7 Compare May 23, 2026 01:06

daniel-noland and others added 6 commits May 22, 2026 19:21

test: disable trace under emulation

7ee09a9

It is much too slow to print to stdout like this under qemu-user or miri. Signed-off-by: Daniel Noland <daniel@githedgehog.com>

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from 84519d3 to 5c63716 Compare May 23, 2026 01:22

daniel-noland mentioned this pull request May 23, 2026

(4) bump(cargo)!: 🚀 upgrades available #1560

Open

daniel-noland mentioned this pull request May 23, 2026

(4.1) ci: demonstrate ci:+cross/full fail message #1561

Open

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from 5c63716 to a326c99 Compare May 23, 2026 03:42

daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from a326c99 to a909643 Compare May 23, 2026 04:23

bump(dplane-plugin): accept glibc-ism removal

a1c1033

Signed-off-by: Daniel Noland <daniel@githedgehog.com>

daniel-noland changed the title ~~(2) qemu cross tests~~ (2 + 3) qemu cross tests May 23, 2026

daniel-noland changed the title ~~(2 + 3) qemu cross tests~~ (2-3) qemu cross tests May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(2-3) qemu cross tests#1558

(2-3) qemu cross tests#1558
daniel-noland wants to merge 8 commits into
pr/daniel-noland/concurrency-facadefrom
pr/daniel-noland/qemu-cross-tests

daniel-noland commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

daniel-noland commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daniel-noland commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

run cross-compiled tests under qemu-user!

Components

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

daniel-noland commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniel-noland commented May 22, 2026 •

edited

Loading