Skip to content

(2-3) qemu cross tests#1558

Open
daniel-noland wants to merge 8 commits into
pr/daniel-noland/concurrency-facadefrom
pr/daniel-noland/qemu-cross-tests
Open

(2-3) qemu cross tests#1558
daniel-noland wants to merge 8 commits into
pr/daniel-noland/concurrency-facadefrom
pr/daniel-noland/qemu-cross-tests

Conversation

@daniel-noland
Copy link
Copy Markdown
Collaborator

@daniel-noland daniel-noland commented May 22, 2026

run cross-compiled tests under qemu-user!

  • A ci:+cross or ci:+cross/full label exercises the musl/gnu x aarch64/bluefield3 matrix end-to-end instead of stopping at "it linked".
  • Put the new qemu-user path and the existing Miri path behind a single emulated cfg, so timing-sensitive / filesystem-bound tests can declare "skip me under emulation" once instead of carrying parallel cfg_attr(miri, ...) and cfg_attr(qemu_user, ...) arms.

The cross CI job is advisory (job-level continue-on-error: true): its results post as a sticky PR comment via a small composite action, rather than turning the workflow red.

Auto-bump PRs ship the ci:+cross/full label so upstream regressions surface there too.

The goal: detect cross compile regressions more easily without blocking or slowing down CI in day-to-day operations. In the future we will likely want to make at least BF3 tests gating for PRs, but we are not quite there yet. For now I'll settle for getting a big warning when/if we, e.g., start depending on a crate which is x86_64 only.

Note

We run cross compiled tests in debug mode only! qemu-user isn't always able to emulate "higher" instructions for aarch64 (e.g. some NEON instructions) which can and will show up under the release build. I don't have a real solution to this problem so I'm going to ignore it and take what I can get 🤷

Components

  • scripts/test-runner.sh -- wired in via .cargo/config.toml's [target.<linux>] runners. Fast path: when the host arch matches the target and the host kernel is Linux, exec the test directly (zero qemu cost on native runs). Otherwise exec qemu-<arch>. Also forwards to .cargo-miri-wrapped when MIRI_SYSROOT is set, so the runner is a no-op under Miri.
  • nix/profiles.nix -- gains a host-arch parameter, threaded from default.nix. For test builds where arch != host-arch, the profile adds --cfg=emulated. --check-cfg=cfg(emulated) is always registered so cfg_attr(emulated, ...) sites don't trip unexpected_cfgs on native builds.
  • miri.just -- adds --cfg=emulated to RUSTFLAGS so the Miri path shares the umbrella.
  • .config/nextest.toml -- new cross-qemu nextest profile with slow-timeout = { period = "60s", terminate-after = 5 } and failure-output = immediate-final so a hung test in qemu does not hold the run indefinitely.
  • .github/workflows/dev.yml -- new cross job and an updated summary step. continue-on-error: true at the job level means needs.cross.result reports success to dependents regardless of outcome; summary therefore lists per-cell conclusions via the Actions API and only posts the sticky comment when at least one cell failed.
  • .github/actions/sticky-pr-comment/ -- composite action that create-or-updates a marker-keyed PR comment, with input validation and try/catch so a 5xx from the API does not fail the calling job.

Notes for reviewers

  • emulated deliberately conflates qemu-user with Miri. If a test wants "skip on Miri only", keep cfg_attr(miri, ignore); if it wants "skip on qemu only", check the target arch alongside. The umbrella exists for the common case (timing / fs / proc / UDS).
  • The cross job's matrix is max-parallel: 1 -- intentional while we learn the failure modes; revisit once the matrix is steady.
  • Cross failures on push / merge_group are not surfaced (sticky comments are PR-only); if main breakage tracking matters, we can add an issue or channel hook later.

@daniel-noland daniel-noland added ci:+cross run cross compile jobs ci:+cross/full labels May 22, 2026
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch 3 times, most recently from cfb126b to ca97529 Compare May 22, 2026 20:26
@daniel-noland daniel-noland requested a review from Copilot May 22, 2026 20:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds infrastructure to run (some) cross-arch test archives back on the build host under qemu-user, by introducing an emulated cfg used to downscale/ignore tests that are unreliable under emulation (and to share that behavior with Miri).

Changes:

  • Add a target runner wrapper (scripts/test-runner.sh) and configure Cargo to use it for Linux x86_64/aarch64 targets, falling back to qemu-user when host/target arch differs.
  • Introduce and propagate an emulated cfg in Nix and Miri paths, and update many tests to use #[cfg_attr(emulated, ...)] / cfg_select! { emulated => ... }.
  • Add CI plumbing for advisory cross failures, including a composite “sticky PR comment” action and workflow updates.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/test-runner.sh New runner wrapper: native exec on Linux host/target match; otherwise runs via qemu-<arch>.
routing/src/router/rio.rs Ignore UDS-binding tests when compiled with cfg(emulated).
routing/src/rib/vrftable.rs Disable traced_test under emulation to reduce overhead/noise.
routing/src/rib/nexthop.rs Disable traced_test under emulation across multiple tests.
routing/src/frr/test.rs Ignore fake-FRR UDS test under emulation; keep tracing only when not emulated.
routing/src/fib/test.rs Treat emulation like Miri for reduced iteration counts and suppressed prints.
routing/src/config/mod.rs Disable traced_test under emulation across config tests.
routing/src/atable/resolver.rs Ignore resolver test under emulation due to /proc/kernel interface assumptions.
nix/profiles.nix Add host-arch param; set --cfg=emulated for cross test archives and register cfg via --check-cfg.
net/src/packet/hash.rs Reduce packet counts under emulation.
nat/src/stateless/test.rs Disable traced_test under emulation for several tests.
nat/src/stateful/test.rs Disable traced_test under emulation for multiple async tests.
nat/src/portfw/test.rs Disable traced_test under emulation across port-forwarding tests.
nat/src/portfw/portfwtable/objects.rs Disable traced_test under emulation for table object tests.
nat/src/portfw/portfwtable/access.rs Disable traced_test under emulation for table access tests.
miri.just Add shared --cfg=emulated for Miri runs (matching the qemu-user path).
mgmt/tests/reconcile.rs Disable traced_test under emulation for mgmt integration tests.
mgmt/src/tests/mgmt.rs Disable traced_test under emulation for mgmt unit test/tool.
left-right-tlcache/src/lib.rs Reduce iteration/handle counts under emulation.
k8s-less/src/local.rs Disable traced_test under emulation for ignored async test.
k8s-intf/src/bolero/support.rs Reduce bolero iterations under emulation.
flow-filter/src/tests.rs Disable traced_test under emulation across many flow-filter tests.
flow-entry/src/flow_table/table.rs Disable traced_test-based tracing under emulation for timing-sensitive tests.
flow-entry/src/flow_table/nf_lookup.rs Reduce packet counts under emulation.
dpdk/src/acl/config.rs Ignore a validator test on aarch64 due to cross-bindgen macro inconsistency (documented TODO).
default.nix Thread host-arch into profiles; add qemu-user to dev shell packages.
config/src/utils/collapse.rs Reduce bolero generator size under emulation.
cli/src/cliproto.rs Ignore Unix-socket communications test under emulation.
.gitignore Ignore qemu-user core dumps (qemu_*.core).
.github/workflows/dev.yml Add ci:+cross/full label support; add advisory cross-test run on musl; add sticky comment surfacing for cross failures and required permissions.
.github/workflows/bump.yml Add ci:+cross/full label to automated bump PRs.
.github/actions/sticky-pr-comment/action.yml New composite action to create/update a marker-keyed PR comment via github-script.
.config/nextest.toml Add cross-qemu nextest profile with different failure output/termination behavior.
.cargo/config.toml Register emulated cfg via --check-cfg and add per-target runners invoking scripts/test-runner.sh.

Comment thread scripts/test-runner.sh
Comment thread .github/workflows/dev.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 2 comments.

Comment thread .cargo/config.toml
Comment thread nix/profiles.nix
@daniel-noland daniel-noland self-assigned this May 22, 2026
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from 7443aea to fae819d Compare May 22, 2026 23:17
@daniel-noland daniel-noland changed the base branch from main to pr/daniel-noland/concurrency-facade May 22, 2026 23:18
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/concurrency-facade branch from 66c8acf to 70b51a0 Compare May 22, 2026 23:21
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from fae819d to 340a7ac Compare May 22, 2026 23:22
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/concurrency-facade branch from 70b51a0 to 7c9c5f5 Compare May 22, 2026 23:22
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from 340a7ac to 84519d3 Compare May 22, 2026 23:23
@daniel-noland daniel-noland changed the title (ignore for now) qemu cross tests (2) qemu cross tests May 22, 2026
@daniel-noland daniel-noland marked this pull request as ready for review May 22, 2026 23:25
@daniel-noland daniel-noland requested a review from a team as a code owner May 22, 2026 23:25
@daniel-noland daniel-noland requested review from sergeymatov and removed request for a team May 22, 2026 23:25
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/concurrency-facade branch from 7c9c5f5 to 93f9bb7 Compare May 23, 2026 01:06
daniel-noland and others added 6 commits May 22, 2026 19:21
Add the plumbing that lets cargo execute cross-compiled tests under
qemu-user when the host architecture doesn't match the target:

  * `scripts/test-runner.sh` -- thin wrapper.  When `MIRI_SYSROOT` is
    set we delegate to `.cargo-miri-wrapped` so a miri run on a
    matching host arch still goes through the miri interpreter
    instead of running natively under qemu.  Otherwise: native exec
    when target == host, else `qemu-${target_machine}`.
  * `.cargo/config.toml` -- runner entries for the four cross triples
    we ship (x86_64 / aarch64 × gnu / musl).  Explicit triples (not
    cfg-patterns) because cargo miri injects a `cfg(all())` runner
    and refuses to disambiguate between two cfg-pattern matches; the
    explicit form wins method-resolution-style and the script's
    `MIRI_SYSROOT` branch handles the miri case from inside.
  * `default.nix` -- `qemu-user` joins the dev shell so the wrapper
    has `qemu-x86_64` / `qemu-aarch64` on PATH locally and in CI.

No CI behaviour change yet -- this just makes `cargo test --target
<cross-triple>` work via emulation when run by hand.  The CI step
that actually exercises the path lands in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Add a `test` step to the cross job that runs the full nextest suite
against the cross-compiled archive.  Each cross binary dispatches
through `scripts/test-runner.sh` (registered as the cargo runner in
`.cargo/config.toml`) which delegates to `qemu-${target_machine}`
when the host architecture doesn't match.

Gates:

  * Only runs on `pull_request` runs with `ci:+cross` or
    `ci:+cross/full` on the labels.  push / merge_group cross legs
    stay build-only; ISA emulation pays a real wall-clock cost and
    we don't want to slow the merge queue.
  * Gated to `matrix.recipe.args == 'dataplane'` so the test pass
    isn't duplicated for the `frr.dataplane` row, which differs from
    the `dataplane` row only in the container recipe, not in the
    cross target.

`ci:+cross` (today's "run the cross job at all" label) implies
"include the qemu test pass" -- there is currently no way to
schedule cross/full builds without a label, so the same gate
serves both modes.  Splitting them out (e.g. opting bump.yml into
cross/full automatically) is a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
The test hard-codes `3` as a "non-multiple of `RESULTS_MULTIPLIER`"
which assumes the multiplier is 4 (x86_64).  Under cross-compiled
aarch64 builds the wrapper's `RESULTS_MULTIPLIER` constant binds to
1 -- the validator's "not-a-multiple" branch becomes unreachable
because every positive integer is trivially a multiple of 1, so the
validator returns `Ok` and the test panics on the `matches!` assert.

Upstream DPDK defines `RTE_ACL_RESULTS_MULTIPLIER` as
`XMM_SIZE / sizeof(uint32_t)`, which should be 4 on every supported
ISA.  Either bindgen on the cross sysroot doesn't see the right
`XMM_SIZE` typedef on aarch64 or the cross headers ship a divergent
`rte_vect.h`; needs a follow-up investigation.

Skip on aarch64 with a TODO so the rest of the cross-aarch64 test
surface can run green under qemu-user.  This is the only known
target-specific divergence in the dpdk binding; the rest of the
sweep passes cleanly under musl+qemu.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Introduce a single `emulated` cfg as the umbrella over both
emulation backends in the test surface:

  * `miri.just`: append `--cfg=emulated` to `RUSTFLAGS` so miri
    invocations set it.
  * `nix/profiles.nix`: set `--cfg=emulated` whenever `for-tests` is
    true and the target arch differs from the build host's
    (`is-emulated-test`).  Today this covers cross-arch test builds
    that run under qemu-user on the lab runners.  `--check-cfg=cfg
    (emulated)` is set unconditionally so unused branches don't
    trip `unexpected_cfgs`.
  * `default.nix`: plumb the build host's arch
    (`stdenv.hostPlatform.parsed.cpu.name`) through to
    `profiles.nix` as `host-arch`, so `is-emulated-test` compares
    target against actual host rather than hard-coded \`"x86_64"\`.
  * `.cargo/config.toml`: same `--check-cfg=cfg(emulated)` for the
    native dev build path.

Sweep existing `cfg_attr(miri, ignore)` / `cfg(miri)` / `cfg_select!
{ miri => N }` sites that apply equally to qemu-user:

  * `routing/src/router/rio.rs`, `routing/src/frr/test.rs`,
    `routing/src/atable/resolver.rs`, `cli/src/cliproto.rs`: tests
    that bind Unix domain sockets / read kernel-state files now
    skip under any emulation backend, not just miri.  qemu-user's
    epoll readiness emulation has the same gaps that justified the
    original miri skips.
  * `routing/src/fib/test.rs`, `net/src/packet/hash.rs`,
    `flow-entry/src/flow_table/nf_lookup.rs`,
    `left-right-tlcache/src/lib.rs`, `k8s-intf/src/bolero/support
    .rs`, `config/src/utils/collapse.rs`: per-arm iteration counts
    in `cfg_select!` are now keyed by `emulated` rather than `miri`.
    qemu-user runs at ~5-10x slowdown vs native and the original
    miri count (which targeted miri's much steeper slowdown) is
    still a sensible upper bound for qemu-user too.

Pair this with two CI/runtime side changes:

  * `.config/nextest.toml`: new `cross-qemu` nextest profile sets a
    `slow-timeout = { period = "60s", terminate-after = 5 }` so a
    qemu hang gets killed and surfaces as a TIMEOUT in the report
    instead of wedging the whole run.  `fail-fast` is permissive
    so we collect the full list of qemu-affected tests in one go.
  * `.github/workflows/dev.yml`: cross-job `test` step is
    restricted to `matrix.libc == 'musl'` because gnu's libgcc_s
    unwinder mis-handles qemu-user's emulated signal frames on
    aarch64 (every panic-unwind path turns into SIGABRT).  musl's
    LLVM libunwind walks those frames correctly, so musl legs run
    the full suite clean; gnu legs stay build-only until/unless we
    fix the gnu unwinder story.
  * `.gitignore`: ignore `**/qemu_*.core` so the SIGABRT cores
    qemu drops when running gnu cross binaries locally don't show
    up in `git status`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
It is much too slow to print to stdout like this under
qemu-user or miri.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Two halves of one logical change:

  * `.github/workflows/dev.yml`: extend the cross job's `if:` to fire
    on `ci:+cross/full` in addition to `ci:+cross`, so a PR labeled
    only with `ci:+cross/full` (no `ci:+cross`) still gets cross.
  * `.github/workflows/bump.yml`: auto-apply `ci:+cross/full` to the
    weekly cargo-upgrades PR.

The weekly cargo-upgrades PR is the right place to catch cross-arch
regressions introduced by transitive dep churn (a crate dropping an
aarch64 target, changing alignment, etc.).  Without an opt-in label
the cross job stays build-only on PR runs, so we add
\`ci:+cross/full\` alongside the existing \`automated\` and
\`dependencies\` labels.  Today that label triggers the qemu-user
test step on the existing 4-leg cross matrix; once the matrix
scope split lands, the same label will expand the matrix to the
full hardware x libc sweep without further changes here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland
Copy link
Copy Markdown
Collaborator Author

see PR #1560 for an illustration of how the ci:+cross/full is now applied to bump jobs

Cross stays advisory (`continue-on-error: true`) on every PR, push,
and merge_group run, so leg flakes never block merge.  That's the
right default for interactive PRs and the merge queue, but it makes
genuine cross failures easy to miss on the weekly cargo-bump PR
(auto-labeled `ci:+cross/full`), where catching upstream-driven
aarch64 regressions is the whole point.

Add a sticky-comment surfacing step:

  * New composite action `.github/actions/sticky-pr-comment` --
    create-or-update a PR comment keyed by an HTML-comment marker
    (so subsequent runs find and update the same comment instead of
    spamming the thread).

  * New steps in the `summary` job that read the cross matrix's
    real per-leg conclusions from the Actions REST API (via
    `actions/github-script`) and, if any leg failed, post the
    sticky comment with a link back to the failing run.  The
    summary job grows `pull-requests: write` (for the comment) and
    `actions: read` (for the API call).

    We can't gate on `needs.cross.result` here: at the job level,
    `continue-on-error: true` makes that always report `success` to
    dependents, which is exactly the property we want for merge
    gating but which also hides the failure from this step.  Reading
    the underlying job results sidesteps the masking.

The alternative -- making cross blocking when `ci:+cross/full` is
present -- creates an attribution trap: a non-labeled PR that
silently regresses cross would land on main, and the next
cross/full-labeled bump PR would inherit the broken main and read
as "the bump caused the regression."  A sticky comment gives
reviewers a loud in-PR signal without that misattribution risk and
preserves the "leg-flake doesn't block merge" property uniformly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/qemu-cross-tests branch from a326c99 to a909643 Compare May 23, 2026 04:23
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland changed the title (2) qemu cross tests (2 + 3) qemu cross tests May 23, 2026
@daniel-noland daniel-noland changed the title (2 + 3) qemu cross tests (2-3) qemu cross tests May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:+cross/full ci:+cross run cross compile jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants