Skip to content

(4.1) ci: demonstrate ci:+cross/full fail message#1561

Open
daniel-noland wants to merge 22 commits into
bump/cargo-upgradesfrom
pr/daniel-noland/intentional-regression
Open

(4.1) ci: demonstrate ci:+cross/full fail message#1561
daniel-noland wants to merge 22 commits into
bump/cargo-upgradesfrom
pr/daniel-noland/intentional-regression

Conversation

@daniel-noland
Copy link
Copy Markdown
Collaborator

This is NOT to be merged. The one and only point of this PR is to demonstrate the message which displays when we regress the aarch64 build or tests.

This is intended as a review/discussion tool for #1558

daniel-noland and others added 19 commits May 22, 2026 17:22
Sweep direct `use std::sync::{Arc, Mutex, RwLock, atomic::*}` imports
across the workspace to `concurrency::sync` so loom/shuttle test builds
can route through instrumented primitives via one feature flip.

Two enforcement layers:

  * `clippy.toml` extends `disallowed-types` for the lock primitives.
    parking_lot's lock types are distinct concrete types, so clippy
    sees through the `concurrency::sync` re-export without flagging
    legitimate uses.
  * `.semgrep/rules/no-std-sync-direct.yaml` covers the rest (`Arc`,
    `Weak`, atomics, `LazyLock`, `OnceLock`, `Once`, `Barrier`,
    `Condvar`) where clippy's alias resolution can't distinguish the
    facade re-export from `std::sync`. The `concurrency` crate and its
    tests are exempt by path.

`mgmt/tests/reconcile.rs` keeps a direct `std::sync::Mutex` because
bolero's `catch_unwind` needs `RefUnwindSafe`, which parking_lot's
`Mutex` doesn't impl. Documented inline with `clippy::disallowed_types`
allow + `nosemgrep` annotation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
`std::thread::Builder::spawn_scoped` is inherent on std but missing on
the loom and shuttle Builders; both ship `Scope::spawn` instead. Add a
`concurrency::thread::BuilderExt` trait with one method:

  * std: forwards to the inherent `Builder::spawn_scoped` via
    fully-qualified call (Rust's method resolution prefers the
    inherent, so the trait impl is dead but kept for symmetry).
  * shuttle / loom: discards advisory Builder config, delegates to
    `Scope::spawn`, wraps the infallible return in `Ok` to match
    std's `io::Result` signature.

`use concurrency::thread::BuilderExt;` lets call sites write
`builder.spawn_scoped(scope, f)` under every backend. Used by the
kernel driver's named scoped threads in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Collapse the chained `shuttle_dfs -> shuttle_pct -> shuttle` features
into a single `shuttle` feature backed by `shuttle::PortfolioRunner`.
The runner drives `RandomScheduler` and `PctScheduler` in parallel;
any scheduler finding a counterexample fails the test
(`stop_on_first_failure = true`). `shuttle_dfs` becomes an additive
opt-in that adds `DfsScheduler` to the same portfolio.

`stress.rs` now has one shuttle arm instead of three, and
`#[concurrency::test]` emits one leaf per backend (`loom` / `shuttle`)
instead of three shuttle variants. Workspace consumers (`nat`,
`flow-entry`) and CI (`dev.yml`) drop the `shuttle_pct` step; the
existing `shuttle` step covers Random + PCT in one pass.

Tests previously gated `not(feature = "shuttle_pct")` to opt out of
single-threaded bodies that PCT panics on are rewritten to
`not(feature = "shuttle")` since PCT now runs in every shuttle build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Tier A of the std::thread sweep that complements the `concurrency::sync`
facade migration. Swap `use std::thread` to `use concurrency::thread`
in two test modules whose tests are candidates for
`#[concurrency::test]` conversion:

  * `routing/src/fib/test.rs` -- prerequisite for the FIB race-test
    conversion later in the stack.
  * `dpdk/src/acl/mod.rs::classify_concurrent_arc_shared` -- import
    swap only; the test runs under real DPDK EAL so the macro
    conversion is deferred.

Production threading sites (`dpdk/src/lcore.rs`, `mgmt/src/processor/
launch.rs`, `routing/src/router/rio.rs`, `dataplane/src/statistics`,
`test-utils`) are left alone -- they need real OS threads and never
compile under loom/shuttle. A later sweep adds clippy/semgrep
enforcement once production sites are also routed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Tier B of the std::thread sweep. `ThreadPortMap` keys a per-thread
`RwLock<HashMap<ThreadId, _>>` by `std::thread::current().id()`. Each
backend ships its own `ThreadId`, so a std-typed map would silently
work in production while loom/shuttle key the table by their own
thread identity. Route the import and call sites through
`concurrency::thread` so the key tracks the active backend.

No behavioural change under the default backend. Prerequisite for any
future loom/shuttle exercise of the NAT allocator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Replace the hand-rolled `shuttle::check_random(... 100)` wrappers in
`shuttle_tests` with `#[concurrency::test]`, which routes bodies
through `concurrency::stress` (loom `model`, shuttle
`PortfolioRunner`).

The module is renamed `concurrency_tests` and gated to `cfg(any(feature
= "shuttle", feature = "loom"))`. `FlowTable::insert` spawns a tokio
task for the flow timer, which would panic without a running runtime;
the existing `start_timer` bypass under shuttle is extended to loom.
Tokio-driven coverage of `insert` stays in `std_tests`.

`test_flow_table_timeout` is dropped from the model-checker mod: it's
single-threaded (PCT rejects), and `std_tests` already has the
authoritative `#[tokio::test(start_paused = true)]` version.

Adds `loom = ["concurrency/loom"]` to `flow-entry/Cargo.toml` so the
macro-emitted cfg arm resolves to a known feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Move `test_fib_removals` and `test_leftright_destroy_race_simple` out
of the `#[concurrency_mode(std)]` block into a sibling
`concurrency_tests` module that runs through `#[concurrency::test]`:
default backend smoke run, `loom::model` under loom, shuttle's
PortfolioRunner under shuttle.

The heavy fuzz loops (`test_concurrency_fib` /
`test_concurrency_fibtable`) stay on std -- their 100k+ packet
iteration counts are TSAN-calibrated, not for per-iteration
model-checking cost.

Iteration counts are tuned per backend via `cfg_select!`: 5 rounds
under loom/shuttle (vs 1000 on std), with a fixed reader/worker budget
under the model checkers so unbounded poll loops don't trip shuttle's
`max_steps` ceiling. `test_packet` is inlined because the original
lives in the std-gated `mod tests` and is invisible under
loom/shuttle.

Add `loom`, `shuttle`, `shuttle_dfs` features to `routing/Cargo.toml`
so the macro-emitted cfg arms resolve to known features.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Bump `concurrency::stress`'s shuttle `Config::stack_size` from the
default 32 KiB to 4 MiB. Shuttle wraps each atomic/lock primitive
with bookkeeping that pushes per-instance size into the 100-byte
range (an `AtomicBool` is ~100 bytes under shuttle), so any
non-trivial atomic-heavy body blows through the default. The
historical workaround was per-call `shuttle::Config` overrides at
1 MiB (notably in NAT's allocator tests).

One number in the dispatcher kills the per-test knob. 4 MiB carries
the heaviest workspace consumer (NAT allocator's per-block atomic
arrays) with headroom; the cost is `N workers * 4 MiB` per stress
iteration, well below CI memory pressure.

`shuttle::Config` is `#[non_exhaustive]`, so written as a mutation
of `Config::default()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Now that `concurrency::stress` carries a 4 MiB shuttle stack and the
PortfolioRunner runs Random + PCT in parallel, the custom
`shuttle_config` / `run_shuttle_random` / `run_shuttle_pct`
scaffolding in `tests_shuttle` is redundant. Replace with a single
`mod concurrency_tests` that flips each test to `#[concurrency::test]`:

  * `test_concurrent_allocations_two_ips` (was
    `..._without_shuttle`) -- two threads against distinct source IPs;
    smoke run on default backend, full coverage on
    `--features shuttle`.
  * `test_concurrent_allocations_three_workers` (was
    `..._shuttle_random` + `_pct`, collapsed) -- portfolio runs both
    schedulers in one invocation.
  * `test_ensure_shuttle_works` -- gated to model-checker backends
    only; the deliberate race only reaches the failing schedule under
    a real scheduler, the default backend's one-shot run is
    non-deterministic.

Drops the helpers plus the `Arc` / `thread` imports in `std_tests`
that they pulled in. Adds `loom = ["concurrency/loom"]` to
`nat/Cargo.toml` so the macro-emitted cfg arm resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Make `just features=loom test` build and run end-to-end across the
workspace. Earlier commits had loom working only on the `concurrency`
crate; everything else failed to compile or crashed at runtime with
stack overflow, Arc leak, or DashMap destructor panics. Bundled into
one commit because each fix was discovered by running the previous
one.

## concurrency

  * `fn sleep(_: Duration)` shim under loom (loom 0.7 doesn't model
    time; yields to the scheduler so the call still acts as a
    schedule point).
  * Enumerate `loom::thread` re-exports and shadow `spawn` with a
    4 MiB default stack. Loom 0.7's default coroutine stack is 4 KiB,
    which overflows trivially under atomic-heavy `concurrency::sync`
    types.
  * `stress` under loom wraps the body in a 4 MiB
    `Builder::spawn` so the main 4 KiB coroutine just spawns and
    joins. Costs one of loom's five thread slots.
  * `loom_scope::Scope::spawn` routes through `super::spawn` for the
    same default.
  * `Slot::load()` / `SlotOption::load()` helpers used by
    `common::cliprovider` to drop a redundant `load_full()` clone.

## nat

  * `cfg_attr`-gate `#![feature(arbitrary_self_types)]`: loom wraps
    `Arc<T>` in a facade newtype that isn't a blessed self-receiver,
    so `self: Arc<Self>` methods on `AllocatedIp` /
    `AllocatedPortBlock` need the unstable feature there.
  * `#[concurrency_mode(loom)]` no-op `shuffle_slice` (loom needs
    determinism for replay; shuffle is allocation-order heuristic,
    not correctness).
  * Gate `concurrency_tests` off loom: the facade's `Weak` shim holds
    a strong clone of the `Arc`, so the allocator's
    `Weak::upgrade().is_none()` liveness signal never fires and
    loom's `Arc leaked` assertion catches it.

## flow-entry

  * Gate `concurrency_tests` to shuttle only: `FlowTable`'s internal
    `DashMap` panics in loom's end-of-execution cleanup (sharded
    `RwLock`s don't fit loom's strict lifecycle accounting).

## routing

  * Gate `fib::test::concurrency_tests` off loom: the `left_right`
    epoch state space is too large for exhaustive search to terminate
    in reasonable time.

## dataplane

  * The binary builds an `Arc<dyn Fn(...) ...>` trait-object closure
    in `packet_processor::setup_internal`, which needs `CoerceUnsized`
    on the concrete `Arc`. Loom 0.7's `Arc` doesn't carry that trait,
    and the facade newtype can't add it. Gate the bin out of loom
    builds: extract the body to `dataplane/src/runtime.rs` and leave
    `main.rs` with a stub `main` under loom that panics if invoked.
    Library crates still get loom coverage through feature
    propagation.
  * `dataplane/src/drivers/dpdk.rs` switches `use crate::CmdArgs` to
    `use args::CmdArgs;` to follow the new module layout.

CI's loom step in `.github/workflows/dev.yml` still scopes to
`--package=dataplane-concurrency` because the loom-incompatible tests
across the workspace are now cfg-gated rather than package-filtered;
the package scope is no longer load-bearing for the test invocation,
only for cargo's feature unification.

Default and `--features shuttle` builds unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Leftover from the superseded dis-guard exploration; nothing references
it. Public-API drift in `concurrency::slot` for no benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
The import-only enforcement let `static X: std::sync::LazyLock<T> =
std::sync::LazyLock::new(...)` slip through, and one such site already
existed in `config/src/external/overlay/vpcpeering.rs`. Extend the
rule with a multi-line regex backstop that matches the facade-managed
type names in any expression position, with a leading-comment
lookahead so rustdoc intra-doc links (`/// [std::sync::Arc]`) don't
false-positive. Also expand the grouped-import regex to span multiple
lines.

Convert the offending FQN to `concurrency::sync::LazyLock`. Move the
deliberate `mgmt/tests/reconcile.rs` `nosemgrep:` onto the same line
as the `std::sync::Mutex::new` it suppresses, and annotate the
intentional `std::sync::Arc` in `concurrency::stress` (shared *across*
`loom::model` invocations, so it must remain a std Arc).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
The previous loom step ran `just test concurrency` only, which left
the workspace-wide loom compile (the whole point of the facade's
local `Weak<T>` shim and `Arc::downgrade`) unprotected. Add a
`cargo check`-equivalent step ahead of the test run so a regression
in any consumer crate fails CI directly. Tests stay scoped to the
concurrency crate -- model-checking the whole workspace under loom
is intractable. Update the inline comment to reflect the new reality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Add the plumbing that lets cargo execute cross-compiled tests under
qemu-user when the host architecture doesn't match the target:

  * `scripts/test-runner.sh` -- thin wrapper.  When `MIRI_SYSROOT` is
    set we delegate to `.cargo-miri-wrapped` so a miri run on a
    matching host arch still goes through the miri interpreter
    instead of running natively under qemu.  Otherwise: native exec
    when target == host, else `qemu-${target_machine}`.
  * `.cargo/config.toml` -- runner entries for the four cross triples
    we ship (x86_64 / aarch64 × gnu / musl).  Explicit triples (not
    cfg-patterns) because cargo miri injects a `cfg(all())` runner
    and refuses to disambiguate between two cfg-pattern matches; the
    explicit form wins method-resolution-style and the script's
    `MIRI_SYSROOT` branch handles the miri case from inside.
  * `default.nix` -- `qemu-user` joins the dev shell so the wrapper
    has `qemu-x86_64` / `qemu-aarch64` on PATH locally and in CI.

No CI behaviour change yet -- this just makes `cargo test --target
<cross-triple>` work via emulation when run by hand.  The CI step
that actually exercises the path lands in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Add a `test` step to the cross job that runs the full nextest suite
against the cross-compiled archive.  Each cross binary dispatches
through `scripts/test-runner.sh` (registered as the cargo runner in
`.cargo/config.toml`) which delegates to `qemu-${target_machine}`
when the host architecture doesn't match.

Gates:

  * Only runs on `pull_request` runs with `ci:+cross` or
    `ci:+cross/full` on the labels.  push / merge_group cross legs
    stay build-only; ISA emulation pays a real wall-clock cost and
    we don't want to slow the merge queue.
  * Gated to `matrix.recipe.args == 'dataplane'` so the test pass
    isn't duplicated for the `frr.dataplane` row, which differs from
    the `dataplane` row only in the container recipe, not in the
    cross target.

`ci:+cross` (today's "run the cross job at all" label) implies
"include the qemu test pass" -- there is currently no way to
schedule cross/full builds without a label, so the same gate
serves both modes.  Splitting them out (e.g. opting bump.yml into
cross/full automatically) is a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
The test hard-codes `3` as a "non-multiple of `RESULTS_MULTIPLIER`"
which assumes the multiplier is 4 (x86_64).  Under cross-compiled
aarch64 builds the wrapper's `RESULTS_MULTIPLIER` constant binds to
1 -- the validator's "not-a-multiple" branch becomes unreachable
because every positive integer is trivially a multiple of 1, so the
validator returns `Ok` and the test panics on the `matches!` assert.

Upstream DPDK defines `RTE_ACL_RESULTS_MULTIPLIER` as
`XMM_SIZE / sizeof(uint32_t)`, which should be 4 on every supported
ISA.  Either bindgen on the cross sysroot doesn't see the right
`XMM_SIZE` typedef on aarch64 or the cross headers ship a divergent
`rte_vect.h`; needs a follow-up investigation.

Skip on aarch64 with a TODO so the rest of the cross-aarch64 test
surface can run green under qemu-user.  This is the only known
target-specific divergence in the dpdk binding; the rest of the
sweep passes cleanly under musl+qemu.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Introduce a single `emulated` cfg as the umbrella over both
emulation backends in the test surface:

  * `miri.just`: append `--cfg=emulated` to `RUSTFLAGS` so miri
    invocations set it.
  * `nix/profiles.nix`: set `--cfg=emulated` whenever `for-tests` is
    true and the target arch differs from the build host's
    (`is-emulated-test`).  Today this covers cross-arch test builds
    that run under qemu-user on the lab runners.  `--check-cfg=cfg
    (emulated)` is set unconditionally so unused branches don't
    trip `unexpected_cfgs`.
  * `default.nix`: plumb the build host's arch
    (`stdenv.hostPlatform.parsed.cpu.name`) through to
    `profiles.nix` as `host-arch`, so `is-emulated-test` compares
    target against actual host rather than hard-coded \`"x86_64"\`.
  * `.cargo/config.toml`: same `--check-cfg=cfg(emulated)` for the
    native dev build path.

Sweep existing `cfg_attr(miri, ignore)` / `cfg(miri)` / `cfg_select!
{ miri => N }` sites that apply equally to qemu-user:

  * `routing/src/router/rio.rs`, `routing/src/frr/test.rs`,
    `routing/src/atable/resolver.rs`, `cli/src/cliproto.rs`: tests
    that bind Unix domain sockets / read kernel-state files now
    skip under any emulation backend, not just miri.  qemu-user's
    epoll readiness emulation has the same gaps that justified the
    original miri skips.
  * `routing/src/fib/test.rs`, `net/src/packet/hash.rs`,
    `flow-entry/src/flow_table/nf_lookup.rs`,
    `left-right-tlcache/src/lib.rs`, `k8s-intf/src/bolero/support
    .rs`, `config/src/utils/collapse.rs`: per-arm iteration counts
    in `cfg_select!` are now keyed by `emulated` rather than `miri`.
    qemu-user runs at ~5-10x slowdown vs native and the original
    miri count (which targeted miri's much steeper slowdown) is
    still a sensible upper bound for qemu-user too.

Pair this with two CI/runtime side changes:

  * `.config/nextest.toml`: new `cross-qemu` nextest profile sets a
    `slow-timeout = { period = "60s", terminate-after = 5 }` so a
    qemu hang gets killed and surfaces as a TIMEOUT in the report
    instead of wedging the whole run.  `fail-fast` is permissive
    so we collect the full list of qemu-affected tests in one go.
  * `.github/workflows/dev.yml`: cross-job `test` step is
    restricted to `matrix.libc == 'musl'` because gnu's libgcc_s
    unwinder mis-handles qemu-user's emulated signal frames on
    aarch64 (every panic-unwind path turns into SIGABRT).  musl's
    LLVM libunwind walks those frames correctly, so musl legs run
    the full suite clean; gnu legs stay build-only until/unless we
    fix the gnu unwinder story.
  * `.gitignore`: ignore `**/qemu_*.core` so the SIGABRT cores
    qemu drops when running gnu cross binaries locally don't show
    up in `git status`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
It is much too slow to print to stdout like this under
qemu-user or miri.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Two halves of one logical change:

  * `.github/workflows/dev.yml`: extend the cross job's `if:` to fire
    on `ci:+cross/full` in addition to `ci:+cross`, so a PR labeled
    only with `ci:+cross/full` (no `ci:+cross`) still gets cross.
  * `.github/workflows/bump.yml`: auto-apply `ci:+cross/full` to the
    weekly cargo-upgrades PR.

The weekly cargo-upgrades PR is the right place to catch cross-arch
regressions introduced by transitive dep churn (a crate dropping an
aarch64 target, changing alignment, etc.).  Without an opt-in label
the cross job stays build-only on PR runs, so we add
\`ci:+cross/full\` alongside the existing \`automated\` and
\`dependencies\` labels.  Today that label triggers the qemu-user
test step on the existing 4-leg cross matrix; once the matrix
scope split lands, the same label will expand the matrix to the
full hardware x libc sweep without further changes here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland added dont-merge Do not merge this Pull Request ci:+cross/full labels May 23, 2026
@daniel-noland daniel-noland changed the title ci: demonstrate ci:+cross/full fail message (2.2) ci: demonstrate ci:+cross/full fail message May 23, 2026
@daniel-noland daniel-noland changed the title (2.2) ci: demonstrate ci:+cross/full fail message (4.1) ci: demonstrate ci:+cross/full fail message May 23, 2026
@daniel-noland daniel-noland requested a review from Copilot May 23, 2026 03:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR intentionally introduces an aarch64-only test failure to demonstrate the CI messaging behavior for ci:+cross/full (per the PR description, it is not intended to be merged).

Changes:

  • Add an #[cfg(target_arch = "aarch64")] unit test that always panics to force the aarch64 test cell to fail.

Comment thread args/src/lib.rs
@daniel-noland daniel-noland force-pushed the bump/cargo-upgrades branch from 8b52b70 to 0edba91 Compare May 23, 2026 03:42
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/intentional-regression branch from e3496c6 to a94d84a Compare May 23, 2026 03:43
daniel-noland and others added 2 commits May 22, 2026 22:22
Cross stays advisory (`continue-on-error: true`) on every PR, push,
and merge_group run, so leg flakes never block merge.  That's the
right default for interactive PRs and the merge queue, but it makes
genuine cross failures easy to miss on the weekly cargo-bump PR
(auto-labeled `ci:+cross/full`), where catching upstream-driven
aarch64 regressions is the whole point.

Add a sticky-comment surfacing step:

  * New composite action `.github/actions/sticky-pr-comment` --
    create-or-update a PR comment keyed by an HTML-comment marker
    (so subsequent runs find and update the same comment instead of
    spamming the thread).

  * New steps in the `summary` job that read the cross matrix's
    real per-leg conclusions from the Actions REST API (via
    `actions/github-script`) and, if any leg failed, post the
    sticky comment with a link back to the failing run.  The
    summary job grows `pull-requests: write` (for the comment) and
    `actions: read` (for the API call).

    We can't gate on `needs.cross.result` here: at the job level,
    `continue-on-error: true` makes that always report `success` to
    dependents, which is exactly the property we want for merge
    gating but which also hides the failure from this step.  Reading
    the underlying job results sidesteps the masking.

The alternative -- making cross blocking when `ci:+cross/full` is
present -- creates an attribution trap: a non-labeled PR that
silently regresses cross would land on main, and the next
cross/full-labeled bump PR would inherit the broken main and read
as "the bump caused the regression."  A sticky comment gives
reviewers a loud in-PR signal without that misattribution risk and
preserves the "leg-flake doesn't block merge" property uniformly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland force-pushed the bump/cargo-upgrades branch from 0edba91 to 75bd274 Compare May 23, 2026 04:23
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/intentional-regression branch from a94d84a to 52ff308 Compare May 23, 2026 04:23
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Cross build advisory: failure detected ⚠️

Please investigate before merging

Failed legs: build-container/dataplane/aarch64/musl, build-container/dataplane/bluefield3/musl

@daniel-noland
Copy link
Copy Markdown
Collaborator Author

⚠️ Cross build advisory: failure detected ⚠️

Please investigate before merging

Failed legs: build-container/dataplane/aarch64/musl, build-container/dataplane/bluefield3/musl

Behold! An error message!

@daniel-noland daniel-noland marked this pull request as ready for review May 23, 2026 17:59
@daniel-noland daniel-noland requested a review from a team as a code owner May 23, 2026 17:59
@daniel-noland daniel-noland requested review from Fredi-raspall and removed request for a team May 23, 2026 17:59
@daniel-noland daniel-noland self-assigned this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:+cross/full dont-merge Do not merge this Pull Request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants