Skip to content

Fix UI test runner flag handling and recursive provenance resolution#126

Merged
dkcumming merged 6 commits intoruntimeverification:masterfrom
cds-rs:dc/investigate-failures
Feb 28, 2026
Merged

Fix UI test runner flag handling and recursive provenance resolution#126
dkcumming merged 6 commits intoruntimeverification:masterfrom
cds-rs:dc/investigate-failures

Conversation

@cds-amal
Copy link
Collaborator

@cds-amal cds-amal commented Feb 26, 2026

This PR rewrites the UI test runner to actually respect //@ compile-flags: directives (previously silently ignored), and fixes a panic in provenance type resolution that crashed on nested struct fields.

  • Rewrite run_ui_tests.sh: flag-based CLI, build-once-then-invoke (eliminates cargo freshness overhead per test), arch-aware skip logic, optional --save-debug-output for failure triage
  • Add extract_test_flags() to both runners: parses //@ compile-flags:, //@ edition:, and //@ rustc-env: directives from test files
  • Fix get_prov_ty to recursively walk nested struct/tuple fields instead of requiring an exact byte-offset match at the top level
  • Replace panicking unwraps in get_prov_ty with graceful fallbacks

UI test results: 19 failing -> 2 failing, 2871 passing -> 2888 passing.

What was wrong

Two independent problems, both surfaced by investigating the 19 UI test failures on the master branch.

1. Test runner ignoring compile-flags directives

The UI test runners (run_ui_tests.sh, remake_ui_tests.sh) never extracted //@ compile-flags:, //@ edition:, or //@ rustc-env: directives from test files. Every test ran with bare -Zno-codegen regardless of what the test required. 17 of the 19 failures were compilation errors caused by missing flags (e.g., --edition 2021, --test, -Zunstable-options, -Ccodegen-units=1).

These tests were "failing" for the wrong reason: they weren't testing stable-mir-json at all, just failing to compile.

2. Provenance type resolution panicking on nested structs

get_prov_ty used field_for_offset (exact byte-offset match) to find which field a provenance pointer belongs to, then returned that field's type directly. This works when the pointer sits at the start of a top-level field, but breaks on nested structs.

Concrete example: TestDescAndFn has fields at offsets 0 (TestDesc) and 128 (TestFn). A pointer at byte offset 56 sits inside TestDesc (which itself has a &str field at relative offset 56). The old code looked for a field starting at exactly offset 56 in the top-level struct, found nothing, and panicked.

The fix adds field_containing_offset (range-based: finds the field with the largest start offset <= target) and makes the struct and tuple branches recurse into the containing field with a relative offset, walking down through the field hierarchy until reaching the actual pointer type.

Test runner changes

The rewritten run_ui_tests.sh:

  • Flag-based CLI: --verbose, --save-generated-output, --save-debug-output replace positional args and env vars
  • Build once: runs cargo build upfront, then invokes the binary directly with proper DYLD_LIBRARY_PATH/LD_LIBRARY_PATH setup (cuts ~2-3s of cargo freshness checking per test)
  • Arch filtering: skips tests gated on foreign architectures via path matching and //@ only-<arch> directive parsing
  • Flag extraction: extract_test_flags() parses //@ compile-flags:, //@ edition:, and //@ rustc-env: directives
  • Debug output: --save-debug-output captures stderr on failure to tests/ui/debug/<test_name>.stderr and prints a 4-line snippet inline

remake_ui_tests.sh gets the same extract_test_flags() function so regeneration respects directives too.

Remaining failures

Test Exit Root cause
tests/ui/issues/issue-26641.rs 101 rustc internal panic ("escaping bound vars"); not a stable-mir-json bug
tests/ui/sanitizer/cfi/drop-in-place.rs 101 Missing -Ccodegen-units=1 in the test file's //@ compile-flags: directive; every sibling CFI test has it, this one doesn't. Upstream test bug, not ours.

Test plan

  • cargo build clean
  • Full UI test suite on Linux: 2888 passing, 2 failing (both known, neither ours)
  • make integration-test

Copy link
Collaborator

@dkcumming dkcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Awesome improvements to the scripts, and awesome solving of the bugs.

I started to make a new commit that was going to enforce that $RUST_REPO was checked out to the same commit we require for stable_mir_json so the tests are compatible. It happens for docker but not locally. When doing it I got some scope creep when I starting thinking we should have a utils.sh file for shared commands - so instead let's merge this and worry about things like that later!

…llection

The `collect_alloc` function asserted that VTable and Static allocations
must have a builtin-deref pointer type. This is wrong: vtable pointers
carry a raw `*const ()` type that doesn't pass `builtin_deref(true)`,
and static allocations can have non-pointer types depending on the
static's definition.

Removing these assertions fixes 6 UI tests that were previously
panicking during alloc collection (4 thin-box tests, const-trait-to-trait,
issue-11205).
Rewrite run_ui_tests.sh: flag-based CLI (--verbose, --save-generated-output,
--save-debug-output), build once then invoke binary directly (eliminates
cargo freshness overhead per test), arch-aware skip logic for cross-platform
test lists.

Add extract_test_flags() to both runners to parse //@ compile-flags:,
//@ edition:, and //@ rustc-env: directives from test files and pass them
through to the compiler. Previously these directives were silently ignored,
causing tests to fail for the wrong reasons or pass despite being
misconfigured.

Update Makefile test-ui target for the new CLI.
get_prov_ty previously used field_for_offset (exact byte-offset match) to
find the field at a given provenance offset, then returned that field's type
directly. This broke on nested structs: a pointer at byte offset 56 inside
TestDescAndFn sits inside a nested TestDesc field that starts at offset 0,
not at offset 56 itself.

Add field_containing_offset (range-based lookup: largest field start <= target)
and make the struct and tuple branches recurse into the containing field with
a relative offset, walking down through the field hierarchy until reaching the
actual pointer type.

Replace panicking unwraps/asserts with graceful eprintln + return None for
layout failures, non-rigid types, and unexpected non-zero offsets on
builtin_deref types. Add debug_log_println! instrumentation at key resolution
points (zero cost without --features debug_log).

Moves 18 UI tests from failing to passing (compile-flags tests that now work
with the runner fix, plus tests that previously panicked in get_prov_ty).
@cds-amal cds-amal force-pushed the dc/investigate-failures branch from 16785a5 to 8c64cb6 Compare February 28, 2026 11:28
@cds-amal cds-amal requested a review from a team February 28, 2026 11:28
@dkcumming dkcumming merged commit e875066 into runtimeverification:master Feb 28, 2026
5 checks passed
@cds-amal cds-amal deleted the dc/investigate-failures branch February 28, 2026 11:36
cds-amal added a commit that referenced this pull request Feb 28, 2026
Add ensure_rustc_commit.sh: a helper sourced by both UI test scripts
that reads the expected commit from rust-toolchain.toml [metadata] and
ensures the rust checkout (regular or bare+worktree) is at that commit.

Both run_ui_tests.sh and remake_ui_tests.sh now use RUST_SRC_DIR
(set by ensure_rustc_commit.sh) instead of using the raw directory
argument directly. CI workflow updated to use yq for TOML parsing.

Cherry-picked from d021298 (spike/hex-rustc), adapted for the
PR #126 test script rework on this branch.
cds-amal added a commit to cds-rs/stable-mir-json that referenced this pull request Mar 1, 2026
…ntimeverification#124, runtimeverification#126, runtimeverification#127

Several merged PRs were missing from the changelog or lacked PR links:

- runtimeverification#127: mutability field on PtrType/RefType in TypeMetadata
- runtimeverification#124: ADR-001 (index-first graph architecture)
- runtimeverification#121: existing entries for 3-phase pipeline, AllocMap coherence, and
  dead fixup removal now link to the PR
- runtimeverification#126: existing entries for UI test runner rewrite and provenance
  resolution fixes now link to the PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants