Fix UI test runner flag handling and recursive provenance resolution by cds-amal · Pull Request #126 · runtimeverification/stable-mir-json

cds-amal · 2026-02-26T12:55:13Z

This PR rewrites the UI test runner to actually respect //@ compile-flags: directives (previously silently ignored), and fixes a panic in provenance type resolution that crashed on nested struct fields.

Rewrite run_ui_tests.sh: flag-based CLI, build-once-then-invoke (eliminates cargo freshness overhead per test), arch-aware skip logic, optional --save-debug-output for failure triage
Add extract_test_flags() to both runners: parses //@ compile-flags:, //@ edition:, and //@ rustc-env: directives from test files
Fix get_prov_ty to recursively walk nested struct/tuple fields instead of requiring an exact byte-offset match at the top level
Replace panicking unwraps in get_prov_ty with graceful fallbacks

UI test results: 19 failing -> 2 failing, 2871 passing -> 2888 passing.

What was wrong

Two independent problems, both surfaced by investigating the 19 UI test failures on the master branch.

1. Test runner ignoring compile-flags directives

The UI test runners (run_ui_tests.sh, remake_ui_tests.sh) never extracted //@ compile-flags:, //@ edition:, or //@ rustc-env: directives from test files. Every test ran with bare -Zno-codegen regardless of what the test required. 17 of the 19 failures were compilation errors caused by missing flags (e.g., --edition 2021, --test, -Zunstable-options, -Ccodegen-units=1).

These tests were "failing" for the wrong reason: they weren't testing stable-mir-json at all, just failing to compile.

2. Provenance type resolution panicking on nested structs

get_prov_ty used field_for_offset (exact byte-offset match) to find which field a provenance pointer belongs to, then returned that field's type directly. This works when the pointer sits at the start of a top-level field, but breaks on nested structs.

Concrete example: TestDescAndFn has fields at offsets 0 (TestDesc) and 128 (TestFn). A pointer at byte offset 56 sits inside TestDesc (which itself has a &str field at relative offset 56). The old code looked for a field starting at exactly offset 56 in the top-level struct, found nothing, and panicked.

The fix adds field_containing_offset (range-based: finds the field with the largest start offset <= target) and makes the struct and tuple branches recurse into the containing field with a relative offset, walking down through the field hierarchy until reaching the actual pointer type.

Test runner changes

The rewritten run_ui_tests.sh:

Flag-based CLI: --verbose, --save-generated-output, --save-debug-output replace positional args and env vars
Build once: runs cargo build upfront, then invokes the binary directly with proper DYLD_LIBRARY_PATH/LD_LIBRARY_PATH setup (cuts ~2-3s of cargo freshness checking per test)
Arch filtering: skips tests gated on foreign architectures via path matching and //@ only-<arch> directive parsing
Flag extraction: extract_test_flags() parses //@ compile-flags:, //@ edition:, and //@ rustc-env: directives
Debug output: --save-debug-output captures stderr on failure to tests/ui/debug/<test_name>.stderr and prints a 4-line snippet inline

remake_ui_tests.sh gets the same extract_test_flags() function so regeneration respects directives too.

Remaining failures

Test	Exit	Root cause
`tests/ui/issues/issue-26641.rs`	101	rustc internal panic ("escaping bound vars"); not a stable-mir-json bug
`tests/ui/sanitizer/cfi/drop-in-place.rs`	101	Missing `-Ccodegen-units=1` in the test file's `//@ compile-flags:` directive; every sibling CFI test has it, this one doesn't. Upstream test bug, not ours.

Test plan

cargo build clean
Full UI test suite on Linux: 2888 passing, 2 failing (both known, neither ours)
make integration-test

dkcumming

Nice! Awesome improvements to the scripts, and awesome solving of the bugs.

I started to make a new commit that was going to enforce that $RUST_REPO was checked out to the same commit we require for stable_mir_json so the tests are compatible. It happens for docker but not locally. When doing it I got some scope creep when I starting thinking we should have a utils.sh file for shared commands - so instead let's merge this and worry about things like that later!

…llection The `collect_alloc` function asserted that VTable and Static allocations must have a builtin-deref pointer type. This is wrong: vtable pointers carry a raw `*const ()` type that doesn't pass `builtin_deref(true)`, and static allocations can have non-pointer types depending on the static's definition. Removing these assertions fixes 6 UI tests that were previously panicking during alloc collection (4 thin-box tests, const-trait-to-trait, issue-11205).

Rewrite run_ui_tests.sh: flag-based CLI (--verbose, --save-generated-output, --save-debug-output), build once then invoke binary directly (eliminates cargo freshness overhead per test), arch-aware skip logic for cross-platform test lists. Add extract_test_flags() to both runners to parse //@ compile-flags:, //@ edition:, and //@ rustc-env: directives from test files and pass them through to the compiler. Previously these directives were silently ignored, causing tests to fail for the wrong reasons or pass despite being misconfigured. Update Makefile test-ui target for the new CLI.

get_prov_ty previously used field_for_offset (exact byte-offset match) to find the field at a given provenance offset, then returned that field's type directly. This broke on nested structs: a pointer at byte offset 56 inside TestDescAndFn sits inside a nested TestDesc field that starts at offset 0, not at offset 56 itself. Add field_containing_offset (range-based lookup: largest field start <= target) and make the struct and tuple branches recurse into the containing field with a relative offset, walking down through the field hierarchy until reaching the actual pointer type. Replace panicking unwraps/asserts with graceful eprintln + return None for layout failures, non-rigid types, and unexpected non-zero offsets on builtin_deref types. Add debug_log_println! instrumentation at key resolution points (zero cost without --features debug_log). Moves 18 UI tests from failing to passing (compile-flags tests that now work with the runner fix, plus tests that previously panicked in get_prov_ty).

Add ensure_rustc_commit.sh: a helper sourced by both UI test scripts that reads the expected commit from rust-toolchain.toml [metadata] and ensures the rust checkout (regular or bare+worktree) is at that commit. Both run_ui_tests.sh and remake_ui_tests.sh now use RUST_SRC_DIR (set by ensure_rustc_commit.sh) instead of using the raw directory argument directly. CI workflow updated to use yq for TOML parsing. Cherry-picked from d021298 (spike/hex-rustc), adapted for the PR #126 test script rework on this branch.

…ntimeverification#124, runtimeverification#126, runtimeverification#127 Several merged PRs were missing from the changelog or lacked PR links: - runtimeverification#127: mutability field on PtrType/RefType in TypeMetadata - runtimeverification#124: ADR-001 (index-first graph architecture) - runtimeverification#121: existing entries for 3-phase pipeline, AllocMap coherence, and dead fixup removal now link to the PR - runtimeverification#126: existing entries for UI test runner rewrite and provenance resolution fixes now link to the PR

dkcumming approved these changes Feb 28, 2026

View reviewed changes

cds-amal added 6 commits February 28, 2026 06:27

Update failing / passing tsvs

054fa32

chore(changelog): add UI test runner and provenance resolution entries

868122d

Cargo fmt

8c64cb6

cds-amal force-pushed the dc/investigate-failures branch from 16785a5 to 8c64cb6 Compare February 28, 2026 11:28

cds-amal requested a review from a team February 28, 2026 11:28

dkcumming merged commit e875066 into runtimeverification:master Feb 28, 2026
5 checks passed

cds-amal deleted the dc/investigate-failures branch February 28, 2026 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UI test runner flag handling and recursive provenance resolution#126

Fix UI test runner flag handling and recursive provenance resolution#126
dkcumming merged 6 commits intoruntimeverification:masterfrom
cds-rs:dc/investigate-failures

cds-amal commented Feb 26, 2026 •

edited by dkcumming

Loading

Uh oh!

dkcumming left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cds-amal commented Feb 26, 2026 • edited by dkcumming Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was wrong

1. Test runner ignoring compile-flags directives

2. Provenance type resolution panicking on nested structs

Test runner changes

Remaining failures

Test plan

Uh oh!

dkcumming left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cds-amal commented Feb 26, 2026 •

edited by dkcumming

Loading