Skip to content

Single-pass body analysis with AllocMap coherence checks#121

Merged
dkcumming merged 10 commits intoruntimeverification:masterfrom
cds-rs:dc/declarative-spike
Feb 26, 2026
Merged

Single-pass body analysis with AllocMap coherence checks#121
dkcumming merged 10 commits intoruntimeverification:masterfrom
cds-rs:dc/declarative-spike

Conversation

@cds-amal
Copy link
Collaborator

@cds-amal cds-amal commented Feb 20, 2026

What's this about?

So, #120 fixed the immediate alloc-id mismatch by carrying collected items forward instead of re-fetching them. That was the right call. But it left the underlying structure intact: three separate phases (mk_item, collect_unevaluated_constant_items, collect_interned_values), each with full access to TyCtxt, each free to call inst.body() or any other side-effecting rustc query whenever it felt like it. Nothing in the types prevented that, and the bug was a direct consequence: one phase called inst.body() a second time, rustc minted fresh AllocIds, and suddenly the alloc map had ids that didn't correspond to anything in the stored bodies.

The question is: how do we make that class of bug structurally impossible, rather than just fixed for the one case we caught?

The full decision record is in ADR-002.

The restructuring

The fix is to split the pipeline into phases with type signatures that enforce the boundary:

collect_and_analyze_items(HashMap<String, Item>)
  -> (CollectedCrate, DerivedInfo)

assemble_smir(CollectedCrate, DerivedInfo) -> SmirJson

CollectedCrate holds items and unevaluated consts (the output of talking to rustc). DerivedInfo holds calls, allocs, types, and spans (the output of walking bodies). assemble_smir takes both by value and does pure data transformation; it structurally cannot call inst.body() because it has no MonoItem or Instance to call it on. That's the whole point: if you can't reach the query, you can't accidentally call it.

The two body-walking visitors (InternedValueCollector and UnevaluatedConstCollector) are merged into a single BodyAnalyzer that walks each body exactly once. The fixpoint loop for transitive unevaluated constant discovery is integrated: when BodyAnalyzer finds an unevaluated const, it records it; the outer loop creates the new Item (the one place inst.body() is called) and enqueues it.

But what about catching regressions?

Turns out the existing integration tests normalize away alloc_ids (via the jq filter), so they literally cannot catch this class of bug. The golden files don't contain alloc ids at all; you could scramble every id in the output and the tests would still pass.

AllocMap replaces the bare HashMap<AllocId, ...> with a newtype that, under #[cfg(debug_assertions)], tracks every insertion and flags duplicates. After the collect/analyze phase completes, verify_coherence walks every stored Item body with an AllocIdCollector visitor and asserts that each referenced AllocId exists in the map. This catches both "walked a stale body" (missing ids) and "walked the same body twice" (duplicate insertions) at dev time; zero cost in release builds.

Other things that fell out of this

  • Static items now store their body in MonoItemKind::MonoItemStatic (collected once in mk_item), so the analysis phase never goes back to rustc for static bodies
  • get_item_details takes the pre-collected body as a parameter instead of calling inst.body() independently
  • The items_clone full HashMap clone is replaced by a HashSet of original item names (which is all the static fixup actually needed)
  • we uncovered and fixed a very old bug

What's deleted

InternedValueCollector, UnevaluatedConstCollector, collect_interned_values, collect_unevaluated_constant_items, the InternedValues type alias, and items_clone. Good riddance.

Downstream impact

The tighter allocs representation has already shown positive downstream effects in KMIR: the proof engine can now decode allocations inline (resolving to concrete values like StringVal("123")) instead of deferring them as opaque thunks. @dkcumming 's offset-u8 test went from thunking through #decodeConstant(constantKindAllo...) to directly producing toAlloc(allocId(0)), StringVal("123"). The test's expected output needed updating, but the new failure mode is semantically grounded in actual data rather than deferred interpretation.

Test plan

  • cargo build compiles
  • cargo clippy clean
  • cargo fmt clean
  • make integration-test passes (all 28 tests, identical output)
  • KMIR downstream: test_prove_rs[offset-u8-fail] expected output updated

@cds-amal cds-amal force-pushed the dc/declarative-spike branch from 69cd6a5 to 6f6c567 Compare February 20, 2026 17:40
@cds-amal cds-amal changed the title Declarative spike Declarative collect/analyze/assemble pipeline with AllocMap coherence Feb 20, 2026
@cds-amal cds-amal marked this pull request as ready for review February 20, 2026 17:45
@cds-amal cds-amal changed the title Declarative collect/analyze/assemble pipeline with AllocMap coherence Single-pass body analysis with AllocMap coherence checks Feb 22, 2026
Copy link
Member

@jberthold jberthold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactoring, makes a lot of sense.
We have to find out where the tests from the rustc suite are going wrong and why, but this is a good direction.

@cds-amal cds-amal requested a review from jberthold February 24, 2026 15:55
@cds-amal
Copy link
Collaborator Author

cds-amal commented Feb 25, 2026

The early-return bug in visit_terminator

Fyi: @jberthold , @dkcumming :)

The symptom

Three UI tests (issue-58435-ice-with-assoc-const.rs, closure-to-fn-coercion.rs, ufcs-polymorphic-paths.rs) hit the verify_coherence assertion: alloc IDs referenced in stored bodies were missing from the alloc map. These failures were pre-existing (they occur on every commit since verify_coherence was introduced), not regressions from removing the static-item fixup.

The pattern all three tests share

Each test stores a function pointer in a constant:

// issue-58435
const ID: fn(&S<T>) -> &S<T> = |s| s;

// closure-to-fn-coercion
const FOO: fn(u8) -> u8 = |v: u8| { v };
const BAR: [fn(&mut u32); 5] = [ |_| {}, |v| *v += 1, ... ];

// ufcs-polymorphic-paths
// (many function constants used as call targets)

When rustc evaluates these constants, the resulting value is an Allocated constant: a memory allocation containing a pointer (with provenance) to the actual closure or function. In MIR, when one of these constants appears as the func operand of a Call terminator, the constant's kind is ConstantKind::Allocated, not ConstantKind::ZeroSized. (Regular direct calls, like foo() where foo is a named function, use ZeroSized constants: the function identity is encoded entirely in the type, and the value is zero-sized. But calling through a const that holds a function pointer produces an Allocated constant because the pointer value is actual data.)

The bug

BodyAnalyzer::visit_terminator (printer.rs, formerly line 1046) had this logic for Call terminators:

Call { func: Constant(ConstOperand { const_: cnst, .. }), .. } => {
    if *cnst.kind() != stable_mir::ty::ConstantKind::ZeroSized {
        return;  // <-- the bug
    }
    let inst = fn_inst_for_ty(cnst.ty(), true)
        .expect("Direct calls to functions must resolve to an instance");
    fn_inst_sym(self.tcx, Some(cnst.ty()), Some(&inst))
}

The intent was clear: if the call target isn't a ZeroSized function constant, skip the link-map resolution (you can't resolve an indirect call to a specific symbol name). The problem is that return exits visit_terminator entirely, which means self.super_terminator(term, loc) on line 1070 never runs.

super_terminator is the MIR visitor's default recursion method. It walks the terminator's operands, which is how visit_mir_const gets called on constants nested inside the terminator. By skipping it, the early return made the entire terminator subtree invisible to every collector: alloc collection, type collection, and span collection all missed everything inside that terminator's operands and arguments.

The AllocIdCollector used by verify_coherence doesn't override visit_terminator at all, so it uses the default implementation, which always calls super_terminator. It therefore recurses into the operands normally, finds the Allocated constant's provenance, and reports the alloc IDs. BodyAnalyzer never sees them because it bailed out before recursing. Hence the coherence violation: IDs in the stored body that aren't in the alloc map.

Why only these three tests?

The pattern requires a Call terminator whose function operand is a non-ZeroSized constant. This means: (a) a function pointer stored in a const item or associated const, (b) used as the direct call target in MIR. Most function calls in Rust use ZeroSized constants (the function's type alone identifies it); you only get Allocated call targets when the callee is computed from a const-evaluated value. This is relatively uncommon, which is why only 3 out of 75+ UI tests triggered the bug.

The fix

Replace return with None:

Call { func: Constant(ConstOperand { const_: cnst, .. }), .. } => {
    if *cnst.kind() != stable_mir::ty::ConstantKind::ZeroSized {
        None
    } else {
        let inst = fn_inst_for_ty(cnst.ty(), true)
            .expect("Direct calls to functions must resolve to an instance");
        fn_inst_sym(self.tcx, Some(cnst.ty()), Some(&inst))
    }
}

Now the match arm produces None for the link-map entry (no symbol to record for an indirect call), but execution falls through to update_link_map (which is a no-op for None) and then to self.super_terminator(term, loc), which recurses normally into the terminator's operands. Alloc collection, type collection, and span collection all proceed as expected.

Why verify_coherence was the right tool here

This bug predates the declarative pipeline work; it's been present since the original BodyAnalyzer implementation. It was never caught because the old code didn't have a coherence check, and the missing allocations only affect programs with const-evaluated function pointer calls (an unusual but valid pattern).

The AllocMap coherence check, introduced as part of the pipeline restructuring, made this immediately visible: it walks the stored bodies independently of BodyAnalyzer, compares the alloc IDs it finds against the alloc map, and asserts on any mismatch. The assertion message names the specific missing AllocIds, which pointed directly at the gap. Without coherence checking, these programs would have produced silently incomplete JSON output (missing allocations, missing types, missing spans for everything inside the affected terminators).

@dkcumming dkcumming self-assigned this Feb 26, 2026
@dkcumming
Copy link
Collaborator

This is honestly fantastic work, and it was great to go over it all and see the improvements! I think the only thing left to do is update the passing.tsv and failing.tsv for the ui test suite

So, the context: 9a78109 ("Avoid inst.body() duplicate call") fixed
the immediate alloc-id mismatch by carrying collected items forward
instead of re-fetching them. That was the right call, but it left the
three-phase pipeline structure intact (mk_item, then
collect_unevaluated_constant_items, then collect_interned_values).
Each phase could still freely call inst.body() or other side-effecting
rustc queries, and nothing in the types prevented it.

The fix for this is to restructure the pipeline so side-effecting rustc
queries are confined to a single function (mk_item), and everything
downstream operates on pre-collected data:

  collect_and_analyze_items(HashMap<String, Item>)
    -> (CollectedCrate, DerivedInfo)
  assemble_smir(CollectedCrate, DerivedInfo) -> SmirJson

CollectedCrate holds items and unevaluated consts (the output of rustc
interaction). DerivedInfo holds calls, allocs, types, and spans (the
output of body analysis). assemble_smir takes both by value and does
pure data transformation; it structurally cannot call inst.body()
because it has no MonoItem or Instance to call it on. That's the whole
point: if you can't reach the query, you can't accidentally call it.

The two body-walking visitors (InternedValueCollector and
UnevaluatedConstCollector) are merged into a single BodyAnalyzer that
walks each body exactly once. The fixpoint loop for transitive
unevaluated constant discovery is integrated: when BodyAnalyzer finds
an unevaluated const, it records it; the outer loop creates the new
Item (the one place inst.body() is called) and enqueues it.

But what about catching regressions? Turns out the existing integration
tests normalize away alloc_ids (via the jq filter), so they can't
catch this class of bug at all. AllocMap replaces the bare
HashMap<AllocId, ...> with a newtype that, under
#[cfg(debug_assertions)], tracks every insertion and flags duplicates.
After collect/analyze completes, verify_coherence walks every stored
Item body and asserts that each referenced AllocId exists in the map.
This catches both "walked a stale body" (missing ids) and "walked the
same body twice" (duplicate insertions) at dev time; zero cost in
release builds.

A few other cleanups that fell out of this: static items now store
their body in MonoItemKind::MonoItemStatic (collected once in mk_item),
so the analysis phase never goes back to rustc for static bodies.
get_item_details takes the pre-collected body as a parameter instead of
calling inst.body() independently. The items_clone HashMap is replaced
by a HashSet of original item names (which is all the static fixup
actually needed).

Deleted: InternedValueCollector, UnevaluatedConstCollector,
collect_interned_values, collect_unevaluated_constant_items, the
InternedValues type alias, and items_clone.

All 28 integration tests produce identical output.
Begin formal version tracking at 0.2.0. The changelog covers all
notable changes since the initial commit, with PR references. Also
includes a cargo fmt pass on printer.rs.
The same check already runs as the first step inside assemble_smir,
which is the function that actually consumes the data. No mutation
happens between the two call sites, so the one in collect_smir was
redundant.
The fixup block added statics discovered through allocation provenance
that weren't in the original mono item set. It was broken in two ways:
it violated the collection/assembly phase boundary (calling mk_item
after verify_coherence had already run), and it misclassified statics
as MonoItem::Fn, losing their eval_initializer() data.

The block never triggered across the full integration test suite.

If a genuine missing-static scenario exists, verify_coherence will now
catch it: it walks every stored body, extracts AllocIds from provenance,
and asserts each one exists in the alloc map. This produces a clear,
actionable assertion (naming the specific missing AllocIds) rather than
silently emitting a misclassified item.

Also removes the now-dead original_item_names field from CollectedCrate
and the unused AllocMap::iter method.
BodyAnalyzer::visit_terminator had an early `return` when a Call
terminator's function operand was a non-ZeroSized constant (i.e., an
indirect call through a const-evaluated function pointer). The intent
was to skip link-map resolution for indirect calls, but `return` exited
the entire method, skipping self.super_terminator(). That meant the MIR
visitor never recursed into the terminator's operands, so collect_alloc,
the type collector, and the span collector all missed everything inside
that terminator.

The bug has been present since aff2dd0 ("Map function types to names
and update output format", July 2024) and affects programs that call
through function pointers stored in constants (e.g., `const ID: fn(...)
= |s| s;` used as a call target). Three UI tests hit this pattern:
issue-58435-ice-with-assoc-const, closure-to-fn-coercion, and
ufcs-polymorphic-paths.

Fix: replace `return` with `None` so the match arm produces no link-map
entry but falls through to super_terminator for normal recursion.
Caught by verify_coherence, which walks bodies independently of
BodyAnalyzer and found alloc IDs that the analyzer never collected.
Copy link
Collaborator

@dkcumming dkcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

@dkcumming dkcumming merged commit cab07e2 into runtimeverification:master Feb 26, 2026
5 checks passed
@cds-amal cds-amal deleted the dc/declarative-spike branch February 27, 2026 01:29
@ZEINO2022
Copy link

You have done a wonderful, systematic, and proper job.

cds-amal added a commit to cds-rs/stable-mir-json that referenced this pull request Mar 1, 2026
…ntimeverification#124, runtimeverification#126, runtimeverification#127

Several merged PRs were missing from the changelog or lacked PR links:

- runtimeverification#127: mutability field on PtrType/RefType in TypeMetadata
- runtimeverification#124: ADR-001 (index-first graph architecture)
- runtimeverification#121: existing entries for 3-phase pipeline, AllocMap coherence, and
  dead fixup removal now link to the PR
- runtimeverification#126: existing entries for UI test runner rewrite and provenance
  resolution fixes now link to the PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants