bugc: preserve invoke/return contexts through optimizer by gnidan · Pull Request #210 · ethdebug/format

gnidan · 2026-04-16T06:40:46Z

Adds packages/bugc/src/evmgen/optimizer-contexts.test.ts — a suite that compiles the same source at levels 0, 1, 2, 3 and asserts:

The bytecode still runs correctly end-to-end.
Invoke contexts (caller JUMP + callee JUMPDEST) and return contexts (continuation JUMPDEST) are present with the expected identifiers.

Also updates bugc's tail call optimization pass to preserve invoke/return debug contexts, which the original test suite identified as a gap.

Coverage

Every pass that could touch call sites or returns:

Level 1: constant folding, constant propagation, DCE
Level 2: CSE, tail call optimization, jump optimization
Level 3: block merging, return merging, read/write merging

Scenarios: simple call, constant-foldable args, multiple call sites, non-tail recursion, mutual recursion, nested calls, multiple returns of the same value (return-merging candidate), and tail-recursive call.

TCO debug context preservation

The TCO pass used to drop both invoke and return contexts on the recursive call — a deeply recursive program looked like one giant loop to the debugger and the logical call stack was lost.

Now the TCO back-edge JUMP carries a gather context with BOTH:

return: the previous iteration's return
invoke: the new iteration's call

Depth stays constant across the JUMP — one frame pops, one pushes, on the same instruction. The function's terminal RETURN pops the final iteration's frame normally. This models source-level semantics rather than the optimized control flow.

A future transform: tailcall marker will annotate these JUMPs as TCO-produced, letting debuggers optionally render the optimization-aware view.

Implementation:

Adds Block.TailCall metadata to the jump terminator IR, populated by TCO when replacing a call terminator with a jump to the loop header.
Jump codegen attaches a gather context (return + invoke) to the JUMP when tailCall metadata is present.
patchInvokeTarget now walks gather contexts so the invoke leaf's placeholder code offset gets resolved from the function registry.

Findings from the optimizer survey

All other passes preserve contexts. Return-merging and block-merging do not drop return contexts because those contexts are emitted at continuation JUMPDESTs (whose presence is driven by the IR call terminator, which these passes don't touch).
No function inlining pass exists in bugc today, so no "inlined call loses its context" concern for the upcoming transform context design.

Adds a behavioral test suite that compiles a set of source patterns at every optimization level (0, 1, 2, 3) and: - asserts the bytecode still runs correctly end-to-end - counts invoke/return contexts by instruction type and function identifier, then asserts the expected shape Covers every pass that could touch call sites or returns: L1: constant folding, propagation, DCE L2: CSE, TCO, jump optimization L3: block merging, return merging, R/W merging Confirms that only tail call optimization eliminates contexts (by design — the tail call becomes a jump). All other transformations preserve invoke/return contexts across levels for simple calls, nested calls, mutual recursion, non-tail self-recursion, and multi-path returns of the same value. This is groundwork for the transform context spec.

github-actions · 2026-04-16T06:45:17Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-16 07:48 UTC

TCO replaces a tail-recursive call terminator with a jump to the function's loop header. Previously this dropped the invoke debug context, so the recursive call became invisible to debuggers — a deeply recursive program looked like one giant loop with no logical call stack. Now the TCO pass records a TailCall metadata block on the replacement jump terminator, and codegen attaches an invoke debug context to the generated JUMP. The context mirrors the normal caller-JUMP invoke: identity + declaration + code target, no argument pointers. patchInvokeTarget resolves the placeholder code offset from the function registry the same way it does for regular calls. No matching return context is emitted for the TCO'd call — the tail call folds into the outer activation's return, and a future transform: tailcall marker will let the debugger reconcile the missing return when the outer function eventually returns and pops all accumulated tail frames at once. Updates the optimizer-contexts test suite to assert the preserved invoke is present at levels 2 and 3, and that the return context intentionally does not duplicate.

Refines the TCO debug-context fix: the back-edge JUMP now carries a gather context with BOTH the previous iteration's return and the new iteration's invoke. Depth stays constant across the JUMP — one frame pops, one pushes, on the same instruction. The function's terminal RETURN then pops the final iteration's frame normally. This models source-level semantics rather than the optimized control flow: the debugger's logical call stack matches what the programmer wrote, and transform: tailcall markers (future work) can annotate these JUMPs as TCO-produced. Also fixes patchInvokeTarget to walk into gather contexts so the invoke leaf's placeholder code offset gets resolved from the function registry. Test helper countCallSites updated to unwrap gather contexts and count (invoke, return) pairs on JUMPs separately from the traditional JUMPDEST buckets.

gnidan

Review from architect (schema/format side).

gather-of-return-and-invoke: yes, semantically right

The "all contexts apply simultaneously" claim holds. Under
"following execution" semantics, after the back-edge JUMP both
facts are true: iteration N has returned, iteration N+1 has been
invoked. That's a conjunction, which is what gather means.

One subtlety worth noting in the commit message (not blocking):
there's an implicit ordering (return happens before invoke), but
gather expresses only conjunction, not sequence. That's fine —
the order is recoverable from the surrounding trace (prior
iteration's body preceded, new iteration's body follows) and the
future transform: tailcall marker will disambiguate further.
The gather construct itself doesn't need to encode order.

The shared machine state between the two contexts also works out:
the invoke has no argument pointers, and the stack layout at the
JUMP trace step is already the layout a normal callee-entry
JUMPDEST would see (return address + new args, destination
already popped). So the invoke's identity + target is accurate
for that state. No conflict.

stack slot 0 placeholder for return data: not OK, schema needs to change

This is the real issue. Per return.schema.yaml, data is
required:

required:
  - data

And per the TS types, data: Function.PointerRef is non-optional.
The PR satisfies the constraint by pointing at stack slot 0, but
at a TCO back-edge JUMP slot 0 is the new iteration's first
argument (or the return address, depending on the setup) — not
the intermediate return value, which doesn't materialize. A
debugger that follows this pointer gets a wrong answer labeled as
the return value.

The right fix is making return.data optional, not working
around it at the bytecode level. Rationale:

TCO is a legitimate case where a return semantically happens
but no value is observable at that point. The schema should
admit this rather than force compilers to lie.
Precedent: revert.schema.yaml already makes reason and
panic optional on the same grounds — "a bare revert: {} is
permitted when the compiler knows a revert occurred but has
no further detail."
Other legitimate use cases will emerge:
- void functions (no return value to point at)
- compiler-lost precision (return happened but tracking
  dropped)
- optimized returns where the value lives in a register
  already consumed by the subsequent instruction

The change is small:

schemas/program/context/function/return.schema.yaml: remove
data from required.
packages/format/src/types/program/context.ts: change
data: Function.PointerRef to data?: Function.PointerRef
and adjust the guard.
Add a worked example to the schema for a no-data return.
Update revert.mdx-style docs if applicable (probably the
return doc page needs a mirror of revert's "Field optionality"
section).

Once that lands, this PR can drop the stack-slot-0 placeholder
and emit just:

const returnCtx: Format.Program.Context.Return = {
  return: {
    identifier: tailCall.function,
    ...(declaration ? { declaration } : {}),
  },
};

Happy to open that schema change as a separate PR ahead of this
one landing, or bundle it in here — your call.

Other observations

Not blocking, but worth noting:

The gather context's order is [returnCtx, invoke] in the
code. Since gather is an unordered conjunction per its schema,
this works either way, but putting return before invoke reads
naturally ("pop, then push"). Good choice.
A future transform: tailcall marker sitting alongside these
two in the gather would be the ideal final shape. Design
already leaves room for that.
The patchInvokeTarget walking into gather contexts to resolve
the invoke leaf is a good generalization — it means any
composed context shape will work without special-casing TCO.

Approving the design direction. The data-optional schema fix is
the only substantive change needed before this should merge — the
current placeholder produces a subtly wrong debug trace.

…voke-tests

Per the format change in #211 making `return.data` optional, the TCO back-edge JUMP now emits a bare return context (identifier + declaration only). The stack-slot-0 placeholder was semantically wrong anyway — that slot holds the new iteration's first argument, not the previous iteration's return value. TCO doesn't materialize the intermediate return value at all; the actual return happens at the function's terminal RETURN.

Adds a new context type annotating instructions with the compiler transformations that produced them. The value is an array of short identifiers; the list may repeat the same identifier when the transformation has been applied multiple times (e.g., ["inline", "inline"] for doubly-inlined code). Transform is *additional* annotation. The invoke/return contexts for the logical call are still emitted at the call boundary so debuggers see the source-level call stack; the transform context tells debuggers how the call was physically realized. Consumers that ignore transform contexts get a sound source-level view from the semantic contexts alone. v1 identifiers: - "inline": marked instruction is part of an inlined function body; surrounding invoke/return contexts name the inlined callee. - "tailcall": marked instruction is a tail-call-optimized back-edge JUMP or continuation, where the call was realized without pushing/popping a full activation. The identifier set is extensible. Debuggers unfamiliar with a given identifier should preserve it as an opaque label. Order in the array is not semantically significant — the multiset is what matters. Unblocks the final shape of TCO back-edge annotations in bugc (#210): a tail-call-optimized JUMP can now carry `gather: [return, invoke, transform: ["tailcall"]]`. Includes: - schemas/program/context/transform.schema.yaml - schemas/program/context.schema.yaml: wire into the if/$ref union. - packages/format/src/types/program/context.ts: Context.Transform interface, isTransform guard, and Transform.Identifier union preserving autocomplete for known values. - packages/format/src/types/program/context.test.ts: register Context.isTransform with the schema guard test harness. - packages/web/spec/program/context/transform.mdx: spec page covering role, v1 identifiers, repetition/composition, and interaction with gather.

gnidan added 2 commits April 16, 2026 02:55

gnidan changed the title ~~bugc: verify optimizer preserves invoke/return contexts~~ bugc: preserve invoke/return contexts through optimizer Apr 16, 2026

gnidan commented Apr 16, 2026

View reviewed changes

gnidan mentioned this pull request Apr 16, 2026

format: make return.data optional #211

Merged

4 tasks

gnidan added 2 commits April 16, 2026 03:21

Merge remote-tracking branch 'origin/main' into compiler-optimizer-in…

eb9ae9a

…voke-tests

gnidan merged commit 937e348 into main Apr 16, 2026
4 checks passed

gnidan deleted the compiler-optimizer-invoke-tests branch April 16, 2026 07:44

gnidan mentioned this pull request Apr 16, 2026

format: add transform context for compiler optimizations #212

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugc: preserve invoke/return contexts through optimizer#210

bugc: preserve invoke/return contexts through optimizer#210
gnidan merged 5 commits intomainfrom
compiler-optimizer-invoke-tests

gnidan commented Apr 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

gnidan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gnidan commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage

TCO debug context preservation

Findings from the optimizer survey

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnidan left a comment

Choose a reason for hiding this comment

gather-of-return-and-invoke: yes, semantically right

stack slot 0 placeholder for return data: not OK, schema needs to change

Other observations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gnidan commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading