Skip to content

bugc: preserve invoke/return contexts through optimizer#210

Merged
gnidan merged 5 commits intomainfrom
compiler-optimizer-invoke-tests
Apr 16, 2026
Merged

bugc: preserve invoke/return contexts through optimizer#210
gnidan merged 5 commits intomainfrom
compiler-optimizer-invoke-tests

Conversation

@gnidan
Copy link
Copy Markdown
Member

@gnidan gnidan commented Apr 16, 2026

Adds packages/bugc/src/evmgen/optimizer-contexts.test.ts — a suite that compiles the same source at levels 0, 1, 2, 3 and asserts:

  • The bytecode still runs correctly end-to-end.
  • Invoke contexts (caller JUMP + callee JUMPDEST) and return contexts (continuation JUMPDEST) are present with the expected identifiers.

Also updates bugc's tail call optimization pass to preserve invoke/return debug contexts, which the original test suite identified as a gap.

Coverage

Every pass that could touch call sites or returns:

  • Level 1: constant folding, constant propagation, DCE
  • Level 2: CSE, tail call optimization, jump optimization
  • Level 3: block merging, return merging, read/write merging

Scenarios: simple call, constant-foldable args, multiple call sites, non-tail recursion, mutual recursion, nested calls, multiple returns of the same value (return-merging candidate), and tail-recursive call.

TCO debug context preservation

The TCO pass used to drop both invoke and return contexts on the recursive call — a deeply recursive program looked like one giant loop to the debugger and the logical call stack was lost.

Now the TCO back-edge JUMP carries a gather context with BOTH:

  • return: the previous iteration's return
  • invoke: the new iteration's call

Depth stays constant across the JUMP — one frame pops, one pushes, on the same instruction. The function's terminal RETURN pops the final iteration's frame normally. This models source-level semantics rather than the optimized control flow.

A future transform: tailcall marker will annotate these JUMPs as TCO-produced, letting debuggers optionally render the optimization-aware view.

Implementation:

  • Adds Block.TailCall metadata to the jump terminator IR, populated by TCO when replacing a call terminator with a jump to the loop header.
  • Jump codegen attaches a gather context (return + invoke) to the JUMP when tailCall metadata is present.
  • patchInvokeTarget now walks gather contexts so the invoke leaf's placeholder code offset gets resolved from the function registry.

Findings from the optimizer survey

  • All other passes preserve contexts. Return-merging and block-merging do not drop return contexts because those contexts are emitted at continuation JUMPDESTs (whose presence is driven by the IR call terminator, which these passes don't touch).
  • No function inlining pass exists in bugc today, so no "inlined call loses its context" concern for the upcoming transform context design.

Adds a behavioral test suite that compiles a set of source
patterns at every optimization level (0, 1, 2, 3) and:
  - asserts the bytecode still runs correctly end-to-end
  - counts invoke/return contexts by instruction type and
    function identifier, then asserts the expected shape

Covers every pass that could touch call sites or returns:
  L1: constant folding, propagation, DCE
  L2: CSE, TCO, jump optimization
  L3: block merging, return merging, R/W merging

Confirms that only tail call optimization eliminates
contexts (by design — the tail call becomes a jump). All
other transformations preserve invoke/return contexts
across levels for simple calls, nested calls, mutual
recursion, non-tail self-recursion, and multi-path returns
of the same value.

This is groundwork for the transform context spec.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-16 07:48 UTC

gnidan added 2 commits April 16, 2026 02:55
TCO replaces a tail-recursive call terminator with a jump
to the function's loop header. Previously this dropped the
invoke debug context, so the recursive call became
invisible to debuggers — a deeply recursive program looked
like one giant loop with no logical call stack.

Now the TCO pass records a TailCall metadata block on the
replacement jump terminator, and codegen attaches an invoke
debug context to the generated JUMP. The context mirrors
the normal caller-JUMP invoke: identity + declaration +
code target, no argument pointers. patchInvokeTarget
resolves the placeholder code offset from the function
registry the same way it does for regular calls.

No matching return context is emitted for the TCO'd call —
the tail call folds into the outer activation's return, and
a future transform: tailcall marker will let the debugger
reconcile the missing return when the outer function
eventually returns and pops all accumulated tail frames at
once.

Updates the optimizer-contexts test suite to assert the
preserved invoke is present at levels 2 and 3, and that the
return context intentionally does not duplicate.
Refines the TCO debug-context fix: the back-edge JUMP now
carries a gather context with BOTH the previous iteration's
return and the new iteration's invoke. Depth stays constant
across the JUMP — one frame pops, one pushes, on the same
instruction. The function's terminal RETURN then pops the
final iteration's frame normally.

This models source-level semantics rather than the
optimized control flow: the debugger's logical call stack
matches what the programmer wrote, and transform: tailcall
markers (future work) can annotate these JUMPs as
TCO-produced.

Also fixes patchInvokeTarget to walk into gather contexts
so the invoke leaf's placeholder code offset gets resolved
from the function registry.

Test helper countCallSites updated to unwrap gather
contexts and count (invoke, return) pairs on JUMPs
separately from the traditional JUMPDEST buckets.
@gnidan gnidan changed the title bugc: verify optimizer preserves invoke/return contexts bugc: preserve invoke/return contexts through optimizer Apr 16, 2026
Copy link
Copy Markdown
Member Author

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review from architect (schema/format side).

gather-of-return-and-invoke: yes, semantically right

The "all contexts apply simultaneously" claim holds. Under
"following execution" semantics, after the back-edge JUMP both
facts are true: iteration N has returned, iteration N+1 has been
invoked. That's a conjunction, which is what gather means.

One subtlety worth noting in the commit message (not blocking):
there's an implicit ordering (return happens before invoke), but
gather expresses only conjunction, not sequence. That's fine —
the order is recoverable from the surrounding trace (prior
iteration's body preceded, new iteration's body follows) and the
future transform: tailcall marker will disambiguate further.
The gather construct itself doesn't need to encode order.

The shared machine state between the two contexts also works out:
the invoke has no argument pointers, and the stack layout at the
JUMP trace step is already the layout a normal callee-entry
JUMPDEST would see (return address + new args, destination
already popped). So the invoke's identity + target is accurate
for that state. No conflict.

stack slot 0 placeholder for return data: not OK, schema needs to change

This is the real issue. Per return.schema.yaml, data is
required:

required:
  - data

And per the TS types, data: Function.PointerRef is non-optional.
The PR satisfies the constraint by pointing at stack slot 0, but
at a TCO back-edge JUMP slot 0 is the new iteration's first
argument (or the return address, depending on the setup) — not
the intermediate return value, which doesn't materialize. A
debugger that follows this pointer gets a wrong answer labeled as
the return value.

The right fix is making return.data optional, not working
around it at the bytecode level. Rationale:

  1. TCO is a legitimate case where a return semantically happens
    but no value is observable at that point. The schema should
    admit this rather than force compilers to lie.
  2. Precedent: revert.schema.yaml already makes reason and
    panic optional on the same grounds — "a bare revert: {} is
    permitted when the compiler knows a revert occurred but has
    no further detail."
  3. Other legitimate use cases will emerge:
    • void functions (no return value to point at)
    • compiler-lost precision (return happened but tracking
      dropped)
    • optimized returns where the value lives in a register
      already consumed by the subsequent instruction

The change is small:

  • schemas/program/context/function/return.schema.yaml: remove
    data from required.
  • packages/format/src/types/program/context.ts: change
    data: Function.PointerRef to data?: Function.PointerRef
    and adjust the guard.
  • Add a worked example to the schema for a no-data return.
  • Update revert.mdx-style docs if applicable (probably the
    return doc page needs a mirror of revert's "Field optionality"
    section).

Once that lands, this PR can drop the stack-slot-0 placeholder
and emit just:

const returnCtx: Format.Program.Context.Return = {
  return: {
    identifier: tailCall.function,
    ...(declaration ? { declaration } : {}),
  },
};

Happy to open that schema change as a separate PR ahead of this
one landing, or bundle it in here — your call.

Other observations

Not blocking, but worth noting:

  • The gather context's order is [returnCtx, invoke] in the
    code. Since gather is an unordered conjunction per its schema,
    this works either way, but putting return before invoke reads
    naturally ("pop, then push"). Good choice.
  • A future transform: tailcall marker sitting alongside these
    two in the gather would be the ideal final shape. Design
    already leaves room for that.
  • The patchInvokeTarget walking into gather contexts to resolve
    the invoke leaf is a good generalization — it means any
    composed context shape will work without special-casing TCO.

Approving the design direction. The data-optional schema fix is
the only substantive change needed before this should merge — the
current placeholder produces a subtly wrong debug trace.

@gnidan gnidan mentioned this pull request Apr 16, 2026
4 tasks
gnidan added 2 commits April 16, 2026 03:21
Per the format change in #211 making `return.data` optional,
the TCO back-edge JUMP now emits a bare return context
(identifier + declaration only). The stack-slot-0
placeholder was semantically wrong anyway — that slot holds
the new iteration's first argument, not the previous
iteration's return value. TCO doesn't materialize the
intermediate return value at all; the actual return happens
at the function's terminal RETURN.
@gnidan gnidan merged commit 937e348 into main Apr 16, 2026
4 checks passed
@gnidan gnidan deleted the compiler-optimizer-invoke-tests branch April 16, 2026 07:44
gnidan added a commit that referenced this pull request Apr 16, 2026
Adds a new context type annotating instructions with the
compiler transformations that produced them. The value is an
array of short identifiers; the list may repeat the same
identifier when the transformation has been applied multiple
times (e.g., ["inline", "inline"] for doubly-inlined code).

Transform is *additional* annotation. The invoke/return contexts
for the logical call are still emitted at the call boundary so
debuggers see the source-level call stack; the transform context
tells debuggers how the call was physically realized. Consumers
that ignore transform contexts get a sound source-level view
from the semantic contexts alone.

v1 identifiers:
  - "inline": marked instruction is part of an inlined function
    body; surrounding invoke/return contexts name the inlined
    callee.
  - "tailcall": marked instruction is a tail-call-optimized
    back-edge JUMP or continuation, where the call was realized
    without pushing/popping a full activation.

The identifier set is extensible. Debuggers unfamiliar with a
given identifier should preserve it as an opaque label. Order
in the array is not semantically significant — the multiset is
what matters.

Unblocks the final shape of TCO back-edge annotations in
bugc (#210): a tail-call-optimized JUMP can now carry
`gather: [return, invoke, transform: ["tailcall"]]`.

Includes:
- schemas/program/context/transform.schema.yaml
- schemas/program/context.schema.yaml: wire into the if/$ref
  union.
- packages/format/src/types/program/context.ts: Context.Transform
  interface, isTransform guard, and Transform.Identifier union
  preserving autocomplete for known values.
- packages/format/src/types/program/context.test.ts: register
  Context.isTransform with the schema guard test harness.
- packages/web/spec/program/context/transform.mdx: spec page
  covering role, v1 identifiers, repetition/composition, and
  interaction with gather.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant