Skip to content

fix(codegen): two silent miscompiles on poly receivers#388

Merged
matz merged 4 commits intomatz:masterfrom
OriPekelman:fix/poly-ivar-write-and-nil-return-dispatch
May 8, 2026
Merged

fix(codegen): two silent miscompiles on poly receivers#388
matz merged 4 commits intomatz:masterfrom
OriPekelman:fix/poly-ivar-write-and-nil-return-dispatch

Conversation

@OriPekelman
Copy link
Copy Markdown
Contributor

Two codegen fixes for polymorphic receivers

Both bugs surfaced while building tep, a
spinel-AOT'd Sinatra clone. Each is a silent miscompile — code runs,
no error, wrong behaviour — and each has a tiny, deterministic repro.

Repros:

Bug 1 — obj.x = v no-ops when obj's static type is a class union

The dispatch loop in compile_call_expr's if (recv.tag == SP_TAG_OBJ)
block emits arms only for explicit user methods and attr_readers.
Setter calls (mname == "x=") on a attr_writer-registered ivar fall
through with no arm, so the surrounding if (.tag == SP_TAG_OBJ) {}
block is empty. The store never executes.

Fix: add a third arm to the loop that recognises the
attr_writer shape and emits a direct ((sp_C*)recv.v.p)->iv_x = v
inside the matching cls_id branch. Also extends
poly_dispatch_return_type to recognise setters so the result temp's
C type matches the ivar slot type (Ruby returns the rhs from x=,
not an int default).

Bug 2 — subclass override silently dropped when its return type is nil

box_value_to_poly for at == "nil" previously returned the bare
literal "sp_box_nil()", discarding the val argument. The
dispatch loop calls this with val = "sp_<Subclass>_method(...)"
the actual method call is dropped, the cls_id arm just produces
sp_box_nil(), and the body of the override never runs.

This bites any base-class method with an empty body (or any subclass
override whose final expression is puts ... / Hash[]= / similar
nil-returning side effect).

Fix: emit the val for its side effect, then yield nil:

return "((void)(" + val + "), sp_box_nil())"

The (void) cast suppresses the otherwise-benign "expression result
unused" warning under -Wall configurations.

Verification

  • Both repros now print the expected output.
  • make test passes (425/425).
  • The downstream tep project's full Sinatra-compat test suite still
    passes (57/57), and several workarounds in tep can now be removed
    (renaming Request#bodyRequest#raw_body to avoid the
    poly-write bug; placeholder 0 returns in filter base methods to
    dodge the nil-return drop).

What this doesn't cover

  • The String#index returns -1 instead of nil divergence (filed
    as a separate issue, no PR — the choice between matching CRuby
    semantics vs. documenting the divergence is a maintainer call).
  • Other patterns where method-parameter types fall back to int in
    the absence of a concrete call site (deferred until I have a
    cleaner repro).

OriPekelman and others added 2 commits May 8, 2026 16:49
A setter call `obj.x = v` on a parameter whose static type is a
class union (one or more classes have an `attr_writer :x`, others
don't) silently no-ops: the dispatch loop in compile_call_expr's
SP_TAG_OBJ block recognises only explicit user methods and
attr_readers. Setters fall through with no cls_id arm, so
`if (.tag == SP_TAG_OBJ) {}` ends up empty and the store never
executes.

Add a third arm that recognises the attr_writer shape and emits a
direct `((sp_C *)recv.v.p)->iv_x = v` for the matching cls_id.
Also extend `poly_dispatch_return_type` to recognise setters so the
result temp's C type matches the ivar slot type (Ruby's `x = v`
returns v, not int) — without this, the new arm's `_t = v;`
mistypes when the slot is e.g. a string.

Repro lives in test/poly_attr_writer.rb. Surfaced while building
the tep Sinatra-clone; `res.body = out` was a no-op when `res`
widened to Request | Response (both having the ivar).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`box_value_to_poly` for `at == "nil"` returned the bare literal
`"sp_box_nil()"`, discarding the `val` argument. The poly dispatch
loop calls this with `val = "sp_<Subclass>_method(...)"` — the
actual method call. Discarding it means: when a subclass override's
static return type is nil (e.g. its body is `puts ...` or
`hash[k] = v` whose return is nil-typed), the cls_id arm emits a
bare `sp_box_nil()` and the override never executes.

Equivalent for any base class with an empty body: empty body's
inferred return is also nil, so even the base call is dropped.

Emit the val for its side effect, then yield nil:

    return "((void)(" + val + "), sp_box_nil())"

The `(void)` cast suppresses the (benign) "expression result unused"
warning when val is e.g. a function returning int.

Repro lives in test/poly_nil_return_dispatch.rb. Surfaced building
the tep Sinatra-clone — Tep::Filter#before's empty base body made
every user-supplied filter silently no-op until we added a
placeholder return.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces regression tests for polymorphic dispatch, specifically addressing issues where setter calls on class unions were ignored and subclass overrides returning nil were skipped. The tests verify that instance variable assignments occur and that method bodies are executed. The review feedback suggests strengthening these tests by explicitly verifying the return values of the setter expressions and dispatched calls, and by adding test cases for shared instance variables with differing types to ensure proper polymorphic boxing.

Comment thread test/poly_attr_writer.rb

foo = Foo.new
bar = Bar.new
set_x(foo, 42)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PR description mentions that poly_dispatch_return_type was extended to ensure the return value of the setter matches the assigned value's type. However, the current test only verifies the side effect (the write to the ivar) and not the return value of the assignment expression itself. It would be more robust to verify that set_x and set_body return the expected value (e.g., by wrapping the calls in puts).

Comment thread test/poly_attr_writer.rb
Comment on lines +29 to +37
class Req
attr_accessor :body
def initialize; @body = ""; end
end

class Res
attr_accessor :body
def initialize; @body = ""; end
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current test for shared ivar names uses String for both Req#body and Res#body. To fully exercise the fix in poly_dispatch_return_type (which handles matching the C type to the ivar slot), it would be beneficial to add a case where the shared ivar has different types in different classes (e.g., String in one and Integer in another). This would verify that the polymorphic return value is correctly boxed when the types in the class union diverge.

Comment thread test/poly_nil_return_dispatch.rb Outdated

h = Holder.new
h.set(Sub.new)
h.call_hook("ok")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the test correctly verifies that the subclass override is executed via its side effect (puts), it doesn't verify that the return value of the dispatched call is correctly handled as nil. Since the fix specifically addresses nil return values in polymorphic dispatch by ensuring they are properly boxed as sp_box_nil(), adding an assertion for the return value (e.g., puts h.call_hook("ok").nil?) would provide better coverage.

OriPekelman and others added 2 commits May 8, 2026 17:41
Follow-up to the previous attr_writer-on-poly-receiver fix, surfaced
by review feedback (gemini-code-assist on PR matz#388).

When the new arm fires for a setter call whose function parameter
widened to `poly` -- because the same setter is called with two
differently-typed args from divergent call sites -- the rhs needs to
be unboxed to the slot's concrete C type for each cls_id arm. Without
this, a slot of type `int` (`mrb_int iv_slot`) ended up with the
poly's `sp_RbVal lv_v` assigned directly, which fails C compilation.

Three rhs-fitting cases now:

  slot poly + arg concrete  -> box the arg (existing path)
  slot concrete + arg poly  -> unbox the arg via unbox_poly_to (new)
  else                      -> direct assign

Test (`test/poly_attr_writer.rb`) gains a third subcase: `IBox#slot:int`
and `SBox#slot:String` sharing the slot name, with `set_slot(o, v)`
called for both. Pre-fix this didn't even compile; with the unbox path,
each cls_id arm picks the right concrete shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… effect

Follow-up per gemini-code-assist's review on PR matz#388: the
poly_nil_return_dispatch test verified that the subclass override's
side effect (a `puts`) ran, but didn't verify that the dispatched
call's return value was correctly observable as nil.

Add an explicit `result == nil` check after `h.call_hook(\"ok\")` to
confirm the boxing path threads the nil result through to the caller,
not just the side effect.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@OriPekelman
Copy link
Copy Markdown
Contributor Author

Addressed all three of @gemini-code-assist's points:

  1. Return-value coverage: every setter call in test/poly_attr_writer.rb now also captures the return and prints it (ret_foo, ret_req, ret_res, ri, rs). Confirms obj.x = v evaluates to v, not just that the store happens.

  2. Differing slot types: added a third subcase (IBox#slot:int, SBox#slot:String) to exercise the path where poly_dispatch_return_type widens to poly because the two arms disagree on slot type. Required a small follow-up in the new arm itself: when slot is concrete and the rhs is poly (because the function param widened from divergent call sites), the rhs gets unboxed via unbox_poly_to(slot_t, ...) before the store. Without this, the C compiler errors with "assigning to mrb_int from incompatible type sp_RbVal" — the IBox and SBox arms need different concrete unboxes.

  3. Nil return assertion: test/poly_nil_return_dispatch.rb now captures the result of h.call_hook("ok") and prints result == nil (true). Confirms the boxing path didn't just preserve the side effect but also yielded a properly nil-tagged value to the caller.

Re: Windows CI — file_class_methods_int_recv was already failing on master before this PR landed (see run 25562363479 on master, same failure). Pass count on Windows went from 334 to 336 with my two new tests, so the new tests pass; the pre-existing failure is unrelated.

OriPekelman added a commit to OriPekelman/spinel that referenced this pull request May 8, 2026
Per matz#391 review: extend test/poly_attr_writer.rb with three
subcases.

  1. bool slot (the original tep symptom)
  2. string slot with both classes agreeing
  3. divergent slots: IBox{int} + SBox{string} sharing the
     `assign(o, v)` call site, so spinel widens both `o` and
     `v` to poly.

Each subcase also captures and checks the assignment expression's
return value to confirm Ruby `obj.x = v` semantics (yields v).

Subcase 3 surfaced a hole in the original arm: when the rhs is
poly (`sp_RbVal`) and the slot is concrete (mrb_int / const
char *), the per-arm write needs to unbox the rhs into the
slot's C type. Without that the C compiler errors with
"assigning to const char * from sp_RbVal" -- because the
divergent ivars demand different concrete unboxes. Mirrors the
fix already in PR matz#388 for poly setter dispatch.

The arm now matches three rhs/slot shapes:
  - slot poly, rhs concrete: box rhs.
  - slot concrete, rhs poly: unbox rhs into the slot's type.
  - both agree: pass through.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@matz matz merged commit 99fb92b into matz:master May 8, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants