[Bug] rb_fiber_scheduler_blocking_operation_wait does not GC-root blocking_operation during rb_funcall#16908
Closed
samuel-williams-shopify wants to merge 0 commit intoruby:masterfrom
Conversation
ea006ed to
97aa28a
Compare
samuel-williams-shopify
added a commit
to samuel-williams-shopify/ruby
that referenced
this pull request
May 10, 2026
…eration_wait rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When the fiber is suspended, blocking_operation may not be reachable via the conservative GC scan of the suspended fiber's C stack. rb_gc_register_address pins blocking_operation in the global GC root list, which is always walked regardless of fiber state. The address is kept registered through the last implicit use of the VALUE — including all accesses via the raw C pointer derived from it — so that a compacting GC cannot move the object and leave dangling. Confirmed by reproducing the crash in io-event CI: ./configure --enable-shared --disable-install-doc --enable-yjit See: socketry/io-event#171 ruby#16908 Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify
added a commit
to samuel-williams-shopify/ruby
that referenced
this pull request
May 10, 2026
…eration_wait rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When the fiber is suspended, blocking_operation may not be reachable via the conservative GC scan of the suspended fiber's C stack. rb_gc_register_address pins blocking_operation in the global GC root list, which is always walked regardless of fiber state. The address is kept registered through the last implicit use of the VALUE — including all accesses via the raw C pointer derived from it — so that a compacting GC cannot move the object and leave dangling. Confirmed by reproducing the crash in io-event CI: ./configure --enable-shared --disable-install-doc --enable-yjit See: socketry/io-event#171 ruby#16908 Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
rb_fiber_scheduler_blocking_operation_wait(scheduler.c) createsblocking_operationas a C-localVALUE, then immediately callsrb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation).If the scheduler's
blocking_operation_waitimplementation causes a fiber switch (e.g. viarb_fiber_scheduler_blockinside a worker-pool implementation), the calling fiber's C stack is suspended. The conservative GC may not scanblocking_operationif the compiler keeps it only in a machine register on the suspended fiber rather than on the stack frame that gets walked. The object is then collected, andget_blocking_operation(blocking_operation)atscheduler.c:1104reads freed/reused memory:Evidence
Confirmed experimentally in the
io-eventgem (PR #170):GC.disablewrapping the@worker_pool.call(operation)call → crash disappears (all tests pass including Ruby head)GC.disable→[BUG] Segmentation faultinget_blocking_operationThis commit
Adds
test/fiber/test_blocking_operation_gc.rbwith two trigger modes to document the bug. The tests are expected to fail (crash) on affected Ruby head builds and pass once the GC-safety issue is resolved.Proposed fix
rb_fiber_scheduler_blocking_operation_waitshould protectblocking_operationagainst GC collection for the duration ofrb_funcall:Or alternatively, register
blocking_operationas a GC root for the call's lifetime.Made with Cursor