Skip to content

[Bug] rb_fiber_scheduler_blocking_operation_wait does not GC-root blocking_operation during rb_funcall#16908

Closed
samuel-williams-shopify wants to merge 0 commit intoruby:masterfrom
samuel-williams-shopify:bug/blocking-operation-gc
Closed

[Bug] rb_fiber_scheduler_blocking_operation_wait does not GC-root blocking_operation during rb_funcall#16908
samuel-williams-shopify wants to merge 0 commit intoruby:masterfrom
samuel-williams-shopify:bug/blocking-operation-gc

Conversation

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

Problem

rb_fiber_scheduler_blocking_operation_wait (scheduler.c) creates blocking_operation as a C-local VALUE, then immediately calls rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation).

If the scheduler's blocking_operation_wait implementation causes a fiber switch (e.g. via rb_fiber_scheduler_block inside a worker-pool implementation), the calling fiber's C stack is suspended. The conservative GC may not scan blocking_operation if the compiler keeps it only in a machine register on the suspended fiber rather than on the stack frame that gets walked. The object is then collected, and get_blocking_operation(blocking_operation) at scheduler.c:1104 reads freed/reused memory:

[BUG] Segmentation fault at 0x0000000000000000
get_blocking_operation (scheduler.c:123)
rb_fiber_scheduler_blocking_operation_wait (scheduler.c:1104)
io_close_fptr (io.c:5765)

Evidence

Confirmed experimentally in the io-event gem (PR #170):

  • With GC.disable wrapping the @worker_pool.call(operation) call → crash disappears (all tests pass including Ruby head)
  • Without GC.disable[BUG] Segmentation fault in get_blocking_operation

This commit

Adds test/fiber/test_blocking_operation_gc.rb with two trigger modes to document the bug. The tests are expected to fail (crash) on affected Ruby head builds and pass once the GC-safety issue is resolved.

Proposed fix

rb_fiber_scheduler_blocking_operation_wait should protect blocking_operation against GC collection for the duration of rb_funcall:

// After rb_fiber_scheduler_blocking_operation_new():
VALUE blocking_operation = rb_fiber_scheduler_blocking_operation_new(...);
VALUE result = rb_funcall(scheduler, id_blocking_operation_wait, 1, blocking_operation);
RB_GC_GUARD(blocking_operation);  // prevent collection if only in a register
rb_fiber_scheduler_blocking_operation_t *operation = get_blocking_operation(blocking_operation);

Or alternatively, register blocking_operation as a GC root for the call's lifetime.

Made with Cursor

@samuel-williams-shopify samuel-williams-shopify force-pushed the bug/blocking-operation-gc branch from ea006ed to 97aa28a Compare May 9, 2026 13:43
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may not be reachable via the
conservative GC scan of the suspended fiber's C stack.

rb_gc_register_address pins blocking_operation in the global GC root list,
which is always walked regardless of fiber state. The address is kept
registered through the last implicit use of the VALUE — including all accesses
via the raw  C pointer derived from it — so that a compacting GC
cannot move the object and leave  dangling.

Confirmed by reproducing the crash in io-event CI:
  ./configure --enable-shared --disable-install-doc --enable-yjit
See: socketry/io-event#171
     ruby#16908

Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may not be reachable via the
conservative GC scan of the suspended fiber's C stack.

rb_gc_register_address pins blocking_operation in the global GC root list,
which is always walked regardless of fiber state. The address is kept
registered through the last implicit use of the VALUE — including all accesses
via the raw  C pointer derived from it — so that a compacting GC
cannot move the object and leave  dangling.

Confirmed by reproducing the crash in io-event CI:
  ./configure --enable-shared --disable-install-doc --enable-yjit
See: socketry/io-event#171
     ruby#16908

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant