Skip to content

Conversation

@peterzhu2118
Copy link
Member

No description provided.

@peterzhu2118 peterzhu2118 force-pushed the pz-header-check-actions branch from 944e2c1 to 69a4c44 Compare December 6, 2024 20:55
@peterzhu2118 peterzhu2118 force-pushed the pz-header-check-actions branch from 69a4c44 to e1d07f1 Compare December 6, 2024 20:56
@peterzhu2118 peterzhu2118 merged commit c56b7f0 into main Dec 6, 2024
2 checks passed
@peterzhu2118 peterzhu2118 deleted the pz-header-check-actions branch December 6, 2024 20:58
peterzhu2118 added a commit that referenced this pull request Nov 18, 2025
We need the VM barrier in rb_gc_impl_before_fork to stop the other Ractors
because otherwise they could be allocating objects in the fast path which
could be calling mmtk_add_obj_free_candidate. Since mmtk_add_obj_free_candidate
acquires a lock on obj_free_candidates in weak_proc.rs, this lock may not
be released in the child process after the Ractor dies.

For example, the following script demonstrates the issue:

    puts "Hello #{Process.pid}"

    100.times do |i|
      puts "i = #{i}"
      Ractor.new(i) do |j|
        puts "Ractor #{j} hello"
        1000.times do |i|
          s = "#{j}-#{i}"
        end
        Ractor.receive
        puts "Ractor #{j} goodbye"
      end
      pid = fork { }
      puts "Child pid is #{pid}"
      _, status = Process.waitpid2 pid
      puts status.success?
    end

    puts "Goodbye"

In the child process, we can see that it is stuck trying to acquire the
lock on obj_free_candidates:

    #5  0x00007192bfb53f10 in mmtk_ruby::weak_proc::WeakProcessor::get_all_obj_free_candidates (self=0x7192c0657498 <mmtk_ruby::BINDING+72>) at src/weak_proc.rs:52
    #6  0x00007192bfa634c3 in mmtk_ruby::api::mmtk_get_all_obj_free_candidates () at src/api.rs:295
    #7  0x00007192bfa61d50 in rb_gc_impl_shutdown_call_finalizer (objspace_ptr=0x578c17abfc50) at gc/mmtk/mmtk.c:1032
    #8  0x0000578c1601e48e in rb_ec_finalize (ec=0x578c17ac06d0) at eval.c:166
    #9  rb_ec_cleanup (ec=<optimized out>, ex=<optimized out>) at eval.c:257
    #10 0x0000578c1601ebf6 in ruby_cleanup (ex=<optimized out>) at eval.c:180
    #11 ruby_stop (ex=<optimized out>) at eval.c:292
    #12 0x0000578c16127124 in rb_f_fork (obj=<optimized out>) at process.c:4291
    #13 rb_f_fork (obj=<optimized out>) at process.c:4281
peterzhu2118 added a commit that referenced this pull request Nov 18, 2025
In rb_gc_impl_before_fork, it locks the VM and barriers all the Ractors
before calling mmtk_before_fork. However, since rb_mmtk_block_for_gc is
a barrier point, one or more Ractors could be paused there. However,
mmtk_before_fork is not compatible with that because it assumes that the
MMTk workers are idle, but the workers are not idle because they are
busy working on a GC.

This commit essentially implements a trylock. It will optimistically
lock but will release the lock if it detects that any other Ractors are
waiting in rb_mmtk_block_for_gc.

For example, the following script demonstrates the issue:

    puts "Hello #{Process.pid}"

    100.times do |i|
      puts "i = #{i}"
      Ractor.new(i) do |j|
        puts "Ractor #{j} hello"
        1000.times do |i|
          s = "#{j}-#{i}"
        end
        Ractor.receive
        puts "Ractor #{j} goodbye"
      end
      pid = fork { }
      puts "Child pid is #{pid}"
      _, status = Process.waitpid2 pid
      puts status.success?
    end

    puts "Goodbye"

We can see the MMTk worker thread is waiting to start the GC:

    #4  0x00007ffff66538b1 in rb_mmtk_stop_the_world () at gc/mmtk/mmtk.c:101
    #5  0x00007ffff6d04caf in mmtk_ruby::collection::{impl#0}::stop_all_mutators<mmtk::scheduler::gc_work::{impl#14}::do_work::{closure_env#0}<mmtk::plan::immix::gc_work::ImmixGCWorkContext<mmtk_ruby::Ruby, 0>>> (_tls=..., mutator_visitor=...) at src/collection.rs:23

However, the mutator thread is stuck in mmtk_before_fork trying to stop
that worker thread:

    #4  0x00007ffff6c0b621 in std::sys::thread::unix::Thread::join () at library/std/src/sys/thread/unix.rs:134
    #5  0x00007ffff6658b6e in std::thread::JoinInner<()>::join<()> (self=...)
    #6  0x00007ffff6658d4c in std::thread::JoinHandle<()>::join<()> (self=...)
    #7  0x00007ffff665795e in mmtk_ruby::binding::RubyBinding::join_all_gc_threads (self=0x7ffff72462d0 <mmtk_ruby::BINDING+8>) at src/binding.rs:115
    #8  0x00007ffff66561a8 in mmtk_ruby::api::mmtk_before_fork () at src/api.rs:309
    #9  0x00007ffff66556ff in rb_gc_impl_before_fork (objspace_ptr=0x555555d17980) at gc/mmtk/mmtk.c:1054
    #10 0x00005555556bbc3e in rb_gc_before_fork () at gc.c:5429
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants