Fix busy-loop when waiting for file descriptors to close #7865

KJTsanaktsidis · 2023-05-26T02:26:28Z

When one thread is closing a file descriptor whilst another thread is concurrently reading it, we need to wait for the reading thread to be done with it to prevent a potential EBADF (or, worse, file descriptor reuse).

At the moment, that is done by keeping a list of threads still using the file descriptor in io_close_fptr. It then continually calls rb_thread_schedule() in fptr_finalize_flush until said list is empty.

That busy-looping seems to behave rather poorly on some OS's, particulary FreeBSD. It can cause the TestIO#test_race_gets_and_close test to fail (even with its very long 200 second timeout) because the closing thread starves out the using thread.

To fix that, I introduce the concept of struct rb_io_close_wait_list; a list of threads still using a file descriptor that we want to close. We call rb_notify_fd_close to let the thread scheduler know we're closing a FD, which fills the list with threads. Then, we call rb_notify_fd_close_wait which will block the thread until all of the still-using threads are done.

This is implemented with a condition variable sleep, so no busy-looping is required.

After applying this fix, the timeout on TestIO#test_race_gets_and_close can be reduced and it passes reliably on FreeBSD.

On Win32, currently, rb_nativethread_cond_t is an incomplete type because it's a typedef for `struct rb_thread_cond_struct`. That means you can't actually allocate a rb_nativethread_cond_t unless you also include THREAD_IMPL_H (since its defined in thread_win32.h) (alternatively, including vm_core.h also works). Move the definition of rb_thread_cond_struct into thread_native.h to alleviate this.

When one thread is closing a file descriptor whilst another thread is concurrently reading it, we need to wait for the reading thread to be done with it to prevent a potential EBADF (or, worse, file descriptor reuse). At the moment, that is done by keeping a list of threads still using the file descriptor in io_close_fptr. It then continually calls rb_thread_schedule() in fptr_finalize_flush until said list is empty. That busy-looping seems to behave rather poorly on some OS's, particulary FreeBSD. It can cause the TestIO#test_race_gets_and_close test to fail (even with its very long 200 second timeout) because the closing thread starves out the using thread. To fix that, I introduce the concept of struct rb_io_close_wait_list; a list of threads still using a file descriptor that we want to close. We call `rb_notify_fd_close` to let the thread scheduler know we're closing a FD, which fills the list with threads. Then, we call rb_notify_fd_close_wait which will block the thread until all of the still-using threads are done. This is implemented with a condition variable sleep, so no busy-looping is required.

This test should be fixed and fast now because the closing thread sleeps appropriately waiting for the file descriptor to be unused.

ioquatix · 2023-05-26T06:09:19Z

I would have liked to discuss this a little more, as I have a counter-proposal to remove a large amount of this logic: #5384

KJTsanaktsidis · 2023-05-26T06:48:16Z

I hadn't seen https://bugs.ruby-lang.org/issues/18455 before. I think it makes sense that the waiting-on-close should be done per-IO, rather than per-FD. However, we'll still need this fix or something like it, right, to make the closing thread actually sleep while waiting for the other threads to finish being interrupted?

I guess the list of "threads currently blocked on this IO instance" could then be stored on the IO itself in io.c rather than this very strange dance between io.c and thread.c involving a VM-global linked list of pending IOs though.

And it lookms like "store a list of threads currently blocked on this IO instance" is exactly what you did - e.g. - https://github.com/ruby/ruby/pull/5384/files#diff-161b2a279f4c67a1ab075a7890ecf6f3f1d483d910910fa52b3715e25cfdcbd7R2530

However - you removed the part that blocks the closing thread from actually closing until the other threads have fully been woken up: https://github.com/ruby/ruby/pull/5384/files#diff-92194f057884b3287a3a6bf84e6e3b2bf433a556b68562799252a091744e7854L5233 (unless i've drasticly misunderstood things) so it seems possible you might get spurious EBADF's?

tl;dr - I think we should do both this PR and your PR?

EDIT: I see you were planning to handle the actual close asynchronously afterwards it seems - #5384 (comment).

Anyway - I'm more than happy for this fix to get blown away by your proposal to do this per-IO.

ioquatix · 2023-05-26T07:05:10Z

I like that you did work on this, so thanks, and I appreciate your detailed reply. It's a complex problem.

I basically have a problem if close is O(N). If we try to create 1000s or more connections, close can become a significant overhead.

I think the goal of close with read operations is to avoid using EBADF as a signal that the IO has closed. However, it's been a while since I looked at this. I'll try to revisit this today and rebase my PR.

KJTsanaktsidis · 2023-05-26T07:25:45Z

This test seems to be explicitly checking that IOError and not EBADF is raised if an IO is closed whilst being read:

ruby/test/ruby/test_io.rb

Line 3828 in cc698c6

begin

So yeah if we can keep that property whilst avoiding the global list of IO ops (and it sounds like we can) I'm 💯 for that.

Thanks for your feedback, and let me know if I can help out with what you're doing in any way!

ioquatix · 2023-05-26T07:41:49Z

One initial issue that came out of the PR, was the desire/need to hide at least some (but ideally all) of rb_io. #6511

I don't know what is the best way to do this, which is both safe (compatible where possible) and good (achieving our goal of removing it completely).

If you have any thoughts on this... could be good to explore that space.

The reason why it was important is because I wanted to introduce more complex data structures to rb_io but don't want to expose that implementation.

mame · 2023-05-31T01:09:42Z

Since this change, Solaris 10 CI has been failing. @KJTsanaktsidis @ioquatix @nobu Can you please take a look at it?

http://rubyci.s3.amazonaws.com/solaris10-gcc/ruby-master/log/20230526T070003Z.fail.html.gz

- raises IOError when stream is closed by another threadtimeout: output interval exceeds 1800.0 seconds.
timeout: the process group 15303 is alive.

ioquatix · 2023-05-31T01:40:07Z

Thanks for your report @mame. I'll check it. Also, I hope we can fix this problem by #5384.

KJTsanaktsidis · 2023-05-31T06:38:46Z

@mame are you able to point me to some details on how these solaris10 tests are configured? The chkbuild link (http://rubyci.s3.amazonaws.com/solaris10-gcc/) just gives me a S3 error page...

I rented a SPARC solaris 10 VM and got Ruby compiled with GCC 5.5 and the test in question passes for me. Do I need a machine with more or fewer CPUs perhaps?

KJTsanaktsidis · 2023-05-31T06:49:32Z

Ah I spoke too soon - I have it failing here when all of the ruby spec suite (gmake MSPECOPT=-Vfs yes-test-rubyspec) is run. So I can poke around this now.

KJTsanaktsidis · 2023-05-31T08:26:14Z

OK - I have a rough idea of what's going on here.

When the deadlock happens, there are two threads. One thread is calling IO#close, and is waiting for all other usages of that file to end in rb_notify_fd_close_wait. It's blocked in a call to rb_native_cond_wait.

(gdb) bt
#0  0xfef4b038 in __lwp_park () from /lib/libc.so.1
#1  0xfef44ec4 in cond_sleep_queue () from /lib/libc.so.1
#2  0xfef45088 in cond_wait_queue () from /lib/libc.so.1
#3  0xfef455d0 in cond_wait () from /lib/libc.so.1
#4  0xfef4560c in pthread_cond_wait () from /lib/libc.so.1
#5  0x0028d840 in rb_native_cond_wait (mutex=0xffbfd798, cond=0xffbfd7b0) at thread_pthread.c:191
#6  rb_notify_fd_close_wait (busy=0xffbfd790) at thread.c:2512
#7  0x0011b15c in call_close_wait_nogvl (arg=<optimized out>) at io.c:5430
#8  0x00292acc in rb_nogvl (flags=0, flags=0, data2=<optimized out>, ubf=<optimized out>, data1=0xffbfd790, func=0x11b14c <call_close_wait_nogvl>) at thread.c:1547
#9  rb_thread_call_without_gvl (func=0x11b14c <call_close_wait_nogvl>, data1=data1@entry=0xffbfd790, ubf=ubf@entry=0xffffffff, data2=data2@entry=0x0) at thread.c:1663
#10 0x0012abbc in fptr_finalize_flush (fptr=fptr@entry=0x2f8bcd0, noraise=noraise@entry=0, busy=busy@entry=0xffbfd790, keepgvl=0) at io.c:5479
#11 0x0012af78 in io_close_fptr (io=0, io@entry=4266864140) at io.c:5652
#12 0x0012b0bc in rb_io_close (io=4266864140) at io.c:5672
#13 0x0012b53c in rb_io_close_m (io=io@entry=4266864140) at io.c:5716
#14 0x002cc2a8 in ractor_safe_call_cfunc_0 (recv=4266864140, argc=0, argv=0x3e5758, func=0x12b510 <rb_io_close_m>) at vm_insnhelper.c:3249
#15 0x002dfd4c in vm_call_cfunc_with_frame_ (ec=0x3e52a4, reg_cfp=0x4651e8, calling=<optimized out>, argc=0, argv=0x3e5758, stack_bottom=0x3e5754) at vm_insnhelper.c:3433

The second thread is using the IO in question, blocked trying to read from it:

(gdb) bt
#0  0xfef4e874 in __pollsys () from /lib/libc.so.1
#1  0xfef3e548 in _pollsys () from /lib/libc.so.1
#2  0xfeeea298 in pselect_large_fdset () from /lib/libc.so.1
#3  0xfeeea6e0 in select_large_fdset () from /lib/libc.so.1
#4  0x0011df6c in nogvl_wait_for (timeout=0x0, events=1, fptr=0x2f8bcd0, th=<optimized out>) at io.c:12554
#5  io_internal_wait (events=1, timeout=0x0, error=11, fptr=0x2f8bcd0, thread=<optimized out>) at io.c:1144
#6  internal_read_func (ptr=ptr@entry=0xfd96f808) at io.c:1175
#7  0x00292d38 in rb_thread_io_blocking_region (func=0x11de74 <internal_read_func>, data1=data1@entry=0xfd96f808, fd=9) at thread.c:1694
#8  0x001232e8 in rb_io_read_memory (count=1, buf=0xfe538b4c, fptr=0x2f8bcd0) at io.c:1284
#9  io_bufread (fptr=0x2f8bcd0, len=1, ptr=0xfe538b4c "") at io.c:3017
#10 bufread_call (arg=arg@entry=4254529864) at io.c:3056
#11 0x000dc9e8 in rb_ensure (b_proc=0x123110 <bufread_call>, data1=4254529864, e_proc=0x258ff0 <rb_str_unlocktmp>, data2=4266888000) at eval.c:1009
#12 0x00128f58 in io_fread (fptr=0x2f8bcd0, size=<optimized out>, offset=0, str=4266888000) at io.c:3070
#13 io_read (argc=argc@entry=1, argv=argv@entry=0xfd96fe70, io=io@entry=4266864140) at io.c:3786
#14 0x002cc294 in ractor_safe_call_cfunc_m1 (recv=4266864140, argc=1, argv=0xfd96fe70, func=0x128e00 <io_read>) at vm_insnhelper.c:3242

The issue is that when the closing thread yields the GVL to begin waiting on the condvar, that begins a round of "spam all the threads that have pending interrupts with SIGVTALRM". This should wake up the thread that's blocked on the read; however, if that thread has not yet actually started calling select(2), the signal has no effect (i.e. the subsequent call to select(2) won't return EINTR). Once the main thread is blocked on the condvar, nothing else is going to produce more SIGVTALRMs, and so we deadlock.

I demonstrated that this is the issue by applying this patch - it forces a timer thread to be created when calling rb_notify_fd_close_wait which can apply more SIGVTALRM to the reading thread and eventually actually make it fail with EINTR. 5ec9626. With this applied, the tests work on my Solaris VM. This is obviously not the real solution though.

I think the actual solution is one of two things:

rb_notify_fd_close_wait shouldn't block on a native condvar, but rather on some higher-level thing integrated into Ruby's thread scheduling system. An actual Ruby mutx & condition variable would work (the thread would go into sigwait-sleep whilst waiting for the condvar to be signaled); or perhaps it should be plumbed into the ubf/scheduling mechanism in some other way.
nogvl_wait_for should wait not only on the actual FD that was requested, but also on the sigwait FD (if it's available). This is actually what rb_thread_fd_select does (it adds the sigwait FD to the select set, and calls check_signals_nogvl if it fired). Perhaps the code for nogvl_wait_for and rb_thread_fd_select should be unified to do this?
Both of these things maybe? @ioquatix WDYT?

I can work on trying one or both of these tomorrow or the day after I think.

Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be _no_ threads left to actually handle incoming signals/ubf calls/etc. This manifested as failing tests on Solaris 10 (SPARC), because: * One thread called IO#close, which sent a SIGVTALRM to the other thread to interrupt it, and then waited on the condvar to be notified that the reading thread was done. * One thread was calling IO#read, but it hadn't yet reached the actual call to select(2) when the SIGVTALRM arrived, so it never unblocked itself. This results in a deadlock. The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread. See the discussion in: ruby#7865

Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be _no_ threads left to actually handle incoming signals/ubf calls/etc. This manifested as failing tests on Solaris 10 (SPARC), because: * One thread called IO#close, which sent a SIGVTALRM to the other thread to interrupt it, and then waited on the condvar to be notified that the reading thread was done. * One thread was calling IO#read, but it hadn't yet reached the actual call to select(2) when the SIGVTALRM arrived, so it never unblocked itself. This results in a deadlock. The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread. See the discussion in: #7865

mame · 2023-06-02T09:04:35Z

@KJTsanaktsidis I have confirmed that Solaris 10 CI is now green. It's amazing that you were able to solve the problem just by reading the phenomenon and the code!

KJTsanaktsidis · 2023-06-02T09:27:39Z

It wasn’t quite so magical 😂 - I actually found a company selling SPARC hosting, and spin up a Solaris 10 server. The deadlock happened pretty reliably from running the tests so I could just attach gdb to it.

mame · 2023-06-02T09:35:23Z

Aha. Thanks for spending your time and resource! It looks like #7884 fixed some other people's deadlock as well. Thanks anyway!

KJTsanaktsidis and others added 3 commits May 26, 2023 13:07

Remvoe very high timeout on test_race_gets_and_close

0fc9e46

This test should be fixed and fast now because the closing thread sleeps appropriately waiting for the file descriptor to be unused.

KJTsanaktsidis force-pushed the ktsanaktsidis/fix_sched_busy_wait_when_io_close branch from 715cfa9 to 0fc9e46 Compare May 26, 2023 03:08

nobu merged commit d1628fe into ruby:master May 26, 2023
95 checks passed

KJTsanaktsidis mentioned this pull request Jun 1, 2023

Use a real Ruby mutex in rb_io_close_wait_list #7884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix busy-loop when waiting for file descriptors to close #7865

Fix busy-loop when waiting for file descriptors to close #7865

KJTsanaktsidis commented May 26, 2023

ioquatix commented May 26, 2023

KJTsanaktsidis commented May 26, 2023 •

edited

ioquatix commented May 26, 2023

KJTsanaktsidis commented May 26, 2023

ioquatix commented May 26, 2023 •

edited

mame commented May 31, 2023

ioquatix commented May 31, 2023

KJTsanaktsidis commented May 31, 2023

KJTsanaktsidis commented May 31, 2023

KJTsanaktsidis commented May 31, 2023

mame commented Jun 2, 2023

KJTsanaktsidis commented Jun 2, 2023

mame commented Jun 2, 2023

Fix busy-loop when waiting for file descriptors to close #7865

Fix busy-loop when waiting for file descriptors to close #7865

Conversation

KJTsanaktsidis commented May 26, 2023

ioquatix commented May 26, 2023

KJTsanaktsidis commented May 26, 2023 • edited

ioquatix commented May 26, 2023

KJTsanaktsidis commented May 26, 2023

ioquatix commented May 26, 2023 • edited

mame commented May 31, 2023

ioquatix commented May 31, 2023

KJTsanaktsidis commented May 31, 2023

KJTsanaktsidis commented May 31, 2023

KJTsanaktsidis commented May 31, 2023

mame commented Jun 2, 2023

KJTsanaktsidis commented Jun 2, 2023

mame commented Jun 2, 2023

KJTsanaktsidis commented May 26, 2023 •

edited

ioquatix commented May 26, 2023 •

edited