Skip to content

Commit

Permalink
fix deadlock on Thread#join
Browse files Browse the repository at this point in the history
because of 9720f5a

http://rubyci.s3.amazonaws.com/solaris11-sunc/ruby-master/log/20230403T130011Z.fail.html.gz

```
  1) Failure:
TestThread#test_signal_at_join [/export/home/chkbuild/chkbuild-sunc/tmp/build/20230403T130011Z/ruby/test/ruby/test_thread.rb:1488]:
Exception raised:
<#<fatal:"No live threads left. Deadlock?\n1 threads, 1 sleeps current:0x00891288 main thread:0x00891288\n* #<Thread:0xfef89a18 sleep_forever>\n   rb_thread_t:0x00891288 native:0x00000001 int:0\n   \n">>
Backtrace:
  -:30:in `join'
  -:30:in `block (3 levels) in <main>'
  -:21:in `times'
  -:21:in `block (2 levels) in <main>'.
```

The mechanism:

* Main thread (M) calls `Thread#join`
* M: calls `sleep_forever()`
* M: set `th->status = THREAD_STOPPED_FOREVER`
* M: do `checkints`
* M: handle a trap handler with `th->status = THREAD_RUNNABLE`
* M: thread switch at the end of the trap handler
* Another thread (T) will process `Thread#kill` by M.
* T: `rb_threadptr_join_list_wakeup()` at the end of T tris to wakeup M,
     but M's state is runnable because M is handling trap handler and
     just ignore the waking up and terminate T$a
* T: switch to M.
* M: after the trap handler, reset `th->status = THREAD_STOPPED_FOREVER`
     and check deadlock -> Deadlock because only M is living.

To avoid such situation, add new sleep flags `SLEEP_ALLOW_SPURIOUS`
and `SLEEP_NO_CHECKINTS` to skip any check ints.

BTW this is instentional to leave second `vm_check_ints_blocking()`
without checking `SLEEP_NO_CHECKINTS` because `SLEEP_ALLOW_SPURIOUS`
should be specified with `SLEEP_NO_CHECKINTS` and skipping this
checkints can skip any interrupts.
  • Loading branch information
ko1 committed Apr 3, 2023
1 parent 38209ff commit b2e8481
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions thread.c
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ rb_thread_local_storage(VALUE thread)
enum SLEEP_FLAGS {
SLEEP_DEADLOCKABLE = 0x01,
SLEEP_SPURIOUS_CHECK = 0x02,
SLEEP_ALLOW_SPURIOUS = 0x04,
SLEEP_NO_CHECKINTS = 0x08,
};

static void sleep_forever(rb_thread_t *th, unsigned int fl);
Expand Down Expand Up @@ -418,6 +420,7 @@ rb_threadptr_join_list_wakeup(rb_thread_t *thread)
case THREAD_STOPPED:
case THREAD_STOPPED_FOREVER:
target_thread->status = THREAD_RUNNABLE;
break;
default:
break;
}
Expand Down Expand Up @@ -1053,7 +1056,7 @@ thread_join_sleep(VALUE arg)
rb_fiber_scheduler_block(scheduler, target_th->self, p->timeout);
}
else if (!limit) {
sleep_forever(th, SLEEP_DEADLOCKABLE);
sleep_forever(th, SLEEP_DEADLOCKABLE | SLEEP_ALLOW_SPURIOUS | SLEEP_NO_CHECKINTS);
}
else {
if (hrtime_update_expire(limit, end)) {
Expand Down Expand Up @@ -1338,7 +1341,9 @@ sleep_forever(rb_thread_t *th, unsigned int fl)

status = fl & SLEEP_DEADLOCKABLE ? THREAD_STOPPED_FOREVER : THREAD_STOPPED;
th->status = status;
RUBY_VM_CHECK_INTS_BLOCKING(th->ec);

if (!(fl & SLEEP_NO_CHECKINTS)) RUBY_VM_CHECK_INTS_BLOCKING(th->ec);

while (th->status == status) {
if (fl & SLEEP_DEADLOCKABLE) {
rb_ractor_sleeper_threads_inc(th->ractor);
Expand All @@ -1350,6 +1355,9 @@ sleep_forever(rb_thread_t *th, unsigned int fl)
if (fl & SLEEP_DEADLOCKABLE) {
rb_ractor_sleeper_threads_dec(th->ractor);
}
if (fl & SLEEP_ALLOW_SPURIOUS) {
break;
}

woke = vm_check_ints_blocking(th->ec);

Expand Down

0 comments on commit b2e8481

Please sign in to comment.