Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always deliver postponed job to main ractor #4336

Closed
wants to merge 1 commit into from

Conversation

XrXr
Copy link
Member

@XrXr XrXr commented Mar 30, 2021

Profilers use rb_postponed_job_register_one() in signal handlers, and
signals don't necessarily land on a thread that runs Ruby code and has a
thread local execution context. The MJIT worker thread doesn't have an
execution context, for example.

Without an execution context, postponed job APIs were crashing. Always
use the main ractor's execution context so postponed job APIs can work
from non Ruby threads like in older versions.

Tests courtesy of John Crepezzi and John Hawthorn.

Bug #17573


The test crashes without the code change.

Profilers use `rb_postponed_job_register_one()` in signal handlers, and
signals don't necessarily land on a thread that runs Ruby code and has a
thread local execution context. The MJIT worker thread doesn't have an
execution context, for example.

Without an execution context, postponed job APIs were crashing. Always
use the main ractor's execution context so postponed job APIs can work
from non Ruby threads like in older versions.

Tests courtesy of John Crepezzi <john.crepezzi@gmail.com> and
John Hawthorn <john@hawthorn.email>.

[Bug #17573]
@k0kubun
Copy link
Member

k0kubun commented Mar 30, 2021

https://bugs.ruby-lang.org/issues/17573#note-6
It happens with --jit because MJIT spawns a worker thread that doesn't have an execution context

I removed an MJIT-specific postponed_job at Ruby 3. Do you know which postponed_job is executed in an MJIT worker? Never mind, I haven't followed the full contexts yet and it feels like I misunderstand some contexts.

@XrXr
Copy link
Member Author

XrXr commented Mar 30, 2021

Do you know which postponed_job is executed in an MJIT worker?

It's Stackprof's signal handler that's running rb_postponed_job_register_one on MJIT's worker thread. When the OS chooses to deliver the signal to the worker thread, it basically inserts a stack frame at an arbitrary point of execution in the worker thread. GET_EC() uses thread local storage, and returns NULL on the worker thread.

So it's not an MJIT problem, really. The crashes I want to solve happen on threads that C extensions spawn. I'm using MJIT to reproduce a similar situation without the need for extra gem dependencies.

@XrXr XrXr requested a review from ko1 July 7, 2021 17:55
@ko1
Copy link
Contributor

ko1 commented Oct 27, 2021

@XrXr @jhawthorn
Sorry for long absent.

How about to mix #4108 and this PR?

diff --git a/vm_trace.c b/vm_trace.c
index 63e4c3ac1a..3bc55f2c3a 100644
--- a/vm_trace.c
+++ b/vm_trace.c
@@ -1582,6 +1582,14 @@ postponed_job_register(rb_execution_context_t *ec, rb_vm_t *vm,
     return PJRR_SUCCESS;
 }
 
+static rb_execution_context_t *
+get_valid_ec(rb_vm_t *vm)
+{
+    rb_execution_context_t *ec = GET_EC();
+    if (ec == NULL) ec = rb_vm_main_ractor_ec(vm);
+    return ec;
+}
+
 /*
  * return 0 if job buffer is full
  * Async-signal-safe
@@ -1589,8 +1597,8 @@ postponed_job_register(rb_execution_context_t *ec, rb_vm_t *vm,
 int
 rb_postponed_job_register(unsigned int flags, rb_postponed_job_func_t func, void *data)
 {
-    rb_execution_context_t *ec = GET_EC();
-    rb_vm_t *vm = rb_ec_vm_ptr(ec);
+    rb_vm_t *vm = GET_VM();
+    rb_execution_context_t *ec = get_valid_ec(vm);
 
   begin:
     switch (postponed_job_register(ec, vm, flags, func, data, MAX_POSTPONED_JOB, vm->postponed_job_index)) {
@@ -1608,8 +1616,8 @@ rb_postponed_job_register(unsigned int flags, rb_postponed_job_func_t func, void
 int
 rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data)
 {
-    rb_execution_context_t *ec = GET_EC();
-    rb_vm_t *vm = rb_ec_vm_ptr(ec);
+    rb_vm_t *vm = GET_VM();
+    rb_execution_context_t *ec = get_valid_ec(vm);
     rb_postponed_job_t *pjob;
     rb_atomic_t i, index;
 

@ko1
Copy link
Contributor

ko1 commented Oct 27, 2021

It complete your test.

Advantages:

  • if ractor thread get signaled fortunately, we can measure ractor's thread
  • we do not need to modify vm_core.h

@XrXr
Copy link
Member Author

XrXr commented Oct 27, 2021

How about to mix #4108 and this PR?

Sounds good to me.

ko1 added a commit to ko1/ruby that referenced this pull request Nov 8, 2021
Postponed job can be registered from non-Ruby thread, which means
`ec` in TLS can be NULL. In this case, use main thread's `ec` instead.

See ruby#4108
and ruby#4336
ko1 added a commit that referenced this pull request Nov 9, 2021
Postponed job can be registered from non-Ruby thread, which means
`ec` in TLS can be NULL. In this case, use main thread's `ec` instead.

See #4108
and #4336
@XrXr XrXr force-pushed the sigprof-on-non-ruby-thread branch 3 times, most recently from ebc6cc4 to 89f5e36 Compare November 22, 2021 22:42
@XrXr
Copy link
Member Author

XrXr commented Nov 22, 2021

Closing as this is fixed by #5094. There is a small fixup in #5156.

@XrXr XrXr closed this Nov 22, 2021
@XrXr XrXr deleted the sigprof-on-non-ruby-thread branch November 22, 2021 22:47
matzbot pushed a commit that referenced this pull request Nov 23, 2021
…Backport #17573]

	Use valid `ec` for postponed job.

	Postponed job can be registered from non-Ruby thread, which means
	`ec` in TLS can be NULL. In this case, use main thread's `ec` instead.

	See #4108
	and #4336
	---
	 vm_trace.c | 16 ++++++++++++----
	 1 file changed, 12 insertions(+), 4 deletions(-)

	Avoid assert failure when NULL EC is expected

	After 5680c38, postponed job APIs now
	expect to be called on native threads not managed by Ruby and handles
	getting a NULL execution context. However, in debug builds the change
	runs into an assertion failure with GET_EC() which asserts that EC is
	non-NULL. Avoid the assertion failure by passing `false` for `expect_ec`
	instead as the intention is to handle when there is no EC.

	Add a test from John Crepezzi and John Hawthorn to exercise this
	situation.

	See GH-4108
	See GH-5094

	[Bug #17573]

	Co-authored-by: John Hawthorn <john@hawthorn.email>
	Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
	---
	 ext/-test-/postponed_job/postponed_job.c       | 31 ++++++++++++++++++++++++++
	 test/-ext-/postponed_job/test_postponed_job.rb |  7 ++++++
	 vm_trace.c                                     |  2 +-
	 3 files changed, 39 insertions(+), 1 deletion(-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants