-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug / Concurrency] CPU sampler during multi threaded operation yields error: Context cannot be entered on system threads #3013
Comments
Thanks for the report. So this message comes from truffleruby/src/main/java/org/truffleruby/core/thread/ThreadNodes.java Lines 694 to 723 in a3d9543
We need to execute the unblock function given to What changed here is the CPUSampler now uses a system thread vs a embedder thread before, and system threads don't allow entering the context. I don't know how to fix this, we need to execute the blocking function on some thread and that requires entering the context:
We already have the @chumer @jchalou What's is the envisioned solution here? Maybe tools should use embedder threads and not system threads to avoid this problem? It is a breaking change in CPUSampler from Ruby's POV. |
@eregon Tools must use system threads to correctly work with Truffle isolates. The ordinary Java thread is not able to do a downcall to the host VM. |
@tzezula Why not? By |
The problem with |
@tzezula, we handle embedder threads in polyglot isolates. The problem is with threads created by the language that are not system or polyglot threads. They are unsupported now but will only fail if you use polyglot isolates. No embedder threads are not the solution. Embedder threads are only intended for embedders, as the name suggests.
Why would you do anything on a CPU sampler thread? Sorry, I cannot yet see why system threads are problematic here. Can you elaborate? |
So CPUSampler creates a samplerExecutionService system thread. And it |
One idea is if unblock functions do not need Sulong and do not need a Truffle context, we could execute the native function (via JNI, not NFI which needs the context) directly on the "system thread of CPUSampler", without needing to enter the Truffle context. truffleruby/lib/cext/include/ruby/thread.h Lines 94 to 131 in 7c5bef1
do not specify whether the unblock functions of rb_thread_call_without_gvl* can call rb_* or so, which would be a problem as those need to be entered in the context as almost all of them end up running some Sulong or Ruby code.It might hold though. self-note: some cleanup at |
I chatted with @chumer. |
With asserts on it's rather clear:
but without it's not:
|
I tried creating a new thread in
And the main thread ends up stuck, waiting for that lock:
EDIT: it fixes the deadlock to not join() the thread there, so that's fine. |
I also tried just calling the unblock function via JNI (which would avoid the overhead of an extra thread and communicating with it for each sampling ThreadLocalAction). It'd be good to look at real-world usages of |
Some more thoughts here:
|
Ideally, I don't think we'd have to deal with the unblock function at all. I suppose that's what you mean with an analog to async_profiler. I don't want collecting a profile to affect the behavior of running code beyond performance overhead. Interrupting a blocking call that would naturally unblock is not ideal. With that said, I'd settle for whatever we can do. I'm finding the sampling profiler doesn't work with a real world Rails application. There's a lot of thread calls going on without the GVL in the database driver, OpenSSL, ZLib, and others. |
Yes, exactly, ideally we wouldn't need to, but currently the only way to get a guest stacktrace of another thread is to run
Note that many calls to Ideally when such a blocking call is interrupted it would check interrupts ( |
* We cannot enter TruffleContext on a system thread, so that leaves us with two options: * Calling the unblock function with JNI, and expect the unblock function does not call back to Ruby (not the case for all unblock functions I saw). * Calling the unblock function on a new Ruby Thread per context and communicate via a queue, but this seems very high overhead for every guest safepoint. * Fixes #3013
Fix in #3385. |
cpusampler
is yielding threading errors a various runs. Example when running the thread spec tests:jt test spec/ruby/optional/capi/thread_spec.rb -- --cpusampler
The output contains a few tens of:
[ruby] SEVERE: could not unblock thread inside blocking call in C extension because the context does not allow multithreading (Context cannot be entered on system threads.)
Another example is when running rails tests (not necessarily related to multithreading).
The text was updated successfully, but these errors were encountered: