New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock with --dynamic-ui on a large enough context #9926
Comments
Mentioned offline: it's possible that this relates to the |
So, it looks like this had nothing to do with The deadlock I reproduced was:
Rather than trying to fiddle with lock ordering here, I think the approach least likely to bite us in the long run is to have logging be mostly lock free by enqueueing onto a non-blocking queue consumed by the |
### Problem As described in #9926, we observed a deadlock with: * One thread in `WorkunitStore::log_workunit_state`, which via our logging mechanism was trying to use the `Session` to write to stderr. * Another thread in `Session::maybe_display_render` requesting `WorkunitStore::heavy_hitters`. ### Solution Do not acquire the `Session` lock in a logging callback (which might occur anywhere at all, and could cause other unidentified lock interleaving): instead, enqueue for `Scheduler::execute` on the main thread to write to the `Session`. ### Result Fixes #9926. The particular deadlock described there is only possible now that we log workunit completions, but if this cherry-picks cleanly to `1.29.x`, we should apply it there as well to prevent any unanticipated interleaving. [ci skip-jvm-tests]
I'm still getting deadlock even with this fix. I've attached a backtrace taken by attaching |
Full backtrace from a subsequent run that also deadlocked: |
As described in #9926, we observed a deadlock with: * One thread in `WorkunitStore::log_workunit_state`, which via our logging mechanism was trying to use the `Session` to write to stderr. * Another thread in `Session::maybe_display_render` requesting `WorkunitStore::heavy_hitters`. Do not acquire the `Session` lock in a logging callback (which might occur anywhere at all, and could cause other unidentified lock interleaving): instead, enqueue for `Scheduler::execute` on the main thread to write to the `Session`. Fixes #9926. The particular deadlock described there is only possible now that we log workunit completions, but if this cherry-picks cleanly to `1.29.x`, we should apply it there as well to prevent any unanticipated interleaving. [ci skip-jvm-tests]
### Problem As reported on #9926 and shown in https://github.com/pantsbuild/pants/files/4732596/gdb.txt, threads 5 and 25 were deadlocked accessing the `WorkunitStore` (in `complete_workunit` and `poll_session_workunits`, respectively) and the GIL (to get `Node` information, and via the `poll_session_workunits` method in `interface.rs`, respectively). ### Solution Release the GIL before interacting with the `WorkunitStore` in `poll_session_workunits`. I'd like to refactor `interface.rs` to make this kind of issue more challenging to trigger, but this fix is self-contained. ### Result Fixes #9926 some more. [ci skip-jvm-tests]
Running:
...reproducibly hangs the UI (and seemingly the whole process). Disabling the UI with
--no-dynamic-ui
allows progress to be made.The text was updated successfully, but these errors were encountered: