-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8361752: Double free in CompileQueue::delete_all after JDK-8357473 #26294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
|
@shipilev This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 86 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
|
I am pretty convinced this is it. But I still struggle to reproduce the failure locally. So I would appreciate if @TobiHartmann or @dholmes-ora could give it a spin through the CI where this reproduces. Probably after JDK-8360048 lands, if that one is not a test-only bug? |
|
I kicked off a CI run. I'll keep you posted on the results. |
| // Wake up all blocking task waiters to delete all remaining blocking | ||
| // tasks. This is not a performance sensitive path, so we do this | ||
| // unconditionally to simplify coding. | ||
| { | ||
| MonitorLocker ml(Thread::current(), CompileTaskWait_lock); | ||
| ml.notify_all(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other compiler threads which still in process of compiling for blocking tasks? They still need it CompileTask object.
delete_all() is called by one compiler thread which finished compilation but other threads may not.
I don't see any compiler thread checks shut_down state to stop compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, that's the point of the existing protocol to force waiters to delete the task: the blocking waiter would wait for compiler thread to complete the task one way or the other. This PR makes that protocol even stronger: only blocking waiters are allowed to delete the blocking task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, your question is what happens if we notify here, and compilations are still running? Well, I think current protocol should nominally allow waiters to wait until compilation is over and then allow them to delete the task. But then I see wait_for_compilation can exit when compilation is shut down:
while (!task->is_complete() && !is_compilation_disabled_forever()) {
ml.wait();
}
This will proceed to delete the task while compiler thread is running. Grrr. Looks to be another hole in this protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can compiler thread delete its own blocking task when it finished. And let Java thread resume execution when compilation disabled as it do now but do nothing about task in such case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that works. There is no "own" blocking task, there are nearly always two threads involved: the compiler thread and the waiter (Java) thread. Waiter is checking the task status under the lock. Logically, the last user should delete the task, that is waiter.
But I think we can handle this hole by ignoring the blocking task deletion during compiler shutdown. For the same reason described in PR body: we already leave cruft behind in that case, and it costs us quite a bit of complexity to deal with every corner case during shutdown. So it seems simpler to just drop the tasks on the floor in that corner case.
I did a variant of this in new commit, seems to still work well under stress testing. More testing is running now...
FWIW, tier1-tier3, and 100 repeats of Let me know when I should kick off another round. |
Thank you, that is good to know! New version handles even more obscure corner case, that I doubt would show up easily :) My Linux x86_64 server fastdebug |
tier1 - tier3 and 100 repeats of TestStressBailout.java on Linux x64 & aarch64, Windows x64, and Mac x64 & aarch64 all passed. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good.
|
Thanks! I think I need another Review. |
|
@iwanowww please look? |
iwanowww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
Thank you! I re-tested locally after local merge with current master, and it still works. Here goes. /integrate |
|
Going to push as commit 9609f57.
Your commit was automatically rebased without conflicts. |
See the bug for more analysis.
The short summary is that
CompileQueue::delete_allwalks the entire compile queue and deletes the tasks. It normally goes smoothly, unless there are blocking tasks. Then, the actual waiters have to delete the task, lest we delete the task under waiter's feet. Full deletion and blocking waits coordinate withwaiting_for_completion_countcounter. This mechanism -- added by JDK-8343938 in JDK 25 to solve a similar problem -- almost works. Almost.There is a subtle race window, where blocking waiter could have already unparked, dropped
waiting_for_completion_countto0and proceeded to delete the task, seeCompileBroker::wait_for_completion(). Then the queue deletion code could assume there are no actual waiters on the blocking task, and proceed to delete the task again. Before JDK-8357473 this race was fairly innocuous, as second attempt at insertion into the free list was benign. But now,CompileTask-s aredelete-d, and the second attempt leads to double free.I suspect we can fix that by complicating the coordination protocol even further, e.g. by tracking the counters more thoroughly. But, recognizing
CompileQueue::delete_all()is basically only called from the compiler shutdown code (things are already bad), and it looks completely opportunistic (it does not delete the whole compiler threads, so skipping synchronous deletes on a few compile tasks are not a big deal), we should strive to simplify it.This PR summarily delegates all blocking task deletes to waiters. I think it stands to reason (and can be seen in
CompilerBrokercode) that if a blocking task is in queue, then there is a waiter that would callCompileBroker::wait_for_completion()on it.Additional testing:
tier1allProgress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26294/head:pull/26294$ git checkout pull/26294Update a local copy of the PR:
$ git checkout pull/26294$ git pull https://git.openjdk.org/jdk.git pull/26294/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26294View PR using the GUI difftool:
$ git pr show -t 26294Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26294.diff
Using Webrev
Link to Webrev Comment