-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8311981: Test gc/stringdedup/TestStringDeduplicationAgeThreshold.java#ZGenerational timed out #15240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8311981: Test gc/stringdedup/TestStringDeduplicationAgeThreshold.java#ZGenerational timed out #15240
Conversation
👋 Welcome back dholmes! A progress list of the required criteria for merging this PR into |
@dholmes-ora The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sounds like a good solution, but I haven't dug deep enough fully appreciate all the paths taken. Please make sure to get another (R)eviewer for this change.
@dholmes-ora This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 13 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Thanks,
Patricio
// "external" mutex. If the try_lock fails then we assume that there is an operation | ||
// and force the caller to check more carefully in a safer context. If we can't get | ||
// the lock it means another thread is trying to handshake with us, so it can't | ||
// happen during thread termination and destruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the particular mention about thread termination and destruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo: s/musn't/mustn't/
which is preferred according to urbandictionary.com.
I also don't understand the last sentence. More accurately, I understand the first
phrase, but I don't understand the need for the second phrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier when I mistakenly thought the no-arg version was this version with default args, I was concerned about its use in the HandshakeState
destructor, but reasoned that it was safe as there couldn't be a real pending op at that time to cause contention on the lock - hence the comment in the main code. I left the comment just to add some information on when during a thread's lifetime we could hit this problem - to make it easier to reason about. I can remove it if it is causing confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The only thing which might be confusing is what exactly "termination" phase includes. Because when we are in Threads::remove() with a status of _thread_exiting we can still fail to acquire the lock here while trying to grab the Threads_lock.
Thanks for including the detailed call site analysis in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up. The call site analysis is complicated. The only one that
bothers my brain are the suspend cases, but I've convinced myself
that these are okay even if we have errant duplicate calls.
// "external" mutex. If the try_lock fails then we assume that there is an operation | ||
// and force the caller to check more carefully in a safer context. If we can't get | ||
// the lock it means another thread is trying to handshake with us, so it can't | ||
// happen during thread termination and destruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo: s/musn't/mustn't/
which is preferred according to urbandictionary.com.
I also don't understand the last sentence. More accurately, I understand the first
phrase, but I don't understand the need for the second phrase.
@robehn - do you have time to take a peak at this one since it is Handshake related? |
Thanks for the reviews @stefank , @pchilano and @dcubed-ojdk . @dcubed-ojdk I've fixed the typo. See response to @pchilano about the comment. Thanks. |
Hello, a question, if I understand the description correctly: What I'm I not understanding? |
In |
Sorry if I didn't make it clear, as I understand the description: If "Generation GC Thread" install the new VM op to be execute, it will still wait until it have been execute. The "Gen GC Thread" would still be in this method "VMThread::wait_until_executed". Is my question clearer. |
The reason the handshake stalls is because the However maybe there is still an issue here with The deadlock arises from that the |
Thanks! It sounds like we want the VM op requester to leave VMThread::wait_until_executed/VMOperation_lock before starting the next VM op. As you say process_if_requested may also trigger this AFAICT. |
Looking at it again. Because this is a gc thread which is not a JavaThread it will not do handshakes (process_if_requested when calling |
Thanks @xmas92 for explaining what happens after the current issue gets resolved. Thanks for the reviews @stefank , @pchilano, @dcubed-ojdk and @robehn . /integrate |
Going to push as commit f142470.
Your commit was automatically rebased without conflicts. |
@dholmes-ora Pushed as commit f142470. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please see the JBS issue for full details on the underlying deadlock issue (credit to @stefank for discovering it) and the proposed solution (credit @pchilano and @xmas92 ). Quite simply we make
HandshakeState::has_operation()
non-blocking by using atry_lock
and conservatively returntrue
to indicate an operation may be pending. By not blocking we avoid the deadlock scenario. All usages of the changed code have been examined to see that they are safe with this change (they all basically just take a safe slow path to see if there really is an operation).Testing:
Given the nature of the deadlock this testing is not sufficient to claims success as we probably only saw 1 failure in many hundreds of runs. So if anyone has suggestions for additional testing please speak up. Otherwise we are relying on "correctness by design" - we've removed a blocking condition that leads to the 3-way deadlock, and examined the code paths affected.
Thanks.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15240/head:pull/15240
$ git checkout pull/15240
Update a local copy of the PR:
$ git checkout pull/15240
$ git pull https://git.openjdk.org/jdk.git pull/15240/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 15240
View PR using the GUI difftool:
$ git pr show -t 15240
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15240.diff
Webrev
Link to Webrev Comment