-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8264393: JDK-8258284 introduced dangling TLH race #3272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into |
|
@dcubed-ojdk The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
@fisk, @robehn and @dholmes-ora - This one is a Threads-SMR race fix that I used |
Webrevs
|
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dan,
So in a nutshell, when clearing a nested TLH you can't simply install the previous TLH as the threads_hazard_ptr, but instead set to NULL so that it is properly set by acquire_stable_list.
This seems reasonable to me.
Thanks,
David
| // thread's hazard ptr is handled by acquire_stable_list_fast_path(). | ||
| // And that protocol cannot be properly done with a ThreadsList that | ||
| // might not be the current system ThreadsList. | ||
| assert(_previous->_list->_nested_handle_cnt > 0, "must be > than zero"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "> than" reads "greater than than".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
@dholmes-ora - your nutshell summary is spot on. Thanks for the review. |
|
Hi Dan, As you describe it and how it look to me, isn't the issue just that we decrement before before reinstating the old list? So you just need to move the "if (_has_ref_count) {" piece of code after the potential reinstating of the previous list. Or am I missing something? Thanks for finding it! |
|
@robehn - Thanks for reviewing the fix. Yes, I think you have missed something. :-) I modeled the analysis of this race after one of your favorite race techniques Switching the decrement: |
|
I tested JDK-8264123 together with this fix (JDK-8264393) in Mach5 Tier[1-7] |
|
Hi Dan, yes thanks. So I would say, you may not install a ThreadsList into your hazard pointer if it's on the _to_delete_list. |
|
@robehn - Thanks for closing the loop on your review thread. @dholmes-ora nutshell summary covers it: Another way to put it is that the |
|
@fisk - since I'm tweaking your code (again), I really need you to chime in on |
fisk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I missed this race earlier. I never originally intended for hazard pointers to be set when exiting nested ThreadsListHandles. Anyway - the problem is understood and the fix looks good.
|
@fisk - Thanks for the review. The fault is mine from when I |
|
@dcubed-ojdk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 75 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
/integrate |
|
@dcubed-ojdk Since your change was applied there have been 75 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit f259eea. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
I don't think you did, because originally we never intended them to be nested. |
I ported some 20 year old tests using JDK-8262881 in order to help
test [~rehn]'s fix for JDK-8257831. These tests in combination with
one piece of the fix from JDK-8257831 revealed a bug in my fix for
JDK-8258284 from back in Dec 2020.
The race revealed by the ported tests from JDK-8262881 happens
only with nested ThreadsListHandles. When TLH2 is destroyed, the
thread updates its threads_hazard_ptr from the TLH2-list to the
TLH1-list; I made this change back in 2020.12 using JDK-8258284.
The threads_hazard_ptr can be observed by a thread calling
ThreadsSMRSupport::free_list() as a stable ThreadsList at the same
time as the TLH1 destructor is decrementing the nested_handle_cnt
that permits the ThreadsList to be freed. So the thread calling
ThreadsSMRSupport::free_list() thinks it has a stable hazard ptr
(TLH1-list), but that hazard ptr can be freed and causes lots of
confusion.
Update: This fix along with the fix from JDK-8264123 were stress
tested with the new tests from JDK-8262881.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3272/head:pull/3272$ git checkout pull/3272Update a local copy of the PR:
$ git checkout pull/3272$ git pull https://git.openjdk.java.net/jdk pull/3272/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3272View PR using the GUI difftool:
$ git pr show -t 3272Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3272.diff