Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264123: add ThreadsList.is_valid() support #3255

Closed
wants to merge 1 commit into from

Conversation

@dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Mar 29, 2021

ThreadsLists need an is_valid() function and checks in various
places to help catch bugs where a ThreadsList is dangling.

Other minor changes:

  • change raw _threads_hazard_ptr access to used get_threads_hazard_ptr().
  • get_threads_hazard_ptr() should be const.
  • fix a couple of old typos.

Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3255/head:pull/3255
$ git checkout pull/3255

Update a local copy of the PR:
$ git checkout pull/3255
$ git pull https://git.openjdk.java.net/jdk pull/3255/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3255

View PR using the GUI difftool:
$ git pr show -t 3255

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3255.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Mar 29, 2021

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Mar 29, 2021

/label add hotspot-runtime

@openjdk
Copy link

@openjdk openjdk bot commented Mar 29, 2021

@dcubed-ojdk
The hotspot-runtime label was successfully added.

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review Mar 29, 2021
@openjdk openjdk bot added the rfr label Mar 29, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 29, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Dan,

I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.

Any why is this all product code rather than debug?

Thanks,
David

@@ -282,6 +282,9 @@ class ScanHazardPtrGatherProtectedThreadsClosure : public ThreadClosure {
if (thread->cmpxchg_threads_hazard_ptr(NULL, current_list) == current_list) return;
}

guarantee(ThreadsList::is_valid(current_list), "current_list="

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

Why a guarantee and not an assert? What is the cost of doing this check?

This comment has been minimized.

@dcubed-ojdk

dcubed-ojdk Mar 30, 2021
Author Member

This is a guarantee() instead of an assert() because this failure mode
is rarely caught with 'release' bits and very, very rarely caught with
'fastdebug' or 'slowdebug' bits. It's a very tight race.

My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.

I don't have data on what this correctness check costs, but it's just
a value check on the new _magic field value. Very much like other
"magic" field value checks that we have all over the VM...

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

I should have edited this comment once I saw what the actual check was.

Keeping this as a guarantee initially then switching to assert seems a good strategy.

if (!Thread::is_hazard_ptr_tagged(hazard_ptr)) {
// We only validate hazard_ptrs that are not tagged since a tagged
// hazard ptr can be deleted at any time.
guarantee(ThreadsList::is_valid(hazard_ptr), "hazard_ptr=" INTPTR_FORMAT

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

Same query about guarantee versus assert

This comment has been minimized.

@dcubed-ojdk

dcubed-ojdk Mar 30, 2021
Author Member

Same answer and to reiterate the important part:

My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.

// If the hazard ptr is unverified, then ignore it since it could
// be deleted at any time now.
Comment on lines +382 to +383

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

Why the difference in comment compared to line 317/318 when the subsequent checks are identical?

This comment has been minimized.

@dcubed-ojdk

dcubed-ojdk Mar 30, 2021
Author Member

The ValidateHazardPtrsClosure is called from places that will use
the collected hazard ptr for traversal. Deletion while being traversed
would lead to the crashes that we want to avoid.

The ScanHazardPtrGatherThreadsListClosure is used to gather
hazard ptrs where the only the address value of the hazard ptr is
used and no traversal takes place. This is why I added the comment
on L300-302 above.

@@ -203,6 +205,8 @@ class ThreadsList : public CHeapObj<mtThread> {
int find_index_of_JavaThread(JavaThread* target);
JavaThread* find_JavaThread_from_java_tid(jlong java_tid) const;
bool includes(const JavaThread * const p) const;

static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

This seems like a necessary, but not sufficient, check for validity. Unless we zap memory when these things are freed I don't see how this ensures validity.

This comment has been minimized.

@dcubed-ojdk

dcubed-ojdk Mar 30, 2021
Author Member

You missed L652 in the ThreadsList::~ThreadsList() above:

_magic = 0xDEADBEEF;

When the destructor runs, the magic value is stomped. I've captured
failures where the is_valid() check observes 0xDEADBEEF and also
failures where the ThreadsList has been recycled into something else
and the _magic value is some other value that doesn't match
THREADS_LIST_MAGIC.

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

Sorry yes I did miss it - I misread the diff due to hidden lines.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Mar 30, 2021

@dholmes-ora - Thanks for the review!

I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.

Please see my reply about the ThreadsList destructor code that you missed.

Any why is this all product code rather than debug?

Please see my reply about guarantee() versus assert().

@openjdk openjdk bot mentioned this pull request Mar 30, 2021
3 of 3 tasks complete
@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Mar 30, 2021

#3272 is now a dependent PR on this fix (woot!) so now
you can see the race that I was chasing with this validation code! There's a long analysis
note attached to "JDK-8264393 JDK-8258284 introduced dangling TLH race".

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Looks good.

Thanks,
David

@@ -282,6 +282,9 @@ class ScanHazardPtrGatherProtectedThreadsClosure : public ThreadClosure {
if (thread->cmpxchg_threads_hazard_ptr(NULL, current_list) == current_list) return;
}

guarantee(ThreadsList::is_valid(current_list), "current_list="

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

I should have edited this comment once I saw what the actual check was.

Keeping this as a guarantee initially then switching to assert seems a good strategy.

@@ -203,6 +205,8 @@ class ThreadsList : public CHeapObj<mtThread> {
int find_index_of_JavaThread(JavaThread* target);
JavaThread* find_JavaThread_from_java_tid(jlong java_tid) const;
bool includes(const JavaThread * const p) const;

static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }

This comment has been minimized.

@dholmes-ora

dholmes-ora Mar 30, 2021
Member

Sorry yes I did miss it - I misread the diff due to hidden lines.

@openjdk
Copy link

@openjdk openjdk bot commented Mar 30, 2021

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264123: add ThreadsList.is_valid() support

Reviewed-by: dholmes, eosterlund, rehn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 45 new commits pushed to the master branch:

  • 0696fd0: 8263496: MetalHighContrastTheme.getControlHighlight cleanup
  • 6cf1095: 8264484: Replace uses of StringBuffer with StringBuilder in jdk.hotspot.agent
  • d2df9a7: 8264331: Use the blessed modifier order in jdk.compiler
  • 0228734: 8262470: Printed GlyphVector outline with low DPI has bad quality on Windows
  • 3997c99: 8264222: Use switch expression in jshell where possible
  • 39f0b27: 8176026: SA: Huge heap sizes cause a negative value to be displayed in the jhisto heap total
  • de495df: 8264413: Data is written to file header even if its CRC32 was calculated
  • 52d8a22: 8264054: Bad XMM performance on java.lang.MathBench.sqrtDouble
  • 16acfaf: 8012229: [lcms] Improve performance of color conversion for images with alpha channel
  • cb70ab0: 8263235: sanity/client/SwingSet/src/ColorChooserDemoTest.java failed throwing java.lang.NoClassDefFoundError
  • ... and 35 more: https://git.openjdk.java.net/jdk/compare/8cf1c62c345ce91208549294192944a1efe717cf...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Mar 30, 2021
@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Mar 31, 2021

@dholmes-ora - Thanks for closing the loop on your initial review.
@robehn and/or @fisk - I could use a second pair of eyes on this one...

@fisk
fisk approved these changes Mar 31, 2021
Copy link
Contributor

@fisk fisk left a comment

Looks good to me!

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Mar 31, 2021

@fisk - Thanks for the review!

@robehn
robehn approved these changes Apr 1, 2021
Copy link
Contributor

@robehn robehn left a comment

Hi Dan,

I have seen some other verification code which also might be performance impacting, can we got over and measure what the cost is for different verification when you gotten your bake time here? So we can take an informed decision based on what the cost vs debuggability in product is.

Looks good, thanks.

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Apr 1, 2021

@robehn - Thanks for the review!

Are you saying that I should measure the cost of these guarantee() calls before
I switch them to assert() calls in the future? If that's what you mean, I'll include
a note for that when I file the new bug. If that's not what you mean, please let me
know what you do mean.

@robehn
Copy link
Contributor

@robehn robehn commented Apr 1, 2021

@robehn - Thanks for the review!

Are you saying that I should measure the cost of these guarantee() calls before
I switch them to assert() calls in the future? If that's what you mean, I'll include
a note for that when I file the new bug. If that's not what you mean, please let me
know what you do mean.

Yes

@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Apr 1, 2021

I've filed the following new bug:

JDK-8264624 change the guarantee() calls added by JDK-8264123 to assert() calls
@dcubed-ojdk
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk commented Apr 3, 2021

/integrate

@openjdk openjdk bot closed this Apr 3, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Apr 3, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 3, 2021

@dcubed-ojdk Since your change was applied there have been 74 commits pushed to the master branch:

  • e8eda65: 8264664: use text blocks in javac module tests
  • cec66cf: 8264572: ForkJoinPool.getCommonPoolParallelism() reports always 1
  • 9c283da: 8264662: ProblemList vmTestbase/jit/escape/AdaptiveBlocking/AdaptiveBlocking001/AdaptiveBlocking001.java on win-x64 with ZGC
  • eb0ac86: 8264655: Minor internal doc comment cleanup
  • 3991b32: 8264539: Improve failure message of java/nio/file/WatchService/SensitivityModifier.java
  • 4133ded: 8264658: ProblemList javax/swing/JInternalFrame/8146321/JInternalFrameIconTest.java on linux-x64
  • 220ddbd: 8264657: ProblemList java/awt/Focus/FrameMinimizeTest/FrameMinimizeTest.java on linux-x64
  • d0f3cc9: 8264656: ProblemList sun/security/ssl/SSLSocketImpl/SSLSocketImplThrowsWrongExceptions.java on linux-x64
  • 6c145c4: 8264544: Case-insensitive comparison issue with supplementary characters.
  • f60e81b: 8264548: Dependencies: ClassHierarchyWalker::is_witness() cleanups
  • ... and 64 more: https://git.openjdk.java.net/jdk/compare/8cf1c62c345ce91208549294192944a1efe717cf...master

Your commit was automatically rebased without conflicts.

Pushed as commit 9b2232b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dcubed-ojdk dcubed-ojdk deleted the dcubed-ojdk:JDK-8264123 branch Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants