Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8253833: mutexLocker assert_locked_or_safepoint should not access VMThread state from non-VM-thread #563

Closed
wants to merge 1 commit into from

Conversation

@robehn
Copy link
Contributor

@robehn robehn commented Oct 8, 2020

It's unsafe for all threads except VM thread to access the current vm operation.
This part of the assert is also faulty:
If we are not at safepoint and the operation requester (calling thread) would be the owner of the lock do not mean it is safe for current thread.

Passes t1-5. (also note VMThread::vm_operation() assert current thread is VM thread, and I have seen no such assert)

Thanks


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Windows x64 macOS x64
Build ✔️ (3/3 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed)

Issue

  • JDK-8253833: mutexLocker assert_locked_or_safepoint should not access VMThread state from non-VM-thread

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/563/head:pull/563
$ git checkout pull/563

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 8, 2020

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Oct 8, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 8, 2020

@robehn The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 8, 2020

Webrevs

Copy link
Contributor

@shipilev shipilev left a comment

Looks okay to me. Is it significantly different from assert_locked_or_safepoint_weak now?

@robehn
Copy link
Contributor Author

@robehn robehn commented Oct 8, 2020

Looks okay to me. Is it significantly different from assert_locked_or_safepoint_weak now?

Weak just checks if lock is locked while this one check if current thread is the owner.
So only one line differ. You want me to do try to share the rest of the code?

@shipilev
Copy link
Contributor

@shipilev shipilev commented Oct 8, 2020

Looks okay to me. Is it significantly different from assert_locked_or_safepoint_weak now?

Weak just checks if lock is locked while this one check if current thread is the owner.
So only one line differ. You want me to do try to share the rest of the code?

Ah, okay. No action needed, as long as you saw the other versions and decided not to do anything with them.

coleenp
coleenp approved these changes Oct 8, 2020
src/hotspot/share/runtime/mutexLocker.cpp Show resolved Hide resolved
@openjdk
Copy link

@openjdk openjdk bot commented Oct 8, 2020

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8253833: mutexLocker assert_locked_or_safepoint should not access VMThread state from non-VM-thread

Reviewed-by: shade, coleenp, dcubed, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 51 new commits pushed to the master branch:

  • 4b5ac3a: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions
  • ec41046: 8254348: Build fails when cds is disabled after JDK-8247536
  • e4469d2: 8247536: Support for pre-generated java.lang.invoke classes in CDS static archive
  • 7ec9c8e: 8233214: Remove runtime code not needed with CMS removed
  • 536b35b: 8254319: Shenandoah: Interpreter native-LRB needs to activate during HAS_FORWARDED
  • be26972: 8253379: [windows] Several jpackage tests failed with error code 1638
  • 52e45a3: 8229186: Improve error messages for TestStringIntrinsics failures
  • 6d2c1a6: 8254292: Update JMH devkit to 1.26
  • 2bbf8a2: 8245543: Cgroups: Incorrect detection logic on some systems (still reproducible)
  • aaa0a2a: 8254297: Zero and Minimal VMs are broken with undeclared identifier 'DerivedPointerTable' after JDK-8253180
  • ... and 41 more: https://git.openjdk.java.net/jdk/compare/1e8e543b264bb985bfee535fedc9ffe7db5ad482...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 8, 2020
Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

I'm good with the change, but I had one question about
code above the code that you deleted.

src/hotspot/share/runtime/mutexLocker.cpp Show resolved Hide resolved
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 9, 2020

Mailing list message from David Holmes on hotspot-runtime-dev:

On 9/10/2020 4:16 pm, Robbin Ehn wrote:

On Thu, 8 Oct 2020 22:11:12 GMT, David Holmes <dholmes at openjdk.org> wrote:

I was hope that "git blame" would help, but most of this code is ancient:

da18495 (Coleen Phillimore 2019-08-22 09:51:36 -0400 166) void assert_locked_or_safepoint(const Mutex* lock) {
8153779 (J. Duke 2007-12-01 00:00:00 +0000 167) // check if this thread owns the lock (common case)
8153779 (J. Duke 2007-12-01 00:00:00 +0000 168) assert(lock != NULL, "Need non-NULL lock");
8153779 (J. Duke 2007-12-01 00:00:00 +0000 169) if (lock->owned_by_self()) return;
8153779 (J. Duke 2007-12-01 00:00:00 +0000 170) if (SafepointSynchronize::is_at_safepoint()) return;
8153779 (J. Duke 2007-12-01 00:00:00 +0000 171) if (!Universe::is_fully_initialized()) return;
8153779 (J. Duke 2007-12-01 00:00:00 +0000 172) // see if invoker of VM operation owns it
8153779 (J. Duke 2007-12-01 00:00:00 +0000 173) VM_Operation* op = VMThread::vm_operation();
8153779 (J. Duke 2007-12-01 00:00:00 +0000 174) if (op != NULL && op->calling_thread() ==
lock->owner()) return; 1e71f67 (David Lindholm 2015-09-29 11:02:08 +0200 175) fatal("must own lock %s",
lock->name()); 8153779 (J. Duke 2007-12-01 00:00:00 +0000 176) }

Update: Stripped the filename from the "git blame" output so it wasn't quite so wide...
Update: Changing to a `code` quote helped the formatting a bit...

The initialization bail-out was added "1.86 99/02/19 17:05:07", but no comments or associated bug report.

Universe is initialized before VM thread is ready to process operations.
So while that bail-out is true, there can never be any vm ops.

I believe instead of:
`if (!Universe::is_fully_initialized()) return;`
We really want:
`if (Threads::number_of_threads() == 0) return;`
But I'm not sure and totally out of scope.

The way these things typically work is:
- code crashes because X has not yet been initialized
- X is known to be initialized when Universe::is_fully_initialized()
=> skip if !Universe::is_fully_initialized()

X likely relates to a later check that can't be made early during VM
init. So we just have to skip it. Without knowing what X actually is we
can't say if there is a more precise expression to check. Of course X
may no longer exist. :)

Cheers,
David
-----

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 9, 2020

Mailing list message from David Holmes on hotspot-runtime-dev:

On 9/10/2020 4:21 pm, Robbin Ehn wrote:

On Thu, 8 Oct 2020 22:02:34 GMT, David Holmes <dholmes at openjdk.org> wrote:

It's unsafe for all threads except VM thread to access the current vm operation.
This part of the assert is also faulty:
If we are not at safepoint and the operation requester (calling thread) would be the owner of the lock do not mean it
is safe for current thread.
Passes t1-5. (also note VMThread::vm_operation() assert current thread is VM thread, and I have seen no such assert)

Thanks

src/hotspot/share/runtime/mutexLocker.cpp line 170:

168: assert(lock != NULL, "Need non-NULL lock");
169: if (lock->owned_by_self()) return;
170: if (SafepointSynchronize::is_at_safepoint()) return;

This is actually inadequate as it would only be safe (in a lock sneaking sense) if at a safepoint AND in the vmThread.
Ditto for L179.

It is also safe for threads respecting the safepoint e.g. GC safepoint worker (safepoint cannot end until the safepoint
workers stops). In some cases we do not use Mutex during safepoint when executing with such thread.

True

But an entire over-haul of these asserts and what the implicit rules are is out of scope here.

Yes and I think if we can't formulate the exact condition then logically
we have:

(is_at_safepoint() && is_VMThread()) || (is_at_safepoint() &&
complexConditionWeHaveToAssumeIsTrue())

which simplifies to just

is_at_safepoint()

:)

Cheers,
David

@dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Oct 9, 2020

But an entire over-haul of these asserts and what the implicit rules are is out of scope here.

Yes and I think if we can't formulate the exact condition then logically
we have:

(is_at_safepoint() && is_VMThread()) || (is_at_safepoint() &&
complexConditionWeHaveToAssumeIsTrue())

which simplifies to just

is_at_safepoint()

:)

I love complexConditionWeHaveToAssumeIsTrue()!!
Ahhhhh... thanks for giving me a reason to smile on Friday morning!!

@robehn
Copy link
Contributor Author

@robehn robehn commented Oct 9, 2020

But an entire over-haul of these asserts and what the implicit rules are is out of scope here.

Yes and I think if we can't formulate the exact condition then logically
we have:
(is_at_safepoint() && is_VMThread()) || (is_at_safepoint() &&
complexConditionWeHaveToAssumeIsTrue())
which simplifies to just
is_at_safepoint()
:)

I love complexConditionWeHaveToAssumeIsTrue()!!
Ahhhhh... thanks for giving me a reason to smile on Friday morning!!

:)

@robehn
Copy link
Contributor Author

@robehn robehn commented Oct 9, 2020

@shipilev and @dholmes-ora you are not listed under "Reviewers" commit message part, can you press the magic button(s) (approve?) so you get the credit!

@robehn
Copy link
Contributor Author

@robehn robehn commented Oct 12, 2020

/integrate

@openjdk openjdk bot closed this Oct 12, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 12, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 12, 2020

@robehn Since your change was applied there have been 56 commits pushed to the master branch:

  • 77c7762: 8254353: Remove unused non-product flags
  • d3069ac: 8254362: x86_32 builds fail after JDK-8253180
  • 25001c5: 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed: not in CodeBuffer memory"
  • d43f141: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847
  • cc52358: 8254335: logging/logStream.hpp includes memory/resourceArea.hpp but doesn't need it
  • 4b5ac3a: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions
  • ec41046: 8254348: Build fails when cds is disabled after JDK-8247536
  • e4469d2: 8247536: Support for pre-generated java.lang.invoke classes in CDS static archive
  • 7ec9c8e: 8233214: Remove runtime code not needed with CMS removed
  • 536b35b: 8254319: Shenandoah: Interpreter native-LRB needs to activate during HAS_FORWARDED
  • ... and 46 more: https://git.openjdk.java.net/jdk/compare/1e8e543b264bb985bfee535fedc9ffe7db5ad482...master

Your commit was automatically rebased without conflicts.

Pushed as commit 45b09a3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robehn robehn deleted the 8253833-vm-op-access branch Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants