Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8238761: Asynchronous handshakes #151

Closed
wants to merge 11 commits into from

Conversation

robehn
Copy link
Contributor

@robehn robehn commented Sep 14, 2020

This patch implements asynchronous handshake, which changes how handshakes works by default. Asynchronous handshakes are target only executed, which they may never be executed. (target may block on socket for the rest of VM lifetime) Since we have several use-cases for them we can have many handshake pending. (should be very rare) To be able handle an arbitrary amount of handshakes this patch adds a per JavaThread queue and heap allocated HandshakeOperations. It's a singly linked list where you push/insert to the end and pop/get from the front. Inserts are done via CAS on first pointer, no lock needed. Pops are done while holding the per handshake state lock, and when working on the first pointer also CAS.

The thread grabbing the handshake state lock for a JavaThread will pop and execute all handshake operations matching the filter. The JavaThread itself uses no filter and any other thread uses the filter of everything except asynchronous handshakes. In this initial change-set there is no need to do any other filtering. If needed filtering can easily be exposed as a virtual method on the HandshakeClosure, but note that filtering causes handshake operation to be done out-order. Since the filter determins who execute the operation and not the invoked method, there is now only one method to call when handshaking one thread.

Some comments about the changes:

  • HandshakeClosure uses ThreadClosure, since it neat to use the same closure for both alla JavThreads do and Handshake all threads. With heap allocating it cannot extends StackObj. I tested several ways to fix this, but those very much worse then this.

  • I added a is_handshake_safe_for for checking if it's current thread is operating on itself or the handshaker of that thread.

  • Simplified JVM TI with a JvmtiHandshakeClosure and also made them not needing a JavaThread when executing as a handshaker on a JavaThread, e.g. VM Thread can execute the handshake operation.

  • Added WB testing method.

  • Removed VM_HandshakeOneThread, the VM thread uses the same call path as direct handshakes did.

  • Changed the handshake semaphores to mutex to be able to handle deadlocks with lock ranking.

  • VM_HandshakeAllThreadsis still a VM operation, since we do support half of the threads being handshaked before a safepoint and half of them after, in many handshake all operations.

  • ThreadInVMForHandshake do not need to do a fenced transistion since this is always a transistion from unsafe to unsafe.

  • Added NoSafepointVerifyer, we are thinking about supporting safepoints inside handshake, but it's not needed at the moment. To make sure that gets well tested if added the NoSafepointVerifyer will raise eyebrows.

  • Added ttyLocker::break_tty_lock_for_safepoint(os::current_thread_id()); due to lock rank.

  • Added filtered queue and gtest for it.

Passes multiple t1-8 runs.
Been through some pre-reviwing.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/151/head:pull/151
$ git checkout pull/151

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 14, 2020

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 14, 2020

@robehn The following labels will be automatically applied to this pull request: hotspot hotspot-runtime serviceability.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label (add|remove) "label" command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Sep 14, 2020
@robehn robehn changed the title Rebase version 1.0 8238761: Asynchronous handshakes Sep 14, 2020
@robehn robehn marked this pull request as ready for review September 15, 2020 07:35
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 15, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 15, 2020

@mlbridge
Copy link

mlbridge bot commented Sep 16, 2020

Mailing list message from Patricio Chilano on hotspot-dev:

Hi Robbin,

Changes look good to me! Some minor comments:

src/hotspot/share/prims/jvmtiThreadState.cpp
222:?? assert(current_thread == get_thread() ||
223:????????? SafepointSynchronize::is_at_safepoint() ||
224: get_thread()->is_handshake_safe_for(current_thread),
225:????????? "call by myself / at safepoint / at handshake");
Extra current_thread == get_thread() is already handled by
is_handshake_safe_for().

src/hotspot/share/prims/jvmtiEnvBase.cpp
Same as above.

src/hotspot/share/runtime/handshake.cpp
198:???? log_info(handshake)("Handshake \"%s\", Targeted threads: %d,
Executed by targeted threads: %d, Total completion time: " JLONG_FORMAT
" ns%s%s",
199:???????????????????????? name, targets,
200:???????????????????????? targets - vmt_executed,
In the calls to log_handshake_info() in VM_HandshakeAllThreads and
Handshake::execute() we are passing as vmt_executed the number of
handshakes that the driver executed which could be more than the targets
parameter. So the operation "targets - vmt_executed" to calculate the
handshakes executed by the targets would no longer be valid. Personally
I would just leave ProcessResult as an enum and log as before. We still
have a log_trace() in try_process(), so that already keeps track of
extra handshakes executed by the handshaker.

src/hotspot/share/runtime/handshake.cpp
387:???? NoSafepointVerifier nsv;
388:???? process_self_inner();
389:?? }
Shouldn't NoSafepointVerifier be placed before the execution of the
handshake closure so that we also cover the case when the handshake is
executed by the handshaker? As in:
// Only actually execute the operation for non terminated threads.
if (!thread->is_terminated()) {
??? NoSafepointVerifier nsv;
??? _handshake_cl->do_thread(thread);
}

src/hotspot/share/runtime/interfaceSupport.inline.hpp
156:???? // Threads shouldn't block if they are in the middle of
printing, but...
157: ttyLocker::break_tty_lock_for_safepoint(os::current_thread_id());
What's the issue of having NoSafepointVerifier inside the handshake?

Thanks!

Patricio

On 9/15/20 4:39 AM, Robbin Ehn wrote:

Added NSV
ProcessResult to enum
Fixed logging
Moved _active_handshaker to private
@robehn
Copy link
Contributor Author

robehn commented Sep 17, 2020

src/hotspot/share/prims/jvmtiThreadState.cpp
src/hotspot/share/prims/jvmtiEnvBase.cpp

Removed double checks.

src/hotspot/share/runtime/handshake.cpp
...
I would just leave ProcessResult as an enum and log as before.

Reverted to plain enum and updated logs. (better?)

src/hotspot/share/runtime/handshake.cpp
387: NoSafepointVerifier nsv;
388: process_self_inner();

I wanted a NSV to cover the process_self_inner method.
So I added a second one in suggested place:

NoSafepointVerifier nsv;
_handshake_cl->do_thread(thread);
}

src/hotspot/share/runtime/interfaceSupport.inline.hpp
156: // Threads shouldn't block if they are in the middle of
printing, but...
157: ttyLocker::break_tty_lock_for_safepoint(os::current_thread_id());
What's the issue of having NoSafepointVerifier inside the handshake?

Sorry, the issue is the lock rank. Right now the semaphore hides this issue.

Please see commit 86b83d0.

Copy link
Contributor

@pchilano pchilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, thanks for fixing! I added some comments on the changes.

src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiEnvBase.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiEnvBase.cpp Outdated Show resolved Hide resolved
@pchilano
Copy link
Contributor

Update looks good, thanks Robbin!

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up. I don't think I have anything that is in the must fix category.

src/hotspot/share/prims/jvmtiEnvBase.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiEnvBase.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiEnvBase.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiEnvBase.hpp Show resolved Hide resolved
src/hotspot/share/utilities/filterQueue.inline.hpp Outdated Show resolved Hide resolved
src/hotspot/share/utilities/filterQueue.inline.hpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/whitebox.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/whitebox.cpp Show resolved Hide resolved
src/hotspot/share/runtime/handshake.hpp Show resolved Hide resolved
@openjdk
Copy link

openjdk bot commented Sep 18, 2020

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for more details.

After integration, the commit message for the final commit will be:

8238761: Asynchronous handshakes

Reviewed-by: pchilanomate, dcubed, dholmes, coleenp, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 31 new commits pushed to the master branch:

  • 70b0fcc: 8253728: tests fail with "assert(fr.is_compiled_frame()) failed: Wrong frame type"
  • 527b0e4: 8248984: Bump minimum boot jdk to JDK 15
  • ac15d64: 8241151: Incorrect lint warning for no definition of serialVersionUID in a record
  • d25b03e: 8253616: Change to GCC 10.2 for building on Linux at Oracle
  • 821bd08: 8253667: ProblemList tools/jlink/JLinkReproducible{,3}Test.java on linux-aarch64
  • 1ae6b53: 8252194: Add automated test for fix done in JDK-8218469
  • 77a0f39: 8253540: InterpreterRuntime::monitorexit should be a JRT_LEAF function
  • 0054c15: 8253435: Cgroup: 'stomping of _mount_path' crash if manually mounted cpusets exist
  • 8e338f6: 8253646: ZGC: Avoid overhead of sorting ZStatIterableValues on bootstrap
  • ec9bee6: 8253015: Aarch64: Move linux code out from generic CPU feature detection
  • ... and 21 more: https://git.openjdk.java.net/jdk/compare/1f5a033421bbcf803169c5c5f93314fd22b5a4a5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 18, 2020
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Robbin,

There is still a lack of motivation for this feature in the JBS issue. What kind of handshakes need to be asynchronous? Any async operation implies that the requester doesn't care about when or even if, the operation gets executed - they are by definition fire-and-forget actions. So what are the usecases being envisaged here?

Many of the changes included here seem unrelated to, and not reliant on, async handshakes, and could be factored out to simplify the review and allow focus on the actual async handshake part e.g. the JVM TI cleanups seem they could be mostly standalone.

Specific comments below. A general concern I have is where the current thread is no longer guaranteed to be a JavaThread (which is a step in the wrong direction in relation to some of the cleanups I have planned!) and I can't see why this would be changing.

Thanks.

src/hotspot/share/prims/jvmtiEnvBase.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/whitebox.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
@mlbridge
Copy link

mlbridge bot commented Sep 21, 2020

Mailing list message from David Holmes on hotspot-dev:

Correction ...

On 21/09/2020 4:16 pm, David Holmes wrote:

On Thu, 17 Sep 2020 12:07:15 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

This patch implements asynchronous handshake, which changes how handshakes works by default. Asynchronous handshakes
are target only executed, which they may never be executed. (target may block on socket for the rest of VM lifetime)
Since we have several use-cases for them we can have many handshake pending. (should be very rare) To be able handle an
arbitrary amount of handshakes this patch adds a per JavaThread queue and heap allocated HandshakeOperations. It's a
singly linked list where you push/insert to the end and pop/get from the front. Inserts are done via CAS on first
pointer, no lock needed. Pops are done while holding the per handshake state lock, and when working on the first
pointer also CAS. The thread grabbing the handshake state lock for a JavaThread will pop and execute all handshake
operations matching the filter. The JavaThread itself uses no filter and any other thread uses the filter of everything
except asynchronous handshakes. In this initial change-set there is no need to do any other filtering. If needed
filtering can easily be exposed as a virtual method on the HandshakeClosure, but note that filtering causes handshake
operation to be done out-order. Since the filter determins who execute the operation and not the invoked method, there
is now only one method to call when handshaking one thread. Some comments about the changes:
- HandshakeClosure uses ThreadClosure, since it neat to use the same closure for both alla JavThreads do and Handshake
all threads. With heap allocating it cannot extends StackObj. I tested several ways to fix this, but those very much
worse then this.

- I added a is_handshake_safe_for for checking if it's current thread is operating on itself or the handshaker of that
thread.

- Simplified JVM TI with a JvmtiHandshakeClosure and also made them not needing a JavaThread when executing as a
handshaker on a JavaThread, e.g. VM Thread can execute the handshake operation.

- Added WB testing method.

- Removed VM_HandshakeOneThread, the VM thread uses the same call path as direct handshakes did.

- Changed the handshake semaphores to mutex to be able to handle deadlocks with lock ranking.

- VM_HandshakeAllThreadsis still a VM operation, since we do support half of the threads being handshaked before a
safepoint and half of them after, in many handshake all operations.

- ThreadInVMForHandshake do not need to do a fenced transistion since this is always a transistion from unsafe to unsafe.

- Added NoSafepointVerifyer, we are thinking about supporting safepoints inside handshake, but it's not needed at the
moment. To make sure that gets well tested if added the NoSafepointVerifyer will raise eyebrows.

- Added ttyLocker::break_tty_lock_for_safepoint(os::current_thread_id()); due to lock rank.

- Added filtered queue and gtest for it.

Passes multiple t1-8 runs.
Been through some pre-reviwing.

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

Fixed double checks
Added NSV
ProcessResult to enum
Fixed logging
Moved _active_handshaker to private

src/hotspot/share/runtime/handshake.cpp line 336:

334: // and thus prevents reading stale data modified in the handshake closure
335: // by the Handshakee.
336: OrderAccess::acquire();

How/why is this deleted? Surely there are still single-thread VMops that use a handshake??

That comment was placed against the old line 336 which was the deletion
of this method:

bool Handshake::execute(HandshakeClosure* thread_cl, JavaThread* target) {

(I'll file a skara/git bug).

David
-----

@robehn
Copy link
Contributor Author

robehn commented Sep 21, 2020

There is still a lack of motivation for this feature in the JBS issue. What kind of handshakes need to be asynchronous? Any async operation implies that the requester doesn't care about when or even if, the operation gets executed - they are by definition fire-and-forget actions. So what are the usecases being envisaged here?

Added info in JBS issue:
https://bugs.openjdk.java.net/browse/JDK-8238761

All uses-cases of _suspend_flag as initial targets.
But some of them require more bits.

Many of the changes included here seem unrelated to, and not reliant on, async handshakes, and could be factored out to simplify the review and allow focus on the actual async handshake part e.g. the JVM TI cleanups seem they could be mostly standalone.

Since I kept rebasing this doing somethings I did somethings to simplify the rebasing.
I guess you are talking about the JvmtiHandshakeClosure?

Specific comments below. A general concern I have is where the current thread is no longer guaranteed to be a JavaThread (which is a step in the wrong direction in relation to some of the cleanups I have planned!) and I can't see why this would be changing.

If the VM thread emits a "handshake all" it will continuously loop the JavaThreads until op is completed.
We do not keep track which JavaThread have done which handshake.
The VM thread will execute all handshakes it finds on every JavaThread's queue.
If someone else adds a handshake to one of the JavaThreads the VM thread might execute it.

I did not see any issues while looking at the code or in testing doing this.
Many of the handshakes used to be safepoints, always executed by VM thread.
We changed that to VM thread or JavaThread it self, and we now extend that to any JavaThread.

Some of the JVM TI handshakes are a bit different, but since they must proper allocate resource in target JavaThread and not in current JavaThread, there is no issue executing the code with a non-JavaThread.

At the moment we have no dependencies on that the 'driver' is a JavaThread for any of the handshakes.
We can easily set a per Handshake typ filter (slight changes to Handshake Closure and filter function) and choose to only executed the handshake with target self and requester/any JavaThread/only VM thread.

So if we think JVM TI handshakes should only be executed by requester or target it's an easy fix.
If you think the default is wrong, it's also an easy change.

(For others following there also a planned investigation on requester only executed handshake, which is not as easy)

Thanks.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me!

src/hotspot/share/runtime/handshake.hpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/handshake.hpp Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
@dholmes-ora
Copy link
Member

Hi Robbin,

I've gone back to refresh myself on the previous discussions and (internal) design walk-throughs to get a better sense of these changes. Really the "asynchronous handshake" aspect is only a small part of this. The fundamental change here is that handshakes are now maintained via a per-thread queue, and those handshake operations can, in the general case, be executed by any of the target thread, the requestor (active_handshaker) thread or the VMThread. Hence the removal of the various "JavaThread::current()" assumptions. Unless constrained otherwise, any handshake operation may be executed by the VMThread so we have to take extra care to ensure the code is written to allow this. I'm a little concerned that our detour into direct-handshakes actually lulled us into a false sense of security knowing that an operation would always execute in a JavaThread, and we have now reverted that and allowed the VMThread back in. I understand why, but the change in direction here caught me by surprise (as I had forgotten the bigger picture). It may not always be obvious that the transitive closure of the code from an operation can be safely executed by a non-JavaThread.

Then on top of this generalized queuing mechanism there is a filter which allows some control over which thread may perform a given operation - at the moment the only filter isolates "async" operations which only the target thread can execute.

In addition another nuance is that when processing a given thread's handshake operation queue, different threads have different criteria for when to stop processing the queue:

  • the target thread will drain the queue completely
  • the VMThread will drain the queue of all "non-async" operations**
  • the initiator for a "direct" operation will drain the queue up to, and including, the synchronous operation they submitted
  • the initiator for an "async" operation will not process any operation themselves and will simply add to the queue and then continue on their way (hence the "asynchronous")

** I do have some concerns about latency impact on the VMThread if it is used to execute operations that didn't need to be executed by the VMThread!

I remain concerned about the terminology conflation that happens around "async handshakes". There are two aspects that need to be separated:

  • the behaviour of the thread initiating the handshake operation; and
  • which thread can execute the handshake operation

When a thread initiates a handshake operation and waits until that operation is complete (regardless of which thread performed it, or whether the initiator processed any other operations) that is a synchronous handshake operation.
When a thread initiates a handshake operation and does not wait for the operation to complete (it does the target->queue()->add(op); and continues on its way) that is an asynchronous handshake operation.

The question of whether the operation must be executed by the target thread is orthogonal to whether the operation was submitted as a synchronous or asynchronous operation.

So I have problem when you say that an asynchronous handshake operation is one that must be executed by the target thread, as this is not the right characterisation at all. It is okay to constrain things such that an async operation is always executed by the target, but that is not what makes it an async operation. In the general case there is no reason why an async operation might not be executed by the VMThread, or some other JavaThread performing a synchronous operation on the same target.

I will go back through the actual handshake code to see if there are specific things I would like to see changed, but that will have to wait until tomorrow.

Thanks,
David

@robehn
Copy link
Contributor Author

robehn commented Sep 22, 2020

Hi David, you are correct and you did a fine job summarizing this, thanks!

I will go back through the actual handshake code to see if there are specific things I would like to see changed, but that will have to wait until tomorrow.

Great thanks!

Thanks,
David

@robehn
Copy link
Contributor Author

robehn commented Sep 22, 2020

@coleenp I think you placed your comment:
"The "driver" concept is odd. Should it really be caller? Like the thread that called VMHandshake?"
On the commit or somewhere else, not a review comment, I can't reply.

Anyhow I can reply to it here:
The answer is caller makes me think of requester, but we don't know if it's the requester executing this.
Therefore I referred to the thread executing handshakes on the target/handshakee as driver.

@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 23, 2020
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review and removed hotspot-runtime hotspot-runtime-dev@openjdk.org labels Sep 23, 2020
@sspitsyn
Copy link
Contributor

sspitsyn commented Sep 23, 2020

Hi Robbin,
I've looked more at the Serviceability files.
The fix looks good in general. Nice refactoring and simplification with the JvmtiHandshakeClosure.
It is a little unexpected that the do_thread can be executed by non-JavaThread's.
Not sure, what kinds of inconvenience it can cause.
Also, adding the _completed field is somewhat inconsistent.
I'd expect it to be present or not present for almost all JVMTI handshake closures.
I hope, you can comment on why it was added in these particular cases.

@mlbridge
Copy link

mlbridge bot commented Sep 24, 2020

Mailing list message from David Holmes on hotspot-dev:

On 23/09/2020 7:37 pm, Robbin Ehn wrote:

On Wed, 23 Sep 2020 02:56:00 GMT, David Holmes <dholmes at openjdk.org> wrote:

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

Update after Coleen

src/hotspot/share/runtime/handshake.hpp line 97:

95: }
96: bool block_for_operation() {
97: return !_queue.is_empty() || _lock.is_locked();

I really don't understand the is_locked() check in this condition. ??
And the check for !empty is racy, so how do we avoid missing an in-progress addition?

A JavaThread is blocked.
A second thread have just executed a handshake operation for this JavaThread and are on line:
https://github.com/openjdk/jdk/blob/cd784a751a3153939b9284898f370160124ca610/src/hotspot/share/runtime/handshake.cpp#L510
And the queue is empty.

The JavaThread wakes up and changes state from blocked to blocked_trans.
It now checks if it's allowed to complete the transition to e.g. vm.

If a third thread adds to queue before the second thread leaves the loop it's operation can be executed.
But the JavaThread could see the queue as empty. (racey as you say)

The executor takes lock and then checks if the JavaThread is safe for processing.
The JavaThread becomes unsafe and then check if lock is locked.
If the lock is locked we must take slow path to avoid this.

We should also take slow path if there is something on queue to processes.
We are unsafe when we check queue and lock is not held, if we 'miss' that anything is on queue, it's fine.
Since any other thread cannot have seen us as safe and seen the item on queue. (since lock is not held)
Thus not allowed to process the operation.

Wow! That all definitely needs some detailed commentary.

Thanks,
David

@mlbridge
Copy link

mlbridge bot commented Sep 24, 2020

Mailing list message from David Holmes on hotspot-dev:

<trimming>

On 23/09/2020 8:11 pm, Robbin Ehn wrote:

On Wed, 23 Sep 2020 03:04:39 GMT, David Holmes <dholmes at openjdk.org> wrote:

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

Update after Coleen

src/hotspot/share/runtime/handshake.cpp line 63:

61: };
62:
63: class AsyncHandshakeOperation : public HandshakeOperation {

This doesn't quite make sense. If you have an AsyncHandshakeOperation as a distinct subclass then it should not be
possible for is_async() on a HandshakeOperation to return true - but it can because it can be passed an
AsyncHandshakeClosure when constructed. If you want async and non-async operations to be distinct types then you will
need to restrict how the base class is constructed, and provide a protected constructor that just takes an
AsyncHandShakeClosure.

This implementation code not part of the interface.

I find it hard to tell which classes form which.

By casting the AsyncHandShakeClosure to a HandshakeClosure before instantiating the HandshakeOperation you can still
get is_async() to return true. And there are a loads of other user error which can be done like stack allocating
AsyncHandshakeOperation. Protecting against all those kinds of errors requires a lot of more code.

Can we at least declare a protected constructor for HandshakeOperation
that takes the AsyncHandshakeClosure, so that an accidental creation of
"new HandShakeperation(anAsyncClosure)" will be prevented?

Thanks,
David

@robehn
Copy link
Contributor Author

robehn commented Sep 24, 2020

Hi Serguei,

Hi Robbin,
I've looked more at the Serviceability files.
The fix looks good in general. Nice refactoring and simplification with the JvmtiHandshakeClosure.

Good.

It is a little unexpected that the do_thread can be executed by non-JavaThread's.
Not sure, what kinds of inconvenience it can cause.

Reading the code I did not find any issues.
Any return values must already be allocated correctly since the executor thread do not wait around.
Thus targeted thread must be the owner of these.
Also extensive testing find no issues.
But as I said to David, we can easily change back to the previous behavior if needed by using the filter mechanism.

Also, adding the _completed field is somewhat inconsistent.
I'd expect it to be present or not present for almost all JVMTI handshake closures.
I hope, you can comment on why it was added in these particular cases.

Two of the handshakes have guarantee checks:

guarantee(op.completed(), "Handshake failed. Target thread is not alive?");

guarantee(hs.completed(), "Handshake failed: Target thread is not alive?");

This is the only placed it is used/needed.
In my previous version I just removed those guarantee because they needed extra code and was suspicious to me.
But it was requested to add them back.

I had quick look now, and from what I can tell it is not guaranteed that we do execute those handshake.
We do not run handshakes on exiting threads (is_terminated()), we set is exiting here:

set_terminated(_thread_exiting);

But we remove the JVM TI Thread state way later, here:
JvmtiExport::cleanup_thread(this);

For example the method ensure_join() which is between set_terminated and removing the JVM TI state can safepoint/handshake.

ensure_join(this);

That would trigger the guarantee.

So I believe we should not have those two guarantees and thus the _completed can be removed once again.
I think that should be handled in a separate issue, leaving this as is for now.

…ved names once more and moved non-public to private
@robehn
Copy link
Contributor Author

robehn commented Sep 24, 2020

Added comment @dholmes-ora
Added constructor @dholmes-ora
Previous renames caused confusing, renamed some methods and moved those not public to private

Running tests

Good to go?

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some naming issues to resolve and an editing pass on various new comments.

Thanks,
David


private:
// Must use AsyncHandshakeOperation when using AsyncHandshakeClosure.
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns) {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not understanding the way you have implemented the guard I requested. How does declaring the private constructor prevent a call to HandShakeOperation(someAsync, target) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad, fixed.

@@ -192,12 +199,12 @@ void VM_Handshake::handle_timeout() {
fatal("Handshake operation timed out");
}

static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int requester_executed, const char* extra = NULL) {
static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int non_self_executed, const char* extra = NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "non_self_executed" is much clearer - thanks. But this now highlights that the log message itself doesn't make complete sense as it refers to "requesting thread" when it could be a range of possible threads not just a single requesting thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line is done by the requesting thread of the specific "Handshake "%s".
The log line only prints executed handshake operations emitted by the requesting thread.
With changes on how logs from @pchilano and name changes this is getting confusing for me also.

I renamed non_self_executed to emitted_handshakes_executed as the local variable passed into this function.

To summarize the log line prints how many of the handshake operation emitted by 'this' handshake request was done by the requesting thread (or by VM Thread on behalf of the requesting thread when doing handshake all).
This value can thus only be 0 or 1 if not executed by VM thread in a handshake all.

Operations not done by requesting thread was either done cooperatively or by targets self.

int executed_by_driver = 0;
// Keeps count on how many of own emitted handshakes
// this thread execute.
int emitted_handshakes_executed = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:
// Keeps count of how many handshakes were actually executed
// by this thread.
int handshakes_executed = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only count those we emitted.
I use to count all, but @pchilano thought it was confusing that we could log more executed than emitted.
So now it only counts executed and emitted.

int executed_by_driver = 0;
// Keeps count on how many of own emitted handshakes
// this thread execute.
int emitted_handshakes_executed = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito :)

Mutex _lock;
// Set to the thread executing the handshake operation during the execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the execution/its execution/

Or just delete "during the execution"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// MT-Unsafe, external serialization needed.
// Applies the match_func to the items in the queue until match_func returns
// true and then return false, or there is no more items and then returns
// false. Any pushed item while executing may or may not have match_func
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pushed item/item pushed/

I know what you are trying to say about concurrent pushes but it sounds too non-deterministic - any concurrent push that happens before contains() reaches the end of the queue will have the match_func applied. So in that sense "while executing" only applies to pushes that will be seen; any push not seen had to have happened after execution was complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We insert on first ptr and we also walk from first ptr. (This is tail of the queue)
Since contains() never restart and we load first on directly on entry, no new pushes will be seen by contains().

pop() on the other hand may re-start due to failed CAS, thus all pushes up till this failed CAS will be seen.
This happens if first ptr (tail) is the items selected for popping.
A new push will change the first ptr (tail), the re-start to be able to unlinked the match.

With a deterministic match_func this newly pushed item will never be pop, since we have an older matching item. (otherwise we would never tried to CAS)

// true and then return false, or there is no more items and then returns
// false. Any pushed item while executing may or may not have match_func
// applied. The method is not re-entrant and must be executed mutually
// exclusive other contains and pops calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// exclusive to other contains() and pop() calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

E pop() {
return pop(match_all);
}

// MT-Unsafe, external serialization needed.
// Applies the match_func to all items in the queue returns the item which
// match_func return true for and was inserted first. Any pushed item while
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Applies the match_func to each item in the queue, in order of insertion, and
// returns the first item for which match_func returns true. Returns false if there are
// no matches or the queue is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some offline discussion, fixed.

// match_func return true for and was inserted first. Any pushed item while
// executing may or may not have be popped, if popped it was the first
// inserted match. The method is not re-entrant and must be executed mutual
// exclusive with other contains and pops calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on contains() to ensure pop() and contains() use consistent terminology. Thanks.

assert(_lock.owned_by_self(), "Lock must be held");
return _queue.pop();
};

static bool processor_filter(HandshakeOperation* op) {
return !op->is_asynch();
static bool non_self_queue_filter(HandshakeOperation* op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "non_self_queue_filter" is really awkward - sorry. But I think we're going to have to revisit the way filtering is named and done once we try to generalise it anyway, so I'll let this pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David.


private:
// Must use AsyncHandshakeOperation when using AsyncHandshakeClosure.
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns) {};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad, fixed.

@@ -192,12 +199,12 @@ void VM_Handshake::handle_timeout() {
fatal("Handshake operation timed out");
}

static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int requester_executed, const char* extra = NULL) {
static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int non_self_executed, const char* extra = NULL) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line is done by the requesting thread of the specific "Handshake "%s".
The log line only prints executed handshake operations emitted by the requesting thread.
With changes on how logs from @pchilano and name changes this is getting confusing for me also.

I renamed non_self_executed to emitted_handshakes_executed as the local variable passed into this function.

To summarize the log line prints how many of the handshake operation emitted by 'this' handshake request was done by the requesting thread (or by VM Thread on behalf of the requesting thread when doing handshake all).
This value can thus only be 0 or 1 if not executed by VM thread in a handshake all.

Operations not done by requesting thread was either done cooperatively or by targets self.

int executed_by_driver = 0;
// Keeps count on how many of own emitted handshakes
// this thread execute.
int emitted_handshakes_executed = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only count those we emitted.
I use to count all, but @pchilano thought it was confusing that we could log more executed than emitted.
So now it only counts executed and emitted.

int executed_by_driver = 0;
// Keeps count on how many of own emitted handshakes
// this thread execute.
int emitted_handshakes_executed = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito :)

assert(_lock.owned_by_self(), "Lock must be held");
return _queue.pop();
};

static bool processor_filter(HandshakeOperation* op) {
return !op->is_asynch();
static bool non_self_queue_filter(HandshakeOperation* op) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -28,18 +28,22 @@
#include "memory/allocation.hpp"
#include "runtime/atomic.hpp"

// The FilterQueue is FIFO with the ability to skip over queued items.
// The skipping is controlled by using a filter when poping.
// It also supports lock free pushes, while poping (including contain())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// MT-Unsafe, external serialization needed.
// Applies the match_func to the items in the queue until match_func returns
// true and then return false, or there is no more items and then returns
// false. Any pushed item while executing may or may not have match_func
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We insert on first ptr and we also walk from first ptr. (This is tail of the queue)
Since contains() never restart and we load first on directly on entry, no new pushes will be seen by contains().

pop() on the other hand may re-start due to failed CAS, thus all pushes up till this failed CAS will be seen.
This happens if first ptr (tail) is the items selected for popping.
A new push will change the first ptr (tail), the re-start to be able to unlinked the match.

With a deterministic match_func this newly pushed item will never be pop, since we have an older matching item. (otherwise we would never tried to CAS)


// MT-Unsafe, external serialization needed.
// Applies the match_func to the items in the queue until match_func returns
// true and then return false, or there is no more items and then returns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// true and then return false, or there is no more items and then returns
// false. Any pushed item while executing may or may not have match_func
// applied. The method is not re-entrant and must be executed mutually
// exclusive other contains and pops calls.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

E pop() {
return pop(match_all);
}

// MT-Unsafe, external serialization needed.
// Applies the match_func to all items in the queue returns the item which
// match_func return true for and was inserted first. Any pushed item while
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some offline discussion, fixed.

@sspitsyn
Copy link
Contributor

Robbin, thank you for your answers!
There are JVMTI functions that are specified to return error code JVMTI_ERROR_THREAD_NOT_ALIVE.
As an example, see:
https://docs.oracle.com/en/java/javase/15/docs/specs/jvmti.html#GetStackTrace
The following functions are impacted by your fix:
GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, GetCurrentContendedMonitor, GetStackTrace, GetFrameCount, GetFrameLocation, PopFrame

I wonder, how this error code can be ever returned for these functions now.

@robehn
Copy link
Contributor Author

robehn commented Sep 28, 2020

Robbin, thank you for your answers!
There are JVMTI functions that are specified to return error code JVMTI_ERROR_THREAD_NOT_ALIVE.
As an example, see:
https://docs.oracle.com/en/java/javase/15/docs/specs/jvmti.html#GetStackTrace
The following functions are impacted by your fix:
GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, GetCurrentContendedMonitor, GetStackTrace, GetFrameCount, GetFrameLocation, PopFrame

I wonder, how this error code can be ever returned for these functions now.

They should have exactly the same behavior as previously.
All I did was set JVMTI_ERROR_THREAD_NOT_ALIVE as the default value for those handshakes:

_result(JVMTI_ERROR_THREAD_NOT_ALIVE) {}

Which simplifies the code.
So only in the two cases where the guarantee's are we can never return that (reset_current_location/enter_interp_only_mode)

If the guarantee's are wrong, the current code have that bug already, so I'm not adding or fixing that.

It should be exactly the same and we have test for at least some of the operations which verifies that the agent gets: JVMTI_ERROR_THREAD_NOT_ALIVE

And this passes t1-8 multiple times, so I'm pretty confident that this does not change any return value.

@sspitsyn
Copy link
Contributor

Robbin, you are right - thanks.
The JVMTI_ERROR_THREAD_NOT_ALIVE is the default value.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Robbin,
Thanks for the updates and the slack chat to clarify my misunderstanding of the queuing mechanism.

I agree that the logging statements are somewhat confusing as they were written when the processing logic was much simpler, but I understand now the count of emitted executed operations.

This all looks good to me now.
Thanks,
David

@robehn
Copy link
Contributor Author

robehn commented Sep 29, 2020

Thanks all!

/integrate

@robehn
Copy link
Contributor Author

robehn commented Sep 29, 2020

/integrate

@openjdk openjdk bot closed this Sep 29, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 29, 2020
@openjdk
Copy link

openjdk bot commented Sep 29, 2020

@robehn Since your change was applied there have been 36 commits pushed to the master branch:

  • 6d19fe6: 8253763: ParallelObjectIterator should have virtual destructor
  • 55c90a1: 6514600: AbstractAction can throw NullPointerException when clone()d
  • b659132: 8252888: Collapse G1MMUTracker class hierarchy
  • e63b90c: 8251358: Clean up Access configuration after Shenandoah barrier change
  • 9c17a35: 8253748: StressIGV tests fail with release VMs
  • 70b0fcc: 8253728: tests fail with "assert(fr.is_compiled_frame()) failed: Wrong frame type"
  • 527b0e4: 8248984: Bump minimum boot jdk to JDK 15
  • ac15d64: 8241151: Incorrect lint warning for no definition of serialVersionUID in a record
  • d25b03e: 8253616: Change to GCC 10.2 for building on Linux at Oracle
  • 821bd08: 8253667: ProblemList tools/jlink/JLinkReproducible{,3}Test.java on linux-aarch64
  • ... and 26 more: https://git.openjdk.java.net/jdk/compare/1f5a033421bbcf803169c5c5f93314fd22b5a4a5...master

Your commit was automatically rebased without conflicts.

Pushed as commit 6bddeb7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robehn robehn deleted the 8238761-asynchrounous-handshakes branch September 29, 2020 08:58
robehn pushed a commit to robehn/jdk that referenced this pull request Dec 11, 2023
Merge jdk-22+26 into rivos/main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

6 participants