-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8238761: Asynchronous handshakes #151
Conversation
👋 Welcome back rehn! A progress list of the required criteria for merging this PR into |
@robehn The following labels will be automatically applied to this pull request: When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing lists. If you would like to change these labels, use the |
Webrevs
|
Mailing list message from Patricio Chilano on hotspot-dev: Hi Robbin, Changes look good to me! Some minor comments: src/hotspot/share/prims/jvmtiThreadState.cpp src/hotspot/share/prims/jvmtiEnvBase.cpp src/hotspot/share/runtime/handshake.cpp src/hotspot/share/runtime/handshake.cpp src/hotspot/share/runtime/interfaceSupport.inline.hpp Thanks! Patricio On 9/15/20 4:39 AM, Robbin Ehn wrote: |
Added NSV ProcessResult to enum Fixed logging Moved _active_handshaker to private
Removed double checks.
Reverted to plain enum and updated logs. (better?)
I wanted a NSV to cover the process_self_inner method.
Sorry, the issue is the lock rank. Right now the semaphore hides this issue. Please see commit 86b83d0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, thanks for fixing! I added some comments on the changes.
Update looks good, thanks Robbin! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up. I don't think I have anything that is in the must fix category.
@robehn This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for more details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 31 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Robbin,
There is still a lack of motivation for this feature in the JBS issue. What kind of handshakes need to be asynchronous? Any async operation implies that the requester doesn't care about when or even if, the operation gets executed - they are by definition fire-and-forget actions. So what are the usecases being envisaged here?
Many of the changes included here seem unrelated to, and not reliant on, async handshakes, and could be factored out to simplify the review and allow focus on the actual async handshake part e.g. the JVM TI cleanups seem they could be mostly standalone.
Specific comments below. A general concern I have is where the current thread is no longer guaranteed to be a JavaThread (which is a step in the wrong direction in relation to some of the cleanups I have planned!) and I can't see why this would be changing.
Thanks.
test/hotspot/jtreg/runtime/handshake/AsyncHandshakeWalkStackTest.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/runtime/handshake/MixedHandshakeWalkStackTest.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/runtime/handshake/MixedHandshakeWalkStackTest.java
Outdated
Show resolved
Hide resolved
Mailing list message from David Holmes on hotspot-dev: Correction ... On 21/09/2020 4:16 pm, David Holmes wrote:
That comment was placed against the old line 336 which was the deletion bool Handshake::execute(HandshakeClosure* thread_cl, JavaThread* target) { (I'll file a skara/git bug). David |
Added info in JBS issue: All uses-cases of _suspend_flag as initial targets.
Since I kept rebasing this doing somethings I did somethings to simplify the rebasing.
If the VM thread emits a "handshake all" it will continuously loop the JavaThreads until op is completed. I did not see any issues while looking at the code or in testing doing this. Some of the JVM TI handshakes are a bit different, but since they must proper allocate resource in target JavaThread and not in current JavaThread, there is no issue executing the code with a non-JavaThread. At the moment we have no dependencies on that the 'driver' is a JavaThread for any of the handshakes. So if we think JVM TI handshakes should only be executed by requester or target it's an easy fix. (For others following there also a planned investigation on requester only executed handshake, which is not as easy)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me!
Hi Robbin, I've gone back to refresh myself on the previous discussions and (internal) design walk-throughs to get a better sense of these changes. Really the "asynchronous handshake" aspect is only a small part of this. The fundamental change here is that handshakes are now maintained via a per-thread queue, and those handshake operations can, in the general case, be executed by any of the target thread, the requestor (active_handshaker) thread or the VMThread. Hence the removal of the various "JavaThread::current()" assumptions. Unless constrained otherwise, any handshake operation may be executed by the VMThread so we have to take extra care to ensure the code is written to allow this. I'm a little concerned that our detour into direct-handshakes actually lulled us into a false sense of security knowing that an operation would always execute in a JavaThread, and we have now reverted that and allowed the VMThread back in. I understand why, but the change in direction here caught me by surprise (as I had forgotten the bigger picture). It may not always be obvious that the transitive closure of the code from an operation can be safely executed by a non-JavaThread. Then on top of this generalized queuing mechanism there is a filter which allows some control over which thread may perform a given operation - at the moment the only filter isolates "async" operations which only the target thread can execute. In addition another nuance is that when processing a given thread's handshake operation queue, different threads have different criteria for when to stop processing the queue:
** I do have some concerns about latency impact on the VMThread if it is used to execute operations that didn't need to be executed by the VMThread! I remain concerned about the terminology conflation that happens around "async handshakes". There are two aspects that need to be separated:
When a thread initiates a handshake operation and waits until that operation is complete (regardless of which thread performed it, or whether the initiator processed any other operations) that is a synchronous handshake operation. The question of whether the operation must be executed by the target thread is orthogonal to whether the operation was submitted as a synchronous or asynchronous operation. So I have problem when you say that an asynchronous handshake operation is one that must be executed by the target thread, as this is not the right characterisation at all. It is okay to constrain things such that an async operation is always executed by the target, but that is not what makes it an async operation. In the general case there is no reason why an async operation might not be executed by the VMThread, or some other JavaThread performing a synchronous operation on the same target. I will go back through the actual handshake code to see if there are specific things I would like to see changed, but that will have to wait until tomorrow. Thanks, |
Hi David, you are correct and you did a fine job summarizing this, thanks!
Great thanks!
|
@coleenp I think you placed your comment: Anyhow I can reply to it here: |
Hi Robbin, |
Mailing list message from David Holmes on hotspot-dev: On 23/09/2020 7:37 pm, Robbin Ehn wrote:
Wow! That all definitely needs some detailed commentary. Thanks, |
Mailing list message from David Holmes on hotspot-dev: <trimming> On 23/09/2020 8:11 pm, Robbin Ehn wrote:
I find it hard to tell which classes form which.
Can we at least declare a protected constructor for HandshakeOperation Thanks, |
Hi Serguei,
Good.
Reading the code I did not find any issues.
Two of the handshakes have guarantee checks:
This is the only placed it is used/needed. I had quick look now, and from what I can tell it is not guaranteed that we do execute those handshake. jdk/src/hotspot/share/runtime/thread.cpp Line 2123 in 94daf2c
But we remove the JVM TI Thread state way later, here: jdk/src/hotspot/share/runtime/thread.cpp Line 2207 in 94daf2c
For example the method ensure_join() which is between set_terminated and removing the JVM TI state can safepoint/handshake. jdk/src/hotspot/share/runtime/thread.cpp Line 2159 in 94daf2c
That would trigger the guarantee. So I believe we should not have those two guarantees and thus the _completed can be removed once again. |
…ved names once more and moved non-public to private
Added comment @dholmes-ora Running tests Good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some naming issues to resolve and an editing pass on various new comments.
Thanks,
David
|
||
private: | ||
// Must use AsyncHandshakeOperation when using AsyncHandshakeClosure. | ||
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns) {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not understanding the way you have implemented the guard I requested. How does declaring the private constructor prevent a call to HandShakeOperation(someAsync, target) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my bad, fixed.
@@ -192,12 +199,12 @@ void VM_Handshake::handle_timeout() { | |||
fatal("Handshake operation timed out"); | |||
} | |||
|
|||
static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int requester_executed, const char* extra = NULL) { | |||
static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int non_self_executed, const char* extra = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "non_self_executed" is much clearer - thanks. But this now highlights that the log message itself doesn't make complete sense as it refers to "requesting thread" when it could be a range of possible threads not just a single requesting thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log line is done by the requesting thread of the specific "Handshake "%s".
The log line only prints executed handshake operations emitted by the requesting thread.
With changes on how logs from @pchilano and name changes this is getting confusing for me also.
I renamed non_self_executed to emitted_handshakes_executed as the local variable passed into this function.
To summarize the log line prints how many of the handshake operation emitted by 'this' handshake request was done by the requesting thread (or by VM Thread on behalf of the requesting thread when doing handshake all).
This value can thus only be 0 or 1 if not executed by VM thread in a handshake all.
Operations not done by requesting thread was either done cooperatively or by targets self.
int executed_by_driver = 0; | ||
// Keeps count on how many of own emitted handshakes | ||
// this thread execute. | ||
int emitted_handshakes_executed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest:
// Keeps count of how many handshakes were actually executed
// by this thread.
int handshakes_executed = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only count those we emitted.
I use to count all, but @pchilano thought it was confusing that we could log more executed than emitted.
So now it only counts executed and emitted.
int executed_by_driver = 0; | ||
// Keeps count on how many of own emitted handshakes | ||
// this thread execute. | ||
int emitted_handshakes_executed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dito :)
Mutex _lock; | ||
// Set to the thread executing the handshake operation during the execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/the execution/its execution/
Or just delete "during the execution"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// MT-Unsafe, external serialization needed. | ||
// Applies the match_func to the items in the queue until match_func returns | ||
// true and then return false, or there is no more items and then returns | ||
// false. Any pushed item while executing may or may not have match_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/pushed item/item pushed/
I know what you are trying to say about concurrent pushes but it sounds too non-deterministic - any concurrent push that happens before contains() reaches the end of the queue will have the match_func applied. So in that sense "while executing" only applies to pushes that will be seen; any push not seen had to have happened after execution was complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We insert on first ptr and we also walk from first ptr. (This is tail of the queue)
Since contains() never restart and we load first on directly on entry, no new pushes will be seen by contains().
pop() on the other hand may re-start due to failed CAS, thus all pushes up till this failed CAS will be seen.
This happens if first ptr (tail) is the items selected for popping.
A new push will change the first ptr (tail), the re-start to be able to unlinked the match.
With a deterministic match_func this newly pushed item will never be pop, since we have an older matching item. (otherwise we would never tried to CAS)
// true and then return false, or there is no more items and then returns | ||
// false. Any pushed item while executing may or may not have match_func | ||
// applied. The method is not re-entrant and must be executed mutually | ||
// exclusive other contains and pops calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// exclusive to other contains() and pop() calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
E pop() { | ||
return pop(match_all); | ||
} | ||
|
||
// MT-Unsafe, external serialization needed. | ||
// Applies the match_func to all items in the queue returns the item which | ||
// match_func return true for and was inserted first. Any pushed item while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Applies the match_func to each item in the queue, in order of insertion, and
// returns the first item for which match_func returns true. Returns false if there are
// no matches or the queue is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some offline discussion, fixed.
// match_func return true for and was inserted first. Any pushed item while | ||
// executing may or may not have be popped, if popped it was the first | ||
// inserted match. The method is not re-entrant and must be executed mutual | ||
// exclusive with other contains and pops calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on contains() to ensure pop() and contains() use consistent terminology. Thanks.
assert(_lock.owned_by_self(), "Lock must be held"); | ||
return _queue.pop(); | ||
}; | ||
|
||
static bool processor_filter(HandshakeOperation* op) { | ||
return !op->is_asynch(); | ||
static bool non_self_queue_filter(HandshakeOperation* op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "non_self_queue_filter" is really awkward - sorry. But I think we're going to have to revisit the way filtering is named and done once we try to generalise it anyway, so I'll let this pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David.
|
||
private: | ||
// Must use AsyncHandshakeOperation when using AsyncHandshakeClosure. | ||
HandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns) {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my bad, fixed.
@@ -192,12 +199,12 @@ void VM_Handshake::handle_timeout() { | |||
fatal("Handshake operation timed out"); | |||
} | |||
|
|||
static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int requester_executed, const char* extra = NULL) { | |||
static void log_handshake_info(jlong start_time_ns, const char* name, int targets, int non_self_executed, const char* extra = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log line is done by the requesting thread of the specific "Handshake "%s".
The log line only prints executed handshake operations emitted by the requesting thread.
With changes on how logs from @pchilano and name changes this is getting confusing for me also.
I renamed non_self_executed to emitted_handshakes_executed as the local variable passed into this function.
To summarize the log line prints how many of the handshake operation emitted by 'this' handshake request was done by the requesting thread (or by VM Thread on behalf of the requesting thread when doing handshake all).
This value can thus only be 0 or 1 if not executed by VM thread in a handshake all.
Operations not done by requesting thread was either done cooperatively or by targets self.
int executed_by_driver = 0; | ||
// Keeps count on how many of own emitted handshakes | ||
// this thread execute. | ||
int emitted_handshakes_executed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only count those we emitted.
I use to count all, but @pchilano thought it was confusing that we could log more executed than emitted.
So now it only counts executed and emitted.
int executed_by_driver = 0; | ||
// Keeps count on how many of own emitted handshakes | ||
// this thread execute. | ||
int emitted_handshakes_executed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dito :)
assert(_lock.owned_by_self(), "Lock must be held"); | ||
return _queue.pop(); | ||
}; | ||
|
||
static bool processor_filter(HandshakeOperation* op) { | ||
return !op->is_asynch(); | ||
static bool non_self_queue_filter(HandshakeOperation* op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
@@ -28,18 +28,22 @@ | |||
#include "memory/allocation.hpp" | |||
#include "runtime/atomic.hpp" | |||
|
|||
// The FilterQueue is FIFO with the ability to skip over queued items. | |||
// The skipping is controlled by using a filter when poping. | |||
// It also supports lock free pushes, while poping (including contain()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// MT-Unsafe, external serialization needed. | ||
// Applies the match_func to the items in the queue until match_func returns | ||
// true and then return false, or there is no more items and then returns | ||
// false. Any pushed item while executing may or may not have match_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We insert on first ptr and we also walk from first ptr. (This is tail of the queue)
Since contains() never restart and we load first on directly on entry, no new pushes will be seen by contains().
pop() on the other hand may re-start due to failed CAS, thus all pushes up till this failed CAS will be seen.
This happens if first ptr (tail) is the items selected for popping.
A new push will change the first ptr (tail), the re-start to be able to unlinked the match.
With a deterministic match_func this newly pushed item will never be pop, since we have an older matching item. (otherwise we would never tried to CAS)
|
||
// MT-Unsafe, external serialization needed. | ||
// Applies the match_func to the items in the queue until match_func returns | ||
// true and then return false, or there is no more items and then returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// true and then return false, or there is no more items and then returns | ||
// false. Any pushed item while executing may or may not have match_func | ||
// applied. The method is not re-entrant and must be executed mutually | ||
// exclusive other contains and pops calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
E pop() { | ||
return pop(match_all); | ||
} | ||
|
||
// MT-Unsafe, external serialization needed. | ||
// Applies the match_func to all items in the queue returns the item which | ||
// match_func return true for and was inserted first. Any pushed item while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some offline discussion, fixed.
Robbin, thank you for your answers! I wonder, how this error code can be ever returned for these functions now. |
They should have exactly the same behavior as previously. jdk/src/hotspot/share/prims/jvmtiEnvBase.hpp Line 345 in 3a95750
Which simplifies the code. So only in the two cases where the guarantee's are we can never return that (reset_current_location/enter_interp_only_mode) If the guarantee's are wrong, the current code have that bug already, so I'm not adding or fixing that. It should be exactly the same and we have test for at least some of the operations which verifies that the agent gets: JVMTI_ERROR_THREAD_NOT_ALIVE And this passes t1-8 multiple times, so I'm pretty confident that this does not change any return value. |
Robbin, you are right - thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Robbin,
Thanks for the updates and the slack chat to clarify my misunderstanding of the queuing mechanism.
I agree that the logging statements are somewhat confusing as they were written when the processing logic was much simpler, but I understand now the count of emitted executed operations.
This all looks good to me now.
Thanks,
David
Thanks all! /integrate |
/integrate |
@robehn Since your change was applied there have been 36 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 6bddeb7. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Merge jdk-22+26 into rivos/main
This patch implements asynchronous handshake, which changes how handshakes works by default. Asynchronous handshakes are target only executed, which they may never be executed. (target may block on socket for the rest of VM lifetime) Since we have several use-cases for them we can have many handshake pending. (should be very rare) To be able handle an arbitrary amount of handshakes this patch adds a per JavaThread queue and heap allocated HandshakeOperations. It's a singly linked list where you push/insert to the end and pop/get from the front. Inserts are done via CAS on first pointer, no lock needed. Pops are done while holding the per handshake state lock, and when working on the first pointer also CAS.
The thread grabbing the handshake state lock for a JavaThread will pop and execute all handshake operations matching the filter. The JavaThread itself uses no filter and any other thread uses the filter of everything except asynchronous handshakes. In this initial change-set there is no need to do any other filtering. If needed filtering can easily be exposed as a virtual method on the HandshakeClosure, but note that filtering causes handshake operation to be done out-order. Since the filter determins who execute the operation and not the invoked method, there is now only one method to call when handshaking one thread.
Some comments about the changes:
HandshakeClosure uses ThreadClosure, since it neat to use the same closure for both alla JavThreads do and Handshake all threads. With heap allocating it cannot extends StackObj. I tested several ways to fix this, but those very much worse then this.
I added a is_handshake_safe_for for checking if it's current thread is operating on itself or the handshaker of that thread.
Simplified JVM TI with a JvmtiHandshakeClosure and also made them not needing a JavaThread when executing as a handshaker on a JavaThread, e.g. VM Thread can execute the handshake operation.
Added WB testing method.
Removed VM_HandshakeOneThread, the VM thread uses the same call path as direct handshakes did.
Changed the handshake semaphores to mutex to be able to handle deadlocks with lock ranking.
VM_HandshakeAllThreadsis still a VM operation, since we do support half of the threads being handshaked before a safepoint and half of them after, in many handshake all operations.
ThreadInVMForHandshake do not need to do a fenced transistion since this is always a transistion from unsafe to unsafe.
Added NoSafepointVerifyer, we are thinking about supporting safepoints inside handshake, but it's not needed at the moment. To make sure that gets well tested if added the NoSafepointVerifyer will raise eyebrows.
Added ttyLocker::break_tty_lock_for_safepoint(os::current_thread_id()); due to lock rank.
Added filtered queue and gtest for it.
Passes multiple t1-8 runs.
Been through some pre-reviwing.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/151/head:pull/151
$ git checkout pull/151