-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8259242: Remove ProtectionDomainSet_lock #3362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ption.java timed out
👋 Welcome back coleenp! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Coleen,
Can you explain the synchronization protocol you are using here please - I don't understand how the handshake fits into it ??
Further comments below.
Thanks,
David
assert(Thread::current()->is_Java_thread(), "only called by JavaThread"); | ||
assert_locked_or_safepoint(SystemDictionary_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the current thread must be a JavaThread then we cannot be at a safepoint and so the second assert can just check the lock is held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
ProtectionDomainEntry* pd_set() const { return Atomic::load(&_pd_set); } | ||
void set_pd_set(ProtectionDomainEntry* entry) { Atomic::store(&_pd_set, entry); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These probably need to be load_acquire and release_store, so that a lock-free traversal can proceed correctly with a locked addition/removal. With suitable name changes for the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that the method names have to match the implementation details. Thanks I didn't know whether they needed acquire etc.
// If there are any deleted entries, Handshake-all then they'll be | ||
// safe to remove since traversing the pd_set list does not stop for | ||
// safepoints and only JavaThreads will read the pd_set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand what you are trying to do here. This is basically a no-op handshake with every thread? Does all the threads remain in the handshake until the last one is processed or do they proceed through one at a time? The former seems better suited for a safepoint operation. The latter means this doesn't work as they may have moved on to a new removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the ServiceThread calls this code, so they won't be removed again. This is modeled after the ObjectMonitor code that does the same thing. All the threads have to hit a handshake at a safepoint polling place, which they cannot do while reading the list. After all the threads have hit the handshake, no thread can see the old entries on the list.
I thought that's what I said in the comment. What should I say to make this clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See src/hotspot/share/runtime/synchronizer.cpp:
size_t ObjectSynchronizer::deflate_idle_monitors() {
<snip>
if (current->is_Java_thread()) {
<snip>
// A JavaThread needs to handshake in order to safely free the
// ObjectMonitors that were deflated in this cycle.
HandshakeForDeflation hfd_hc;
Handshake::execute(&hfd_hc);
ProtectionDomainEntry* next() { return Atomic::load(&_next); } | ||
void set_next(ProtectionDomainEntry* entry) { Atomic::store(&_next, entry); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again these may need to be load_acquire/release_store for correct lock-free traversal. With suitable name changes for the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll add acquire/release and change the names to match the implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we collectively agree on the naming convention that set_next has to be release_set_next because the implementation uses release_store, or is this just your preference? I find this noisy in the context where these functions are being called.
The lock-free algorithm is based on the one for object monitors, except that adding and deleting entries is done with the SystemDictionary_lock held, reading entries is done with a (now) Atomic load acquires etc with no safepoint polls, and deleting the entries is done by saving the entries off to the side, until a handshake is reached, meaning none of the readers are reading the old version of the list. |
Mailing list message from David Holmes on hotspot-dev: On 7/04/2021 10:30 pm, Coleen Phillimore wrote:
Yes that is the convention that was established and in place for a Thanks, |
Mailing list message from David Holmes on hotspot-dev: On 7/04/2021 10:05 pm, Coleen Phillimore wrote:
I really don't see why we need to use handshakes here. The writers use a Thanks, |
The reader thread could be using the linked list and have just loaded -> next, while the deleting thread also has loaded -> next, deciding it's unused and calling delete on it. The reader is then swapped back in the process and looks at the _pd_cache field which was deleted or the _next field which is will point to the delete pattern. The concurrent hashtable uses globalCounter for this situation. Many of our lock free data structures delete entries in safepoints (like dictionaries). This pd_set list does the same as the ObjectMonitor code. The handshake is rare as I said in the comments. The protection domain is generally associated with the klass in the initiating loader, which stays alive while this class is alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Coleen,
Changes LGTM. Some comments below:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing, Patricio.
I renamed this bug and pull request. |
… deletions before handshaking. Fix if to when.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Coleen,
Thanks for the additional explanations about the synchronization protocol that is being used. The use of the global handshake with the OM deletion had not really registered with me as a general technique that we would employ for safe-memory-reclamation. (I'm not sure if I find it cool or scary! :) )
So to summarise the protocol:
- the pd_set can only be set and appended to when the lock is held
- traversal of a pd_set is lock-free
- removal from a pd_set is a two phase process:
-
- Under the lock move the entry to the global delete list
-
- When appropriate the serviceThread will process the delete list, by first handshaking all threads, and then iterating through and deleting the entries
We know an entry cannot be in-use when deleted because that implies a traversal is in progress, but if a traversal is in progress then the handshake cannot have happened.
Is that correct?
I would like to see this high-level description as a block comment somewhere for clarity, as the individual pieces of code seem spread around and it is not easy (for me) to see the connections when just reading the code.
Thanks,
David
if (_delete_list == NULL) { | ||
_delete_list = new (ResourceObj::C_HEAP, mtClass) | ||
GrowableArray<ProtectionDomainEntry*>(20, mtClass); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is protecting this code to ensure the allocation can only happen once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions is only called from the service thread. It's not reentrant.
ProtectionDomainEntry* to_delete = current; | ||
current = current->next(); | ||
delete to_delete; | ||
delete_list->push(current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested comment before this:
// Mark current for deletion, but in the meantime it can still be traversed
It is important that current is not unlinked so that in-progress traversals do not break. Though to be honest I'm unclear exactly what triggers the clearing of an entry this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the comment. The service thread could be trying to remove this entry and there's a small chance that some class loading will be looking up this entry with SystemDictionary::find_instance_klass.
// First clean cached pd lists in loaded CLDs
// It's unlikely, but some loaded classes in a dictionary might
// point to a protection_domain that has been unloaded.
// The dictionary pd_set points at entries in the ProtectionDomainCacheTable.
There's a test runtime/Dictionary/ProtectionDomainCache.java that does this. It creates a protection domain with a jar file loading with a URL class loader that's then removed. The defining class loader of the class loaded has that protection domain in its pd_set.
@coleenp This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 47 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Hi David, Yes you have the design correct. It's used for ObjectMonitors, nmethod unloading, implicitly class unloading with ZGC, and now this. The concurrent hashtable uses the globalCounter mechanism, which I briefly considered but it would have required more surgery to the code and would slightly affect read times, and that mechanism is more complicated imo. I could have done something with a lock bit/refcount for readers but that's more error prone and harder to get right. There's a new lockFreeQueue.hpp implementation but this wasn't a queue. I put a block comment before Dictionary::contains_protection_domain, the lock free reader with the NSV which describes the protocol. Hopefully this is clear. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good - thanks Coleen!
David
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks Coleen!
Thank you Patricio and David! |
@coleenp Since your change was applied there have been 47 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 06e6b1f. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This change makes the DictionaryEntry pd_set reads lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock. This lock is a singleton that's taken for every dictionary entry.
I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
Tested with tiers 1-3. There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3362/head:pull/3362
$ git checkout pull/3362
Update a local copy of the PR:
$ git checkout pull/3362
$ git pull https://git.openjdk.java.net/jdk pull/3362/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3362
View PR using the GUI difftool:
$ git pr show -t 3362
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3362.diff