New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8255987: JDI tests fail with com.sun.jdi.ObjectCollectedException #1595
Conversation
👋 Welcome back pliden! A progress list of the required criteria for merging this PR into |
/cc hotspot-gc |
@pliden |
Webrevs
|
debuggee.resume(); | ||
checkDebugeeAnswer_instances(className, baseInstances); | ||
debuggee.suspend(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the changes in this PR, what was triggering the (expected) collection of the objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't making sense to me - but then this test is not making much sense to me either. The testArrayType logic is quite different to testClassType and now seems invalid. It suspends the VM, then calls disableCollection on all the object refs of interest, then later calls enableCollection and then resumes the VM. The calls to disableCollection/enableCollection seem pointless if GC is disabled while the VM is suspended. I suspect this was added because VM suspension was not in fact stopping the GC.
The testClassType test is doing what? I can't tell what it expects to be checking with checkDebugeeAnswer_instances, but there's no VM suspension (presently) and no disableCollection calls. ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plummercj Nothing was explicitly triggering collection of these objects. However, the test is explicitly checking the number of objects "reachable for the purposes of garbage collection" in checkDebugeeAnswer_instances()
. The tests sets up a breakpoint (with SUSPEND_ALL), which suspends the VM. Then it creates a number of new instances and expects these to be weakly reachable. However, with this change, suspending the VM will make all objects "reachable for the purposes of garbage collection". So, to let the test continue to work as intended we need to first resume the VM, count the instances, and then suspend it again.
@dholmes-ora I have no idea why these tests are so different. The VM suspend is implicit in the breakpoint in this test, which is set up using SUSPEND_ALL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand now. ReferenceType.instances()
only counts objects "reachable for the purposes of garbage collection". This change in behavior does concern me a little bit. I think the expectation is that the instances created by ClassType.newInstance()
will not show up in this count unless disableCollection()
is called, even when under a "suspend all". Clearly that's the expectation of this test, so the question is whether or not it is a reasonable expectation.
Note that ClassType.newInstance()
says nothing about the state of the returned object w.r.t. GC. It makes no mention of the need to call disableCollection()
before resuming the VM, so I guess this gives us some wiggle room. However, the argument against the object being strongly reachable is comes from user asking the question "who has the strong reference that makes it strongly reachable?". It's not obvious to the user why there is a strong reference, and why it seemingly goes a way once VM.resumeAll()
is called.
I still think overall this is the right approach (baring a better approach being presented), but we may need to include some spec clarifications, and be prepared for some push back if this breaks anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow your reasoning here Chris. All ObjectReferences can be GC'd at any time unless GC has been disallowed. So a reference create via newInstance is no different to any other reference. If it is currently reachable then instances() should return it. Are you treating "reachable for the purposes of garbage collection" as-if it said "strongly reachable"? It doesn't so I think you are reading too much into this. I think there is a lot of flexibility in this API in terms of what it may return regarding weak references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read "reachable for the purposes of garbage collection" as not including objects reachable only via weak reference. So if the only reference to an object is a weak reference, which is normally what you have after calling ClassType.newInstance()
, then the object is not considered reachable. At the very least, his is how ReferenceType.instances()
is implemented, and is based on JVMTI FollowReferences().
So given that, the expectation would be that an object returned ClassType.newInstance()
would not be counted by ReferenceType.instances()
unless something is done to add a strong reference to the object, such as calling ObjectReference.disableCollection()
. Now with Per's changes a strong reference is also created with doing a VM.suspend(). The test doesn't expect this behavior, and it's understandable why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're still within what the spec says, given that the wording is so loose. But it's hard to tell if this change will be problematic for some use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with making the change and then seeing if there is any fallout from it. My guess is there won't be. I do think there is a need to cleanup the JDI and JDWP specs in a few areas w.r.t. object liveness. Another CR can be filed for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed https://bugs.openjdk.java.net/browse/JDK-8257921. Feel free to extend/improve the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll add some suggestions to the CR based on some of our recent discussions.
try { | ||
array.disableCollection(); | ||
} catch (ObjectCollectedException e) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment: "Since the VM is not suspended, the object may have been collected before disableCollection() could be called on it. Just ignore and continue doing allocations until we run out of memory."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will fix.
if (weakRef == NULL) { | ||
EXIT_ERROR(AGENT_ERROR_NULL_POINTER,"NewWeakGlobalRef"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure I agree that having a fatal error here is the right thing to do. The only other user of weakenNode()
is ObjectReference.disableCollection()
. It returns an error to the debugger if weakenNode()
returns NULL
. However, I'm not so sure that's a good thing to do here either, since it means the VM.resume()
will need to fail. Possibly the error should just be ignored, and we live with the ref staying strong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another options is to save away the weakref in the node when strengthening. This would benefit ObjectReference.disableCollection()
also, since it would no longer need to deal with a potential OOM. However, I'm not so sure it's actually worth doing. Trying to keep the debug session alive will having allocation errors is probably a fools errand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a fatal error here seems excessive. Simply maintaining the strong ref seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to mimic what we already do in strengthenNode()
, i.e. it's a fatal error if we can't create a JNI ref. Here:
strongRef = JNI_FUNC_PTR(env,NewGlobalRef)(env, node->ref);
/*
* NewGlobalRef on a weak ref will return NULL if the weak
* reference has been collected or if out of memory.
* It never throws OOM.
* We need to distinguish those two occurrences.
*/
if ((strongRef == NULL) && !isSameObject(env, node->ref, NULL)) {
EXIT_ERROR(AGENT_ERROR_NULL_POINTER,"NewGlobalRef");
}
So it seems appropriate to do the same thing if we fail to create a JNI weak ref. Also, as @plummercj mentioned, if we can't create a JNI ref, continuing the debug session seems rather pointless as we're about to go down anyway (the next allocation in the JVM will be fatal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
Mailing list message from David Holmes on hotspot-gc-dev: Hi Per, On 3/12/2020 11:19 pm, Per Liden wrote:
I think we can quite reasonably infer that "suspending a VM" means
You can imagine though that 25 years ago it was not an unreasonable I'm somewhat surprised that it has taken this long to discover that our
I assume that the GC folk would be horrified if I were to suggest a Doing what is suggested sounds reasonable, from a functional Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems okay. Some comments on tests as I think the existing test logic is quite confused in places.
Thanks,
David
jobject strongRef; | ||
|
||
strongRef = strengthenNode(env, node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually trying to carefully to follow the coding style currently used in this file/library. If you have a quick look at this file you'll see the pattern above in multiple places, where as combined declaration+assignment style isn't used. So while I personally agree about this style question, I also think following the style already present in a file has precedence over introducing a new style. Don't you agree?
jweak weakRef; | ||
|
||
weakRef = weakenNode(env, node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this can be a single line.
if (weakRef == NULL) { | ||
EXIT_ERROR(AGENT_ERROR_NULL_POINTER,"NewWeakGlobalRef"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a fatal error here seems excessive. Simply maintaining the strong ref seems reasonable.
* Pin all objects to prevent objects from being | ||
* garbage collected while the VM is suspended. | ||
*/ | ||
commonRef_pinAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have multiple VM.suspend calls? The suspendAllCount seems to suggest that. In which case shouldn't we only pin on the 0->1 transition, and only unpin on the 1->0 transition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was something I pointed out in the pre-review, and it has been addressed in commonRef_pinAll/unpinAll
:
568 if (gdata->pinAllCount == 1) {
618 if (gdata->pinAllCount == 0) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I would not have handled it at that level, but would have had
pinAll/unpinAll operate unconditionally, but the calls to those methods
being conditional based on the suspendAllCount.David
Well, that's assuming pinAll()
will only ever be used by by suspendAll()
. One could imaging a future use, such as if VirtualMachine.disableCollection()
were ever to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking pinAll()/unpinAll()
should stand on their own, and not implicitly depend/rely on suspendAllCount
. As @plummercj says, one could imagine we want to use these functions in other contexts in the future.
debuggee.resume(); | ||
checkDebugeeAnswer_instances(className, baseInstances); | ||
debuggee.suspend(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't making sense to me - but then this test is not making much sense to me either. The testArrayType logic is quite different to testClassType and now seems invalid. It suspends the VM, then calls disableCollection on all the object refs of interest, then later calls enableCollection and then resumes the VM. The calls to disableCollection/enableCollection seem pointless if GC is disabled while the VM is suspended. I suspect this was added because VM suspension was not in fact stopping the GC.
The testClassType test is doing what? I can't tell what it expects to be checking with checkDebugeeAnswer_instances, but there's no VM suspension (presently) and no disableCollection calls. ???
Mailing list message from David Holmes on hotspot-gc-dev: On 7/12/2020 4:30 pm, Chris Plummer wrote:
Okay. I would not have handled it at that level, but would have had David |
Mailing list message from David Holmes on hotspot-gc-dev: On 7/12/2020 5:44 pm, Chris Plummer wrote:
Not really. I consider pinAll should pin-all as the name implies. The David |
Mailing list message from David Holmes on hotspot-gc-dev: On 7/12/2020 9:12 pm, Per Liden wrote:
This file uses an archaic C-style, so while I agree it would be Cheers, |
I would say that would not be in spirit of how the rest of this library is designed, with regards to nesting of calls. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have some reservations about the logic in some of the tests now (ie using disableCollection whilst the VM is suspended and reenabling also whilst suspended) but the logic was unclear in the first place. If necessary follow up cleanup issues could be filed here.
Thanks,
David
@pliden This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 102 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
A number of files need copyright updates. |
@plummercj Copyright fixed. |
Thanks for reviewing, @plummercj and @dholmes-ora! /integrate |
@pliden Since your change was applied there have been 107 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 79f1dfb. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR replaces the withdrawn PR #1348. This PR tries to fix the underlying problem, rather than fix the tests.
The problem is that a number of JDI tests create objects on the debugger side with calls to
newInstance()
. However, on the debugee side, these new instances will only be held on to by aJNIGlobalWeakRef
, which means they could be collected at any time, even beforenewInstace()
returns. A number of JDI tests get spuriousObjectCollectedException
thrown at them, which results in test failures. To make these objects stick around, a call todisableCollection()
is typically needed.However, as pointer out by @plummercj in JDK-8255987:
Most of our spuriously failing tests do actually make a call to
VirtualMachine.suspend()
, presumably to prevent objects from being garbage collected. However, the current implementation ofVirtualMachine.suspend()
will only suspend all Java threads. That is not enough to prevent objects from being garbage collected. The GC can basically run at any time, and there is no relation to whether all Java threads are suspended or not.However, as suggested by @plummercj, we could emulate the behaviour implied by the spec by letting a call to
VirtualMachine.suspend()
also convert all existing JDI objects references to be backed by a (strong)JNIGlobalRef
rather than a (weak)JNIGlobalWeakRef
. That will not prevent the GC from running, but it will prevent any object visible to a JDI client from being garbage collected. Of course, a call toVirtualMachine.resume()
would convert all references back to being weak again.This patch introduces the needed functions in
libjdwp
to "pin" and "unpin" all objects. These new functions are then used by the underpinnings ofVirtualMachine.suspend()
andVirtualMachine.resume()
to implement the behaviour described above.Note that there are still a few tests that needed adjustments to guard against
ObjectCollectionException
. These are:ArrayType/newinstance
tests.ObjectCollectedException
seems reasonable here.TestClassLoader
from being unloaded to avoid invalidating code locations.Testing:
vmTestbase/nsk/jdi
andvmTestbase/nsk/jdwp
test suites, using various GC, both in mach5 and locally.Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/1595/head:pull/1595
$ git checkout pull/1595