Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown #775

Closed
wants to merge 6 commits into from
Closed

Conversation

reinrich
Copy link
Member

@reinrich reinrich commented Oct 20, 2020

The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn:

EAPopFrameNotInlinedReallocFailure
EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure
EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure

For ZGC (so far) this is not 100% reliable.

Just ignoring the runs where the expected OOME was not raised was not accepted.

Summary of the now accepted solution:

  • The 3 problematic test cases are skipped if ZGC is selected.

  • They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated.

  • In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory.

  • EATests.java is removed from the problem list for ZGC.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x32 Linux x64 Windows x64 macOS x64
Build ✔️ (1/1 passed) ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ❌ (1/9 failed) ✔️ (9/9 passed)

Failed test task

Issue

  • JDK-8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/775/head:pull/775
$ git checkout pull/775

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 20, 2020

👋 Welcome back rrich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 20, 2020

@reinrich The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Oct 20, 2020
@reinrich
Copy link
Member Author

/issue remove 8255072

@reinrich
Copy link
Member Author

/issue add 8255072

@reinrich reinrich marked this pull request as ready for review October 20, 2020 22:03
@openjdk
Copy link

openjdk bot commented Oct 20, 2020

@reinrich This PR does not contain any additional solved issues that can be removed. To remove the primary solved issue, simply edit the title of this PR.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 20, 2020
@openjdk
Copy link

openjdk bot commented Oct 20, 2020

@reinrich This issue is referenced in the PR title - it will now be updated.

@mlbridge
Copy link

mlbridge bot commented Oct 20, 2020

Webrevs

@plummercj
Copy link
Contributor

If the test does not throw OOME, has it actually tested anything in that case? My concern with any test that allows for what is suppose to be an occasional failure it that it will not detect if something breaks and causes that failure to happen every time, often rendering the test useless.

@reinrich
Copy link
Member Author

If the test does not throw OOME, has it actually tested anything in that
case?

If an OOME is expected then it has tested object reallocation in frames affected
by PopFrame/ForceEarlyReturn.

But there are runs where OOME is not expected. I added a new commit which skips
the test cases then.

My concern with any test that allows for what is suppose to be an
occasional failure it that it will not detect if something breaks and causes
that failure to happen every time, often rendering the test useless.

I can follow that concern. My problem is that I cannot reproduce the
failure. Note also that the OOME is successfully generated during object
reallocation a couple of times before (search "run args" in attachments
to the JBS issue).

I'd think the approach to prove the OOME during the PopFrame/ForceEarlyReturn
[1] is relatively reliable knowing how smart GCs try to be with avoiding it.

I've tried to make it even more reliable with a second commit in this pr.

Would that be ok? Maybe you would know a better way?

Thanks, Richard.

[1]

public void consumeAllMemory() {

@plummercj
Copy link
Contributor

I'm not following how the introduction of LinkedListOfLongArrays is making the OOME even more reliable. Can you please elaborate?

Is consumeAllMemory() being called from a different thread than the one doing the forceEarlyReturn? If so, you might be running into an issue because the tlab still has available memory to allocate.

@dcubed-ojdk
Copy link
Member

@reinrich - when your reviewers have agreed to a fix, please remove the
ProblemListing that I did via: #787

@reinrich
Copy link
Member Author

I'm not following how the introduction of LinkedListOfLongArrays is making the OOME even more reliable. Can you please elaborate?

The new LinkedListOfLongArrays is created by renaming LinkedList to
LinkedListOfLongArrays. The new LinkedList is a list node without payload, so it
is smaller than a LinkedListOfLongArrays node. I try to fill the last free
blocks with these. But yeah, this won't make that much of a difference.

Is consumeAllMemory() being called from a different thread than the one doing
the forceEarlyReturn? If so, you might be running into an issue because the
tlab still has available memory to allocate.

Yes, the threads are different (well observed :)). As far as I know tlabs are
retired before gc. I'd expect that they are allocated lazily. So the jdwp agent
thread that does the forceEarlyReturn should not have a tlab at hand. Certainly
these are just assumptions. It would be easy to implement the gc interface(*)
with different heuristics.

(*) Roman Kennke and Aleksey Shipilev demo'ed implementing the gc interface in a
FOSDEM talk.

@reinrich
Copy link
Member Author

@reinrich - when your reviewers have agreed to a fix, please remove the
ProblemListing that I did via: #787

I will. Thanks for doing it.

@plummercj
Copy link
Contributor

The new LinkedListOfLongArrays is created by renaming LinkedList to
LinkedListOfLongArrays. The new LinkedList is a list node without payload, so it
is smaller than a LinkedListOfLongArrays node. I try to fill the last free
blocks with these. But yeah, this won't make that much of a difference.

Ok. However, since you can't reproduce the initial problem it's hard to say if this will fix it. You would need to remove it from the problem list with this PR and see what happens.

Earlier you said:

Note also that the OOME is successfully generated during object
reallocation a couple of times before (search "run args" in attachments
to the JBS issue).

So I suppose in that case it's ok if this one test case allows for no OOME just as long as other test cases still require it.

@reinrich
Copy link
Member Author

Earlier you said:

Note also that the OOME is successfully generated during object
reallocation a couple of times before (search "run args" in attachments
to the JBS issue).

So I suppose in that case it's ok if this one test case allows for no OOME just as long as other test cases still require it.

With that I wanted to state that the probability to get the OOME is not too bad
for meaningful testing. It should be the same for the 3 test cases.

What about repeating a test case if the expected OOME is not raised and let it
fail after 10x retries?

@reinrich
Copy link
Member Author

You mentioned the possibility that the OOME is not thrown because it is another thread that consumes all memory than the one calling forceEarlyReturn() which is supposed to fail with OOME. TLAB could be an issue then. In general GC could have a heuristic in place to raise OOME in greedy threads when another thread would still be able to allocate successfully. I think I could change the debugger part to call consumeAllMemory() in the debuggee. This should be executed in the same jdwp agent thread as the later forceEarlyReturn.

But honestly I don't think it is worth it and I cannot even test if it fixes the issues. I'd prefer to skip the 3 test cases if running with ZGC. Please let me know what you prefer.

@plummercj
Copy link
Contributor

But honestly I don't think it is worth it and I cannot even test if it fixes the issues. I'd prefer to skip the 3 test cases if running with ZGC. Please let me know what you prefer.

That's one option if this only happens with ZGC. You also mentioned doing retries. I would only suggest that if you think you can limit the chances of the OOME not happening to be so unlikely that we are no likely to ever see it happen.

@reinrich
Copy link
Member Author

With the last commit the combined change is

  • The 3 problematic test cases are skipped if ZGC is selected.

  • They are also skipped if no OOME during object reallocation can be expected
    because allocations are not eliminated.

  • In consumeAllMemory, as a last step, empty LinkedList nodes are created
    without long array to fill up small blocks of free memory.

  • EATests.java is removed from the problem list for ZGC.

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Oct 22, 2020

@reinrich This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown

Reviewed-by: cjplummer, sspitsyn, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 107 new commits pushed to the master branch:

  • 5c520c3: 8255232: G1: Make G1BiasedMappedArray freeable
  • 9e5bbff: 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect
  • 5b18558: 8255243: Reinforce escape barrier interactions with ZGC conc stack processing
  • faf23de: 8255534: Shenandoah: Fix CmpP optimization wrt native-LRB
  • 579e50b: 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32
  • 4b20e46: 8255579: x86: Use cmpq(Register,Address) in safepoint_poll
  • 72ff8e2: 8254782: Fix benchmark issues in java/lang/StringIndexOfChar.java benchmark
  • ea26ff1: 8247614: java/nio/channels/DatagramChannel/Connect.java timed out
  • 38574d5: 8255298: Remove SurvivorAlignmentInBytes functionality
  • 4031cb4: 8254189: Improve comments for StackOverFlow and fix in_xxx() functions
  • ... and 97 more: https://git.openjdk.java.net/jdk/compare/0aa3c92577687e02f941260dc0f176f9dcd4ae59...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 22, 2020
@reinrich
Copy link
Member Author

Looks good.

Thank you. I'll wait for a second review assuming it's required.

@plummercj
Copy link
Contributor

Thank you. I'll wait for a second review assuming it's required.

You might want to add the compiler and/or gc teams to the review

@reinrich
Copy link
Member Author

Following @plummercj advice to add compiler/gc teams.

/label hotspot-gc hotspot-compiler

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Oct 23, 2020
@openjdk
Copy link

openjdk bot commented Oct 23, 2020

@reinrich
The hotspot-gc label was successfully added.

The hotspot-compiler label was successfully added.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Oct 23, 2020
Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Richard,
It looks good to me.
One nit:

     public static final boolean DoEscapeAnalysis     = unbox(WB.getBooleanVMFlag("DoEscapeAnalysis"), UseJVMCICompiler);
     public static final boolean EliminateAllocations = unbox(WB.getBooleanVMFlag("EliminateAllocations"), UseJVMCICompiler); // read by debugger
     public static final boolean DeoptimizeObjectsALot   = WB.getBooleanVMFlag("DeoptimizeObjectsALot");                      // read by debugger
     public static final long BiasedLockingBulkRebiasThreshold = WB.getIntxVMFlag("BiasedLockingBulkRebiasThreshold");
     public static final long BiasedLockingBulkRevokeThreshold = WB.getIntxVMFlag("BiasedLockingBulkRevokeThreshold");
+    public static final boolean ZGCIsSelected        = GC.Z.isSelected();

There are unneeded spaces before '=' sign.

Thanks,
Serguei

@reinrich
Copy link
Member Author

Hi Serguei,

thanks for reviewing. I'll remove the whitespace.

I'm able now to reproduce the issue but only with ZGC. So far my attempts(*) to reliably get the OOME during ForceEarlyReturn/PopFrame because of object reallocation failed though. So I'm still in favour of the current solution which is: skip the 3 problematic testcases if ZGC is selected in the target vm. I'm still open for suggestions also though. I'll wait a few more days and then I'll integrate.

Thanks, Richard.

(*) I tried:

  • disable TLAB
  • call WhiteBox.fullGC() in consumeAllMemory() before the last round of allocations.
  • Check if the memory can be allocated by the thread doing the PopFrame/ForceEarlyReturn. com.sun.jdi.ThreadReference::invokeMethod() cannot be used. A target thread has to be specified and the jdwp threads are not visible through jdi. Only a dedicated native test JVMTI agent can consume all memory and then do the PopFrame/ForceEarlyReturn.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change @requires for testing with Graal to vm.graal.enabled instead of vm.jvmci

@reinrich
Copy link
Member Author

Please change @requires for testing with Graal to vm.graal.enabled instead of vm.jvmci

Sure. I've done that now.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good

@reinrich
Copy link
Member Author

Good

Thanks for reviewing. Will integrate beginning of next week.

@reinrich
Copy link
Member Author

reinrich commented Nov 3, 2020

/integrate

@openjdk openjdk bot closed this Nov 3, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 3, 2020
@openjdk
Copy link

openjdk bot commented Nov 3, 2020

@reinrich Since your change was applied there have been 185 commits pushed to the master branch:

  • b8d4e02: 8255374: Add a dropReturn MethodHandle combinator
  • 1d0bd50: 8254758: Change G1ServiceThread to be task based
  • 9a0cf58: 8255615: Zero: demote ZeroStack::abi_stack_available guarantee to assert
  • 904561e: 8255719: Zero: on return path, check for pending exception before attempting to clear it
  • 9bd836e: 8255744: Zero: handle JVM_CONSTANT_DynamicInError
  • 36998b0: 8255716: AArch64: Regression: JVM crashes if manually offline a core
  • 4107670: 8233562: [TESTBUG] Swing StyledEditorKit test bug4506788.java fails on MacOS
  • 9a36747: 8255780: Remove unused overloads of VMError::report_and_die()
  • c96a914: 8255662: ZGC: Unify nmethod closures in the heap iterator
  • aa2862a: 8255741: Zero: print signal name in unhandled signal handler
  • ... and 175 more: https://git.openjdk.java.net/jdk/compare/0aa3c92577687e02f941260dc0f176f9dcd4ae59...master

Your commit was automatically rebased without conflicts.

Pushed as commit 63461d5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this pull request Nov 3, 2020
…ed VMOutOfMemoryException is not thrown

Reviewed-by: cjplummer, sspitsyn, kvn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
5 participants