Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8267246: -XX:MaxRAMPercentage=0 is unreasonable for jtreg tests on many-core machines #4062

Closed
wants to merge 2 commits into from

Conversation

@DamonFool
Copy link
Member

@DamonFool DamonFool commented May 17, 2021

Hi all,

vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java fails on our many-core machines due to -XX:MaxRAMPercentage=0.
This is because MaxRAMPercentage will be always 0 if JTREG_JOBS > 25 [1].

It can be reproduced by: make test TEST="vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java" JTREG="JOBS=26" on almost all machines.

Setting -XX:MaxRAMPercentage=0 on many-core machines seems unreasonable.
It would be better to fix it.

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/make/RunTests.gmk#L741


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8267246: -XX:MaxRAMPercentage=0 is unreasonable for jtreg tests on many-core machines

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4062/head:pull/4062
$ git checkout pull/4062

Update a local copy of the PR:
$ git checkout pull/4062
$ git pull https://git.openjdk.java.net/jdk pull/4062/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4062

View PR using the GUI difftool:
$ git pr show -t 4062

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4062.diff

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 17, 2021

/test

Loading

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 17, 2021

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 17, 2021

/label add build
/cc build

Loading

@openjdk
Copy link

@openjdk openjdk bot commented May 17, 2021

@DamonFool
The build label was successfully added.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented May 17, 2021

@DamonFool The build label was already applied.

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented May 17, 2021

Webrevs

Loading

@openjdk
Copy link

@openjdk openjdk bot commented May 17, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until 94aac13

Loading

@openjdk openjdk bot added the test-request label May 17, 2021
Copy link
Contributor

@shipilev shipilev left a comment

Wait, no. That would mean on large core machines, the sum of heap sizes would be more than physical memory size. That is, 1% multiplied over >100 JTREG_JOBS would be >100%. Since MaxRAMPercentage is lower priority than -Xmx, and seeing that test does not expect heap OOME, why not just put the explicit heap size in that test?

diff --git a/test/hotspot/jtreg/vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java b/test/hotspot/jtreg/vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java
index 0d5f1a1626f..4ee794fb79d 100644
--- a/test/hotspot/jtreg/vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java
+++ b/test/hotspot/jtreg/vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java
@@ -36,7 +36,7 @@
  * @build vm.mlvm.anonloader.stress.oome.metaspace.Test
  * @run driver vm.mlvm.share.IndifiedClassesBuilder
  *
- * @run main/othervm -XX:-UseGCOverheadLimit -XX:MetaspaceSize=10m -XX:MaxMetaspaceSize=20m vm.mlvm.anonloader.stress.oome.metaspace.Test
+ * @run main/othervm -Xmx1g -XX:-UseGCOverheadLimit -XX:MetaspaceSize=10m -XX:MaxMetaspaceSize=20m vm.mlvm.anonloader.stress.oome.metaspace.Test
  */
 
 package vm.mlvm.anonloader.stress.oome.metaspace;

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented May 17, 2021

...maybe even -Xmx512m, you need to see what works.

Loading

@erikj79
Copy link
Member

@erikj79 erikj79 commented May 17, 2021

The option -XX:MaxRAMPercentage can't really scale up properly unless it accepts values lower than 1. Not sure what to do about this. Even before hitting 0, we get very clunky behavior due to rounding at lower values.

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented May 17, 2021

-XX:MaxRAMPercentage is actually double, so it can accept values below 1. The last time I looked into trouble like this, it was a problem with doing floating-point division in the make files -- not sure if something can be done about that.

But my point is that the failing test -- is that the only test that fails? -- expects some heap size to accommodate Java allocations until the expected Metaspace OOM happens. In that case, the fix should be in the test itself. Because even if we do +1 to MaxRAMPercentage, it might still be not enough.

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented May 17, 2021

Mailing list message from erik.joelsson at oracle.com on build-dev:

On 2021-05-17 10:19, Aleksey Shipilev wrote:

On Mon, 17 May 2021 13:24:16 GMT, Jie Fu <jiefu at openjdk.org> wrote:

Hi all,

vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java fails on our many-core machines due to `-XX:MaxRAMPercentage=0`.
This is because `MaxRAMPercentage` will be always 0 if JTREG_JOBS > 25 [1].

It can be reproduced by: `make test TEST="vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java" JTREG="JOBS=26"` on almost all machines.

Setting `-XX:MaxRAMPercentage=0` on many-core machines seems unreasonable.
It would be better to fix it.

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/make/RunTests.gmk#L741
`-XX:MaxRAMPercentage` is actually `double`, so it can accept values below 1. The last time I looked into trouble like this, it was a problem with doing floating-point division in the make files -- not sure if something can be done about that.

Oh, if it's double, we can just switch to using awk to make the
calculation, just like we do for TEST_JOBS. I just did a quick check and
it produces float values.

But my point is that the failing test -- is that the only test that fails? -- expects some heap size to accommodate Java allocations until the expected Metaspace OOM happens. In that case, the fix should be in the test itself. Because even if we do +1 to `MaxRAMPercentage`, it might still be not enough.

If this test has special needs, those should of course be handled by the
test itself.

/Erik

Loading

1 similar comment
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 17, 2021

Mailing list message from erik.joelsson at oracle.com on build-dev:

On 2021-05-17 10:19, Aleksey Shipilev wrote:

On Mon, 17 May 2021 13:24:16 GMT, Jie Fu <jiefu at openjdk.org> wrote:

Hi all,

vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java fails on our many-core machines due to `-XX:MaxRAMPercentage=0`.
This is because `MaxRAMPercentage` will be always 0 if JTREG_JOBS > 25 [1].

It can be reproduced by: `make test TEST="vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java" JTREG="JOBS=26"` on almost all machines.

Setting `-XX:MaxRAMPercentage=0` on many-core machines seems unreasonable.
It would be better to fix it.

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/make/RunTests.gmk#L741
`-XX:MaxRAMPercentage` is actually `double`, so it can accept values below 1. The last time I looked into trouble like this, it was a problem with doing floating-point division in the make files -- not sure if something can be done about that.

Oh, if it's double, we can just switch to using awk to make the
calculation, just like we do for TEST_JOBS. I just did a quick check and
it produces float values.

But my point is that the failing test -- is that the only test that fails? -- expects some heap size to accommodate Java allocations until the expected Metaspace OOM happens. In that case, the fix should be in the test itself. Because even if we do +1 to `MaxRAMPercentage`, it might still be not enough.

If this test has special needs, those should of course be handled by the
test itself.

/Erik

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented May 17, 2021

Mailing list message from Aleksey Shipilev on build-dev:

On 5/17/21 7:30 PM, erik.joelsson at oracle.com wrote:

Oh, if it's double, we can just switch to using awk to make the
calculation, just like we do for TEST_JOBS. I just did a quick check and
it produces float values.

Oh, cool. Having a more precise MaxRAMPercentage would be nice.

--
Thanks,
-Aleksey

Loading

1 similar comment
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 17, 2021

Mailing list message from Aleksey Shipilev on build-dev:

On 5/17/21 7:30 PM, erik.joelsson at oracle.com wrote:

Oh, if it's double, we can just switch to using awk to make the
calculation, just like we do for TEST_JOBS. I just did a quick check and
it produces float values.

Oh, cool. Having a more precise MaxRAMPercentage would be nice.

--
Thanks,
-Aleksey

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 17, 2021

Thanks @shipilev and @erikj79 for your review and nice suggestions.

I filed JDK-8267293 to handle the test failure and will fix the unreasonable setting of MaxRAMPercentage in this PR.

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 18, 2021

Mailing list message from Aleksey Shipilev on build-dev:

On 5/17/21 7:30 PM, erik.joelsson at oracle.com wrote:

Oh, if it's double, we can just switch to using awk to make the
calculation, just like we do for TEST_JOBS. I just did a quick check and
it produces float values.

Oh, cool. Having a more precise MaxRAMPercentage would be nice.

--
Thanks,
-Aleksey

Hi @erikj79 and @shipilev ,

Patch has been updated to using awk to make the calculation.
Thanks.

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 18, 2021

/test

Loading

@openjdk
Copy link

@openjdk openjdk bot commented May 18, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until a1e7aea

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented May 18, 2021

I ran tier1 and tier2 on my 3970X, which usually runs with -XX:MaxRAMPercentage=0 due to having 32 cores, and the tests completed fine. It now runs with -XX:MaxRAMPercentage=0.78125.

Loading

Copy link
Contributor

@shipilev shipilev left a comment

I am good with this change, but someone else should take a look as well.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented May 18, 2021

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8267246: -XX:MaxRAMPercentage=0 is unreasonable for jtreg tests on many-core machines

Reviewed-by: shade, erikj

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • fadf580: 8262952: [macos_aarch64] os::commit_memory failure
  • f8f40ab: 8230486: G1BarrierSetAssembler::g1_write_barrier_post unnecessarily pushes/pops new_val
  • 9d168e2: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses
  • 02507bc: 8267166: Remove test file vmTestbase/vm/mlvm/tools/LoadClass.java
  • ce88b33: 8266615: C2 incorrectly folds subtype checks involving an interface array
  • 894547d: 8266897: com/sun/net/httpserver/FilterTest.java fails intermittently with AssertionError
  • da7c846: 8264408: test_oopStorage no longer needs to disable some tests on WIN32
  • f6c2891: 8267229: Split runtime/Metaspace/elastic test configurations for better scalability
  • b60975d: 8267237: ARM32: bad AD file in matcher.cpp after 8266810
  • 905b41a: 8265711: C1: Intrinsify Class.getModifier method
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/dd5a84c68c4f6128c3568c6f4fc1302c6aaadf01...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label May 18, 2021
Copy link
Member

@erikj79 erikj79 left a comment

Looks good to me, thanks for fixing this!

I took the change for a spin internally running our tier1-3.

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 18, 2021

Thanks @shipilev and @erikj79 for your review.

I'll push it tomorrow since JDK-8267293 and JDK-8267311 are easier to be reproduced with the original code.
Thanks.

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 19, 2021

I'll push it tomorrow since JDK-8267293 and JDK-8267311 are easier to be reproduced with the original code.

We can still easily reproduce JDK-8267311 as before with this patch.
So push it.

/integrate

Loading

@openjdk openjdk bot closed this May 19, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels May 19, 2021
@openjdk
Copy link

@openjdk openjdk bot commented May 19, 2021

@DamonFool Since your change was applied there have been 27 commits pushed to the master branch:

  • 324defe: 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use"
  • bdbe23b: 8265462: Handle multiple slots in the NSS Internal Module from SunPKCS11's Secmod
  • 10236e7: 8263242: serviceability/sa/ClhsdbFindPC.java cannot find MaxJNILocalCapacity with ASLR
  • e6705c0: 8266949: Check possibility to disable OperationTimedOut on Unix
  • b92c5a4: 8265292: [macos_aarch64] java/foreign/TestDowncall.java crashes with SIGBUS
  • fadf580: 8262952: [macos_aarch64] os::commit_memory failure
  • f8f40ab: 8230486: G1BarrierSetAssembler::g1_write_barrier_post unnecessarily pushes/pops new_val
  • 9d168e2: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses
  • 02507bc: 8267166: Remove test file vmTestbase/vm/mlvm/tools/LoadClass.java
  • ce88b33: 8266615: C2 incorrectly folds subtype checks involving an interface array
  • ... and 17 more: https://git.openjdk.java.net/jdk/compare/dd5a84c68c4f6128c3568c6f4fc1302c6aaadf01...master

Your commit was automatically rebased without conflicts.

Pushed as commit 0daec49.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants