Skip to content

8331185: Enable compiler memory limits in debug builds#18969

Closed
tstuefe wants to merge 19 commits intoopenjdk:masterfrom
tstuefe:compiler-default-limit
Closed

8331185: Enable compiler memory limits in debug builds#18969
tstuefe wants to merge 19 commits intoopenjdk:masterfrom
tstuefe:compiler-default-limit

Conversation

@tstuefe
Copy link
Member

@tstuefe tstuefe commented Apr 26, 2024

See [1] for previous discussions.

We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing.

The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, -XX:CompileCommand=MemLimit,... overrules the default.

Examples:

This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: -XX:CompileCommand=MemLimit,java.lang.String::*,100m

This disables the default memlimit globally: -XX:CompileCommand=MemLimit,*.*,0


The patch:

  1. adds a debug-only default memory limit of 1GB (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert.
  2. To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record.
  3. Adapted and extended tests

I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of.

Tested:

  • manually on Mac m1 (debug and release)
  • GHAs are running
  • but Oracle will do more testing before this goes in

[1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8331185: Enable compiler memory limits in debug builds (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969
$ git checkout pull/18969

Update a local copy of the PR:
$ git checkout pull/18969
$ git pull https://git.openjdk.org/jdk.git pull/18969/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18969

View PR using the GUI difftool:
$ git pr show -t 18969

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18969.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 26, 2024

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 26, 2024

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8331185: Enable compiler memory limits in debug builds

Reviewed-by: asmehra, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 9 new commits pushed to the master branch:

  • 524aaad: 8319957: PhaseOutput::code_size is unused and should be removed
  • 95d2f80: 8330016: Stress seed should be initialized for runtime stub compilation
  • 5746137: 8331771: ZGC: Remove OopMapCacheAlloc_lock ordering workaround
  • 02c95a6: 6968351: httpserver clashes with delayed TCP ACKs for low Content-Length
  • 02a799c: 8331695: Serial: DefNewGeneration:_promotion_failed used without being initialized
  • 23a72a1: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer
  • a2584a8: 8331714: Make OopMapCache installation lock-free
  • df1ff05: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store()
  • 3b8227b: 8326836: Incorrect @since tags for ClassSignature methods

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title JDK-8331185: Enable compiler memory limits in debug builds 8331185: Enable compiler memory limits in debug builds Apr 26, 2024
@openjdk
Copy link

openjdk bot commented Apr 26, 2024

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 26, 2024
@tstuefe tstuefe marked this pull request as ready for review April 26, 2024 11:31
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 26, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 26, 2024

@vnkozlov
Copy link
Contributor

I submitted our testing.

Why you did not merge your latest JDK-8330625 change? Patch was applied with offsets. So I will test latest JDK.

Comment on lines 2 to 3
* Copyright (c) 2023, 2024 Red Hat, Inc. All rights reserved.
* Copyright (c) 2023, 2024 Oracle and/or its affiliates. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma , after second year. Files header verifier failed for both test files.

@vnkozlov
Copy link
Contributor

compiler/c2/TestFindNode.java and compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java failed on aarch64 when run with stress flags:

-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers

#  Internal Error (/workspace/open/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:551), pid=912622, tid=912639
#  fatal error: c2 compiler/c2/TestFindNode::test(()V): Hit MemLimit (limit: 1073741824 now: 1073824544)

V  [libjvm.so+0x97496c]  report_fatal(VMErrorType, char const*, int, char const*, ...)+0x108  (debug.cpp:214)
V  [libjvm.so+0x8a9e2c]  CompilationMemoryStatistic::on_arena_change(long, Arena const*)+0x5bc  (compilationMemoryStatistic.cpp:551)
V  [libjvm.so+0x58ee70]  Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x10c  (arena.cpp:300)
V  [libjvm.so+0x13eb86c]  PhaseChaitin::Split(unsigned int, ResourceArea*)+0x3ac  (resourceArea.inline.hpp:35)
V  [libjvm.so+0x798a44]  PhaseChaitin::Register_Allocate()+0x554  (chaitin.cpp:553)
V  [libjvm.so+0x8ced88]  Compile::Code_Gen()+0x284  (compile.cpp:2988)
V  [libjvm.so+0x8d1128]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1588  (compile.cpp:896)
V  [libjvm.so+0x72d480]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x17c  (c2compiler.cpp:142)

Looks like Register Allocator eats memory.

@vnkozlov
Copy link
Contributor

There also few closed tests which failed with -XX:-TieredCompilation -XX:+AlwaysIncrementalInline on x64 and aarch64:

V  [libjvm.so+0xa90b95]  report_fatal(VMErrorType, char const*, int, char const*, ...)+0x105  (debug.cpp:214)
V  [libjvm.so+0x9c06dd]  CompilationMemoryStatistic::on_arena_change(long, Arena const*)+0x5cd  (compilationMemoryStatistic.cpp:551)
V  [libjvm.so+0x5f45fc]  Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x10c  (arena.cpp:300)
V  [libjvm.so+0x14e4706]  Parse::init_blocks()+0x46  (parse1.cpp:1290)
V  [libjvm.so+0x14f0aff]  Parse::Parse(JVMState*, ciMethod*, float)+0x52f  (parse1.cpp:565)
V  [libjvm.so+0x8412b9]  ParseGenerator::generate(JVMState*)+0x169  (callGenerator.cpp:99)
V  [libjvm.so+0x84482a]  PredictedCallGenerator::generate(JVMState*)+0x3aa  (callGenerator.cpp:928)
V  [libjvm.so+0x8466de]  CallGenerator::do_late_inline_helper()+0x90e  (callGenerator.cpp:704)
V  [libjvm.so+0x9e2de4]  Compile::inline_incrementally_one()+0xd4  (compile.cpp:2054)
V  [libjvm.so+0x9e3b32]  Compile::inline_incrementally(PhaseIterGVN&)+0x292  (compile.cpp:2137)
V  [libjvm.so+0x9e5b50]  Compile::Optimize()+0x340  (compile.cpp:2272)
V  [libjvm.so+0x9e9c60]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b50  (compile.cpp:863)
V  [libjvm.so+0x83ed15]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5  (c2compiler.cpp:142)

@vnkozlov
Copy link
Contributor

Closed test passed with 1200M limit

@tstuefe
Copy link
Member Author

tstuefe commented Apr 27, 2024

@vnkozlov How do you want to handle this? Are these memory usage numbers pathological or normal?

Should I increase the general limit to 1200M? Alternatively, we can also just run failing the test with a higher memory limit.

@vnkozlov
Copy link
Contributor

vnkozlov commented Apr 27, 2024

@vnkozlov How do you want to handle this? Are these memory usage numbers pathological or normal?

Should I increase the general limit to 1200M? Alternatively, we can also just run failing the test with a higher memory limit.

The closed failed test has very long chain of lambda forms and I don't think memory consumption is pathological. May be something could be done to improve it (as enhancement RFE) but it is not urgent. I will help you with increase limit for it.

I did not check which limit will allow to pass failed open tests. I will ask you to investigate and increase limit for them for now and file bugs to investigate the cause.

I also need to run later tiers (I ran only up to tier5) before integration. If I find more failing cases we may consider increase default limit.

@vnkozlov
Copy link
Contributor

I did not find any other failures in tier6-9. So let's adjust limit and file bugs for failed tests I found before. And keep default limit 1Gb.

@tstuefe
Copy link
Member Author

tstuefe commented Apr 29, 2024

I submitted our testing.

Why you did not merge your latest JDK-8330625 change? Patch was applied with offsets. So I will test latest JDK.

Ah forgot. Sorry. Thanks for testing with the latest head.

@tstuefe
Copy link
Member Author

tstuefe commented Apr 29, 2024

@vnkozlov Opened a follow up issue for compiler/c2/TestFindNode.java : https://bugs.openjdk.org/browse/JDK-8331283

@tstuefe
Copy link
Member Author

tstuefe commented Apr 29, 2024

@vnkozlov ... but I was unable to reproduce the problem with compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java on aarch64. Memory usage during compilation of the test method is ~170MB, which is fine.

I tried to reproduce it on both MacOS m1, and on a Raspberry 4 with 64bit linux.

The annoying part is that when we assert the memlimit during usage, we do not know how large the limit would have been had we continued. We could add a third option to the memlimit command that would assert, but only after we finished the compilation. However, that option would mean we lose the call stack when we reached the memlimit, which would be a valuable indicator of which code uses the memory. I also hesitate to make the compiler memory statistic more complex than it is.

@tstuefe
Copy link
Member Author

tstuefe commented Apr 29, 2024

Opened a bug to track the memory limit break in compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java. Since I cannot reproduce it, someone at Oracle should look at this: https://bugs.openjdk.org/browse/JDK-8331295

@vnkozlov
Copy link
Contributor

Thank you, @tstuefe, for filing these bugs.

One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that?

@tstuefe
Copy link
Member Author

tstuefe commented Apr 30, 2024

Thank you, @tstuefe, for filing these bugs.
One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that?

Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly.
The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion.

Good. I think we need to push it before this PR.

@vnkozlov Openend PR: #19005

@openjdk
Copy link

openjdk bot commented May 2, 2024

@tstuefe this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout compiler-default-limit
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label May 2, 2024
// Method 1 should show up as "ok" and with the default limit, e.g.
// total NA RA result #nodes limit time type #rc thread method
// 32728 0 32728 ok - 1024M 0.045 c1 1 0x000000011b019c10 compiler/print/CompileCommandMemLimit$TestMain::method1(()J)
oa.shouldMatch("\\d+ +\\d+ +\\d+ +ok +" + numberNodesRegex + " +" + implicitMemoryLimit + " +.* +" + method1regex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor suggestion regarding the regex. I find "\s+" more readable than " +" to match multiple spaces.

@ashu-mehra
Copy link
Contributor

Just one suggestion which you may pick or ignore. Otherwise looks good.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label May 3, 2024
@tstuefe
Copy link
Member Author

tstuefe commented May 3, 2024

Just one suggestion which you may pick or ignore. Otherwise looks good.

Many thanks, @ashu-mehra !

I will actually ignore your suggestion, because I want the expression to only match spaces precisely, not whitespaces. But for any-whitespace, I usually do as you suggest.

@tstuefe
Copy link
Member Author

tstuefe commented May 3, 2024

@vnkozlov SAP did a test series and did not find any issues in their CI

@vnkozlov
Copy link
Contributor

vnkozlov commented May 3, 2024

Okay, I will run our testing too.

@vnkozlov
Copy link
Contributor

vnkozlov commented May 4, 2024

Looks like memlimit,TestFindNode::test,0 does not work. The test failed with stress flags JDK-8331283 on linux-aarch64 (Ampere). With the same call stack. I see -XX:CompileCommand=memlimit,TestFindNode::test,0 in flags passed to test.

@vnkozlov
Copy link
Contributor

vnkozlov commented May 4, 2024

I attached hs_err file to JDK-8331283

@tstuefe
Copy link
Member Author

tstuefe commented May 4, 2024

Looks like memlimit,TestFindNode::test,0 does not work. The test failed with stress flags JDK-8331283 on linux-aarch64 (Ampere). With the same call stack. I see -XX:CompileCommand=memlimit,TestFindNode::test,0 in flags passed to test.

I fixed the error, a simple typo (forgot to properly name the class in the option). Retested locally on Mac m1, confirmed that the test passes with this commit, fails without it. I am not sure what went wrong, since I did these tests beforehand. Maybe I pushed the wrong version.

That is slightly concerning, however, since the error should have come up at SAP too. I guess they don't test with all these stress options.

@vnkozlov
Copy link
Contributor

vnkozlov commented May 4, 2024

-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print - leftover from debugging?

@tstuefe
Copy link
Member Author

tstuefe commented May 7, 2024

-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print - leftover from debugging?

I tend to leave debug output in, if its not too large, to speed up any follow-up fixes I need to do later. But then, I did not do it consistently anyway, so I removed the output.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 7, 2024
@tstuefe
Copy link
Member Author

tstuefe commented May 8, 2024

Many thanks, @vnkozlov !

/integrate

@openjdk
Copy link

openjdk bot commented May 8, 2024

Going to push as commit ad78b7f.
Since your change was applied there have been 23 commits pushed to the master branch:

  • aafa15f: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails
  • edd47c1: 8308033: The jcmd thread dump related tests should test virtual threads
  • 1aebab7: 8320995: RISC-V: C2 PopCountVI
  • 0eff492: 8330278: Have SSLSocketTemplate.doClientSide use loopback address
  • c6f611c: 8331798: Remove unused arg of checkErgonomics() in TestMaxHeapSizeTools.java
  • 0e1dca7: 8331715: Remove virtual specifiers in ContiguousSpace
  • 7f29904: 8330005: RandomGeneratorFactory.getDefault() throws exception when the runtime image only has java.base module
  • 2baacfc: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool'
  • 7b79426: 8278353: Provide Duke as default favicon in Simple Web Server
  • 466a21d: 8331863: DUIterator_Fast used before it is constructed
  • ... and 13 more: https://git.openjdk.org/jdk/compare/f308e107ce8b993641ee3d0a0d5d52bf5cd3b94e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 8, 2024
@openjdk openjdk bot closed this May 8, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 8, 2024
@openjdk
Copy link

openjdk bot commented May 8, 2024

@tstuefe Pushed as commit ad78b7f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants