Skip to content

JDK-8267371: Concurrent gtests take too long #4110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented May 19, 2021

More than half of the total runtime of the gtests is currently used by just three tests:

  • VirtualSpace.os_reserve_space_concurrent_vm
  • VirtualSpace.os_virtual_space_concurrent_vm
  • os_linux.reserve_memory_special_concurrent_vm

These tests do concurrent tests with 30 threads, with a timeout of 15 seconds each.

Since we run the gtests with several configurations (see hotspot/jtreg/gtests) these numbers multiply, which hurts especially when run as part of tier1. I think we can safely reduce the timeouts, still have good test coverage and recover some of the gtest runtime.


The patch reduces the timeout to 5 seconds, which reduces the time for one gtest by 30 seconds from 80 seconds down to 50 seconds.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4110/head:pull/4110
$ git checkout pull/4110

Update a local copy of the PR:
$ git checkout pull/4110
$ git pull https://git.openjdk.java.net/jdk pull/4110/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4110

View PR using the GUI difftool:
$ git pr show -t 4110

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4110.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 19, 2021

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 19, 2021
@openjdk
Copy link

openjdk bot commented May 19, 2021

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 19, 2021
@mlbridge
Copy link

mlbridge bot commented May 19, 2021

Webrevs

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I was kinda wondering the last time I looked at that test if 30 threads is excessive for the test as well.

@openjdk
Copy link

openjdk bot commented May 19, 2021

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8267371: Concurrent gtests take too long

Reviewed-by: shade, gziemski

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 23 new commits pushed to the master branch:

  • 66ab6d8: 8264181: javadoc tool Incorrect error message about malformed link
  • 99fcc41: 8234532: Remove ThreadLocalAllocBuffer::_fast_refill_waste since it is never set
  • 237fee8: 8267339: Temporarily disable os.release_multi_mappings_vm on macOS x64
  • 64e2479: 8267407: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java on linux-aarch64
  • 9760dba: 8267321: Use switch expression for VarHandle$AccessMode lookup
  • fdd0352: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287
  • 0b49f5a: 8267257: Shenandoah: Always deduplicate strings when it is enabled during full gc
  • 12050f0: 8266651: Convert Table method parameters from String to Content
  • e749f75: 8267304: Bump global JTReg memory limit to 768m
  • e858dd6: 8267361: JavaTokenizer reads octal numbers mistakenly
  • ... and 13 more: https://git.openjdk.java.net/jdk/compare/ff84577d72226da0bf1ce2c6d6852f3934feecf2...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 19, 2021
@tstuefe
Copy link
Member Author

tstuefe commented May 19, 2021

This looks good to me. I was kinda wondering the last time I looked at that test if 30 threads is excessive for the test as well.

I'm honestly not sure why these tests are run concurrently; it looks to me like there is no VM internal synchronization involved which we could test, this is basically just a bunch of parallel mmap()s. I may miss something here.

Had been introduced (as jtreg-started whitebox test) with https://bugs.openjdk.java.net/browse/JDK-8027237. The associated RFR thread does not discuss the run parameters.

@stefank Stefan, do you think we could reduce the number of Threads for these tests too?

@stefank
Copy link
Member

stefank commented May 19, 2021

@tstuefe Do whatever you think is appropriate for this test. I'm not particularly fond of having stress tests in our gtest suite, and would have preferred to not have them here.

@tstuefe
Copy link
Member Author

tstuefe commented May 19, 2021

Hi guys,

I am not convinced these tests make a lot of sense. I leave them in for now because they may be of use on AIX where we maintain global data structures and also possibly for stress-testing NMT (I work on JDK-8256844: "Make NMT late-initializable" which will make it possible to enable NMT in gtests, and thereby stress-testing NMT itself). Beyond that, they are just hitting mmap() without really testing VM infrastructure.

I disable reserve_memory_special_concurrent for non-large-page-runs. And I reduced the time to run further from 5 to 3 seconds, and the number of threads from 30 to 5.

Like @stefank I also think the gtests are the wrong place for these kind of tests. Jtreg tests are a better place for this, offering finer grained control and better thread facilities. I remember now we had this discussion when these tests were introduced (https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-March/049533.html).

Well, maybe we can do something about this in the future (split them out, remove them, move them back to jtreg...).

Thanks, Thomas

@gerard-ziemski
Copy link

Hi Thomas, you said The patch reduces the timeout to 5 seconds, which reduces the time for one gtest by 30 seconds from 80 seconds down to 50 seconds.

Does that mean that the 30 second saving come from some of the tests being timed out? Why then bother at all with them?

It does sound like they should move out of tier1...

@tstuefe
Copy link
Member Author

tstuefe commented May 19, 2021

Hi Thomas, you said The patch reduces the timeout to 5 seconds, which reduces the time for one gtest by 30 seconds from 80 seconds down to 50 seconds.

Does that mean that the 30 second saving come from some of the tests being timed out? Why then bother at all with them?

It does sound like they should move out of tier1...

Hi Gerard,

the test don't time out; they run a fixed amount of time. Sorry, "Timeout" was a bad phrasing here.

Moving them out of tier1 (or alternatively just getting rid of them) would be fine but as a separate RFE. The former would require either splitting gtests into several units, some tier1 worthy, some not; or to reimplement them as jtreg tests (as they had been before). Both is significantly more effort than I want to put into right now.

Thanks, Thomas

Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to take look at the ConcurrentTestRunner to make sure I understood the parameters:

ConcurrentTestRunner(TestRunnable* const runnableArg, int nrOfThreadsArg, long testDurationMillisArg)

Looks good, thank you for fixing it!

@tstuefe
Copy link
Member Author

tstuefe commented May 19, 2021

Had to take look at the ConcurrentTestRunner to make sure I understood the parameters:

ConcurrentTestRunner(TestRunnable* const runnableArg, int nrOfThreadsArg, long testDurationMillisArg)

Looks good, thank you for fixing it!

Thank you!

@gerard-ziemski
Copy link

Just one last comment, 3000 ms is 3 seconds, not 5 seconds like you claim, but I'm OK with 3 if that works.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. It seems to reduce "real CPU time" from 10 min to 1.5 min for CONF=linux-x86_64-server-fastdebug make run-test TEST=jtreg:gtest/LargePageGtests.java.

@tstuefe
Copy link
Member Author

tstuefe commented May 19, 2021

Just one last comment, 3000 ms is 3 seconds, not 5 seconds like you claim, but I'm OK with 3 if that works.

Yes, I downgraded it first to 5secs, then to 3secs after talking to stefank. Thanks for the review!

@tstuefe
Copy link
Member Author

tstuefe commented May 19, 2021

I like it. It seems to reduce "real CPU time" from 10 min to 1.5 min for CONF=linux-x86_64-server-fastdebug make run-test TEST=jtreg:gtest/LargePageGtests.java.

Great, thanks!

/integrate

@openjdk openjdk bot closed this May 19, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 19, 2021
@openjdk
Copy link

openjdk bot commented May 19, 2021

@tstuefe Since your change was applied there have been 24 commits pushed to the master branch:

  • 38d690b: 8265262: CITime - 'other' incorrectly calculated
  • 66ab6d8: 8264181: javadoc tool Incorrect error message about malformed link
  • 99fcc41: 8234532: Remove ThreadLocalAllocBuffer::_fast_refill_waste since it is never set
  • 237fee8: 8267339: Temporarily disable os.release_multi_mappings_vm on macOS x64
  • 64e2479: 8267407: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java on linux-aarch64
  • 9760dba: 8267321: Use switch expression for VarHandle$AccessMode lookup
  • fdd0352: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287
  • 0b49f5a: 8267257: Shenandoah: Always deduplicate strings when it is enabled during full gc
  • 12050f0: 8266651: Convert Table method parameters from String to Content
  • e749f75: 8267304: Bump global JTReg memory limit to 768m
  • ... and 14 more: https://git.openjdk.java.net/jdk/compare/ff84577d72226da0bf1ce2c6d6852f3934feecf2...master

Your commit was automatically rebased without conflicts.

Pushed as commit 9820f3d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8267371-concurrent-gtests-take-too-long branch June 29, 2021 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants