Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8280940: gtest os.release_multi_mappings_vm is racy #7288

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Jan 31, 2022

release_multi_mappings_vm is unfortunately racy.

The original intention of the test was to check that os::release_memory() works across multiple mappings allocated with multiple calls to os::reserve_memory(). This was broken for a long time on windows, since it relies on the implicit assumption that every platform uses mmap-ish APIs under the hood. But Windows virtual memory API (and SysV shmat, for that matter) does not work that way.

The release_multi_mappings_vm test

  • A reserves a number of mappings in 4M stripes adjacent to each other
  • B releases them with a single call to os::release_memory
  • C re-allocates a range at the same address

Step (C) will fail if the os::release call in (B) failed to release the mapping. Which it sometimes did silently, so just checking the return code in (B) was not sufficient.

Unfortunately, it will also fail if someone concurrently mapped into the range between (B) and (C). It's rare, but it happens.

This is difficult to make completely airtight, but we could make it much more stable:

  1. instead of releasing all stripes, just release the n middle stripes (n>1) and leave first and last stripe reserved. Then, in (C), re-reserve the middle stripes. Tests the same (as long as we have multiple middle stripes) but drastically decreases the chance of some random allocation placing memory into the vacated address hole
  2. reduce the stripe size from today's 4M to something much smaller. Again, reduces the chance of stray mappings being placed into the hole since it will be smaller too.

The patch does just that.
It also adds a lengthy comment.
I also needed to place the NUMASwitcher line in front of the last os::release_memory. That is done just to clean up everything before the tests ends, but now its a multi-mapping-release too (we now release front stripe, middle, and end stripe).

Tests:

  • GHAs
  • manually ran the test on x64 Linux
  • SAP nightlies

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8280940: gtest os.release_multi_mappings_vm is racy

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7288/head:pull/7288
$ git checkout pull/7288

Update a local copy of the PR:
$ git checkout pull/7288
$ git pull https://git.openjdk.java.net/jdk pull/7288/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7288

View PR using the GUI difftool:
$ git pr show -t 7288

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7288.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 31, 2022

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@tstuefe tstuefe force-pushed the JDK-8280940-gtest-os.release_multi_mappings_vm-is-racy branch from 817cafc to bc634f1 Compare January 31, 2022 12:22
@openjdk
Copy link

openjdk bot commented Jan 31, 2022

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jan 31, 2022
@tstuefe tstuefe force-pushed the JDK-8280940-gtest-os.release_multi_mappings_vm-is-racy branch 2 times, most recently from e0e73df to b35b2f0 Compare February 2, 2022 08:49
@tstuefe tstuefe force-pushed the JDK-8280940-gtest-os.release_multi_mappings_vm-is-racy branch from b35b2f0 to 36bcff1 Compare February 2, 2022 09:02
@tstuefe tstuefe marked this pull request as ready for review February 3, 2022 05:20
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 3, 2022
@mlbridge
Copy link

mlbridge bot commented Feb 3, 2022

Webrevs

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up.

You may want to correct 'NUMASwitchter' in the PR so that
any searches find the correct name.

// re-reserve it. This should work unless release failed.
address p2 = (address)os::attempt_reserve_memory_at((char*)p, total_range_len);
ASSERT_EQ(p2, p);
// ...re-reserve the middle stripes. This should work unless release failed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:
s/failed/failed silently/

just to be clear.

@openjdk
Copy link

openjdk bot commented Feb 3, 2022

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8280940: gtest os.release_multi_mappings_vm is racy

Reviewed-by: dcubed, sjohanss

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 161 new commits pushed to the master branch:

  • 9d0a4c3: 8274238: Inconsistent type for young_list_target_length()
  • 2604a88: 8281585: Remove unused imports under test/lib and jtreg/gc
  • 534e557: 8256368: Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers
  • 95f198b: 8274980: Improve adhoc build version strings
  • c61d629: 8281553: Ensure we only require liveness from mach-nodes with barriers
  • 2597206: 8280783: Parallel: Refactor PSCardTable::scavenge_contents_parallel
  • 2632d40: 8281637: Remove unused VerifyOption_G1UseNextMarking
  • 46f5229: 8281539: IGV: schedule approximation computes immediate dominators wrongly
  • 1ef45c5: 8280799: С2: assert(false) failed: cyclic dependency prevents range check elimination
  • 483d4b9: 8281505: Add CompileCommand PrintIdealPhase
  • ... and 151 more: https://git.openjdk.java.net/jdk/compare/be9f984caec32c3fe1deef30efe40fa115409ca0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 3, 2022
@tstuefe
Copy link
Member Author

tstuefe commented Feb 4, 2022

Thanks @dcubed-ojdk !

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up.

@@ -449,31 +449,54 @@ struct NUMASwitcher {
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a review (I'm not an expert in the relevant area), just a couple drive-by comments. However, GitHub UI won't let me comment on the parts that I want. So getting as close as I can.

(1) The TEST_VM line should be outdented.

(2) After 8277822, I think the tracking level is always going to be > NMT_off in a debug build, so we'll only be testing in product builds. That seems problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kim,

thanks for taking a look!

Not a review (I'm not an expert in the relevant area), just a couple drive-by comments. However, GitHub UI won't let me comment on the parts that I want. So getting as close as I can.

(1) The TEST_VM line should be outdented.

Sure.

(2) After 8277822, I think the tracking level is always going to be > NMT_off in a debug build, so we'll only be testing in product builds. That seems problematic.

We run the gtests in all NMT modes (off, summary, default), see:

/* @test id=nmt-off
* @summary Run NMT-related gtests with NMT switched off
* @library /test/lib
* @modules java.base/jdk.internal.misc
* java.xml
* @run main/native GTestWrapper --gtest_filter=NMT*:os* -XX:NativeMemoryTracking=off
*/

so we run these tests in debug builds with NMT=off. The NMT gtests have been introduced with 8256844 and extended to cover the os* tests with 8277822.

That said, I should take a look at 8263464, see if this still is a problem.

Cheers, Thomas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, https://bugs.openjdk.java.net/browse/JDK-8263464 is still a problem. This means that even though we can release multiple mappings with os::release_memory, NMT cannot cope with that.

AFAIK the only real example of releasing multiple segments in one go is Windows + UseNUMA, and there os::release_memory(), when called over multiple segments, just recursively releases the segments individually. So I believe this works with NMT. Still, it would be nice to fix this.

@zhengyu123 : do you think https://bugs.openjdk.java.net/browse/JDK-8263464 is difficult to fix? Should I take a stab at it or do you want to take a look?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to not be a way to run gtests with non-default -XX:whatever
arguments and the like. But it looks like you did something about that in
JDK-8251158. I had no idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) thanks. Yes, I was afraid of missing out on tests. Since then, we have embraced this pattern of XX-specific gtests. It's a nice way to get test coverage with not much effort.

@tstuefe
Copy link
Member Author

tstuefe commented Feb 14, 2022

No takers? Need a second review.

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Took the test for some repeated spins through our test-environment and all looks good.

@tstuefe
Copy link
Member Author

tstuefe commented Feb 14, 2022

Thanks @kstefanj @kimbarrett and @dcubed-ojdk!

/integrate

@openjdk
Copy link

openjdk bot commented Feb 14, 2022

Going to push as commit f07b816.
Since your change was applied there have been 161 commits pushed to the master branch:

  • 9d0a4c3: 8274238: Inconsistent type for young_list_target_length()
  • 2604a88: 8281585: Remove unused imports under test/lib and jtreg/gc
  • 534e557: 8256368: Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers
  • 95f198b: 8274980: Improve adhoc build version strings
  • c61d629: 8281553: Ensure we only require liveness from mach-nodes with barriers
  • 2597206: 8280783: Parallel: Refactor PSCardTable::scavenge_contents_parallel
  • 2632d40: 8281637: Remove unused VerifyOption_G1UseNextMarking
  • 46f5229: 8281539: IGV: schedule approximation computes immediate dominators wrongly
  • 1ef45c5: 8280799: С2: assert(false) failed: cyclic dependency prevents range check elimination
  • 483d4b9: 8281505: Add CompileCommand PrintIdealPhase
  • ... and 151 more: https://git.openjdk.java.net/jdk/compare/be9f984caec32c3fe1deef30efe40fa115409ca0...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 14, 2022
@openjdk openjdk bot closed this Feb 14, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 14, 2022
@openjdk
Copy link

openjdk bot commented Feb 14, 2022

@tstuefe Pushed as commit f07b816.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8280940-gtest-os.release_multi_mappings_vm-is-racy branch February 15, 2023 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
4 participants