Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8255917: runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region" #1750

Closed
wants to merge 3 commits into from

Conversation

yminqi
Copy link
Contributor

@yminqi yminqi commented Dec 11, 2020

Hi, Please review
(This is redo for #1657)
Windows mapping for file into memory could not happen to reserved memory. In mapping CDS archive we first reserve enough memory then before mapping, release them. For cds archive and using class space, need split the whole space into two spaces. To do so, we need release the whole first then do the reserve again on the split, which is problematic that there is possibility other thread or system can kick in to take the released space.
The fix is the first step of two steps:

  1. Do not split reserved memory;
  2. Remove splitting memory.
    This fix is first step, for Windows and use requested mapping address, reserved for cds archive and ccs on a contiguous space separately, so there is no need to call split. If any reservation failed, go to other way, but do not do the 'real' split for the whole reserved space, keep the whole region reserved and released as a whole.
    Also fixed issues that when loading shared archive failed, bitmap region should be unmapped or it will cause mismatch in reserved/committed size calculation for NMT.
    Fixed reserved region name for adding committed region for NMT, it should use the reserved region name not "Unknown" the default.
    A test case added for testing the failed case which is caused by mismatch of class path.

Tests:tier1-5,tier7


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8255917: runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region"

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1750/head:pull/1750
$ git checkout pull/1750

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 11, 2020

👋 Welcome back minqi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 11, 2020
@openjdk
Copy link

openjdk bot commented Dec 11, 2020

@yminqi The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 11, 2020
@mlbridge
Copy link

mlbridge bot commented Dec 11, 2020

Webrevs

* @requires vm.cds
* @library /test/lib
* @compile test-classes/Hello.java
* @run main/timeout=240 MismatchedPathTriggerMemoryRelease
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test requires more than the default timeout (120s) to run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test requires more than the default timeout (120s) to run?
No, I will fix it. Thanks.

Comment on lines +56 to +61
execOutput = TestCommon.exec("non-exist.jar",
"-Xshare:auto",
"-Xlog:os,cds=debug",
"-XX:NativeMemoryTracking=detail",
"-XX:SharedBaseAddress=0",
"Hello");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of TestCommon.exec, you could use TestCommon.execAuto and no need to pass the -Xshare:auto argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to execAuto, remove "-Xshare:auto", it failed on "Error: Could not find or load main class non-exist.jar" so i will keep original version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to add "-cp" before "non-exist.jar" for execAuto to work.
I'm fine to leave it as is.

Copy link
Member

@calvinccheung calvinccheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment regarding the test. Up to you if you want to make the change.

@@ -1723,6 +1723,9 @@ bool os::release_memory(char* addr, size_t bytes) {
} else {
res = pd_release_memory(addr, bytes);
}
if (!res) {
log_info(os)("os::release_memory(" PTR_FORMAT ", " SIZE_FORMAT ") failed", p2i(addr), bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use "os::release_memory failed (" PTR_FORMAT ", " SIZE_FORMAT ")". That way it's easy to match the error message in the test case with a simple substring test of "os::release_memory failed". Otherwise it's hard to see that the regexp in the test "os::release_memory\\(0x[0-9a-fA-F]*,\\s[0-9]*\\)\\sfailed" indeed would match the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Will make the check in test simple. (I have tested the pattern using online java pattern match and a local test program for it).

Thanks
Yumin

Copy link
Member

@iklam iklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk
Copy link

openjdk bot commented Dec 14, 2020

@yminqi This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255917: runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region"

Reviewed-by: ccheung, iklam, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 24 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 14, 2020
Copy link
Member

@calvinccheung calvinccheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@yminqi
Copy link
Contributor Author

yminqi commented Dec 14, 2020

@iklam @calvinccheung Thanks for review. I will wait for @tstuefe input for integration.

@tstuefe
Copy link
Member

tstuefe commented Dec 14, 2020

Hi Yumin,

I don't understand the special handling in NMT.

Unix: We reserve a region, split it, tell NMT about the split (see posix version of os::split_reserved_memory), then later release both parts separately.
Windows:

  1. use_archive_base_addr = true: we allocate the spaces separately, later release them separately
  2. use_archive_base_addr = false: we allocate total space, leave it unsplit, then later release the total space.

Where is this new NMT coding needed?

Thanks, Thomas

@yminqi
Copy link
Contributor Author

yminqi commented Dec 14, 2020

Hi, Thomas

I don't understand the special handling in NMT.

Unix: We reserve a region, split it, tell NMT about the split (see posix version of os::split_reserved_memory), then later release both parts separately.
There is no problem for Linux/Unix here since split does not do the split.
Windows:

  1. use_archive_base_addr = true: we allocate the spaces separately, later release them separately
    Yes, the two spaces are released separately and NMT will find exact matched regions to remove.
  2. use_archive_base_addr = false: we allocate total space, leave it unsplit, then later release the total space.

The space reserved as a whole, and we did not do the 'split' on it. The release should be on whole too. When unmapping regions, the released on region in fact did not do anything in nmt since it is within the archive space. Since the new code is release on the whole space (there is no call to release on class_space_rs in this case), so it check if it is > archive_space_rs (with tag of mtClassShared) we know it is for releasing the whole space. The space which is bigger than archive_space_rs with tag of mtClassShared should be only the whole space.

Where is this new NMT coding needed?

The code is to make the bytes in snapshot correct. That, when all the spaces release, the bytes in the reserved slot should be 'zero'.

Thanks for review.

@tstuefe
Copy link
Member

tstuefe commented Dec 15, 2020

Hi, Thomas

I don't understand the special handling in NMT.
Unix: We reserve a region, split it, tell NMT about the split (see posix version of os::split_reserved_memory), then later release both parts separately.
There is no problem for Linux/Unix here since split does not do the split.
Windows:

  1. use_archive_base_addr = true: we allocate the spaces separately, later release them separately
    Yes, the two spaces are released separately and NMT will find exact matched regions to remove.
  2. use_archive_base_addr = false: we allocate total space, leave it unsplit, then later release the total space.

The space reserved as a whole, and we did not do the 'split' on it. The release should be on whole too. When unmapping regions, the released on region in fact did not do anything in nmt since it is within the archive space. Since the new code is release on the whole space (there is no call to release on class_space_rs in this case), so it check if it is > archive_space_rs (with tag of mtClassShared) we know it is for releasing the whole space. The space which is bigger than archive_space_rs with tag of mtClassShared should be only the whole space.

Okay. Lets see if I understand:

For (2) and for Unix, we reserve the total area. Then we tell NMT to semantically split it up in two regions (MemTracker::record_virtual_memory_split_reserved...) - later we tag the first one with "mtClassShared", the second with "mtClass". Then, if initializing the archive fails, we release now the total area. Since NMT has only the split view, it finds the mtClassShared entry at base address which is of smaller size. We run into the if (size > reserved_rgn->size()) { case in virtualMemoryTracker.cpp:517. We release both separately.

Right?

Also: the section at virtualMemoryTracker.cpp:509 if (reserved_rgn->contain_region(addr, size)) { should only be needed for Posix, right? We never should run into the problem that we unmap a section within another section on Windows? So, in theory, we should be able to make this a WINDOWS_ONLY(ShouldNotReachHere()) ?

--

Going forward, I wonder whether NMT should have general support for releasing multiple mappings in one go. Since we introduced the concept of "artificial split" with MemTracker::record_virtual_memory_split_reserved... we may want to have now a corresponding "multi-release". Or, Idk, maybe just remove the MemTracker::record_virtual_memory_split_reserved and the corresponding code in NMT release? and just live with the fact that for a "shallow split" the numbers in NMT are accounted to the wrong tag (mtClassShared gets mtClass).

Where is this new NMT coding needed?

The code is to make the bytes in snapshot correct. That, when all the spaces release, the bytes in the reserved slot should be 'zero'.

Thanks for review.

The rest looks good AFAICS.

For JDK17, I really hope we can simplify and streamline this coding. Maybe if we consider one or two things:

  • just do File IO always on Windows? I did make a test and I may be wrong but I could not measure any startup performance loss compared to the multiple mappings.
  • instead of many little mmap calls, just mmap the whole archive right at reservation time alongside the class space reservation. That would increase the chance that everything goes well, and reduce the complexity of the cleanup if it doesn't.

What do you think, does that make sense?

Thanks, Thomas

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yminqi
Copy link
Contributor Author

yminqi commented Dec 15, 2020

@tstuefe Thanks for review, next os::split_reserved_memory will be removed (https://bugs.openjdk.java.net/browse/JDK-8256213).

@yminqi
Copy link
Contributor Author

yminqi commented Dec 15, 2020

/integrate

@openjdk openjdk bot closed this Dec 15, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 15, 2020
@openjdk
Copy link

openjdk bot commented Dec 15, 2020

@yminqi Since your change was applied there have been 27 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit 36e2097.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@yminqi yminqi deleted the jdk-8255917 branch December 15, 2020 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
4 participants