Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8255917: runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region" #1657

Closed
wants to merge 35 commits into from

Conversation

@yminqi
Copy link
Contributor

@yminqi yminqi commented Dec 7, 2020

Hi, Please review
Windows mapping for file into memory could not happen to reserved memory. In mapping CDS archive we first reserve enough memory then before mapping, release them. For cds archive and using class space, need split the whole space into two, that is, release the whole reserved space and do reservation to the two split spaces again, which is problematic that there is possibility other thread or system can kick in to take the released space.
The fix is the first step of two steps:

  1. Do not split reserved memory;
  2. Remove splitting memory.
    This fix is first step, for Windows and use requested mapping address, reserved for cds archive and ccs on a contiguous space separately, so there is no need to call split. If any reservation failed, release them, go to other way, but do not do the 'real' split either. For Windows (and using class space), the reserved space will be released anyway.

Tests:tier1-5,tier7


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8255917: runtime/cds/SharedBaseAddress.java failed "assert(reserved_rgn != 0LL) failed: No reserved region"

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1657/head:pull/1657
$ git checkout pull/1657

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 7, 2020

👋 Welcome back minqi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Dec 7, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Dec 7, 2020

@yminqi The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 7, 2020

Webrevs

Copy link
Member

@tstuefe tstuefe left a comment

Hi Yumin,

small request wrt to commenting.

We should test this with all permutations of NativeMemoryTracking and Xshare. But I guess the standard tests do this already?

Cheers, Thomas

src/hotspot/share/memory/metaspaceShared.cpp Outdated Show resolved Hide resolved
@yminqi
Copy link
Contributor Author

@yminqi yminqi commented Dec 7, 2020

Hi Yumin,

small request wrt to commenting.

We should test this with all permutations of NativeMemoryTracking and Xshare. But I guess the standard tests do this already?

Cheers, Thomas

Thanks for the review! There are permutation on standard tests, I will give another look if they cover all the cases again. I think it is enough so no need for a new test case but will check again.

Thanks
Yumin

// the gap reserved at the end of the archive space.
archive_space_rs = total_rs.first_part(ccs_begin_offset,
(size_t)os::vm_allocation_granularity(),
/*split=*/false);
Copy link
Member

@iklam iklam Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since /*split=*/false is passed, the region is no longer split using os::split_reserved_memory(). This means on Windows, we cannot use os::release_memory() to free the individual regions of archive_space_rs and class_space_rs.

The Windows API of VirtualFree says:

Releases the specified region of pages, or placeholder [...], dwSize must be 0 (zero), and lpAddress must point to the base address returned by the VirtualAlloc function when the region is reserved. The function fails if either of these conditions is not met.

I suggest we do this:

  • Add an extra ReservedSpace& total_rs parameter to MetaspaceShared::reserve_address_space_for_archives(). Return the total_rs when we go through this path of the code.
  • Also pass total_rs to MetaspaceShared::release_reserved_spaces. If total_rs.is_reserverd() is true, release total_rs instead of the two smaller spaces.

To make sure this PR is correct, we should add something like the following in os::release_memory(), and check for this log in test/hotspot/jtreg/runtime/cds/SharedBaseAddress.java:

if (!res) {
  log_info(os)("os::release_memory(" PTR_FORMAT ", " SIZE_FORMAT ") failed", p2i(addr), bytes);
}

Perhaps, in a separate RFE, we should add an assert in os::release_memory(), or at least change the log_info to log_warning.

Copy link
Member

@tstuefe tstuefe Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ioi,

Where is the release you worry about? Since on Windows, since 8255978, os::release_memory() notices if the region the caller wants release does not correspond exactly to a memory mapping at the OS level, and will assert. Do you see that assert?

I try to understand:

if useBaseAddress==true, Yumin now creates two separate mappings, and can release them individually
if useBaseAddress==false, there is one mapping as before, but we split now shallow. But we don't release it since we use file IO to read into it. If someone were to release one of those, we should see an assert on Windows.
(I am a tiny bit unhappy about the increasing complexity of the patch, since it negates some of the work done to simplify it back in June.)

About tracing, since 8256864 we trace Virtualxxx calls, so the tracing is already there - we trace VirtualFree() errrors for "os=info". If you only care for windows, that tracing could suffice.

Thanks, Thomas

Copy link
Member

@iklam iklam Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If useBaseAddress==false, after the total_space is successfully mapped, subsequent operations mail fail. For example,

  • mapinfo->map_regions() may fail to commit the necessary memory for doing os::read().
  • mapinfo->validate_shared_path_table() may fail because the runtime classpath is not compatible

In these cases, we need to call MetaspaceShared::release_reserved_spaces().

Do you see that assert?

Hmm, I think we should add a new test for this specifically: java -XX:ArchiveRelocationMode=1 -cp mispatched.jar to force the failure in mapinfo->validate_shared_path_table().

If you only care for windows, that tracing could suffice.

Since we are changing the split parameter for all platforms, I think we should test for all platforms, not just windows.

Copy link
Member

@iklam iklam Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test case can be something like this:

$ java -XX:DumpLoadedClassList=HelloWorld.classlist -cp HelloWorld.jar HelloWorld
Hello World
$ java -Xshare:dump -XX:SharedClassListFile=HelloWorld.classlist -cp HelloWorld.jar \
       -XX:SharedArchiveFile=tmp.jsa -XX:SharedBaseAddress=0 
$ touch XXX.jar
$ java -Xshare:auto -XX:SharedArchiveFile=tmp.jsa -cp XXX.jar:HelloWorld.jar -Xlog:cds \
       -showversion HelloWorld
...
[0.034s][info][cds] Archive(s) were created with -XX:SharedBaseAddress=0. Always map at os-selected address.
[0.034s][info][cds] Try to map archive(s) at an alternative address
[0.034s][info][cds] Mapped static  region #0 at base 0x00007f21f7800000 top 0x00007f21f7806000 (MiscCode)
[0.034s][info][cds] Mapped static  region #1 at base 0x00007f21f7806000 top 0x00007f21f7a1d000 (ReadWrite)
[0.034s][info][cds] Mapped static  region #2 at base 0x00007f21f7a1d000 top 0x00007f21f7daf000 (ReadOnly)
[0.048s][info][cds] UseSharedSpaces: shared class paths mismatch (hint: enable -Xlog:class+path=info to diagnose the failure)
[0.048s][info][cds] Unmapping region #0 at base 0x00007f21f7800000 (MiscCode)
[0.048s][info][cds] Unmapping region #1 at base 0x00007f21f7806000 (ReadWrite)
[0.048s][info][cds] Unmapping region #2 at base 0x00007f21f7a1d000 (ReadOnly)
[0.048s][info][cds] UseSharedSpaces: Unable to map shared spaces
java version "16-internal" 2021-03-16
Java(TM) SE Runtime Environment (slowdebug build 16-internal+0-adhoc.iklam.open)
Java HotSpot(TM) 64-Bit Server VM (slowdebug build 16-internal+0-adhoc.iklam.open, mixed mode)
Hello World

You can set a breakpoint at MetaspaceShared::release_reserved_spaces() to make sure it's called with total_rs.

Note that -Xshare:auto must be used.

If -Xshare:on is used, the VM will exit immediately without calling MetaspaceShared::release_reserved_spaces(). Most of the CDS tests are executed with -Xshare:on. That's why we didn't see the assert on Windows with Yumin's earlier patch -- in Mach5 tier 4, we run test/hotspot/jtreg/runtime/cds/appcds/BootClassPathMismatch.java with -XX:ArchiveRelocationMode=1, but the test itself would use with -Xshare:on.

Copy link
Contributor Author

@yminqi yminqi Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When jar mismatched, say, -cp non-exist.jar (We could not use -cp no-exitst.jar:hello.jar since, once hello.jar the dump time jar found and matched, the non-exist.jar will be ignored, test showed it is OK), the output:
[0.035s][info ][cds] Mapped static region #0 at base 0x00007f98e3800000 top 0x00007f98e3806000 (MiscCode)
[0.035s][info ][cds] Mapped static region #1 at base 0x00007f98e3806000 top 0x00007f98e3a1e000 (ReadWrite)
[0.035s][info ][cds] Mapped static region #2 at base 0x00007f98e3a1e000 top 0x00007f98e3db2000 (ReadOnly)
[0.049s][info ][cds] UseSharedSpaces: shared class paths mismatch (hint: enable -Xlog:class+path=info to diagnose the failure)
[0.049s][info ][cds] Unmapping region #0 at base 0x00007f98e3800000 (MiscCode)
[0.049s][info ][cds] Unmapping region #1 at base 0x00007f98e3806000 (ReadWrite)
[0.049s][info ][cds] Unmapping region #2 at base 0x00007f98e3a1e000 (ReadOnly)
[0.049s][debug][cds] Released shared space 0x00007f98e3800000
[0.049s][debug][cds] Released shared space (archive + class) 0x00007f98e3800000
[0.049s][info ][cds] UseSharedSpaces: Unable to map shared spaces

The repeat will be eliminated next version from here:
1489 } else {
1490 unmap_archive(static_mapinfo);
1491 unmap_archive(dynamic_mapinfo);
1492 log_debug(cds)("Released shared space " INTPTR_FORMAT, p2i(total_space_rs.base()));
1493 release_reserved_spaces(total_space_rs, archive_space_rs, class_space_rs);
1494 }

Copy link
Member

@iklam iklam Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that if you use -cp non-exist.jar:hello.jar, the non-exist.jar will be ignored because it doesn't exist, and CDS will not treat the classpath as a mismatch.

However, in my example, I created the XXX.jar file, so CDS will treat the classpath as a mismatch.

shipilev and others added 20 commits Dec 8, 2020
…cided on incompatible initial and minimum heap sizes

Reviewed-by: tschatzl, sjohanss
…verification

Reviewed-by: thartmann, neliasso
Reviewed-by: tschatzl, stefank
… and not abstract

Reviewed-by: mchung, darcy
… a offset-computing method handle

Reviewed-by: mcimadamore, chegar
Co-authored-by: Vicente Romero <vromero@openjdk.org>
Co-authored-by: Harold Seigel <hseigel@openjdk.org>
Reviewed-by: lfoltan, mchung, alanb, mcimadamore, chegar
…ngly marked oops

Reviewed-by: shade, rkennke
…args record component

Reviewed-by: mcimadamore
…in size computation for heap segments

Reviewed-by: jvernee, chegar
Reviewed-by: asemenyuk, almatvee, shade
@yminqi
Copy link
Contributor Author

@yminqi yminqi commented Dec 8, 2020

Please check 03. 02 is generated when merge with most current and remote head not updated correctly. After set remote head correct, 03 is regenerated and is correct one for review. Thanks

…sed after archive loading failed; Unmap bitmap after archive failure. Fixed reserved region name for adding reserved region.
@yminqi
Copy link
Contributor Author

@yminqi yminqi commented Dec 11, 2020

This branch has many conflicts, something wrong since push-02, closed this PR and will send a single patch in new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment