Skip to content

8338912: CDS: Segmented roots array #20858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Sep 4, 2024

Attempt to drop the min region alignment with JDK-8337828 highlights an interesting trouble. The roots array we are creating during the dump time can easily be larger than the min region alignment. We are currently "lucky" none of our tests hit this limit. AFAICS, about 128K classes would be enough to hit the current 1M min region alignment. Dropping the min region alignment to 256K starts to fail the test with "only" 30K classes, JDK-8338856.

We can slice that heap root array, and thus untie the roots count from the min region alignment. I am submitting something that works, but this might not be the final form for it. I would like @iklam to poke holes in this approach :)

Additional testing:

  • macos-aarch64-server-fastdebug, runtime/cds
  • linux-aarch64-server-fastdebug, all
  • linux-x86_64-server-fastdebug, all

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8338912: CDS: Segmented roots array (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20858/head:pull/20858
$ git checkout pull/20858

Update a local copy of the PR:
$ git checkout pull/20858
$ git pull https://git.openjdk.org/jdk.git pull/20858/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20858

View PR using the GUI difftool:
$ git pr show -t 20858

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20858.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 4, 2024

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 4, 2024

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8338912: CDS: Segmented roots array

Reviewed-by: ccheung, iklam

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 112 new commits pushed to the master branch:

  • 89ca89c: 8338626: ClassLoaderExt::process_jar_manifest() should allow / separator on Windows
  • 3e0da58: 8333843: Provide guidelines on MemorySegment to read strings with known lengths
  • 3c4d15b: 8334301: Errors in jpackage man page
  • 4d01178: 8339927: Man page update for deprecating jhsdb debugd for removal
  • bd44cf8: 8330302: strace004 can still fail
  • 8a4ea09: 8336492: Regression in lambda serialization
  • 358ff19: 8339727: Open source several AWT focus tests - series 1
  • 0c36177: 8340089: Simplify SegmentBulkOperations::powerOfProperty
  • bacd046: 8321010: RISC-V: C2 RoundVF
  • 5709c37: 8340081: Test java/foreign/TestLinker.java failed failed: missing permission java.lang.foreign.native.threshold.power.fill
  • ... and 102 more: https://git.openjdk.org/jdk/compare/1353601dcc8f9ec3e12dea21dc61b3585a154b13...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 4, 2024
@openjdk
Copy link

openjdk bot commented Sep 4, 2024

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Sep 4, 2024
@mlbridge
Copy link

mlbridge bot commented Sep 4, 2024

Copy link
Member

@iklam iklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design is good. I have some suggestions for readability and simplification.

Copy link
Member

@iklam iklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just some nits on naming, etc.

Copy link
Contributor

@jianglizhou jianglizhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shipilev Glad to see this change. As mentioned in yesterday's premain meeting, we ran into the single roots array scalability issue trivially when experimenting with real world applications back in 2021. At the time, I reworked it to use a 'linked-roots-array' solution to accommodate a large number of roots. If the required size was larger than the limit, multiple 'roots' arrays were allocated. The last element in the current 'roots' array contained the next 'roots' array. The last element in the last 'roots' array was NULL. The multi-roots-array solution works with GC automatically, and resolves the scalability problem. Your segmented roots array change is in the same direction and is necessary to make CDS workable for real world usages.

@shipilev
Copy link
Member Author

shipilev commented Sep 9, 2024

@shipilev Glad to see this change. As mentioned in yesterday's premain meeting, we ran into the single roots array scalability issue trivially when experimenting with real world applications back in 2021.

Good to know this is not only the problem with our tests, but also a real-world issue!

At the time, I reworked it to use a 'linked-roots-array' solution to accommodate a large number of roots. If the required size was larger than the limit, multiple 'roots' arrays were allocated. The last element in the current 'roots' array contained the next 'roots' array.

Right, I actually started with something like this: an array of slices, like Object[][]. But then I quickly realized that CDS is manipulating "objects" outside of the heap, so constructing any object graph is getting massive frowns from GC barriers code, and it comes with a headache for relocation. Notably, storing the reference to a segment into anywhere, like in 1-st level array is problematic.

What we arrived here is basically Object[][], but we don't have a 1-st level array, we just use the layout encoded by HeapRootSegments to identify where the segments are, instead.

@shipilev
Copy link
Member Author

shipilev commented Sep 9, 2024

I don't quite understand the Windows DeterministicDump failure. Is there anything specific about CDS and Windows that makes it only fail there?

@iklam
Copy link
Member

iklam commented Sep 9, 2024

I don't quite understand the Windows DeterministicDump failure. Is there anything specific about CDS and Windows that makes it only fail there?

This kind of failure usually can be diagnosed by diffing the map files that are generated the test case. Let me try to reproduce it on my side.

@iklam
Copy link
Member

iklam commented Sep 9, 2024

I don't quite understand the Windows DeterministicDump failure. Is there anything specific about CDS and Windows that makes it only fail there?

This kind of failure usually can be diagnosed by diffing the map files that are generated the test case. Let me try to reproduce it on my side.

Hmm, since you haven't change the logic of how the archive is created, except that you move the allocation of the root array(s) from the end to the beginning of the heap objects, I think it's probably caused by this:

class HeapRootSegments {
private:
  size_t _base_offset;
  size_t _count;
  int _roots_count;
  int _max_size_in_bytes;
  int _max_size_in_elems;
  // uninitialized padding on Windows

Maybe explicitly add an int _unused there?

Copy link
Member

@iklam iklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just two small nits.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 9, 2024
@iklam
Copy link
Member

iklam commented Sep 10, 2024

I don't quite understand the Windows DeterministicDump failure. Is there anything specific about CDS and Windows that makes it only fail there?

This kind of failure usually can be diagnosed by diffing the map files that are generated the test case. Let me try to reproduce it on my side.

Hmm, since you haven't change the logic of how the archive is created, except that you move the allocation of the root array(s) from the end to the beginning of the heap objects, I think it's probably caused by this:

class HeapRootSegments {
private:
  size_t _base_offset;
  size_t _count;
  int _roots_count;
  int _max_size_in_bytes;
  int _max_size_in_elems;
  // uninitialized padding on Windows

Maybe explicitly add an int _unused there?

Diff of the map files confirms this:

- heap_root_segments.roots_count:         2628
- heap_root_segments.seg_max_size_elems:  1048576
- heap_root_segments.seg_max_size_bytes:  131070
....
131c131
< 0x0000000000000300:   0010000000000a44 0000020a0001fffe 0000000000000002 000000000000f022   
---
> 0x0000000000000300:   0010000000000a44 000001e20001fffe 0000000000000002 000000000000f022   

    2628 = 0x00000a44
 1048576 = 0x00100000
  131080 = 0x00001ff2

The 0x000001e2 is garbage.

@shipilev
Copy link
Member Author

Diff of the map files confirms this:

- heap_root_segments.roots_count:         2628
- heap_root_segments.seg_max_size_elems:  1048576
- heap_root_segments.seg_max_size_bytes:  131070
....
131c131
< 0x0000000000000300:   0010000000000a44 0000020a0001fffe 0000000000000002 000000000000f022   
---
> 0x0000000000000300:   0010000000000a44 000001e20001fffe 0000000000000002 000000000000f022   

    2628 = 0x00000a44
 1048576 = 0x00100000
  131080 = 0x00001ff2

The 0x000001e2 is garbage.

Whoa, that looks fragile! It feels safer to memset(0) the entire header then? Whack-a-mole-ing the alignment paddings every time we add a field is not convenient.

@shipilev
Copy link
Member Author

Whoa, that looks fragile! It feels safer to memset(0) the entire header then? Whack-a-mole-ing the alignment paddings every time we add a field is not convenient.

Testing memset here, but would probably integrate it separately as JDK-8339830.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 10, 2024
@shipilev
Copy link
Member Author

Whoa, that looks fragile! It feels safer to memset(0) the entire header then? Whack-a-mole-ing the alignment paddings every time we add a field is not convenient.

Testing memset here, but would probably integrate it separately as JDK-8339830.

Hm, maybe I am misunderstanding this. I see there is the initialization here:

  _header = (FileMapHeader*)os::malloc(header_size, mtInternal);
  memset((void*)_header, 0, header_size);
  _header->populate(this,

...and here we read the header in its entirety:

  _header = (FileMapHeader*)os::malloc(gen_header->_header_size, mtInternal);
  os::lseek(fd, 0, SEEK_SET); // reset to begin of the archive
  size_t size = gen_header->_header_size;
  size_t n = ::read(fd, (void*)_header, (unsigned int)size);

I'll look around map files to understand it better.

@iklam
Copy link
Member

iklam commented Sep 10, 2024

Diff of the map files confirms this:

- heap_root_segments.roots_count:         2628
- heap_root_segments.seg_max_size_elems:  1048576
- heap_root_segments.seg_max_size_bytes:  131070
....
131c131
< 0x0000000000000300:   0010000000000a44 0000020a0001fffe 0000000000000002 000000000000f022   
---
> 0x0000000000000300:   0010000000000a44 000001e20001fffe 0000000000000002 000000000000f022   

    2628 = 0x00000a44
 1048576 = 0x00100000
  131080 = 0x00001ff2

The 0x000001e2 is garbage.

Whoa, that looks fragile! It feels safer to memset(0) the entire header then? Whack-a-mole-ing the alignment paddings every time we add a field is not convenient.

memset the header first is not going to help. The random bits are added by this call:

void set_heap_root_segments(HeapRootSegments segments) { _heap_root_segments = segments; }

segments is passed in by value on the stack. Apparently VC++ will not zero-out the unused bits when pushing segments onto the stack, but will insist copying the unused bits when doing the assignment.

Maybe you can add overload the assignment operator to copy the fields individually. This will stop VC++ from copying the garbage, so it will leave the unused slot in its original (zeroed) state.

@shipilev
Copy link
Member Author

shipilev commented Sep 10, 2024

segments is passed in by value on the stack. Apparently VC++ will not zero-out the unused bits when pushing segments onto the stack, but will insist copying the unused bits when doing the assignment.

Maybe you can add overload the assignment operator to copy the fields individually. This will stop VC++ from copying the garbage, so it will leave the unused slot in its original (zeroed) state.

Hrmpf. I thought implicit copy constructors do this right. Apparently not. Let me try and define the explicit copy constructor + assignment operator...

@shipilev
Copy link
Member Author

shipilev commented Sep 10, 2024

Hrmpf. I thought implicit copy constructors do this right. Apparently not. Let me try and define the explicit copy constructor + assignment operator...

All right, I think this works. I believe we still explicitly default to trivial copy constructor + assignment operator, allow them to copy the entire representation, but we also make sure that representation has no garbage in its gaps.

I am running this through our testing. @iklam, if you can run this through your tests, it would be helpful too.

@iklam
Copy link
Member

iklam commented Sep 10, 2024

Hrmpf. I thought implicit copy constructors do this right. Apparently not. Let me try and define the explicit copy constructor + assignment operator...

All right, I think this works. I believe we still explicitly default to trivial copy constructor + assignment operator, allow them to copy the entire representation, but we also make sure that representation has no garbage in its gaps.

I am running this through our testing. @iklam, if you can run this through your tests, it would be helpful too.

I ran bacc5d8 in our CI for tiers 1-4 and didn't see any regressions.

@shipilev
Copy link
Member Author

shipilev commented Sep 11, 2024

Our testing passes here as well. I also changed MIN_GC_REGION_ALIGNMENT = 256K locally, and ran runtime/cds tests with -XX:-UseCompressedOops, which was one of the scenarios that used to run into problems. These pass as well.

I think we are done here. We need a second Reviewer, maybe @calvinccheung?

Copy link
Member

@calvinccheung calvinccheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have couple of minor suggestions.
I'm also running tiers 1 - 4 testing with your patch. Results look good so far.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 12, 2024
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 13, 2024
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 13, 2024
@shipilev
Copy link
Member Author

Thanks for reviews!

/integrate

@openjdk
Copy link

openjdk bot commented Sep 16, 2024

Going to push as commit dc00eb8.
Since your change was applied there have been 123 commits pushed to the master branch:

  • 74add0e: 8340105: Expose BitMap::print_on in release builds
  • 0e0f10f: 8340102: Move assert-only loop in OopMapSort::sort under debug macro
  • a0794e0: 8339639: Opensource few AWT PopupMenu tests
  • a8f143c: 8306679: com/sun/jdi/InterruptHangTest.java asserts with -Xcomp -Dmain.wrapper=Virtual options
  • c91fa27: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass
  • fa502ec: 8339943: Frame not disposed in java/awt/dnd/DropActionChangeTest.java
  • fdfe503: 8335288: SunPKCS11 initialization will call C_GetMechanismInfo on unsupported mechanisms
  • 3aa8338: 8340075: Autoconf bundle cannot run on read-only filesystem
  • 37bf589: 8339847: Broken link to the dieharder distribution website in SplittableRandom
  • 89c172a: 8340082: Use inline return tag in java.base
  • ... and 113 more: https://git.openjdk.org/jdk/compare/1353601dcc8f9ec3e12dea21dc61b3585a154b13...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 16, 2024
@openjdk openjdk bot closed this Sep 16, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 16, 2024
@openjdk
Copy link

openjdk bot commented Sep 16, 2024

@shipilev Pushed as commit dc00eb8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8338912-cds-segmented-roots branch January 8, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants