Skip to content

8301106: Allow archived Java strings to be moved by GC #12607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

iklam
Copy link
Member

@iklam iklam commented Feb 16, 2023

Background:

Currently, the archived java strings are mapped in the G1 "closed archive" region. This essentially pins all the strings in memory.

As a prerequisite for (JDK-8296263), this PR removes the requirement of pinning the archived strings. This will allow the CDS archive heap to be mapped in garbage collectors that do not support object pinning.

Code changes:

  • The archived strings are referenced through an objArray (_shared_strings_array) to keep them alive. As a result, it's no longer necessary to pin them.
  • Since it's possible for the GC to move these strings, the _shared_table in stringTable.cpp is modified to store a 32-bit index for each archived string. This index is used to retrieve the archived string from _shared_strings_array at runtime.

Note that CDS has a limit on the size of archived objArrays. When there's a large number of strings, we use a two-level table. See the comments around _shared_strings_array in the header file.

Testing

Tiers 1 - 4


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8301106: Allow archived Java strings to be moved by GC

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12607/head:pull/12607
$ git checkout pull/12607

Update a local copy of the PR:
$ git checkout pull/12607
$ git pull https://git.openjdk.org/jdk.git pull/12607/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12607

View PR using the GUI difftool:
$ git pr show -t 12607

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12607.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 16, 2023

👋 Welcome back iklam! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 16, 2023
@openjdk
Copy link

openjdk bot commented Feb 16, 2023

@iklam The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Feb 16, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 16, 2023

Webrevs

void HeapShared::archive_strings() {
oop shared_strings_array = StringTable::init_shared_table(_dumped_interned_strings);
bool success = archive_reachable_objects_from(1, _default_subgraph_info, shared_strings_array, /*is_closed_archive=*/ false);
guarantee(success, "shared strings array should not point to any unachivable objects");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: unachivable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working through this. A few comments.

void HeapShared::archive_strings() {
oop shared_strings_array = StringTable::init_shared_table(_dumped_interned_strings);
bool success = archive_reachable_objects_from(1, _default_subgraph_info, shared_strings_array, /*is_closed_archive=*/ false);
guarantee(success, "shared strings array must point to only archivable objects");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could cause this to fail? Do we need a more graceful bailout in release builds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never fail. I could've used an assert but made it a guarantee since this is rather new code and I am paranoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what stops us from hitting this:

        // Don't archive a subgraph root that's too big. For archives static fields, that's OK
        // as the Java code will take care of initializing this field dynamically.
        return false;

in HeapShared::archive_reachable_objects_from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to make it structurally impossibly to reach this point, and added asserts to check for that. Explanations are here:

void HeapShared::archive_strings() {
  oop shared_strings_array = StringTable::init_shared_table(_dumped_interned_strings);
  bool success = archive_reachable_objects_from(1, _default_subgraph_info, shared_strings_array, /*is_closed_archive=*/ false);
  // We must succeed because:
  // - _dumped_interned_strings do not contain any large strings.
  // - StringTable::init_shared_table() doesn't create any large arrays.
  assert(success, "shared strings array must not point to arrays or strings that are too large to archive");
  StringTable::set_shared_strings_array_index(append_root(shared_strings_array));
}

There are other asserts in the handling of _dumped_interned_strings and shared_strings_array to check their object sizes.

@@ -73,7 +73,7 @@ const size_t REHASH_LEN = 100;
const double CLEAN_DEAD_HIGH_WATER_MARK = 0.5;

#if INCLUDE_CDS_JAVA_HEAP
bool StringTable::_two_dimensional_shared_strings_array = false;
bool StringTable::_is_two_dimensional_shared_strings_array = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name seems excessive, but you're the one who had to keep typing it in.

@mlbridge
Copy link

mlbridge bot commented Feb 21, 2023

Mailing list message from Ioi Lam on hotspot-runtime-dev:

On 2/20/2023 5:16 PM, David Holmes wrote:

I used the scope to mark that the two archiving operations must be done within the scope of

{
init_seen_objects_table();
....
delete_seen_objects_table();
}

And the `delete_seen_objects_table()` must be called before the rest of the function can proceed.
That seems an unusual convention when the code executes sequentially anyway.

I moved the code block to a new function.

Thanks
- Ioi

//
// [bits 31 .. 14][ bits 13 .. 0 ]
// primary_index secondary_index
const static int _secondary_array_index_bits = 14;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure secondary array bits are selected such that the secondary array is less than ArchiveHeapWriter::MIN_GC_REGION_ALIGNMENT.
But I believe it would make the code more robust if we calculate the secondary array length dynamically by determining the appropriate length of the array for a given size.
For instance by querying objArrayOopDesc for the appropriate length of the array for an object of ArchiveHeapWriter::MIN_GC_REGION_ALIGNMENT bytes, and then set the _secondary_array_index_bits according to the result.
It would also alleviate the problem of how to best divide the 32 bits between secondary and primary array.
Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted _secondary_array_index_bits to be a compile-time constant so that the related code can be optimized by the C++ compiler. However, to calculate it at compile time would require the use of constexpr, but objArrayOopDesc::object_size(int) can't easily be made constexpr because it eventually reads the runtime constant heapOopSize.

Instead, I added a function StringTable::verify_secondary_array_index_bits() to check that its value isn't too big.

Note that I used 14 as that will be the eventually value when we support Shenandoah. Also, using a large value means the SharedStringsStress needs a larger data set, so I stick with 14.

This will still support up to 28M strings so it's should be OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iklam thanks for adding those comments in StringTable::verify_secondary_array_index_bits.

…ng for large objects; added StringTable::verify_secondary_array_index_bits()
@ashu-mehra
Copy link
Contributor

lgtm!

// refer to more than 16384 * 16384 = 26M interned strings! Not a practical concern
// but bail out for safety.
log_error(cds)("Too many strings to be archived: " SIZE_FORMAT, _items_count);
os::_exit(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I hadn't really noticed that the CDS code has introduced its own direct exit path using os::_exit in a number of places. Why does it do this instead of using the existing VM exit paths?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained in metaspaceShared.cpp, but perhaps we should consolidate all these direct exit calls (in a separate PR) to an utility function so it's easy to track (and explain) all the calls.

  // There may be pending VM operations. We have changed some global states
  // (such as vmClasses::_klasses) that may cause these VM operations
  // to fail. For safety, forget these operations and exit the VM directly.
  os::_exit(0);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vm_direct_exit would seem to be okay in that case.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing further from me. Changes look good. Thanks.

const static int _secondary_array_index_mask = _secondary_array_max_length - 1;

// make sure _secondary_array_index_bits is not too big
static void verify_secondary_array_index_bits() PRODUCT_RETURN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be NOT_DEBUG_RETURN, as the function definition is DEBUG only.

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 29, 2023

@iklam This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@iklam iklam changed the title 8301106 allow cds interned strings to be moved by gc 8301106: Allow archived Java strings to be moved by GC Mar 29, 2023
@openjdk
Copy link

openjdk bot commented Mar 29, 2023

@iklam This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8301106: Allow archived Java strings to be moved by GC

Reviewed-by: dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 7 new commits pushed to the master branch:

  • 69152c3: 8305202: Fix Copyright Header in ZonedDateTimeFormatterBenchmark
  • 438c969: 8304976: Optimize DateTimeFormatterBuilder.ZoneTextPrinterParser.getTree()
  • d063b89: 8303392: Runtime.exec and ProcessBuilder.start should use System logger
  • be764a7: 8302814: Delete unused CountLoopEnd instruct with CmpX
  • 34f4d7f: 8304759: Add BitMap iterators
  • 42df1a9: 8304991: Redundant hyphen in @param results in double-dash in javadocs
  • e3855d0: 8304840: Dangling CharacterCodingException in a few javadoc descriptions

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 29, 2023
@iklam
Copy link
Member Author

iklam commented Mar 29, 2023

Thanks @ashu-mehra and @dholmes-ora for the review. I merged with latest repo and passed tiers 1~4.
/integrate

@openjdk
Copy link

openjdk bot commented Mar 29, 2023

Going to push as commit b524a74.
Since your change was applied there have been 8 commits pushed to the master branch:

  • 9643f65: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC
  • 69152c3: 8305202: Fix Copyright Header in ZonedDateTimeFormatterBenchmark
  • 438c969: 8304976: Optimize DateTimeFormatterBuilder.ZoneTextPrinterParser.getTree()
  • d063b89: 8303392: Runtime.exec and ProcessBuilder.start should use System logger
  • be764a7: 8302814: Delete unused CountLoopEnd instruct with CmpX
  • 34f4d7f: 8304759: Add BitMap iterators
  • 42df1a9: 8304991: Redundant hyphen in @param results in double-dash in javadocs
  • e3855d0: 8304840: Dangling CharacterCodingException in a few javadoc descriptions

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 29, 2023
@openjdk openjdk bot closed this Mar 29, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 29, 2023
@openjdk
Copy link

openjdk bot commented Mar 29, 2023

@iklam Pushed as commit b524a74.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants