Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8312018: Improve zero-base-optimized reservation of class space #14867

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Jul 13, 2023

TL;DR This patch introduces a new reservation API to reserve memory in low address space; depending on the OS, it may use optimized placement techniques. That new API is used to optimize the placement of class space and CDS for zero-based encoding.

A future RFE will use the same API to optimize the zero-based heap reservation and thereby consolidate a lot of coding. We also plan to use this API in other places, e.g. for Shenandoah CollectionSet reservation.


With CDS off or at dump time, we currently attempt to optimize class space location by reserving in low address ranges.

We do this by examining the java heap end (which has been allocated at that point) and, if that had been allocated in lower address regions, attempt to allocate adjacent to it. Essentially, we piggyback on what we hope for is an optimized heap placement. If that fails, we attempt to map at HeapBaseMinAddress.

This approach has many disadvantages:

  • it depends on the VM either using CompressedOops and getting a zero-based heap or the region around HeapBaseMinAddress being free.

  • HeapBaseMinAddress is an odd choice: it is 2G on all platforms, for reasons unknown to me, but that denies us half of the valuable low address range below 4G right away.

  • We only get 1 shot. It's either one of these two addresses.

  • And we only use this strategy for CDS=off or CDS=dumptime; we don't use it for the CDS-runtime-fallback-case when attaching to the primary attach point failed.

  • It assumes narrow Klass encoding uses the same geometry (bit size, shift) as narrow Oops, which is not guaranteed with future developments (lilliput).

  • It actually reduces the chance of getting a zero-based java heap. This is because when attempting to place the heap, we leave a gap for what we assume will be the later class space. That gap is CompressedClassSpaceSize bytes, which is often grossly over-dimensioned. A zero-based heap is more valuable than a zero-based class space. Therefore the heap should get the best chance of low-address heap reservation.

  • It introduces an unnecessary dependency between heap reservation, narrow Oop encoding, and class space reservation. That makes the code base brittle.

  • Getting the heap region to place class space adjacent to it is actually tricky. We lack a common get-heaprange-API because ZGC. This code misuses the CompressedOops interface. But CompressedOops is the encoding range and thus only loosely correlated to the heap range (the latter must contain the former). The fact that CompressedOops::end() returns the heap range end can be seen as aberration since the actual encoding range end usually far outreaches the heap range end. But as long this code relies on it returning the heap range end, we cannot fix that.


This RFE instead proposes a different approach:

  • Let us have an API, os::attempt_reserve_memory_below(address max, ...). This API will do its best to reserve memory with a given size and alignment below a given max address. It will, on supporting OSes, attempt to use OS-specific means to find a suitable address space hole to place the reservation in. Otherwise, it will do the typical ladder-reservation approach with an adjustable maximum number of tries.

  • Let's use this API to reserve zero-base-friendly class space. Let's remove all knowledge of heap and CompressedOops. Now we are independent of what the heap does. It may or may not be located in lower address ranges. If it is, the new API will work around it and find a gap to place the class space if it is not, even better.

  • Let's remove the "leave a gap for class space" logic from Heap reservation. We don't need it. It is harmful: Heap should have the best chance for zero-based - if I only can have one, I rather have a zero-based heap than a zero-based class space.

The end result will be a JVM with a much better chance to get zero-based class space and zero-based heap; we will have removed dependencies between heap and class space; we will have an API that can be used for similar problems (e.g. an obvious future enhancement would be to use this new reservation API for zero-based heap reservation as well, and other places could use it too, eg. Shenandoah CollectionSet reservation).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8312018: Improve zero-base-optimized reservation of class space (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14867/head:pull/14867
$ git checkout pull/14867

Update a local copy of the PR:
$ git checkout pull/14867
$ git pull https://git.openjdk.org/jdk.git pull/14867/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14867

View PR using the GUI difftool:
$ git pr show -t 14867

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14867.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 13, 2023

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 13, 2023

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jul 13, 2023
@tstuefe tstuefe force-pushed the JDK-8312018-Improve-zero-base-optimized-reservation-of-class-space branch 5 times, most recently from 194e7c4 to 814675f Compare July 13, 2023 17:38
@tstuefe tstuefe force-pushed the JDK-8312018-Improve-zero-base-optimized-reservation-of-class-space branch from 814675f to 22fe1e4 Compare July 13, 2023 17:41
@tstuefe tstuefe marked this pull request as ready for review July 14, 2023 10:00
@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2023

Pinging @iklam

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 14, 2023
@mlbridge
Copy link

mlbridge bot commented Jul 14, 2023

Webrevs

Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar enough with these places of HotSpot (and the OSes, for that matter), but I have questions/comments.

src/hotspot/share/memory/metaspace.cpp Outdated Show resolved Hide resolved
src/hotspot/os/linux/os_linux.cpp Show resolved Hide resolved
@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2023

I am not familiar enough with these places of HotSpot (and the OSes, for that matter), but I have questions/comments.

Thank you Roman. I worked in your feedback; while testing I found an off-by-one, and a minor flaw with tracing.

About your procfs question, this should be safe. We only do this once, at start, and have a reasonable fallback. Note that hotspot already reads from this file for other purposes, it seems to work well.

Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, but somebody else who is more familiar with these things should review it as well. Thank you!

@openjdk
Copy link

openjdk bot commented Jul 14, 2023

@tstuefe This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 14, 2023
}

// ...failing that, reserve anywhere, but let platform do optimized placement:
// ...otherwise let JVM chose the best placing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/chose/choose/

Comment on lines 183 to 187
// Convert pointer to uintptr_t
inline uintptr_t p2u(const volatile void* p) {
return (uintptr_t) p;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems overkill for one use.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 18, 2023

Thanks @dholmes-ora . I worked in your feedback.

@iklam
Copy link
Member

iklam commented Jul 18, 2023

I think we should document the interaction with "-Xshare:dump". Maybe we should add comments that the value of CompressedKlass::base() is irrelevant to the dumped CDS archive when running "java -Xshared:dump", because of this code.

I.e., if CDS is enabled, we always use a non-zero based encoding.

narrowKlass ArchiveBuilder::get_requested_narrow_klass(Klass* k) {
  assert(DumpSharedSpaces, "sanity");
  k = get_buffered_klass(k);
  Klass* requested_k = to_requested(k);
  address narrow_klass_base = _requested_static_archive_bottom; // runtime encoding base == runtime mapping start
  const int narrow_klass_shift = ArchiveHeapWriter::precomputed_narrow_klass_shift;
  return CompressedKlassPointers::encode_not_null(requested_k, narrow_klass_base, narrow_klass_shift);
}

Comment on lines +1775 to +1787
char* os::get_lowest_attach_address() {
return (char*)os::vm_allocation_granularity();
}

char* os::get_highest_attach_address() {
return (char*)(
#ifdef _LP64
(128 * 1024 * G)
#else
SIZE_MAX
#endif
- os::vm_page_size());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what "attach" means in this sense. If it's the usable address range, wouldn't it need to be OS-specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, highest and lowest usable address. You are right in that its OS specific. I'll move these to the respective OS files.

// cause the reserved range to nestle alongside the heap.
{
// First try for zero-base zero-shift (lower 4G); failing that, try for zero-based with max shift (lower 32G)
constexpr int num_tries = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_tries should be computed instead of hard-coded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? In pre-existing code that does similar things, we always hardcode them implicitly (typically by attempt-mapping from A->B in hardcoded stride C). And how would I calculate it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am a bit unsure about the meaning of the num_tries parameter. It looks like Windows and Linux will scan the system's memory map and look for the first hole that's larger than size. However, if there are a lot of small holes at lower addresses, then doing 8 tries won't find a large enough block, even though such a block may exist below unscaled_max.

The default implementation will try to find a free block with fixed steps. In this case, it seems like

num_tries = (unscaled_max + size - 1) / size;

would be more appropriate. Otherwise if size changes in the future (to be smaller), again you won't be able to find an appropriate block, even if one exists.

So I am not sure if the caller passing in an arbitrary num_tries parameter is a good idea. Maybe the API needs to be redesigned.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 19, 2023

I think we should document the interaction with "-Xshare:dump". Maybe we should add comments that the value of CompressedKlass::base() is irrelevant to the dumped CDS archive when running "java -Xshared:dump", because of this code.

Okay. Maybe in a different RFE? Since its a bit tangential to what this patch does.

I.e., if CDS is enabled, we always use a non-zero based encoding.

Not necessarily. If CDS is enabled and we don't get the preferred mapping address, we will fallback to traditional Klass range reservation and potentially go zero based. With this patch, that path is optimized too.

@tstuefe tstuefe marked this pull request as draft July 25, 2023 14:38
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 25, 2023
@tstuefe
Copy link
Member Author

tstuefe commented Jul 26, 2023

@iklam @rkennke @dholmes-ora Thanks for your feedback. I'll close this PR in favor of a fresh one, since I did some considerable changes to the API and I don't want to flood your mailboxes with skara spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org
4 participants