Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8344665: Refactor PartialArrayState allocation for reuse #22287

Closed
wants to merge 13 commits into from

Conversation

kimbarrett
Copy link

@kimbarrett kimbarrett commented Nov 20, 2024

This change splits the existing PartialArrayStateAllocator class into an
allocator class and a manager class. The allocator class is per worker
thread. The manager class provides the memory management context for a
group of allocators.

This change is in preparation for some other refactorings around partial array
state handling. That work is intended to make it easier for various
collections to adopt the use of that mechanism for chunking the processing of
large objArrays.

The new implementation for the memory management context is based on the
existing one, with an Arena per worker, now controlled by the manager object.
Potential improvements to that can be explored in the future. Some ideas
include improvements to the Arena API or a single thread-safe Arena variant
(trading slower arena allocation (which is the slow path) for less memory
usage).

G1 has a single manager, reused by each young/mixed GC. Associated state
allocators are nested in the per-worker structures, so deleted at the end of
the collection. The manager is reset at the end of the collection to allow the
memory to be recycled. It is planned that the STW full collector will also use
this manager when converted to use PartialArrayState. So it will be reused by
all STW collections.

ParallelGC has a single manager, reused by each young collection. Because the
per-worker promotion managers are never destroyed, their nested state
allocators are never destroyed. So the manager is not reset, instead leaving
previously allocated states in the allocator free lists for use by the next
collection. This means the full collector won't be able to use the same
manager object as the young collectors.

Testing: mach5 tier1-5


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8344665: Refactor PartialArrayState allocation for reuse (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287
$ git checkout pull/22287

Update a local copy of the PR:
$ git checkout pull/22287
$ git pull https://git.openjdk.org/jdk.git pull/22287/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22287

View PR using the GUI difftool:
$ git pr show -t 22287

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22287.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 20, 2024

👋 Welcome back kbarrett! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 20, 2024

@kimbarrett This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8344665: Refactor PartialArrayState allocation for reuse

Reviewed-by: tschatzl, ayang, iwalulya, zgu

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the master branch:

  • 5cc150c: 8342979: Start of release updates for JDK 25
  • 85fedbf: 8344607: Link Time Optimization - basic support for clang
  • 5a0899f: 8345302: Building microbenchmarks require larger Java heap
  • 1ece4f9: 8345514: Should use internal class name when calling ClassLoader.getResourceAsByteArray
  • ef8da28: 8345591: [aarch64] macroAssembler_aarch64.cpp compile fails ceil_log2 not declared
  • 7513b13: 8328944: NMT reports "unknown" memory
  • 691e692: 8345565: Remove remaining SecurityManager motivated APIs from sun.reflect.util
  • 97b8a09: 8345339: JFR: Missing javadoc for RecordingStream::onMetadata
  • 456c71d: 8343699: [aarch64] Bug in MacroAssembler::klass_decode_mode()
  • 308357c: 8345578: New test in JDK-8343622 fails with a promoted build
  • ... and 84 more: https://git.openjdk.org/jdk/compare/eac00f6d112b24b62b067a1e9cee342ab07ef021...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 20, 2024
@openjdk
Copy link

openjdk bot commented Nov 20, 2024

@kimbarrett The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Nov 20, 2024
@mlbridge
Copy link

mlbridge bot commented Nov 20, 2024

Webrevs

@kimbarrett
Copy link
Author

/label remove hotspot
/label add hotspot-gc

@openjdk openjdk bot removed the hotspot hotspot-dev@openjdk.org label Nov 20, 2024
@openjdk
Copy link

openjdk bot commented Nov 20, 2024

@kimbarrett
The hotspot label was successfully removed.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Nov 20, 2024
@openjdk
Copy link

openjdk bot commented Nov 20, 2024

@kimbarrett
The hotspot-gc label was successfully added.

@kimbarrett
Copy link
Author

kimbarrett commented Nov 20, 2024

I'm not sure why skara initially labelled this as hotspot. Re-labelled as hotspot-gc. [later] Oh, it was because of
the jfr test.

@kimbarrett
Copy link
Author

Some offline discussion with @albertnetymk led to some changes in the manager's
tracking of constructed and destructed associated allocators that I think is
simpler, and makes some parts debug-only.

Arena* _arenas;

// Limit on the number of allocators this manager supports.
uint _num_allocators;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_max_num_allocators; the comment helps, but we can add max to the name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed num_allocators to max_allocators (with or without leading underscore) throughout.

// Encodes 2 values, in an atomic unit.
// - low half: allocators constructed
// - high half: allocators destructed (debug only)
volatile CounterState _counters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a lot of code overhead to accomodate debugging! However, if there is no easier approach, then not a blocker for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the motivation for this encoding is missing from this PR. It's not immediately clear why these two counters need to be combined in this way. Kim outlined some rationale during our offline discussion, but for the benefit of other reviewers and future readers of this code, this rationale should be documented alongside the encoding. Having those arguments written down would help assess whether the additional complexity is justified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a union/struct instead of all that manual masking and shifting?

I agree that encoding them in this way should give a rationale for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using some combination of union/struct a couple of different ways, but
the the result seemd significantly worse than what I published. But some
simplifications since those attempts, plus an idea for a different way of
structuring things, and I've now got something union/struct-based that seems
better.

It could be slightly further improved in syntax if I was willing to drop the
CounterState::_cf member name and make it an anonymous struct. That's a C11
feature that is provided as a C++ extension by all of our supported compilers.

You might not want to look at the difference between the two commits very
much, but instead look at the updated complete change.

Copy link
Author

@kimbarrett kimbarrett Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also expanded commentary for PartialArrayStateManager::_counters. Maybe that will help?

//
// - releasing: When an allocator is destroyed the manager transitions to this
// phase. It remains in this phase until all extent allocators associated with
// this manager have been destroyed. During this phase, new allocators man not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// this manager have been destroyed. During this phase, new allocators man not
// this manager have been destroyed. During this phase, new allocators may not

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Encodes 2 values, in an atomic unit.
// - low half: allocators constructed
// - high half: allocators destructed (debug only)
volatile CounterState _counters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a union/struct instead of all that manual masking and shifting?

I agree that encoding them in this way should give a rationale for that.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 2, 2024
Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments/suggestions.

Comment on lines 176 to 181
uint _max_allocators;

// CounterValues is an integral type large enough to encode a pair of
// allocator counters as a single unit for atomic manipulation.
using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t);
using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the max-value has type uint, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed 16bits of worker threads was quite sufficient for a 32bit platform.
And I misremembered and thought 32bit platforms couldn't be relied upon for a
64bit atomic add and maybe other 64bit operations. And this code is definitely
not super performance critical. So yeah, I could drop the platform-conditional
definition of Counter. I don't think it makes much difference to the code. I
guess the type aliases could be dropped and just use bare uint32/64_t. Not
sure that's actually an improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unifying 32 and 64 bit system is an improvement -- being able to reason with concrete types. As for type aliases, it's rather subjective; I find uint*_t more familiar, but up to you.

// number of constructed allocators and one is the number of destructed
// allocators. The counters are atomic to permit concurrent construction,
// and to permit concurrent destruction. They are an atomic unit to detect
// and reject mixing the two phases, without concern for questions of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that this library can detect and reject misuse (such as mixing two phases), but I’m not sure why so much effort was spent preventing this. None of the existing users of the library are expected to mix phases in the near future. Could we instead document that mixing two phases is not permitted, and if someone chooses to do so, they do so at their own risk?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, only 2 of the nearly a dozen(?) potential clients are using this. I'm
not sure that none of them are going to have workers that do some of their
setup after being started. Hence the desire to support concurrency. And if
that, then I feel better about it if there's some usage validation. But maybe
it would be better to just throw a lock at the problem. And if it turns out
none of the use-cases end up needing that concurrency, then I won't object to
a little bit of code simplification.

@kimbarrett
Copy link
Author

Since y'all (especially @albertnetymk ) seem to really dislike the phase checking,
here's a version without.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 3, 2024
Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the simplification.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 3, 2024
Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still good.

: _arenas(NEW_C_HEAP_ARRAY(Arena, max_allocators, mtGC)),
_max_allocators(max_allocators),
_registered_allocators(0),
_released_allocators(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_released_allocators is a debug only variable, should fail in release build

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. It seems I haven't done a release build since my final touch-up to make it debug-only.
I'll push an update once it's been through our CI.

head->~FreeListEntry();
p = head;
}
return ::new (p) PartialArrayState(src, dst, index, length, initial_refcount);
}

void PartialArrayStateAllocator::Impl::release(uint worker_id, PartialArrayState* state) {
void PartialArrayStateAllocator::release(PartialArrayState* state) {
size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why release order is needed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the usual reference counting dance. Except, where did the
acquire disappear to? There should be an acquire on the refcount == 0 branch!
Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of
tests were run without noticing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. My next question is that, if PartialArrayState is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's needed. The purpose of this release/acquire pair is to ensure there
is a happens-before chain between the use of a State and it's cleanup/reuse.
Transfers through the taskqueue don't help with that.

In other places, we have operations on one side of the taskqueue that need to
be ordered wrto operations on the other side of the taskqueue. We don't have
that here.

Consider two threads which have both obtained access to a State. (At least
one of them must have obtained it via stealing from another thread.) Assume
these are the last two references to the State (it's refcount == 2), and no
further tasks for it will be needed. These two threads will use the State
(getting source/destination, claiming a chunk), and then release the State,
decrementing its refcount. So one of them will decrement to 0. We need to
ensure the cleanup that follows that can't corrupt the use by the accesses
made by the other thread, by ensuring those accesses happen-before the
cleanup. There is no intervening taskqueue manipulation in this scenario.
The operations we need to order are all on the same side of the taskqueue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 3, 2024
Copy link
Contributor

@zhengyu123 zhengyu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 4, 2024
@kimbarrett
Copy link
Author

Thanks for all the reviews and discussion.

@kimbarrett
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 5, 2024

Going to push as commit dbf48a5.
Since your change was applied there have been 94 commits pushed to the master branch:

  • 5cc150c: 8342979: Start of release updates for JDK 25
  • 85fedbf: 8344607: Link Time Optimization - basic support for clang
  • 5a0899f: 8345302: Building microbenchmarks require larger Java heap
  • 1ece4f9: 8345514: Should use internal class name when calling ClassLoader.getResourceAsByteArray
  • ef8da28: 8345591: [aarch64] macroAssembler_aarch64.cpp compile fails ceil_log2 not declared
  • 7513b13: 8328944: NMT reports "unknown" memory
  • 691e692: 8345565: Remove remaining SecurityManager motivated APIs from sun.reflect.util
  • 97b8a09: 8345339: JFR: Missing javadoc for RecordingStream::onMetadata
  • 456c71d: 8343699: [aarch64] Bug in MacroAssembler::klass_decode_mode()
  • 308357c: 8345578: New test in JDK-8343622 fails with a promoted build
  • ... and 84 more: https://git.openjdk.org/jdk/compare/eac00f6d112b24b62b067a1e9cee342ab07ef021...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 5, 2024
@openjdk openjdk bot closed this Dec 5, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 5, 2024
@openjdk
Copy link

openjdk bot commented Dec 5, 2024

@kimbarrett Pushed as commit dbf48a5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants