-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8344665: Refactor PartialArrayState allocation for reuse #22287
Conversation
👋 Welcome back kbarrett! A progress list of the required criteria for merging this PR into |
@kimbarrett This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 94 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@kimbarrett The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
/label remove hotspot |
@kimbarrett |
@kimbarrett |
I'm not sure why skara initially labelled this as |
Some offline discussion with @albertnetymk led to some changes in the manager's |
Arena* _arenas; | ||
|
||
// Limit on the number of allocators this manager supports. | ||
uint _num_allocators; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_max_num_allocators; the comment helps, but we can add max to the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed num_allocators to max_allocators (with or without leading underscore) throughout.
// Encodes 2 values, in an atomic unit. | ||
// - low half: allocators constructed | ||
// - high half: allocators destructed (debug only) | ||
volatile CounterState _counters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a lot of code overhead to accomodate debugging! However, if there is no easier approach, then not a blocker for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the motivation for this encoding is missing from this PR. It's not immediately clear why these two counters need to be combined in this way. Kim outlined some rationale during our offline discussion, but for the benefit of other reviewers and future readers of this code, this rationale should be documented alongside the encoding. Having those arguments written down would help assess whether the additional complexity is justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a union/struct instead of all that manual masking and shifting?
I agree that encoding them in this way should give a rationale for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using some combination of union/struct a couple of different ways, but
the the result seemd significantly worse than what I published. But some
simplifications since those attempts, plus an idea for a different way of
structuring things, and I've now got something union/struct-based that seems
better.
It could be slightly further improved in syntax if I was willing to drop the
CounterState::_cf member name and make it an anonymous struct. That's a C11
feature that is provided as a C++ extension by all of our supported compilers.
You might not want to look at the difference between the two commits very
much, but instead look at the updated complete change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also expanded commentary for PartialArrayStateManager::_counters
. Maybe that will help?
// | ||
// - releasing: When an allocator is destroyed the manager transitions to this | ||
// phase. It remains in this phase until all extent allocators associated with | ||
// this manager have been destroyed. During this phase, new allocators man not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// this manager have been destroyed. During this phase, new allocators man not | |
// this manager have been destroyed. During this phase, new allocators may not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// Encodes 2 values, in an atomic unit. | ||
// - low half: allocators constructed | ||
// - high half: allocators destructed (debug only) | ||
volatile CounterState _counters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a union/struct instead of all that manual masking and shifting?
I agree that encoding them in this way should give a rationale for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments/suggestions.
uint _max_allocators; | ||
|
||
// CounterValues is an integral type large enough to encode a pair of | ||
// allocator counters as a single unit for atomic manipulation. | ||
using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t); | ||
using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the max-value has type uint
, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed 16bits of worker threads was quite sufficient for a 32bit platform.
And I misremembered and thought 32bit platforms couldn't be relied upon for a
64bit atomic add and maybe other 64bit operations. And this code is definitely
not super performance critical. So yeah, I could drop the platform-conditional
definition of Counter. I don't think it makes much difference to the code. I
guess the type aliases could be dropped and just use bare uint32/64_t. Not
sure that's actually an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unifying 32 and 64 bit system is an improvement -- being able to reason with concrete types. As for type aliases, it's rather subjective; I find uint*_t
more familiar, but up to you.
// number of constructed allocators and one is the number of destructed | ||
// allocators. The counters are atomic to permit concurrent construction, | ||
// and to permit concurrent destruction. They are an atomic unit to detect | ||
// and reject mixing the two phases, without concern for questions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice that this library can detect and reject misuse (such as mixing two phases), but I’m not sure why so much effort was spent preventing this. None of the existing users of the library are expected to mix phases in the near future. Could we instead document that mixing two phases is not permitted, and if someone chooses to do so, they do so at their own risk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, only 2 of the nearly a dozen(?) potential clients are using this. I'm
not sure that none of them are going to have workers that do some of their
setup after being started. Hence the desire to support concurrency. And if
that, then I feel better about it if there's some usage validation. But maybe
it would be better to just throw a lock at the problem. And if it turns out
none of the use-cases end up needing that concurrency, then I won't object to
a little bit of code simplification.
Since y'all (especially @albertnetymk ) seem to really dislike the phase checking, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still good.
: _arenas(NEW_C_HEAP_ARRAY(Arena, max_allocators, mtGC)), | ||
_max_allocators(max_allocators), | ||
_registered_allocators(0), | ||
_released_allocators(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_released_allocators
is a debug only variable, should fail in release build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. It seems I haven't done a release build since my final touch-up to make it debug-only.
I'll push an update once it's been through our CI.
head->~FreeListEntry(); | ||
p = head; | ||
} | ||
return ::new (p) PartialArrayState(src, dst, index, length, initial_refcount); | ||
} | ||
|
||
void PartialArrayStateAllocator::Impl::release(uint worker_id, PartialArrayState* state) { | ||
void PartialArrayStateAllocator::release(PartialArrayState* state) { | ||
size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why release
order is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the usual reference counting dance. Except, where did the
acquire disappear to? There should be an acquire on the refcount == 0 branch!
Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of
tests were run without noticing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. My next question is that, if PartialArrayState
is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's needed. The purpose of this release/acquire pair is to ensure there
is a happens-before chain between the use of a State and it's cleanup/reuse.
Transfers through the taskqueue don't help with that.
In other places, we have operations on one side of the taskqueue that need to
be ordered wrto operations on the other side of the taskqueue. We don't have
that here.
Consider two threads which have both obtained access to a State. (At least
one of them must have obtained it via stealing from another thread.) Assume
these are the last two references to the State (it's refcount == 2), and no
further tasks for it will be needed. These two threads will use the State
(getting source/destination, claiming a chunk), and then release the State,
decrementing its refcount. So one of them will decrement to 0. We need to
ensure the cleanup that follows that can't corrupt the use by the accesses
made by the other thread, by ensuring those accesses happen-before the
cleanup. There is no intervening taskqueue manipulation in this scenario.
The operations we need to order are all on the same side of the taskqueue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for all the reviews and discussion. |
/integrate |
Going to push as commit dbf48a5.
Your commit was automatically rebased without conflicts. |
@kimbarrett Pushed as commit dbf48a5. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This change splits the existing PartialArrayStateAllocator class into an
allocator class and a manager class. The allocator class is per worker
thread. The manager class provides the memory management context for a
group of allocators.
This change is in preparation for some other refactorings around partial array
state handling. That work is intended to make it easier for various
collections to adopt the use of that mechanism for chunking the processing of
large objArrays.
The new implementation for the memory management context is based on the
existing one, with an Arena per worker, now controlled by the manager object.
Potential improvements to that can be explored in the future. Some ideas
include improvements to the Arena API or a single thread-safe Arena variant
(trading slower arena allocation (which is the slow path) for less memory
usage).
G1 has a single manager, reused by each young/mixed GC. Associated state
allocators are nested in the per-worker structures, so deleted at the end of
the collection. The manager is reset at the end of the collection to allow the
memory to be recycled. It is planned that the STW full collector will also use
this manager when converted to use PartialArrayState. So it will be reused by
all STW collections.
ParallelGC has a single manager, reused by each young collection. Because the
per-worker promotion managers are never destroyed, their nested state
allocators are never destroyed. So the manager is not reset, instead leaving
previously allocated states in the allocator free lists for use by the next
collection. This means the full collector won't be able to use the same
manager object as the young collectors.
Testing: mach5 tier1-5
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287
$ git checkout pull/22287
Update a local copy of the PR:
$ git checkout pull/22287
$ git pull https://git.openjdk.org/jdk.git pull/22287/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 22287
View PR using the GUI difftool:
$ git pr show -t 22287
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22287.diff
Using Webrev
Link to Webrev Comment