Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254739: G1: Optimize evacuation failure for regions with few failed objects #5181

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Aug 19, 2021

This is a try to optimize evcuation failure for regions.
I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure.

I have tested it with following parameters,

  • -XX:+ParallelGCThreads=1/32/64
  • -XX:G1EvacuationFailureALotInterval=1
  • -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000

It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time.

It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as 2. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8254739: G1: Optimize evacuation failure for regions with few failed objects

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181
$ git checkout pull/5181

Update a local copy of the PR:
$ git checkout pull/5181
$ git pull https://git.openjdk.java.net/jdk pull/5181/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5181

View PR using the GUI difftool:
$ git pr show -t 5181

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5181.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Aug 19, 2021

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Aug 19, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Aug 19, 2021

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc label Aug 19, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Aug 19, 2021

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Aug 23, 2021

Initial feedback: we do not care about cases like G1EvacuationFailureALotCount at this time: for current evacuation failure the amount of failed objects is supposed to be small (due to G1HeapReservePercent), and if they are large, we are likely going into a full gc anyway.
When reusing this mechanism for region pinning, there is unfortunately no particular limit on the live objects to be expected apart from something like "1/2 the objects" as this can occur with any region in the young gen (we can just skip selecting old gen regions for evacuation in that case), so performance should be good for that use case, more so than now.

One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every Node object takes twice the needed data, i.e. the reference plus next link, plus low-level malloc overhead), and obviously traversing all links is kind of slow.
Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per HeapRegion the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays).
Something like G1CardSetAllocator with an element type of G1CardSetArrayor the dirty card queue set allocation (BufferNode::Allocator for the allocation only part); unfortunately the code isn't that generic to be used as is for you.

I am also not really convinced that the failed object lists should be attached to HeapRegion, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for HeapRegion specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of HeapRegion there then).

Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think.

Do you have a breakdown of the time taken for the parts of the G1EvacuationFailureObjsInHR::iterate() method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case.

I understand this is a bit hard to measure in a realistic environment because while -XX:G1EvacuationFailureALot is good for implementing and debugging, but the distribution of failures is probably a lot different than real evacuation failures. Maybe I could help you with finding some options to configure some existing stress tests (GCBasher, ...) to cause evacuation failures?

Later I will follow up with a more thorough look.

Hth,
Thomas

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Aug 25, 2021

Thanks a lot for the detailed review and suggestion.

Initial feedback: we do not care about cases like G1EvacuationFailureALotCount at this time: for current evacuation failure the amount of failed objects is supposed to be small (due to G1HeapReservePercent), and if they are large, we are likely going into a full gc anyway.
When reusing this mechanism for region pinning, there is unfortunately no particular limit on the live objects to be expected apart from something like "1/2 the objects" as this can occur with any region in the young gen (we can just skip selecting old gen regions for evacuation in that case), so performance should be good for that use case, more so than now.

I agree.

One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every Node object takes twice the needed data, i.e. the reference plus next link, plus low-level malloc overhead), and obviously traversing all links is kind of slow.
Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per HeapRegion the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays).
Something like G1CardSetAllocator with an element type of G1CardSetArrayor the dirty card queue set allocation (BufferNode::Allocator for the allocation only part); unfortunately the code isn't that generic to be used as is for you.

I have written a simple version of the "array" for this specific usage, I do not implement a buffer to cache the "node" used in "array". Seems it's getting better performance than my previous linked list version. ( I measure the end-to-end time simply)
Maybe later I could make it more generic or merge with the card-set one? If it's feasible, should we do it in a separate issue?

I am also not really convinced that the failed object lists should be attached to HeapRegion, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for HeapRegion specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of HeapRegion there then).

I agree. We could refactor it thoroughly. Could we do it in a separate issue? please kindly point to the issue if you already track this with an issue.

Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think.

yes.

Do you have a breakdown of the time taken for the parts of the G1EvacuationFailureObjsInHR::iterate() method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case.

I will measure the time of the new implementation with "array", and update the information later.

I understand this is a bit hard to measure in a realistic environment because while -XX:G1EvacuationFailureALot is good for implementing and debugging, but the distribution of failures is probably a lot different than real evacuation failures. Maybe I could help you with finding some options to configure some existing stress tests (GCBasher, ...) to cause evacuation failures?

Thanks a lot, this will be great helpful.

Later I will follow up with a more thorough look.

Hth,
Thomas

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Aug 25, 2021

Hi,

One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every Node object takes twice the needed data, i.e. the reference plus next link, plus low-level malloc overhead), and obviously traversing all links is kind of slow.
Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per HeapRegion the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays).
Something like G1CardSetAllocator with an element type of G1CardSetArrayor the dirty card queue set allocation (BufferNode::Allocator for the allocation only part); unfortunately the code isn't that generic to be used as is for you.

I have written a simple version of the "array" for this specific usage, I do not implement a buffer to cache the "node" used in "array". Seems it's getting better performance than my previous linked list version. ( I measure the end-to-end time simply)
Maybe later I could make it more generic or merge with the card-set one? If it's feasible, should we do it in a separate issue?

Yes, we can merge this later; there is the related JDK-8267834: Refactor G1CardSetAllocator and BufferNode::Allocator to use a common base class that may provide some useful base.

I am also not really convinced that the failed object lists should be attached to HeapRegion, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for HeapRegion specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of HeapRegion there then).

I agree. We could refactor it thoroughly. Could we do it in a separate issue? please kindly point to the issue if you already track this with an issue.

I am currently working on separating the G1 young collection algorithm from G1CollectedHeap in JDK-8253343: Extract G1 Young GC algorithm related code from G1CollectedHeap prototype available here), after that I intend to try to localize young-collection only data structures as suggested. I filed 8272978: Factor out g1 young collection specific data structures where a note about this data structure could be added.

Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think.

yes.

I filed JDK-8272977 to not forget on this.

Do you have a breakdown of the time taken for the parts of the G1EvacuationFailureObjsInHR::iterate() method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case.

I will measure the time of the new implementation with "array", and update the information later.

Thanks,
Thomas

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Aug 26, 2021

The test based on new version (segmented array) shows that most (more than 90%) of iteration time is spent on "iterate_internal", compact cost almost no time, and less than 10% time is spent on sort.

And I also attach the perf data on the JBS bug, for "end to end" time/"pause young" time/"Evacuate Collection Set” time/"Post Evacuate Collection Set" time/"Remove Self Forwards" time. Generally I think the new implemention works well for G1EvacuationFailureALotCount == 1/2/..., not just for G1EvacuationFailureALotCount >= 10.

( Although I'm not sure why this optimization also gets better data than origin on "Evacuate Collection Set", in my mind it should cost more because we "record" more information when evacuation. Am I miss something here? )

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Aug 31, 2021

Just fyi, I am looking at the change, but it's a relatively significant patch and I want to do some local testing too.

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Sep 10, 2021

Performance is good as is, particularly the Remove self-forwards phase is much much faster now. I added a figure to the CR here. Really nice.

Although there are a few existing abnormalities with this change (not newly introduced, the old code is as bad), see JDK-8273309.

There are a few things that need to be improved:

  • we talked about this before, but I do not think putting G1EvacuationFailureObjsInHR, which is something only used in evacuation failure, into HeapRegion, should be done. At least at the moment, it is almost never used.
  • I do not like the use and the implementation of G1EvacuationFailureObjsInHR: it recreates something thatt is done better elsewhere (mark stack, DCQS buffer allocator, and in particular in the remembered set cardset allocator) with less features, in a non-fitting style.
    Examples are something like reuse of available chunks, concurrent freeing of chunks, automatic resizing of blocks on demand to decrease allocations and potentially other stuff seems something we would really really want.
  • as far as I understand the implementation of G1EvacuationFailureObjsInHR ;) - there is zero documentation about the basic structure - it seems to allocate quite a lot of memory that is almost never used. Even in the case of evacuation failure it's likely to be almost empty.
    G1EvacuationFailureObjsInHR seems to basically be a preallocated ArrayList of chunks (the _array_list member).
    Afaict it uses like 1/256 of total heap size (i.e. 0.3%), that's way way too much ( 1 / (256 * sizeof(HeapWord)) * sizeof(pointer); in G1EvacuationFailureObjsInHR.cpp:86; the Array constructor in `G1EvacuationFailureObjsInHR.hpp:124)) ). We will also get into trouble about startup time about this, I'm sure. That, tbh, alone makes it a no-go if I did not miscalculate.

Let's work on renaming and reusing the infrastructure provided by the remembered set (G1CardSetAllocator etc) in g1CardSetMemory.hpp.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Sep 11, 2021

Agree, I will first work on https://bugs.openjdk.java.net/browse/JDK-8273626 which is to refactor G1CardSetAllocator and related classes to support element size less pointer size. The pr is at #5478

@Hamlin-Li Hamlin-Li closed this Sep 29, 2021
@Hamlin-Li Hamlin-Li deleted the speedup-iterate-evac-failure-objs-in-one-region branch Sep 29, 2021
@Hamlin-Li Hamlin-Li restored the speedup-iterate-evac-failure-objs-in-one-region branch Sep 29, 2021
@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Sep 29, 2021

Seems I have accidently closed the pr, reopen it.

@Hamlin-Li Hamlin-Li reopened this Sep 29, 2021
Copy link
Contributor

@tschatzl tschatzl left a comment

There are quite a few comments from me about this change, and at some point I decided to provide a change that would cover these and more suggestions I had (note that this change might contradict some of the other suggestions - sometimes this happens when actually tinkering and seeing the whole code); communicating via text is too cumbersome. The commit containing the direction I think the change should go is available at b793f5b .

Mostly it is about hiding the code for preparing and iterating the sorted list of objects that failed evacuation.

Maybe G1EvacFailureObjsInHR should be renamed to something like G1EvacFailedObjectsSet (I do not think the HR postfix is important).

As for the other question, for now I think it is good to keep the G1EvacFailureObjsInHR in HeapRegion.

Please have a look.

Fwiw, I did some cursory testing of this change, currently running tier1-5 in our internal CI without issues so far (~half done). Still there might be bugs :)

src/hotspot/share/gc/g1/g1EvacFailure.cpp Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailureObjsInHR.hpp Outdated Show resolved Hide resolved
@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Oct 29, 2021

Thanks a lot Thomas, I like the refacoring :) Based on your comments and code, I made some adjustment.
For renaming visit_elem, as it's a callback by G1SegmentedArrayBuffer, so I add some comments instead.

BTW, Not sure why, but seems the rename from g1EvacFailureObjsInHR.cpp to g1EvacFailureObjectsSet.cpp is not recongnized by git.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Oct 29, 2021

@tschatzl Hi Thomas, is there any way I can reproduce this compilation error (drop_all reference) on my local?
I use "bash configure; make images CONF=rel", there is no error on linux x86_64.

return cast_to_oop(_bottom + offset);
}

G1EvacFailureObjectsSet::OffsetInRegion G1EvacFailureObjectsSet::cast_to_offset(oop obj) const {
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be to_offset to match from_offset (forgot that) - this is not a cast to me (changing the interpretation of a bit pattern) but a transformation (changing the bit pattern for storage savings).

So cast seems to be the wrong word to me here.

G1EvacFailureObjectsSet::G1EvacFailureObjectsSet(uint region_idx, HeapWord* bottom) :
DEBUG_ONLY(_region_idx(region_idx) COMMA)
_bottom(bottom),
_offsets("", &_alloc_options, &_free_buffer_list) {
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I recently removed the first parameter of G1SegmentedArray. Please make sure you merge with tip before pushing.

friend class G1SegmentedArray<OffsetInRegion, mtGC>;
friend class G1SegmentedArrayBuffer<mtGC>;

G1EvacFailureObjectsSet* _collector;
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably rename _collector to _objects_set or so to match the type better.


// Helper class to join, sort and iterate over the previously collected segmented
// array of objects that failed evacuation.
class G1EvacFailureObjectsIterator {
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not an Iterator in C++ STL sense, so it is a good idea to not name it like that. I understand that Hotspot code (and even the recent remembered set code - there is a CR to fix that) adds to the confusion, but it would be nice to not add to the confusion.

#ifdef ASSERT
// Callback of G1SegmentedArrayBuffer::iterate_elems
// Verify a single element in a segment node
void visit_elem(void* elem) {
uint* ptr = (uint*)elem;
_collector->assert_is_valid_offset(*ptr);
}
#endif
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally removed this visit_elem method in the original suggestion because I thought that to be too much checking. It is kind of obvious that we only store data in here.
We check that these are valid offsets both when writing to and reading from the array, so this seems to be unnecessary triple-checking.

Fwiw, in Hotspot code we do not use the visit_ prefix but do_ (and pass something that ends with a Closure, not a Visitor in the iterate* methods).
I am aware that in design pattern lingo this is the Visitor pattern, but Hotspot is probably much older than that and it is somewhat jarring to have some code use this terminology and others use another one.

A change here needs to be discussed with a wider audience.


void HeapRegion::iterate_evac_failure_objs(ObjectClosure* closure) {
_evac_failure_objs.iterate(closure);
}
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This one is fine to be placed in the cpp file - the additional single call per HeapRegion does not matter)

@@ -554,6 +557,11 @@ class HeapRegion : public CHeapObj<mtGC> {

// Update the region state after a failed evacuation.
void handle_evacuation_failure();
// Records evac failure objs during evaucation, this will help speed up iteration
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/evucation/evacuation.

Please try to avoid "evac failure objs" in text as it seems strange to me grammatically, and isn't particularly obvious what this is to readers not working on this all the time. Just say "Record an object that failed evacuation within this region.` I do not think the second part of the sentence is necessary, i.e. talking about speeding up something here.

// Records evac failure objs during evaucation, this will help speed up iteration
// of these objs later in *remove self forward* phase of post evacuation.
void record_evac_failure_obj(oop obj);
// Iterates evac failure objs which are recorded during evcauation.
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/evauation/evacuation

Better would probably be "Applies the given closure to all previously recorded objects that failed evacuation in ascending address order"

G1EvacFailureObjectsSet::OffsetInRegion G1EvacFailureObjectsSet::cast_to_offset(oop obj) const {
const HeapWord* o = cast_from_oop<const HeapWord*>(obj);
size_t offset = pointer_delta(o, _bottom);
assert_is_valid_offset(offset);
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The from_offset call below already does this assert.

friend class G1SegmentedArray<OffsetInRegion, mtGC>;
friend class G1SegmentedArrayBuffer<mtGC>;
Copy link
Contributor

@tschatzl tschatzl Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to make visit_* public here instead of the friends. This is an internal class not visible outside, and we actually use it as a closure.

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Oct 29, 2021

@tschatzl Hi Thomas, is there any way I can reproduce this compilation error (drop_all reference) on my local? I use "bash configure; make images CONF=rel", there is no error on linux x86_64.

Probably try to compile with --disable-precompiled-headers (i.e. add to configure options). I had this issue yesterday too, but it went away after some changes (and I've been running our CI successfully with my changes).

Dug into that a bit deeper - can you try
tschatzl@bc79db0 ?

The g1SegmentedArray.inline.hpp was not included in heapregion.inline.hpp, and that test was missing tons of includes anyway. Also we should clean up the .hpp files to not use methods from .inline.hpp - if so, they need to be moved to the .inline.hpp file too.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Oct 29, 2021

I tried --disable-precompiled-headers before, sometimes it helps to locate this kind of issue, but not for this time.
I will try your modification later.
Thansk a lot

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 3, 2021

Kindly reminder ~

Copy link
Contributor

@tschatzl tschatzl left a comment

I think this is good now - great!

There is one minor comment for some refactoring that you may want to consider.

join_and_sort();
iterate_internal(closure);
Copy link
Contributor

@tschatzl tschatzl Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably move the array allocation and freeing here instead of having this at the start and end of join_and_sort and iterate_internal respectively. Otherwise there is a hidden dependency on the first method allocating and the second freeing it, and looks cleaner as allocation and deallocation is obvious and on the same call level.

I.e.

  void iterate(ObjectClosure* closure) {
     uint num = _segments->num_allocated_nodes();
    _offset_array = NEW_C_HEAP_ARRAY(OffsetInRegion, num, mtGC);

    join_and_sort();
    iterate_internal(closure);

    FREE_C_HEAP_ARRAY(OffsetInRegion, _offset_array);
  }

Copy link
Author

@Hamlin-Li Hamlin-Li Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas, good catch.

@openjdk
Copy link

@openjdk openjdk bot commented Nov 3, 2021

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254739: G1: Optimize evacuation failure for regions with few failed objects

Reviewed-by: tschatzl, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 52 new commits pushed to the master branch:

  • d95299a: 8276634: Remove usePlainDatagramSocketImpl option from the test DatagramChannel/SendReceiveMaxSize.java
  • 3c0faa7: 8276173: Clean up and remove unneeded casts in HeapDumper
  • 323d201: 8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded
  • 96c396b: 8276151: AArch64: Incorrect result for double to int vector conversion
  • 7281861: 8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416
  • 8e17ce0: 8275185: Remove dead code and clean up jvmstat LocalVmManager
  • 396132f: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds
  • 9ad4d3d: 8276025: Hotspot's libsvml.so may conflict with user dependency
  • e21b5c7: 8276650: GenGraphs does not produce deterministic output
  • 7b1916e: 8233557: [TESTBUG] DoubleClickTitleBarTest.java fails on macOs
  • ... and 42 more: https://git.openjdk.java.net/jdk/compare/bb92fb02ca8c5795989065a9037748dc39ed77db...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Nov 3, 2021
Copy link
Contributor

@tschatzl tschatzl left a comment

Still good.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 4, 2021

Is someone else available to have a look on this change? Thanks

Copy link
Member

@albertnetymk albertnetymk left a comment

Only some subjective and minor comments.


template <class Elem, MEMFLAGS flag>
template <typename BufferClosure>
void G1SegmentedArray<Elem, flag>::iterate_nodes(BufferClosure& cloure) const {
Copy link
Member

@albertnetymk albertnetymk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: cloure.

Copy link
Author

@Hamlin-Li Hamlin-Li Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

void G1EvacFailureObjectsSet::iterate(ObjectClosure* closure) {
assert_at_safepoint();

G1EvacFailureObjectsIterationHelper helper(this);
helper.iterate(closure);

_offsets.drop_all();
}
Copy link
Member

@albertnetymk albertnetymk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having some destructive operations (drop_all) inside a method named iterate could come as a surprise, IMO. If I understand this correctly, the following would be problematic.

evac_failed_objects.iterate(closure1);
...
evac_failed_objects.iterate(closure2);

Copy link
Author

@Hamlin-Li Hamlin-Li Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop_all just returns buffers to free list, it will not destruct the buffers. So, iterate multiple times is OK, because next time it will get memory from free list or allocate a new buffer. Hope this answer your question.

Copy link
Member

@albertnetymk albertnetymk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since drop_all() resets all counters (e.g. _num_allocated_nodes), the subsequent iteration will think the array is empty, won't it?

Copy link
Author

@Hamlin-Li Hamlin-Li Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we modify the code as below:

void HeapRegion::iterate_evac_failure_objs(ObjectClosure* closure) {
_evac_failure_objs.iterate(closure);
_evac_failure_objs.iterate(closure);
}

For the second time iteration, all thing will be empty, so iterate_nodes will be an empty operation, QuickSort::sort too, and iterate_internal too. These empty operations will not do harm things.

Copy link
Contributor

@tschatzl tschatzl Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hamlin-Li : I think @albertnetymk concern is that typically an iterate method does not modify the list itself. That is surprising for readers. The documentation also does not indicate any of that. I do not think he believes this will cause a VM failure.

Maybe change HeapRegion::iterate_evac_failure_objs to call a (new) drop() method on _evac_failure_objs?

I think such a change would solve Albert's concerns.

An alternative could be renaming iterate to something else.

Copy link
Member

@albertnetymk albertnetymk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank Thomas for unpacking my concern more precisely. I used "problematic" to mean the second iteration will not do what developers expect it to do, not necessarily a VM crash.

Copy link
Author

@Hamlin-Li Hamlin-Li Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your clarification, Albert, Thomas, I see your point, it make sense to me.

As this patch has been blocking some other issues for a while, and I think it's better to think of some good solution for Albert's concern (seems add a drop is a little bit redundant for me :).)

If you don't mind, can I do this refinement later in another issue? Thanks

Copy link
Member

@albertnetymk albertnetymk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a rather minor issue as I stated originally; I am fine either way.

Copy link
Author

@Hamlin-Li Hamlin-Li Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I created JDK-8276721 to track it.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 5, 2021

I think the windows build failure is not related to this change, will push it.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 5, 2021

Thanks @tschatzl @albertnetymk for your reviews.

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Nov 5, 2021

Going to push as commit ed7ecca.
Since your change was applied there have been 62 commits pushed to the master branch:

  • 59c3dcc: 8276746: Add section on reproducible builds in building.md
  • 0e0dd33: 8276129: PretouchTask should page-align the chunk size
  • a472433: 8276572: Fake libsyslookup.so library causes tooling issues
  • b01f107: 8276252: java/nio/channels/Channels/TransferTo.java failed with OOM java heap space error
  • 92d2176: 8273967: gtest os.dll_address_to_function_and_library_name_vm fails on macOS12
  • a74a839: 8276571: C2: pass compilation options as structure
  • c393ee8: 8276632: Use blessed modifier order in security-libs code
  • 7023b3f: 8276628: Use blessed modifier order in serviceability code
  • b933136: 8276641: Use blessed modifier order in jshell
  • 0616d86: 8276635: Use blessed modifier order in compiler code
  • ... and 52 more: https://git.openjdk.java.net/jdk/compare/bb92fb02ca8c5795989065a9037748dc39ed77db...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 5, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 5, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 5, 2021

@Hamlin-Li Pushed as commit ed7ecca.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@Hamlin-Li Hamlin-Li deleted the speedup-iterate-evac-failure-objs-in-one-region branch Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc integrated
3 participants