Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8252103: Parallel heap inspection for ParallelScavengeHeap #25

Closed
wants to merge 15 commits into from

Conversation

linzang
Copy link
Contributor

@linzang linzang commented Sep 6, 2020


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Linux x86 Windows x64 macOS x64
Build ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed)

Issue

  • JDK-8252103: Parallel heap inspection for ParallelScavengeHeap

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/25/head:pull/25
$ git checkout pull/25

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 6, 2020

👋 Welcome back lzang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 6, 2020

@linzang The following labels will be automatically applied to this pull request: hotspot-gc. When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

@openjdk openjdk bot added the hotspot-gc label Sep 6, 2020
@openjdk openjdk bot added the rfr label Sep 6, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 6, 2020

Webrevs

@linzang
Copy link
Contributor Author

@linzang linzang commented Sep 8, 2020

/label add "serviceablility"

@openjdk
Copy link

@openjdk openjdk bot commented Sep 8, 2020

@linzang Usage: /label <add|remove> [label[, label, ...]]wherelabel` is an additional classification that should be applied to this PR. These labels are valid:

  • serviceability
  • hotspot
  • sound
  • hotspot-compiler
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • 2d
  • security
  • swing
  • hotspot-runtime
  • jmx
  • build
  • nio
  • beans
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr
  • awt

@linzang
Copy link
Contributor Author

@linzang linzang commented Sep 8, 2020

/label add serviceability

@openjdk openjdk bot added the serviceability label Sep 8, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Sep 8, 2020

@linzang
The serviceability label was successfully added.

@linzang
Copy link
Contributor Author

@linzang linzang commented Sep 11, 2020

Dear All,
May I ask your help to review this PR? Thanks!
-Lin

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 9, 2020

@linzang This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Oct 9, 2020

Hi Lin,

Sorry for not getting to this sooner. One of the reasons is that I haven't had time to explore a better solution, but the above notification triggered me to at least share my thoughts.

I would like us to avoid having the logic that check how many workers are doing the work and instead have some mechanism that claim different chunks of work. You can compare it a bit to G1 where we have the HeapRegionClaimer that make sure only one thread handles a given region. This claimer needs to be a bit different and allow claiming of eden, to-space, from-space and then multiple chunks of old. But I believe such solution would both be more efficient (since all threads can help out on old in the end) and easier to follow (no special cases).

So basically the PSScavengeParallelObjectIterator need to set up this claimer and pass it down to the workers and all workers than try to do all work, but only the one getting the claim will do the actual work. What do you think about this approach? Do you understand what I'm after?

@linzang
Copy link
Contributor Author

@linzang linzang commented Oct 9, 2020

Hi @kstefanj, Thanks a lot for your comments.
I think there can be a reclaimer that treat eden space, from space, to space and "blocks" in old space as parallel iteration candidates, then every workers try to claim their ownership of the candidate that it is going to scan. So there is no need to check worker numbers and Id for processing different spaces.
what do you think?

BTW, Paul (hohensee@amazon.com) also helped review this patch and send the comment in mail list, I will paste his comments here and then try to polish a patch with all of your comments considered.

@linzang
Copy link
Contributor Author

@linzang linzang commented Oct 9, 2020

Here is the review comments from Paul, copied from hotspot-gc-dev digest:

Message: 3
Date: Mon, 28 Sep 2020 22:39:38 +0000
From: "Hohensee, Paul" hohensee@amazon.com
To: Lin Zang lzang@openjdk.java.net,
"hotspot-gc-dev@openjdk.java.net" hotspot-gc-dev@openjdk.java.net,
"serviceability-dev@openjdk.java.net"
serviceability-dev@openjdk.java.net
Subject: RE: RFR: 8252103: Parallel heap inspection for
ParallelScavengeHeap
Message-ID: 25A4DC80-74A8-4C99-AEF4-7E989317B18A@amazon.com
Content-Type: text/plain; charset="utf-8"

I'm not a GC specialist, but your approach looks reasonable to me.

parallelScavengeHeap.cpp:

"less that 1 workers" -> "less that 1 worker"

psOldGen.cpp:

">=2" -> ">= 2"

"thread_num-2 worker" -> "(thread_num -2) workers"

"cross blocks" -> "crosses blocks"

"that the object start address locates in the related block" -> "that owns the block within which the object starts"

blocks_iterate_parallel() relies on BLOCK_SIZE being an integral multiple of a start_array() block (i.e., ObjectStartArray::block_size). I'd add an assert to that effect.

The comment on the definition of BLOCK_SIZE is "64 KB", but you're doing pointer arithmetic in terms of HeapWords, which > means the actual byte value is either 256KB or 512KB for 32- and 64-bit respectively. You could use

const int BLOCK_SIZE = (64 * 1024 / HeapWordSize);

"// There is objects" -> "There are objects"

"reture" -> "return"

", here only focus objects" -> ". Here only focus on objects"

It's a matter of taste, but imo the loop in blocks_iterate_parallel() would be more clear as

for (HeapWord* p = start; p < end; p += oop(p)->size()) {
cl->do_object(oop(p));
}

psOldGen.hpp:

blocks_iterate_parallel() is pure implementation and I don't see a reference outside psOldGen.cpp. So, it should be private, > not public.

psYoungGen.cpp:

"<=1" -> "<= 1"

Thanks,
Paul

Thanks for Paul's help and effort for reviewing, I will make a refined patch based on your comments.

@linzang linzang force-pushed the jmap-par branch 2 times, most recently from 3672513 to 05f3c64 Compare Oct 19, 2020
@linzang
Copy link
Contributor Author

@linzang linzang commented Oct 19, 2020

Dear @stefank,
I have update this PR that use a claimer to help worker thread do parallel iteration. would you like to help review again?
Thanks,
Lin

@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang Usage: /label <add|remove> [label[, label, ...]]wherelabel` is an additional classification that should be applied to this PR. These labels are valid:

  • serviceability
  • hotspot
  • sound
  • hotspot-compiler
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • 2d
  • security
  • swing
  • hotspot-runtime
  • jmx
  • build
  • nio
  • beans
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr
  • awt

@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang The serviceability label was already applied.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command.

Applicable Labels
  • 2d
  • awt
  • beans
  • build
  • compiler
  • core-libs
  • hotspot
  • hotspot-compiler
  • hotspot-gc
  • hotspot-jfr
  • hotspot-runtime
  • i18n
  • javadoc
  • jdk
  • jmx
  • kulla
  • net
  • nio
  • security
  • serviceability
  • shenandoah
  • sound
  • swing

@openjdk openjdk bot removed the serviceability label Oct 20, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang Usage: /label <add|remove> [label[, label, ...]]wherelabel` is an additional classification that should be applied to this PR. These labels are valid:

  • serviceability
  • hotspot
  • sound
  • hotspot-compiler
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • 2d
  • security
  • swing
  • hotspot-runtime
  • jmx
  • build
  • nio
  • beans
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr
  • awt

@openjdk openjdk bot added the serviceability label Oct 20, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang
The serviceability label was successfully added.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang Usage: /label <add|remove> [label[, label, ...]]wherelabel` is an additional classification that should be applied to this PR. These labels are valid:

  • serviceability
  • hotspot
  • sound
  • hotspot-compiler
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • 2d
  • security
  • swing
  • hotspot-runtime
  • jmx
  • build
  • nio
  • beans
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr
  • awt

@openjdk
Copy link

@openjdk openjdk bot commented Oct 20, 2020

@linzang The serviceability label was already applied.

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Nov 3, 2020

I tried to prototype a version that tries to keep the ParallelScavengeHeap / PSOldGen / MutableSpace abstraction, but your suggestion to keep the block_iterate method in PSOldGen is probably best.

While doing that I also touched up comments and naming a bit. The result is at tschatzl@cdfe0db.

Curiously Stefan told me when looking at this, it is closer to your initial code for claim indices: it starts indices for old gen with 2.

:)

He mentioned that he would be good with either variant, with me slightly favouring that original one. Either way, generally the current code is serviceable, but maybe the above changes of mine (mostly naming) could be incorporated in this change to not let them go to waste.

Copy link
Member

@albertnetymk albertnetymk left a comment

Not a review, just some minor comments in passing.

while (claimer->claim_and_get_block(&block_index)) {
if (block_index == HeapBlockClaimer::EdenIndex) {
young_gen()->eden_space()->object_iterate(cl);
} else if (block_index == HeapBlockClaimer::SurvivorIndex) {
young_gen()->from_space()->object_iterate(cl);
young_gen()->to_space()->object_iterate(cl);
} else {
old_gen()->block_iterate(cl, (size_t)block_index);
}
}
Copy link
Member

@albertnetymk albertnetymk Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure seems to suggest that each if case could be hit in every iteration, but actually only the third case will be revisited more than once. I wonder if this could be made more explicit.

Copy link
Contributor Author

@linzang linzang Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Albert! Correct, the OldGen part will be revisit much for frequently. do you have any clue to optimize this? I was considering the marco like "likely()" but I am not sure whether it is acceptable for hotspot.

Copy link
Member

@albertnetymk albertnetymk Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think readability is the issue here; below is my attempt with the new claim_and_get_block. Don't think there's any performance diff though.

size_t block_index = claimer->claim_and_get_block();
if (block_index == HeapBlockClaimer::EdenIndex) {
  young_gen()->eden_space()->object_iterate(cl);
  block_index = claimer->claim_and_get_block();
}
if (block_index == HeapBlockClaimer::SurvivorIndex) {
  young_gen()->from_space()->object_iterate(cl);
  young_gen()->to_space()->object_iterate(cl);
  block_index = claimer->claim_and_get_block();
}
while (block_index != HeapBlockClaimer::InvalidIndex) {
  old_gen()->object_iterate_block(cl, block_index - HeapBlockClaimer::NumNonOldGenClaims);
  block_index = claimer->claim_and_get_block();
}

PS: I didn't test the code, so please take it as a quick sketch.

Copy link
Contributor Author

@linzang linzang Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Albert,
Thanks for suggestion and I get your point. I have a little concern with this change: it presumes that EdenIndex and SurvivorIndex are fix to be valued as the begining of the indices (0 and 1), otherwise there are chances that youngGen may never get iterated. I am not sure whether this assumption should be kept (or at least add the info in comments?) , although the current implementation of using size_t for indices follows the assumption.

Thanks,
Lin

Copy link
Member

@albertnetymk albertnetymk Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it presumes that EdenIndex and SurvivorIndex are fix to be valued as the begining of the indices (0 and 1)

I don't think that's problematic; the assumption, the iteration follows the order of eden, from, to, and old-gen, actually makes the code easier to read, I believe. Having said that, I have no strong opinion on this; it's up to you.

Copy link
Contributor Author

@linzang linzang Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the order of iteration following eden,from,to and old-gen sounds reasonable. Thanks for explaination, will made the change in the new commit.

Copy link
Member

@albertnetymk albertnetymk Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change.

bool claim_and_get_block(ssize_t* block_index) {
assert(block_index != NULL, "Invalid index pointer");
*block_index = Atomic::fetch_and_add(&_claimed_index, 1);
ssize_t iterable_blocks = (ssize_t)ParallelScavengeHeap::heap()->old_gen()->iterable_blocks();
if (*block_index >= iterable_blocks) {
return false;
}
return true;
}
Copy link
Member

@albertnetymk albertnetymk Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the interface would be cleaner with returning block_index directly, i.e. ssize_t claim_and_get_block(). When the index goes out of range, return an InvalidIndex, which could be defined as -3, like EdenIndex above.

Copy link
Contributor Author

@linzang linzang Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. I have thought about it when writing this method, but I was not able to find a value of InvalidIndex. After reconsider it, plus that I will make block_index to be unsigned to avoid signed/unsigned conversion, I think might be the InvalidIndex could be max_size_t (that is (size_t)-1). what do you think?

Copy link
Member

@albertnetymk albertnetymk Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new version looks very nice; thank you for the change.

@linzang
Copy link
Contributor Author

@linzang linzang commented Nov 4, 2020

Dear @tschatzl ,
Thanks for your proposed fix, I also prefer it because it could avoid the conversion between signed and unsigned. I will make a new commit to incorporate it.

Thanks!
Lin

tschatzl and others added 2 commits Nov 4, 2020
- use unsigned data type for claim index, removing a few casts
- renamings
- some minor code simplifications
@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Nov 4, 2020

I recommend changing the (size_t)-1 assignment to SIZE_MAX.

While there is (to my knowledge) no compiler (that supports the JVM) that does not use two's complement for integer representation, officially only with C++20 (or later) the result of this operation is exactly defined. We know that other code just does the cast as this does ;), but if you use SIZE_MAX, the comment above the definition of InvalidIndex can also be removed then.

I would also move InvalidIndex as the first in the list of constants because it's not really related to numbers assigned because of the spaces (while NumNonOldGenClaims is).

Lgtm otherwise. Thanks for bearing with us.

@linzang
Copy link
Contributor Author

@linzang linzang commented Nov 4, 2020

Dear @tschatzl @kstefanj and @albertnetymk,
So appericated for your suggestion and guidance. I have upload a new commit.
would you like to help review it again.

BRs,
Lin

Co-authored-by: Stefan Johansson <54407259+kstefanj@users.noreply.github.com>
Copy link
Contributor

@kstefanj kstefanj left a comment

I'm good with the current state of the code. Will do some additional testing before being ready to sponsor though.

@openjdk
Copy link

@openjdk openjdk bot commented Nov 4, 2020

@linzang This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8252103: Parallel heap inspection for ParallelScavengeHeap

Reviewed-by: sjohanss, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 119 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kstefanj, @tschatzl) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Nov 4, 2020
Copy link
Contributor

@tschatzl tschatzl left a comment

Please remove the size_t cast before SIZE_MAX. SIZE_MAX defines the maximum value that a size_t can hold, so it must be of that type.

Approving with that minor fix request. @kstefanj will sponsor it.

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Nov 5, 2020

@linzang the testing looks good and we are ready to take this in. Before doing /integrate please merge the latest changes into your branch, I will issue the sponsor command once this is done.

@linzang
Copy link
Contributor Author

@linzang linzang commented Nov 5, 2020

Hi @kstefanj,
Thanks for help push it .
The latest changes has already at branch jmap-par on linzang/jdk. I will add "/integrate"
BRs,
Lin

@linzang
Copy link
Contributor Author

@linzang linzang commented Nov 5, 2020

/integrate

@openjdk openjdk bot added the sponsor label Nov 5, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Nov 5, 2020

@linzang
Your change (at version 4cb0866) is now ready to be sponsored by a Committer.

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Nov 5, 2020

@linzang, sorry I was unclear. I meant merge the latest master, but since this change is not touching to many files and no conflicts are reported I guess its fine to let the bots to the automatic rebase.

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Nov 5, 2020

/sponsor

@openjdk openjdk bot closed this Nov 5, 2020
@openjdk openjdk bot added integrated and removed sponsor ready labels Nov 5, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Nov 5, 2020

@kstefanj @linzang Since your change was applied there have been 119 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit a6ce6a5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk openjdk bot removed the rfr label Nov 5, 2020
@linzang linzang deleted the jmap-par branch Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc integrated test-request
5 participants