Skip to content

8247928: Refactor G1ConcurrentMarkThread for mark abort #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Sep 7, 2020

Hi all,

this is a continuation of the review of the same CR at https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030794.html; the current change is "Version 3" above, the relevant email is https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030812.html copied verbatim here again:

Hi Kim,

On 03.09.20 19:17, Kim Barrett wrote:

On Sep 3, 2020, at 6:07 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
On 03.09.20 09:24, Kim Barrett wrote:

It seems backward that the phase functions return true to indicate aborted,
overflow, or whatever, and return false for successful completion. And
there's no mention of what the return values are in the description in the
header file.

I think the code is easier to read without the need to always negate the result. Ie. in the following example

if (!phase1) return;
if (!phase2) return;

the "!" detracts imho from reading the steps.

Another option I can see could be a macro hiding all this, e.g.

EXEC(phase1);
EXEC(phase2);

but I kind of dislike that even more.

Added documentation.

I agree a macro is not the way to go here. I'm less bothered by the need for
lots of preceeding "!" than you are, but I certainly see your point. So I'm
okay with the code as is, now that there's a comment to set expectations.

http://cr.openjdk.java.net/~tschatzl/8247928/webrev.1_to_2 (diff)
http://cr.openjdk.java.net/~tschatzl/8247928/webrev.2 (full)

I like the overall approach a lot. Thanks for doing this cleanup.

Although I have a number of comments, there's nothing major here.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
190 uint iter = 1;
191 while (true) {
...
209 log_info(gc, marking)("Concurrent Mark Restart for Mark Stack Overflow (iteration #%u)",
210 iter++);

I think that's the only use of iter, and that it's not used after the loop.
How about instead

for (uint iter = 1; true; ++iter) {

and remove the increment in the logging statement.

Done.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
191 while (true) {
...
211 }
212
213 Ticks mark_end = Ticks::now();
214 log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms",
215 mark_start.seconds(), mark_end.seconds(),
216 (mark_end - mark_start).seconds() * 1000.0);

This is a change to the logging. Some consequences I like, others I'm not so
sure about. You talked about some of this in the RFR description, but I
want to go through it carefully here.

  • This now includes the remark pause in the time, where previously that
    seemed to be intentionally excluded. I don't know the rationale for that
    exclusion, but it was pretty clearly intentional since it intentionally (per
    comment) contorted the code somewhat. So it's hard to decide whether this is
    okay. The remark time could be excluded though by capturing the mark_end
    time just before calling subphase_remark().

Since within the "Concurrent Mark" time span there can be any number of
other (gc) stw pauses, I am sceptical how subtracting only the Remark
pause this could be anything but misleading.

It could be certainly interesting to provide some concurrent duration
for the whole marking, but I think this out of scope here.

I do not know what the old "Concurrent Mark" end message actually
measured: if you look, the first message will not contain remark pause,
the next ones after a mark restart do contain them (so all time
excluding the last Remark pause). It is neither just the concurrent time
nor the full marking time. Not sure how useful that is.

This code simplifies the message to something like: from start of
marking related activity until all marking activity fully completed.

Actually there are still a few inconsistencies in the
(gc=info,gc+marking=info) log output. Let me give an example:

[23.160s][info][gc ] GC(225) Concurrent Cycle
[23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks
[23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks 0.009ms
[23.160s][info][gc,marking] GC(225) Concurrent Scan Root Regions
[23.209s][info][gc,marking] GC(225) Concurrent Scan Root Regions 48.811ms
[23.209s][info][gc,marking] GC(225) Concurrent Mark (23.209s)

^-- only the "Concurrent Mark" messages has start time - I think it is
superfluous, just repeating the output of UL.

[23.209s][info][gc,marking] GC(225) Concurrent Mark From Roots
[...]
[24.792s][info][gc,marking] GC(225) Concurrent Mark From Roots 1583.035ms
[24.792s][info][gc,marking] GC(225) Concurrent Preclean
[24.793s][info][gc,marking] GC(225) Concurrent Preclean 0.379ms
[24.886s][info][gc ] GC(225) Pause Remark 552M->225M(760M) 93.108ms
[24.886s][info][gc,marking] GC(225) Concurrent Mark (23.209s, 24.886s)
1676.831ms

^-- same here with start/end times.

[24.886s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
[...]
[31.592s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
6706.292ms
[31.594s][info][gc ] GC(225) Pause Cleanup 451M->451M(1024M) 0.774ms
[31.594s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark
[31.600s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark 6.298ms
[31.600s][info][gc ] GC(225) Concurrent Cycle 8439.504ms

I fixed those two in the next patch next week I will publish using a
github PR...

Another inconsistency is that the Concurrent Mark Abort message does not
contain any duration information (but UL captures the "end" time stamp).
But I would like to defer that (existing) issue.

  • It only does this logging once, rather than on each iteration. That seems
    good; as you pointed out in the RFR, multiples are pretty likely handled
    wrong by some uses.

  • This logging no longer happens if marking is aborted by remark. That's
    now consistent with other abort cases, so probably an improvement.

Agree to both.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
137 while (!should_terminate()) {
138 if (wait_for_next_cycle()) {
139 break;
140 }

This could instead just be

while (!wait_for_next_cycle()) { ...

But I already thought the sense of the wait_for_next_cycle result was
backward, and this makes an even stronger case for that. I think better
would be to reverse the sense of the result and use

while (wait_for_next_cycle()) { ...

+1. Done.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
65 bool phase_concurrent_cycle_start();

Unused and undefined.

Done.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
63 // ConcurrentGCBreakpoints must not be placed before the the root
64 // region scan phase too for this reason.

This comment seems misplaced here. I think it belongs in the implementation
of full_concurrent_cycle_do, in conjunction with the comment about not
returning before the scan root regions phase.

I think the rest of the comment about STS &etc also seems misplaced here and
should be similarly moved (more or less where it used to be).


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
66 bool phase_clear_cld_claimed_marks();

The result of this function is unused, because of the restriction against
returning before the scan root regions phase. I think I'd prefer it return
void, to make it clear there's nothing to be done with it. Even though that
makes its signature different from all the phase functions.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
81 // Perform a full concurrent cycle.

This comment seems kind of vacuous. The function also in some sense doesn't
even do that, since it doesn't include the start/end parts.


src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
70 bool mark_loop_needs_restart() const;

This one seems out of order or otherwise misplaced here. Seems like it
should be after subphase_remark. Or maybe it shouldn't be in the phase
function section at all.


All fixed.

Testing: tier1 currently running.

Thanks,
Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8247928: Refactor G1ConcurrentMarkThread for mark abort

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/44/head:pull/44
$ git checkout pull/44

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 7, 2020

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 7, 2020
@openjdk
Copy link

openjdk bot commented Sep 7, 2020

@tschatzl The following label will be automatically applied to this pull request: hotspot-gc.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 7, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 7, 2020

Webrevs

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on GitHub as well 👍

@openjdk
Copy link

openjdk bot commented Sep 7, 2020

@tschatzl This change now passes all automated pre-integration checks. In addition to the automated checks, the change must also fulfill all project specific requirements

After integration, the commit message will be:

8247928: Refactor G1ConcurrentMarkThread for mark abort

Reviewed-by: sjohanss, kbarrett
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /issue command.

Since the source branch of this PR was last updated there have been 46 commits pushed to the master branch:

  • 7ccf435: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend su…
  • d236cf4: 8252995: Non-PCH builds broken by JDK-8250961
  • 6a00534: 8250961: Move Universe::update_heap_info_at_gc to CollectedHeap
  • f78f780: 8252889: Obsolete -XX:+InsertMemBarAfterArraycopy
  • f933961: 8252980: comment only changes extracted from JDK-8247281
  • 4333942: 8250217: com.sun.tools.javac.api.JavacTaskImpl swallows compiler exceptions potentially producing false positive test results
  • 5166094: 8252957: Wrong comment in CgroupV1Subsystem::cpu_quota
  • 6329de4: 8248532: Every time I change keyboard language at my MacBook, Java crashes
  • d560964: 8252794: Creation of JNIMethodBlock should be done with a leaf lock
  • 5fef8dd: 8235229: Compilation against a modular, multi-release JAR erroneous with --release
  • ... and 36 more: https://git.openjdk.java.net/jdk/compare/e25f2a5715bd6a7b1ca4c7aab597c32b1f2e1367...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge master into your branch, and then specify the current head hash when integrating, like this: /integrate 7ccf4358256a0fef895e4081f90b04f71d21bf9d.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 7, 2020
@tschatzl
Copy link
Contributor Author

/reviewer 2

@tschatzl
Copy link
Contributor Author

@kimbarrett : could you have a look as the latest change is due to your comments? Thanks.

@openjdk
Copy link

openjdk bot commented Sep 10, 2020

@tschatzl Syntax: /reviewer (add|remove) [@user | openjdk-user]+. For example:

  • /reviewer add @openjdk-bot
  • /reviewer add duke
  • /reviewer add @user1 @user2

@tschatzl
Copy link
Contributor Author

/integrate

@openjdk openjdk bot closed this Sep 10, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 10, 2020
@openjdk
Copy link

openjdk bot commented Sep 10, 2020

@tschatzl Since your change was applied there have been 46 commits pushed to the master branch:

  • 7ccf435: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend su…
  • d236cf4: 8252995: Non-PCH builds broken by JDK-8250961
  • 6a00534: 8250961: Move Universe::update_heap_info_at_gc to CollectedHeap
  • f78f780: 8252889: Obsolete -XX:+InsertMemBarAfterArraycopy
  • f933961: 8252980: comment only changes extracted from JDK-8247281
  • 4333942: 8250217: com.sun.tools.javac.api.JavacTaskImpl swallows compiler exceptions potentially producing false positive test results
  • 5166094: 8252957: Wrong comment in CgroupV1Subsystem::cpu_quota
  • 6329de4: 8248532: Every time I change keyboard language at my MacBook, Java crashes
  • d560964: 8252794: Creation of JNIMethodBlock should be done with a leaf lock
  • 5fef8dd: 8235229: Compilation against a modular, multi-release JAR erroneous with --release
  • ... and 36 more: https://git.openjdk.java.net/jdk/compare/e25f2a5715bd6a7b1ca4c7aab597c32b1f2e1367...master

Your commit was automatically rebased without conflicts.

Pushed as commit 8db3335.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the 8247928-refactor-concurrent-mark-thread branch September 10, 2020 11:12
openjdk-notifier bot pushed a commit that referenced this pull request Oct 5, 2021
r18 should not be used as it is reserved as platform register. Linux is
fine with userspace using it, but Windows and also recently macOS (
openjdk/jdk11u-dev#301 (comment) )
are actually using it on the kernel side.

The macro assembler uses the bit pattern `0x7fffffff` (== `r0-r30`) to
specify which registers to spill; fortunately this helper is only used
here:
https://github.com/openjdk/jdk/blob/c05dc268acaf87236f30cf700ea3ac778e3b20e5/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp#L1400-L1404

I haven't seen causing this particular instance any issues in practice
_yet_, presumably because it looks hard to align the stars in order to
trigger a problem (between stp and ldp of r18 a transition to kernel
space must happen *and* the kernel needs to do something with r18). But
jdk11u-dev has more usages of the `::pusha`/`::popa` macro and that
causes troubles as explained in the link above.

Output of `-XX:+PrintInterpreter` before this change:
```
----------------------------------------------------------------------
method entry point (kind = native)  [0x0000000138809b00, 0x000000013880a280]  1920 bytes
--------------------------------------------------------------------------------
  0x0000000138809b00:   ldr x2, [x12, #16]
  0x0000000138809b04:   ldrh    w2, [x2, #44]
  0x0000000138809b08:   add x24, x20, x2, uxtx #3
  0x0000000138809b0c:   sub x24, x24, #0x8
[...]
  0x0000000138809fa4:   stp x16, x17, [sp, #128]
  0x0000000138809fa8:   stp x18, x19, [sp, #144]
  0x0000000138809fac:   stp x20, x21, [sp, #160]
[...]
  0x0000000138809fc0:   stp x30, xzr, [sp, #240]
  0x0000000138809fc4:   mov x0, x28
 ;; 0x10864ACCC
  0x0000000138809fc8:   mov x9, #0xaccc                 // #44236
  0x0000000138809fcc:   movk    x9, #0x864, lsl #16
  0x0000000138809fd0:   movk    x9, #0x1, lsl #32
  0x0000000138809fd4:   blr x9
  0x0000000138809fd8:   ldp x2, x3, [sp, #16]
[...]
  0x0000000138809ff4:   ldp x16, x17, [sp, #128]
  0x0000000138809ff8:   ldp x18, x19, [sp, #144]
  0x0000000138809ffc:   ldp x20, x21, [sp, #160]
```

After:
```
----------------------------------------------------------------------
method entry point (kind = native)  [0x0000000108e4db00, 0x0000000108e4e280]  1920 bytes

--------------------------------------------------------------------------------
  0x0000000108e4db00:   ldr x2, [x12, #16]
  0x0000000108e4db04:   ldrh    w2, [x2, #44]
  0x0000000108e4db08:   add x24, x20, x2, uxtx #3
  0x0000000108e4db0c:   sub x24, x24, #0x8
[...]
  0x0000000108e4dfa4:   stp x16, x17, [sp, #128]
  0x0000000108e4dfa8:   stp x19, x20, [sp, #144]
  0x0000000108e4dfac:   stp x21, x22, [sp, #160]
[...]
  0x0000000108e4dfbc:   stp x29, x30, [sp, #224]
  0x0000000108e4dfc0:   mov x0, x28
 ;; 0x107E4A06C
  0x0000000108e4dfc4:   mov x9, #0xa06c                 // #41068
  0x0000000108e4dfc8:   movk    x9, #0x7e4, lsl #16
  0x0000000108e4dfcc:   movk    x9, #0x1, lsl #32
  0x0000000108e4dfd0:   blr x9
  0x0000000108e4dfd4:   ldp x2, x3, [sp, #16]
[...]
  0x0000000108e4dff0:   ldp x16, x17, [sp, #128]
  0x0000000108e4dff4:   ldp x19, x20, [sp, #144]
  0x0000000108e4dff8:   ldp x21, x22, [sp, #160]
[...]
```
fg1417 pushed a commit to fg1417/jdk that referenced this pull request Aug 17, 2022
After JDK-8283091, the loop below can be vectorized partially.
Statement 1 can be vectorized but statement 2 can't.
```
// int[] iArr; long[] lArrFld; int i1,i2;
for (i1 = 6; i1 < 227; i1++) {
  iArr[i1] += lArrFld[i1]++; // statement 1
  iArr[i1 + 1] -= (i2++); // statement 2
}
```

But we got incorrect results because the vector packs of iArr are
scheduled incorrectly like:
```
...
load_vector XMM1,[R8 + openjdk#16 + R11 << openjdk#2]
movl    RDI, [R8 + openjdk#20 + R11 << openjdk#2] # int
load_vector XMM2,[R9 + openjdk#8 + R11 << openjdk#3]
subl    RDI, R11    # int
vpaddq  XMM3,XMM2,XMM0  ! add packedL
store_vector [R9 + openjdk#8 + R11 << openjdk#3],XMM3
vector_cast_l2x  XMM2,XMM2  !
vpaddd  XMM1,XMM2,XMM1  ! add packedI
addl    RDI, openjdk#228   # int
movl    [R8 + openjdk#20 + R11 << openjdk#2], RDI # int
movl    RBX, [R8 + openjdk#24 + R11 << openjdk#2] # int
subl    RBX, R11    # int
addl    RBX, openjdk#227   # int
movl    [R8 + openjdk#24 + R11 << openjdk#2], RBX # int
...
movl    RBX, [R8 + openjdk#40 + R11 << openjdk#2] # int
subl    RBX, R11    # int
addl    RBX, openjdk#223   # int
movl    [R8 + openjdk#40 + R11 << openjdk#2], RBX # int
movl    RDI, [R8 + openjdk#44 + R11 << openjdk#2] # int
subl    RDI, R11    # int
addl    RDI, openjdk#222   # int
movl    [R8 + openjdk#44 + R11 << openjdk#2], RDI # int
store_vector [R8 + openjdk#16 + R11 << openjdk#2],XMM1
...
```
simplified as:
```
load_vector iArr in statement 1
unvectorized loads/stores in statement 2
store_vector iArr in statement 1
```
We cannot pick the memory state from the first load for LoadI pack
here, as the LoadI vector operation must load the new values in memory
after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2).
We must take the memory state of the last load where we have assigned
new values ('iArr[i1 + 1] - (i2++)') to the iArr array.

In JDK-8240281, we picked the memory state of the first load. Different
from the scenario in JDK-8240281, the store, which is dependent on an
earlier load here, is in a pack to be scheduled and the LoadI pack
depends on the last_mem. As designed[2], to schedule the StoreI pack,
all memory operations in another single pack should be moved in the same
direction. We know that the store in the pack depends on one of loads in
the LoadI pack, so the LoadI pack should be scheduled before the StoreI
pack. And the LoadI pack depends on the last_mem, so the last_mem must
be scheduled before the LoadI pack and also before the store pack.
Therefore, we need to take the memory state of the last load for the
LoadI pack here.

To fix it, the pack adds additional checks while picking the memory state
of the first load. When the store locates in a pack and the load pack
relies on the last_mem, we shouldn't choose the memory state of the
first load but choose the memory state of the last load.

[1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
[2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232

Jira: ENTLLT-5482
Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8
CustomizedGitHooks: yes
asotona added a commit to asotona/jdk that referenced this pull request Nov 4, 2022
Co-authored-by: Christian Stein <sormuras@gmail.com>
asotona added a commit to asotona/jdk that referenced this pull request Nov 4, 2022
Co-authored-by: Christian Stein <sormuras@gmail.com>
asotona added a commit to asotona/jdk that referenced this pull request Feb 6, 2023
Co-authored-by: Christian Stein <sormuras@gmail.com>
caojoshua pushed a commit to caojoshua/jdk that referenced this pull request Jun 1, 2023
1. bootstrap-version: java -Xcomp -version with PEA. currently, it's still
   broken due to JVM-2007 and JVM-1987.
2. hotspot-tier1: execute tier1 tests with PEA. we still have 50~
   errors.

Co-authored-by: Xin Liu <xxinliu@amazon.com>
robehn pushed a commit to robehn/jdk that referenced this pull request Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants