Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8309953: Strengthen and optimize oopDesc age methods #14456

Closed
wants to merge 3 commits into from

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Jun 13, 2023

See the RFE for discussion. Basically, there is little reason to do two loads of mark word, when we can do one.

Additional testing:

  • Eyeballing generated code
  • Linux x86_64 fastdebug tier1 tier2 tier3
  • Linux AArch64 fastdebug tier1 tier2 tier3

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8309953: Strengthen and optimize oopDesc age methods (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14456/head:pull/14456
$ git checkout pull/14456

Update a local copy of the PR:
$ git checkout pull/14456
$ git pull https://git.openjdk.org/jdk.git pull/14456/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14456

View PR using the GUI difftool:
$ git pr show -t 14456

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14456.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 13, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 13, 2023
@openjdk
Copy link

openjdk bot commented Jun 13, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 13, 2023
@mlbridge
Copy link

mlbridge bot commented Jun 13, 2023

Webrevs

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@openjdk
Copy link

openjdk bot commented Jun 13, 2023

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8309953: Strengthen and optimize oopDesc age methods

Reviewed-by: mdoerr, rkennke, tschatzl, stefank

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • 3e0bbd2: 8285368: Overhaul doc-comment inheritance
  • 3eeb681: 8167252: Some of Charset.availableCharsets() does not contain itself
  • 653a8d0: 8310129: SetupNativeCompilation LIBS should match the order of the other parameters
  • 947f149: 8308444: LoadStoreNode::result_not_used() is too conservative
  • 8b4af46: 8309974: some JVMCI tests fail when VM options include -XX:+EnableJVMCI
  • 0038491: 8309978: [x64] Fix useless padding
  • 5f3613e: 8309960: ParallelGC young collections very slow in DelayInducer
  • 83d9267: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr'
  • de8aca2: 8307907: [ppc] Remove RTM locking implementation
  • 4c0e164: 8309717: C2: Remove Arena::move_contents usage
  • ... and 12 more: https://git.openjdk.org/jdk/compare/bd79db3930f192f6742e29a63a6d1c3bc3dd3385...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 13, 2023
Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a minor performance optimization. I wouldn't claim that this actually fixes a race: this code is only safe at a safepoint to begin with (might even want to assert). If this were called outside of a safepoint, a competing thread might unlock a stack-lock, and reaching to the displaced header would blow up.

@rkennke
Copy link
Contributor

rkennke commented Jun 13, 2023

BTW, GC code is full of idioms where we repeatedly load the mark-word, even though a single load would be better. For example if (o->is_forwarded()) { ... o->forwardee() ... }

@dholmes-ora
Copy link
Member

I think this issue is overstated as the code is not intended to be thread-safe in the way suggested. So it is just a micro-optimisation, the value of which has not been shown, and which makes the source code somewhat clunky IMO.

@shipilev
Copy link
Member Author

shipilev commented Jun 14, 2023

Moving perf observations in a separate comment.

Sample generated code for oopDesc::age can be seen if we turn that method from inline to the regular method:

# Before

000000000080f440 <oopDesc::age>:
  80f440: ff 83 00 d1   sub     sp, sp, #32
  80f444: fd 7b 01 a9   stp     x29, x30, [sp, #16]
  80f448: fd 43 00 91   add     x29, sp, #16
  80f44c: 08 00 40 f9   ldr     x8, [x0]          ; <-- first mark load
  80f450: 89 27 00 d0   adrp    x9, 0xd01000 
  80f454: 1f 20 03 d5   nop     
  80f458: 29 95 4a b9   ldr     w9, [x9, #2708]
  80f45c: 0a 05 40 92   and     x10, x8, #0x3
  80f460: 5f 09 00 f1   cmp     x10, #2
  80f464: ea 17 9f 1a   cset    w10, eq
  80f468: 1f 01 40 f2   tst     x8, #0x1
  80f46c: e8 17 9f 1a   cset    w8, eq
  80f470: 3f 09 00 71   cmp     w9, #2
  80f474: 48 01 88 1a   csel    w8, w10, w8, eq
  80f478: 1f 05 00 71   cmp     w8, #1
  80f47c: 21 01 00 54   b.ne    0x80f4a0
  80f480: 08 00 40 f9   ldr     x8, [x0]          ; <-- second mark load
  80f484: e8 07 00 f9   str     x8, [sp, #8]
  80f488: e0 23 00 91   add     x0, sp, #8
  80f48c: c4 ed fd 97   bl      0x78ab9c
  80f490: 00 18 03 53   ubfx    w0, w0, #3, #4
  80f494: fd 7b 41 a9   ldp     x29, x30, [sp, #16]
  80f498: ff 83 00 91   add     sp, sp, #32
  80f49c: c0 03 5f d6   ret     
  80f4a0: 00 00 40 f9   ldr     x0, [x0]
  80f4a4: 00 18 03 53   ubfx    w0, w0, #3, #4
  80f4a8: fd 7b 41 a9   ldp     x29, x30, [sp, #16]
  80f4ac: ff 83 00 91   add     sp, sp, #32
  80f4b0: c0 03 5f d6   ret    

# After

000000000080f480 <oopDesc::age>:
  80f480: ff 83 00 d1   sub     sp, sp, #32
  80f484: fd 7b 01 a9   stp     x29, x30, [sp, #16]
  80f488: fd 43 00 91   add     x29, sp, #16
  80f48c: 00 00 40 f9   ldr     x0, [x0]          ; <-- load mark once
  80f490: e0 07 00 f9   str     x0, [sp, #8]
  80f494: 88 27 00 d0   adrp    x8, 0xd01000  
  80f498: 1f 20 03 d5   nop     
  80f49c: 08 95 4a b9   ldr     w8, [x8, #2708]
  80f4a0: 09 04 40 92   and     x9, x0, #0x3
  80f4a4: 3f 09 00 f1   cmp     x9, #2
  80f4a8: e9 17 9f 1a   cset    w9, eq
  80f4ac: 1f 00 40 f2   tst     x0, #0x1
  80f4b0: ea 17 9f 1a   cset    w10, eq
  80f4b4: 1f 09 00 71   cmp     w8, #2
  80f4b8: 28 01 8a 1a   csel    w8, w9, w10, eq
  80f4bc: 1f 05 00 71   cmp     w8, #1
  80f4c0: 61 00 00 54   b.ne    0x80f4cc 
  80f4c4: e0 23 00 91   add     x0, sp, #8
  80f4c8: c5 ed fd 97   bl      0x78abdc
  80f4cc: 00 18 03 53   ubfx    w0, w0, #3, #4
  80f4d0: fd 7b 41 a9   ldp     x29, x30, [sp, #16]
  80f4d4: ff 83 00 91   add     sp, sp, #32 
  80f4d8: c0 03 5f d6   ret     

Note how the method suffix also folds, which saves about 24 bytes in instruction stream.

On Linux x86_64, it gives as an edge, for a benchmark that stresses a Serial Full GC:

public class Retain {
	static final int RETAINED = Integer.getInteger("retained", 10_000_000);
	static final int GCS      = Integer.getInteger("gcs", 100);

	static Object[] OBJECTS = new Object[RETAINED];

	public static void main(String... args) {
		for (int t = 0; t < GCS; t++) {
			for (int c = 0; c < RETAINED; c++) {
				OBJECTS[c] = new Object();
			}
			System.gc();
		}
	}
}
% build/linux-x86_64-server-release/images/jdk/bin/java -Xms1g -Xmx1g -XX:+AlwaysPreTouch -Xlog:gc -XX:+UseSerialGC Retain.java

# Before
[0.026s][info][gc] Using Serial
[1.229s][info][gc] GC(0) Pause Full (System.gc()) 235M->193M(989M) 343.793ms
[1.638s][info][gc] GC(1) Pause Full (System.gc()) 356M->195M(989M) 373.058ms
[2.043s][info][gc] GC(2) Pause Full (System.gc()) 349M->200M(989M) 372.158ms
[2.455s][info][gc] GC(3) Pause Full (System.gc()) 356M->193M(989M) 378.475ms
[2.865s][info][gc] GC(4) Pause Full (System.gc()) 346M->193M(989M) 376.314ms
[3.274s][info][gc] GC(5) Pause Full (System.gc()) 347M->193M(989M) 375.874ms
[3.680s][info][gc] GC(6) Pause Full (System.gc()) 348M->193M(989M) 372.843ms
[4.086s][info][gc] GC(7) Pause Full (System.gc()) 349M->193M(989M) 372.976ms
[4.492s][info][gc] GC(8) Pause Full (System.gc()) 349M->193M(989M) 372.891ms
[4.901s][info][gc] GC(9) Pause Full (System.gc()) 349M->193M(989M) 375.721ms

# After
[0.024s][info][gc] Using Serial
[1.219s][info][gc] GC(0) Pause Full (System.gc()) 235M->193M(989M) 337.456ms
[1.620s][info][gc] GC(1) Pause Full (System.gc()) 356M->195M(989M) 366.105ms
[2.020s][info][gc] GC(2) Pause Full (System.gc()) 349M->200M(989M) 365.884ms
[2.425s][info][gc] GC(3) Pause Full (System.gc()) 356M->193M(989M) 371.869ms
[2.828s][info][gc] GC(4) Pause Full (System.gc()) 346M->193M(989M) 369.387ms
[3.231s][info][gc] GC(5) Pause Full (System.gc()) 347M->193M(989M) 369.007ms
[3.630s][info][gc] GC(6) Pause Full (System.gc()) 348M->193M(989M) 366.324ms
[4.030s][info][gc] GC(7) Pause Full (System.gc()) 349M->193M(989M) 366.069ms
[4.429s][info][gc] GC(8) Pause Full (System.gc()) 349M->193M(989M) 365.901ms
[4.831s][info][gc] GC(9) Pause Full (System.gc()) 349M->193M(989M) 368.800ms

So, about +1..2% faster Full GC, for a little change :)

@dholmes-ora
Copy link
Member

The issue is not whether we can construct a benchmark that demonstrates a gain with this kind of micro-optimisation, but whether the micro-optimisation provides sufficient gain in general to trade-off against the reduction in code readability. I'm dubious about the value but will leave it to the GC folk to make the call. Thanks for the updated info.

/label add hotspot-gc

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Jun 15, 2023
@openjdk
Copy link

openjdk bot commented Jun 15, 2023

@dholmes-ora
The hotspot-gc label was successfully added.

@shipilev
Copy link
Member Author

Seems like we are good, as I see approvals from 3 GC engineers :)

/integrate

@openjdk
Copy link

openjdk bot commented Jun 15, 2023

Going to push as commit 4a5475c.
Since your change was applied there have been 23 commits pushed to the master branch:

  • 79ff72a: 8308499: Test vmTestbase/nsk/jdi/MethodExitRequest/addClassExclusionFilter/filter001/TestDescription.java failed: VMDisconnectedException
  • 3e0bbd2: 8285368: Overhaul doc-comment inheritance
  • 3eeb681: 8167252: Some of Charset.availableCharsets() does not contain itself
  • 653a8d0: 8310129: SetupNativeCompilation LIBS should match the order of the other parameters
  • 947f149: 8308444: LoadStoreNode::result_not_used() is too conservative
  • 8b4af46: 8309974: some JVMCI tests fail when VM options include -XX:+EnableJVMCI
  • 0038491: 8309978: [x64] Fix useless padding
  • 5f3613e: 8309960: ParallelGC young collections very slow in DelayInducer
  • 83d9267: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr'
  • de8aca2: 8307907: [ppc] Remove RTM locking implementation
  • ... and 13 more: https://git.openjdk.org/jdk/compare/bd79db3930f192f6742e29a63a6d1c3bc3dd3385...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 15, 2023
@openjdk openjdk bot closed this Jun 15, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 15, 2023
@openjdk
Copy link

openjdk bot commented Jun 15, 2023

@shipilev Pushed as commit 4a5475c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8309953-oopdesc-age branch August 10, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants