-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8326385: [aarch64] C2: lightweight locking nodes kill the box register without specifying this effect #18183
Conversation
👋 Welcome back rcastanedalo! A progress list of the required criteria for merging this PR into |
@robcasloz The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
- tier1-7 (linux-aarch64 and macosx-x64) with
-XX:LockingMode=2
.
Guess it meant to say macosx-aarch64
@robcasloz This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 67 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for reviewing, Axel!
Right, good catch, updated in the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a question.
Also, what about the other arches?
@@ -16019,33 +16019,33 @@ instruct cmpFastUnlock(rFlagsReg cr, iRegP object, iRegP box, iRegPNoSp tmp, iRe | |||
ins_pipe(pipe_serial); | |||
%} | |||
|
|||
instruct cmpFastLockLightweight(rFlagsReg cr, iRegP object, iRegP box, iRegPNoSp tmp, iRegPNoSp tmp2) | |||
instruct cmpFastLockLightweight(rFlagsReg cr, iRegP object, iRegP box, iRegPNoSp tmp, iRegPNoSp tmp2, iRegPNoSp tmp3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify the box register at all, if we never use it? It means that the register allocator assigns an actual register to it, right? This could be a problem in workloads that are both locking-intensive and with high register pressure. You may just not see it with dacapo, etc, because aarch64 has so many registers to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is, unfortunately, no ADL construction to specify that an operand such as box
is not used at all. Yes, the register allocator will assign a register to box
. Avoiding this would require fairly intrusive changes in C2 (add lightweight locking-specific, single-input versions of the FastLock
and FastUnlock
nodes and adapt all the C2 logic that deals with them), which I think would be best addressed in a separate RFE (assuming the additional register pressure is a problem in practice). Furthermore, to my understanding, the box operand is likely to be needed again in the context of Lilliput's OMWorld sub-project.
RISC-V and x64/x86 both binds the box to a specific register so it can be (and is) specified as PPC64 (and aarch64 after this pr) uses an extra register allocation, and does not kill the box.
I believe that would require rewriting large parts of the C2 FastLockNode. It is modelled as a CmpNode. |
%{ | ||
predicate(LockingMode == LM_LIGHTWEIGHT); | ||
match(Set cr (FastLock object box)); | ||
effect(TEMP tmp, TEMP tmp2); | ||
effect(TEMP tmp, TEMP tmp2, TEMP tmp3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use box
as the temp instead of intruducing a separate temp?
effect(TEMP tmp, TEMP tmp2, TEMP tmp3); | |
effect(TEMP tmp, TEMP tmp2, TEMP box); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making an input TEMP will crash inside C2 when doing register allocation.
assert(opcnt < numopnds) failed: Accessing non-existent operand
V [libjvm.dylib+0xcc650c] MachNode::in_RegMask(unsigned int) const+0x1f0
V [libjvm.dylib+0x3d136c] PhaseChaitin::gather_lrg_masks(bool)+0x1130
V [libjvm.dylib+0x3cef9c] PhaseChaitin::Register_Allocate()+0x150
V [libjvm.dylib+0x4c3860] Compile::Code_Gen()+0x1f4
V [libjvm.dylib+0x4c17a4] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1388
V [libjvm.dylib+0x38abf0] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1e0
V [libjvm.dylib+0x4df028] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x854
V [libjvm.dylib+0x4de46c] CompileBroker::compiler_thread_loop()+0x348
V [libjvm.dylib+0x8c10e4] JavaThread::thread_main_inner()+0x1dc
V [libjvm.dylib+0x117f7f8] Thread::call_run()+0xf4
V [libjvm.dylib+0xe53724] thread_native_entry(Thread*)+0x138
C [libsystem_pthread.dylib+0x7034] _pthread_start+0x88
Maybe this can be resolved and support for TEMP input registers can be added.
Thanks for reviewing, Axel and Deal! And thanks Axel for trying out Dean's suggestion! |
/integrate |
Going to push as commit 07acc0b.
Your commit was automatically rebased without conflicts. |
@robcasloz Pushed as commit 07acc0b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This changeset introduces a third
TEMP
register for the intermediate computations in thecmpFastLockLightweight
andcmpFastUnlockLightweight
aarch64 ADL instructions, instead of using the legacybox
register. This prevents potential overwrites, and consequent erroneous uses, ofbox
.Introducing a new
TEMP
seems conceptually simpler (and not necessarily worse from a performance perspective) than pre-assigningbox
an arbitrary register and marking it asUSE_KILL
, an alternative also suggested in the JBS issue description. Compared to mainline, the changeset does not lead to any statistically significant regression in a set of locking-intensive benchmarks from DaCapo, Renaissance, SPECjvm2008, and SPECjbb2015.Testing
-XX:LockingMode=2
.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18183/head:pull/18183
$ git checkout pull/18183
Update a local copy of the PR:
$ git checkout pull/18183
$ git pull https://git.openjdk.org/jdk.git pull/18183/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18183
View PR using the GUI difftool:
$ git pr show -t 18183
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18183.diff
Webrev
Link to Webrev Comment