Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8293351: Add second tmp register to aarch64 BarrierSetAssembler::load_at #10161

Closed
wants to merge 6 commits into from

Conversation

xmas92
Copy link
Member

@xmas92 xmas92 commented Sep 5, 2022

Add a second tmp register to the BarrierSetAssembler::load_at GC API for aarch64.

Today G1 and Shenandoah uses a second temporary register. This will also be the case for generational ZGC.

Testing: Oracle platforms tier 1-3


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8293351: Add second tmp register to aarch64 BarrierSetAssembler::load_at

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10161/head:pull/10161
$ git checkout pull/10161

Update a local copy of the PR:
$ git checkout pull/10161
$ git pull https://git.openjdk.org/jdk pull/10161/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10161

View PR using the GUI difftool:
$ git pr show -t 10161

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10161.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 5, 2022

👋 Welcome back aboldtch! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 5, 2022
@openjdk
Copy link

openjdk bot commented Sep 5, 2022

@xmas92 The following labels will be automatically applied to this pull request:

  • hotspot
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Sep 5, 2022
@mlbridge
Copy link

mlbridge bot commented Sep 5, 2022

Webrevs

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is s welcome improvement. Thanks.

@openjdk
Copy link

openjdk bot commented Sep 5, 2022

@xmas92 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8293351: Add second tmp register to aarch64 BarrierSetAssembler::load_at

Reviewed-by: aph, tschatzl, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 98 new commits pushed to the master branch:

  • 37234c8: 8291912: Introduce per-allocation target struct for members in G1PLABAllocator
  • 1e1db5d: 8292591: Experimentally add back barrier-less Java thread transitions
  • dfc16e0: 8292302: Windows GetLastError value overwritten by ThreadLocalStorage::thread
  • 8bd79d3: 8170305: URLConnection doesn't handle HTTP/1.1 1xx (informational) messages
  • 9cd3e35: 4834298: JFileChooser.getSelectedFiles() failed with multi-selection and double-click
  • ec2629c: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON
  • cbee0bc: 8292587: AArch64: Support SVE fabd instruction
  • 68645eb: 8293566: RISC-V: Clean up push and pop registers
  • 526eb54: 8293669: SA: Remove unnecssary "InstanceStackChunkKlass: InstanceStackChunkKlass" output when scanning heap
  • 41ce658: 8292225: Rename ArchiveBuilder APIs related to source and buffered addresses
  • ... and 88 more: https://git.openjdk.org/jdk/compare/e945619ddd38091eaa010f65472141443b8f214d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @tschatzl, @RealFYang) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 5, 2022
@@ -2463,7 +2463,7 @@ void MacroAssembler::verify_heapbase(const char* msg) {
}
#endif

void MacroAssembler::resolve_jobject(Register value, Register thread, Register tmp) {
void MacroAssembler::resolve_jobject(Register value, Register tmp1, Register tmp2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to rename the second parameter for its declaration in file macroAssembler_aarch64.hpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍 . Fixed

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind a few more tweaks? Thanks.

@@ -69,7 +70,7 @@ class G1BarrierSetAssembler: public ModRefBarrierSetAssembler {
#endif

void load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type,
Register dst, Address src, Register tmp1, Register tmp_thread);
Register dst, Address src, Register tmp1, Register tmp2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need similar modification for function definition in file barrierSetAssembler_aarch64.cpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . Fixed

@@ -288,8 +288,7 @@ void BarrierSetAssembler::c2i_entry_barrier(MacroAssembler* masm) {
__ stp(r10, r11, Address(__ pre(sp, -2 * wordSize)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to save r11 then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As resolve_weak_handle may call into the VM r11 may still be clobbered. The reason that rscratch[1/2] was not used previously was because some GC in the resolve_weak_handle call graph used them implicitly. This is all explicit now so rscratch is used instead as it seems like it is the way the GC API has gone, with all registers being explicit.

There is still some room for improvement here with regards to being more consistent with what registers are used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As resolve_weak_handle may call into the VM r11 may still be clobbered. "
-- I don't quite understand this. If this is the case, shouldn't r11 be saved & restored immediate before and after the VM call together with other live caller-save registers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? Actually looking at it a second time it seems like r0-r7 and r9-r17 are pushed to the stack at the vm call I was looking at.
I am probably missing something. I do not fully understand the reason for pushing the registers at this point, not earlier or later. Would have to investigate further. I do however think that such a change belongs to a separate issue/PR. This PR should not change the instruction emission, only which registers are used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that register r11 was used as temp register (and thus gets clobbered) by resolve_weak_handle previously. That's why this register was saved/restored here. But this has changed since it not used here anymore with your change. I am OK if you want to handle that in another PR. Just reminds. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it some further. You are correct. The reason I thought it was important was because this change hard crashed.

+ __ str(r10, Address(__ pre(sp, -wordSize)));
- __ stp(r10, r11, Address(__ pre(sp, -2 * wordSize)));

But I guess it is because I am misaligning the stack or there is some other reason sp cannot be bumped by 8, only 16. As this change is fine.

+ __ str(r10, Address(__ pre(sp, -2 * wordSize)));
- __ stp(r10, r11, Address(__ pre(sp, -2 * wordSize)));

Maybe using the RegSets and push would do this correctly, or it is some special invariant for these adapters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stackpointer needs 128 bit/64 byte alignment. use RegSets and MacroAssembler::push(RegSet, sp);, which will only push pairs of register. In this case: stp(r10, zr, Address(__ pre(sp, -2 * wordSize)));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix using MacroAssembler::push(RegSet, sp);

@@ -46,6 +46,7 @@ class G1BarrierSetAssembler: public ModRefBarrierSetAssembler {
Register pre_val,
Register thread,
Register tmp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use tmp1 as the definition site does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fixed. Made all two temporary register names consistent across g1BarrierSetAssembler

@@ -274,6 +275,7 @@ void G1BarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorator
dst /* pre_val */,
rthread /* thread */,
tmp1 /* tmp */,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tmp1 and tmp2 the comments, to match the paramter names, would also be "/* tmp1 /" and "/ tmp2 */".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

@@ -296,6 +298,7 @@ void G1BarrierSetAssembler::oop_store_at(MacroAssembler* masm, DecoratorSet deco
tmp2 /* pre_val */,
rthread /* thread */,
tmp1 /* tmp */,
rscratch2 /* tmp */,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tmp1 /* tmp1 /," and "rscratch2 / tmp2 */," to match the signature argument names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

@@ -288,8 +288,7 @@ void BarrierSetAssembler::c2i_entry_barrier(MacroAssembler* masm) {
__ stp(r10, r11, Address(__ pre(sp, -2 * wordSize)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stackpointer needs 128 bit/64 byte alignment. use RegSets and MacroAssembler::push(RegSet, sp);, which will only push pairs of register. In this case: stp(r10, zr, Address(__ pre(sp, -2 * wordSize)));

@mlbridge
Copy link

mlbridge bot commented Sep 7, 2022

Mailing list message from Andrew Haley on hotspot-dev:

On 9/6/22 08:28, Axel Boldt-Christmas wrote:

Maybe using the RegSets and `push` would do this correctly

It would. Using a RegSet for push and pop also reduces the risk of error.

@xmas92
Copy link
Member Author

xmas92 commented Sep 8, 2022

@shqking @stooart-mon Your feedback have been addressed with code changes. Is there anything more you would like to add?
@tschatzl @theRealAph @RealFYang The only change since your approval has been to change comments and argument names, as well as changed from stp/ldp to push/pop with RegSet for correctness.

@xmas92
Copy link
Member Author

xmas92 commented Sep 13, 2022

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 13, 2022
@openjdk
Copy link

openjdk bot commented Sep 13, 2022

@xmas92
Your change (at version 1572b08) is now ready to be sponsored by a Committer.

@stefank
Copy link
Member

stefank commented Sep 13, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 13, 2022

Going to push as commit 725f41f.
Since your change was applied there have been 98 commits pushed to the master branch:

  • 37234c8: 8291912: Introduce per-allocation target struct for members in G1PLABAllocator
  • 1e1db5d: 8292591: Experimentally add back barrier-less Java thread transitions
  • dfc16e0: 8292302: Windows GetLastError value overwritten by ThreadLocalStorage::thread
  • 8bd79d3: 8170305: URLConnection doesn't handle HTTP/1.1 1xx (informational) messages
  • 9cd3e35: 4834298: JFileChooser.getSelectedFiles() failed with multi-selection and double-click
  • ec2629c: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON
  • cbee0bc: 8292587: AArch64: Support SVE fabd instruction
  • 68645eb: 8293566: RISC-V: Clean up push and pop registers
  • 526eb54: 8293669: SA: Remove unnecssary "InstanceStackChunkKlass: InstanceStackChunkKlass" output when scanning heap
  • 41ce658: 8292225: Rename ArchiveBuilder APIs related to source and buffered addresses
  • ... and 88 more: https://git.openjdk.org/jdk/compare/e945619ddd38091eaa010f65472141443b8f214d...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 13, 2022
@openjdk openjdk bot closed this Sep 13, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 13, 2022
@openjdk
Copy link

openjdk bot commented Sep 13, 2022

@stefank @xmas92 Pushed as commit 725f41f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

7 participants