Skip to content

Conversation

@robehn
Copy link
Contributor

@robehn robehn commented May 15, 2025

Hi, please consider.

This adds the byte and halfword atomic memory operations (Zabha) - https://github.com/riscv/riscv-zabha.
All amo-instructions, except load-reserve and store-conditional, can also be performed on natural aligned half-words and bytes. (i.e. the extension do not add lr.h/b or sc.h/b) This includes amocas if zacas extension is present.

The majority of this patch is to support amocas.h/b. We are now starting to really feel the pain of all these extensions, as CAS:ing 16/8-bits can now be done in three different ways:

  • lr.w/sc.w 'narrow' CAS (no extension)
  • amocas.w 'narrow' CAS (Zacas)
  • amocas.h/b (Zacas + Zabha)

There is no hwprobe support yet.

Ran t1-3 with Zacas+Zabha and t1 without Zabha in qemu.

Thanks, Robbin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25252/head:pull/25252
$ git checkout pull/25252

Update a local copy of the PR:
$ git checkout pull/25252
$ git pull https://git.openjdk.org/jdk.git pull/25252/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25252

View PR using the GUI difftool:
$ git pr show -t 25252

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25252.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 15, 2025

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 15, 2025

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8356159: RISC-V: Add Zabha

Reviewed-by: fyang, fjiang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 3 new commits pushed to the master branch:

  • 4618374: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value
  • 78a392a: 8356880: ZGC: Backoff in ZLiveMap::reset spin-loop
  • c1a81cf: 8358284: doc/testing.html is not up to date after JDK-8355003

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented May 15, 2025

@robehn The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 15, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label May 15, 2025
@mlbridge
Copy link

mlbridge bot commented May 15, 2025

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! Some minor comments after a cursory look. Thanks.

@RealFYang
Copy link
Member

I witnessed performance regressions after this change on two of my OoO machines (around 7%-10% on P550 and SG2042) when doing a quick specjbb2005 test. There are no Zacas or Zabha extensions. This issue disappear if I revert changes in file src/hotspot/cpu/riscv/assembler_riscv.hpp. I haven't figured out the reason.

@robehn
Copy link
Contributor Author

robehn commented May 19, 2025

I witnessed performance regressions after this change on two of my OoO machines (around 7%-10% on P550 and SG2042) when doing a quick specjbb2005 test. There are no Zacas or Zabha extensions. This issue disappear if I revert changes in file src/hotspot/cpu/riscv/assembler_riscv.hpp. I haven't figured out the reason.

Hmz.. that makes no sense to me :)

@RealFYang
Copy link
Member

RealFYang commented May 19, 2025

Seems there is a difference of encoding of Rs2 for sc_w/d:

Before:
   patch_reg((address)&insn, 15, Rs2);

After:
   patch((address)&insn, 24, 20, Rs2);

Is this correct?

@robehn
Copy link
Contributor Author

robehn commented May 19, 2025

Yes, Rs2 is at 20->24 : https://riscv-software-src.github.io/riscv-unified-db/manual/html/isa/isa_20240411/insts/sc.w.html
We have several instruction where Rs1 and Rs2 have the wrong places.

I don't understand why this is not seen in the gtests, i.e. if there some issue here.

I'll investigat, thanks!

@robehn
Copy link
Contributor Author

robehn commented May 19, 2025

I found it!

void sc_w(Register Rd, Register Rs2, Register Rs1, Aqrl memory_order = aqrl)
void NAME(Register Rd, Register Rs1, Register Rs2, Aqrl memory_order = relaxed)

lr/sc had relaxed as default memory ordering while other atomics had aqrl, I accidentaly upgraded to aqrl.

Which explains why it's working, but slower!

Thanks!

EDIT:

amocas have now default aqrl which seems wrong to have different ordering on cas than lr/sc default.
But I'll not address this here instead just make sure we have same defaults as today.

@RealFYang
Copy link
Member

RealFYang commented May 19, 2025

Yes, Rs2 is at 20->24 : https://riscv-software-src.github.io/riscv-unified-db/manual/html/isa/isa_20240411/insts/sc.w.html We have several instruction where Rs1 and Rs2 have the wrong places.

I think I know what's going on here.
The riscv spec says: SC.W conditionally writes a word in rs2 to the address in rs1.
Even though the encoding of rs1 and rs2 doesn't look correct in jdk head, but the callers of sc_w/d still work by swapping the two params: address and new value. This means that you also need following addon changes in this PR:

diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 2d3edbb1bee..da47edec785 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3823,11 +3823,11 @@ void MacroAssembler::store_conditional(Register dst,
                                        Assembler::Aqrl release) {
   switch (size) {
     case int64:
-      sc_d(dst, new_val, addr, release);
+      sc_d(dst, addr, new_val, release);
       break;
     case int32:
     case uint32:
-      sc_w(dst, new_val, addr, release);
+      sc_w(dst, addr, new_val, release);
       break;
     default:
       ShouldNotReachHere();
@@ -3908,7 +3908,7 @@ void MacroAssembler::cmpxchg_narrow_value(Register addr, Register expected,

     andr(scratch0, result, scratch1); // scratch1 is ~mask
     orr(scratch0, scratch0, new_val);
-    sc_w(scratch0, scratch0, aligned_addr, release);
+    sc_w(scratch0, aligned_addr, scratch0, release);
     bnez(scratch0, retry);
   }

@@ -3980,7 +3980,7 @@ void MacroAssembler::weak_cmpxchg_narrow_value(Register addr, Register expected,

     andr(scratch0, result, scratch1); // scratch1 is ~mask
     orr(scratch0, scratch0, new_val);
-    sc_w(scratch0, scratch0, aligned_addr, release);
+    sc_w(scratch0, aligned_addr, scratch0, release);
     bnez(scratch0, fail);
   }

@RealFYang
Copy link
Member

I found it!

void sc_w(Register Rd, Register Rs2, Register Rs1, Aqrl memory_order = aqrl)
void NAME(Register Rd, Register Rs1, Register Rs2, Aqrl memory_order = relaxed)

lr/sc had relaxed as default memory ordering while other atomics had aqrl, I accidentaly upgraded to aqrl.

Which explains why it's working, but slower!

Thanks!

EDIT:

amocas have now default aqrl which seems wrong to have different ordering on cas than lr/sc default. But I'll not address this here instead just make sure we have same defaults as today.

Seems you can default to aqrl as I didn't see a caller with uses the default value for now.

@robehn
Copy link
Contributor Author

robehn commented May 19, 2025

Yes, Rs2 is at 20->24 : https://riscv-software-src.github.io/riscv-unified-db/manual/html/isa/isa_20240411/insts/sc.w.html We have several instruction where Rs1 and Rs2 have the wrong places.

I think I know what's going on here. The riscv spec says: SC.W conditionally writes a word in rs2 to the address in rs1. Even though the encoding of rs1 and rs2 doesn't look correct in jdk head, but the callers of sc_w/d still work by swapping the two params: address and new value. This means that you also need following addon changes in this PR:

That is why they are already swapped in signature:

void sc_w(Register Rd, Register Rs2, Register Rs1, Aqrl memory_order = aqrl)
void NAME(Register Rd, Register Rs1, Register Rs2, Aqrl memory_order = relaxed)

Otherwise gtests wouldn't work.

@RealFYang
Copy link
Member

RealFYang commented May 19, 2025

Yes, Rs2 is at 20->24 : https://riscv-software-src.github.io/riscv-unified-db/manual/html/isa/isa_20240411/insts/sc.w.html We have several instruction where Rs1 and Rs2 have the wrong places.

I think I know what's going on here. The riscv spec says: SC.W conditionally writes a word in rs2 to the address in rs1. Even though the encoding of rs1 and rs2 doesn't look correct in jdk head, but the callers of sc_w/d still work by swapping the two params: address and new value. This means that you also need following addon changes in this PR:

That is why they are already swapped in signature:

void sc_w(Register Rd, Register Rs2, Register Rs1, Aqrl memory_order = aqrl)
void NAME(Register Rd, Register Rs1, Register Rs2, Aqrl memory_order = relaxed)

Otherwise gtests wouldn't work.

Ah, I see. Maybe it's better to change the caller's param passing to reflect this?
But I still don't understand the performance impact here. I suppose that changing the default memory ordering won't make a difference here? Because I see no one uses the default value for now.

@robehn
Copy link
Contributor Author

robehn commented May 19, 2025

Ah, I see. Maybe it's better to change the caller's param passing to reflect this? But I still don't understand the performance impact here. I suppose that changing the default memory ordering won't make a difference here? Because no one uses the default value for now.

Yes I can change in caller instead. I didn't either find any uses of default memory I'll remove default as no-one uses it.

I'm going to diff assembly see if I find something for the performance issue.

@robehn
Copy link
Contributor Author

robehn commented May 20, 2025

I think I got all. I also took the oppertunity to change the default on lr/sc as we don't use the default.
"aqrl" is a much safer default, using relaxed should be explicit IMHO.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance issue is gone after the update. Several minor comments remain. Thanks.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor comments. Looks good otherwise.

@robehn
Copy link
Contributor Author

robehn commented May 28, 2025

Thanks for keeping review! Let me know!

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version looks good to me. Thanks for the update.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 29, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label May 30, 2025
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good. Thanks!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 30, 2025
@robehn
Copy link
Contributor Author

robehn commented May 30, 2025

Still good. Thanks!

Thank you for your review and patience!

Copy link
Member

@feilongjiang feilongjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@robehn
Copy link
Contributor Author

robehn commented Jun 4, 2025

/integrate

@robehn
Copy link
Contributor Author

robehn commented Jun 4, 2025

Thanks @feilongjiang @RealFYang

@openjdk
Copy link

openjdk bot commented Jun 4, 2025

Going to push as commit dc96160.
Since your change was applied there have been 41 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 4, 2025
@openjdk openjdk bot closed this Jun 4, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 4, 2025
@openjdk
Copy link

openjdk bot commented Jun 4, 2025

@robehn Pushed as commit dc96160.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants