Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoongArch] Improve codegen for atomic cmpxchg ops #69339

Merged
merged 1 commit into from
Oct 19, 2023

Conversation

SixWeining
Copy link
Contributor

PR #67391 improved atomic codegen by handling memory ordering specified by the cmpxchg instruction. An acquire barrier needs to be generated when memory ordering includes an acquire operation. This PR improves the codegen further by only handling the failure ordering.

PR llvm#67391 improved atomic codegen by handling memory ordering specified
by the `cmpxchg` instruction. An acquire barrier needs to be generated
when memory ordering includes an acquire operation. This PR improves the
codegen further by only handling the failure ordering.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 17, 2023

@llvm/pr-subscribers-backend-loongarch

Author: Lu Weining (SixWeining)

Changes

PR #67391 improved atomic codegen by handling memory ordering specified by the cmpxchg instruction. An acquire barrier needs to be generated when memory ordering includes an acquire operation. This PR improves the codegen further by only handling the failure ordering.


Full diff: https://github.com/llvm/llvm-project/pull/69339.diff

4 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp (+2-2)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+4-3)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+46-9)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll (+4-4)
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
index b348cb56c136199..18a532b55ee5a92 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
@@ -571,11 +571,11 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
     BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   }
 
-  AtomicOrdering Ordering =
+  AtomicOrdering FailureOrdering =
       static_cast<AtomicOrdering>(MI.getOperand(IsMasked ? 6 : 5).getImm());
   int hint;
 
-  switch (Ordering) {
+  switch (FailureOrdering) {
   case AtomicOrdering::Acquire:
   case AtomicOrdering::AcquireRelease:
   case AtomicOrdering::SequentiallyConsistent:
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index ecdbaf92e56c1a9..334daccab1e8ba0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -4184,8 +4184,9 @@ LoongArchTargetLowering::shouldExpandAtomicCmpXchgInIR(
 Value *LoongArchTargetLowering::emitMaskedAtomicCmpXchgIntrinsic(
     IRBuilderBase &Builder, AtomicCmpXchgInst *CI, Value *AlignedAddr,
     Value *CmpVal, Value *NewVal, Value *Mask, AtomicOrdering Ord) const {
-  Value *Ordering =
-      Builder.getIntN(Subtarget.getGRLen(), static_cast<uint64_t>(Ord));
+  AtomicOrdering FailOrd = CI->getFailureOrdering();
+  Value *FailureOrdering =
+      Builder.getIntN(Subtarget.getGRLen(), static_cast<uint64_t>(FailOrd));
 
   // TODO: Support cmpxchg on LA32.
   Intrinsic::ID CmpXchgIntrID = Intrinsic::loongarch_masked_cmpxchg_i64;
@@ -4196,7 +4197,7 @@ Value *LoongArchTargetLowering::emitMaskedAtomicCmpXchgIntrinsic(
   Function *MaskedCmpXchg =
       Intrinsic::getDeclaration(CI->getModule(), CmpXchgIntrID, Tys);
   Value *Result = Builder.CreateCall(
-      MaskedCmpXchg, {AlignedAddr, CmpVal, NewVal, Mask, Ordering});
+      MaskedCmpXchg, {AlignedAddr, CmpVal, NewVal, Mask, FailureOrdering});
   Result = Builder.CreateTrunc(Result, Builder.getInt32Ty());
   return Result;
 }
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
index a892ae65f610d03..3f67494bb284ae6 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
@@ -1814,7 +1814,7 @@ def PseudoMaskedAtomicLoadMin32 : PseudoMaskedAMMinMax;
 
 class PseudoCmpXchg
     : Pseudo<(outs GPR:$res, GPR:$scratch),
-             (ins GPR:$addr, GPR:$cmpval, GPR:$newval, grlenimm:$ordering)> {
+             (ins GPR:$addr, GPR:$cmpval, GPR:$newval, grlenimm:$fail_order)> {
   let Constraints = "@earlyclobber $res,@earlyclobber $scratch";
   let mayLoad = 1;
   let mayStore = 1;
@@ -1828,7 +1828,7 @@ def PseudoCmpXchg64 : PseudoCmpXchg;
 def PseudoMaskedCmpXchg32
     : Pseudo<(outs GPR:$res, GPR:$scratch),
              (ins GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask,
-              grlenimm:$ordering)> {
+              grlenimm:$fail_order)> {
   let Constraints = "@earlyclobber $res,@earlyclobber $scratch";
   let mayLoad = 1;
   let mayStore = 1;
@@ -1846,6 +1846,43 @@ class AtomicPat<Intrinsic intrin, Pseudo AMInst>
     : Pat<(intrin GPR:$addr, GPR:$incr, GPR:$mask, timm:$ordering),
           (AMInst GPR:$addr, GPR:$incr, GPR:$mask, timm:$ordering)>;
 
+// These atomic cmpxchg PatFrags only care about the failure ordering.
+// The PatFrags defined by multiclass `ternary_atomic_op_ord` in
+// TargetSelectionDAG.td care about the merged memory ordering that is the
+// stronger one between success and failure. But for LoongArch LL-SC we only
+// need to care about the failure ordering as explained in PR #67391. So we
+// define these PatFrags that will be used to define cmpxchg pats below.
+multiclass ternary_atomic_op_failure_ord {
+  def NAME#_failure_monotonic : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+      (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+    AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+    return Ordering == AtomicOrdering::Monotonic;
+  }]>;
+  def NAME#_failure_acquire : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+      (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+    AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+    return Ordering == AtomicOrdering::Acquire;
+  }]>;
+  def NAME#_failure_release : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+      (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+    AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+    return Ordering == AtomicOrdering::Release;
+  }]>;
+  def NAME#_failure_acq_rel : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+      (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+    AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+    return Ordering == AtomicOrdering::AcquireRelease;
+  }]>;
+  def NAME#_failure_seq_cst : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+      (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+    AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+    return Ordering == AtomicOrdering::SequentiallyConsistent;
+  }]>;
+}
+
+defm atomic_cmp_swap_32 : ternary_atomic_op_failure_ord;
+defm atomic_cmp_swap_64 : ternary_atomic_op_failure_ord;
+
 let Predicates = [IsLA64] in {
 def : AtomicPat<int_loongarch_masked_atomicrmw_xchg_i64,
                 PseudoMaskedAtomicSwap32>;
@@ -1908,24 +1945,24 @@ def : AtomicPat<int_loongarch_masked_atomicrmw_umin_i64,
 // AtomicOrdering.h.
 multiclass PseudoCmpXchgPat<string Op, Pseudo CmpXchgInst,
                             ValueType vt = GRLenVT> {
-  def : Pat<(vt (!cast<PatFrag>(Op#"_monotonic") GPR:$addr, GPR:$cmp, GPR:$new)),
+  def : Pat<(vt (!cast<PatFrag>(Op#"_failure_monotonic") GPR:$addr, GPR:$cmp, GPR:$new)),
             (CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 2)>;
-  def : Pat<(vt (!cast<PatFrag>(Op#"_acquire") GPR:$addr, GPR:$cmp, GPR:$new)),
+  def : Pat<(vt (!cast<PatFrag>(Op#"_failure_acquire") GPR:$addr, GPR:$cmp, GPR:$new)),
             (CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 4)>;
-  def : Pat<(vt (!cast<PatFrag>(Op#"_release") GPR:$addr, GPR:$cmp, GPR:$new)),
+  def : Pat<(vt (!cast<PatFrag>(Op#"_failure_release") GPR:$addr, GPR:$cmp, GPR:$new)),
             (CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 5)>;
-  def : Pat<(vt (!cast<PatFrag>(Op#"_acq_rel") GPR:$addr, GPR:$cmp, GPR:$new)),
+  def : Pat<(vt (!cast<PatFrag>(Op#"_failure_acq_rel") GPR:$addr, GPR:$cmp, GPR:$new)),
             (CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 6)>;
-  def : Pat<(vt (!cast<PatFrag>(Op#"_seq_cst") GPR:$addr, GPR:$cmp, GPR:$new)),
+  def : Pat<(vt (!cast<PatFrag>(Op#"_failure_seq_cst") GPR:$addr, GPR:$cmp, GPR:$new)),
             (CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 7)>;
 }
 
 defm : PseudoCmpXchgPat<"atomic_cmp_swap_32", PseudoCmpXchg32>;
 defm : PseudoCmpXchgPat<"atomic_cmp_swap_64", PseudoCmpXchg64, i64>;
 def : Pat<(int_loongarch_masked_cmpxchg_i64
-            GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$ordering),
+            GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$fail_order),
           (PseudoMaskedCmpXchg32
-            GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$ordering)>;
+            GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$fail_order)>;
 
 def : PseudoMaskedAMMinMaxPat<int_loongarch_masked_atomicrmw_max_i64,
                               PseudoMaskedAtomicLoadMax32>;
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
index 76f9ebed0d93ba4..417c865f6383ffb 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
@@ -129,7 +129,7 @@ define void @cmpxchg_i8_acquire_monotonic(ptr %ptr, i8 %cmp, i8 %val) nounwind {
 ; LA64-NEXT:    beqz $a5, .LBB4_1
 ; LA64-NEXT:    b .LBB4_4
 ; LA64-NEXT:  .LBB4_3:
-; LA64-NEXT:    dbar 20
+; LA64-NEXT:    dbar 1792
 ; LA64-NEXT:  .LBB4_4:
 ; LA64-NEXT:    ret
   %res = cmpxchg ptr %ptr, i8 %cmp, i8 %val acquire monotonic
@@ -162,7 +162,7 @@ define void @cmpxchg_i16_acquire_monotonic(ptr %ptr, i16 %cmp, i16 %val) nounwin
 ; LA64-NEXT:    beqz $a5, .LBB5_1
 ; LA64-NEXT:    b .LBB5_4
 ; LA64-NEXT:  .LBB5_3:
-; LA64-NEXT:    dbar 20
+; LA64-NEXT:    dbar 1792
 ; LA64-NEXT:  .LBB5_4:
 ; LA64-NEXT:    ret
   %res = cmpxchg ptr %ptr, i16 %cmp, i16 %val acquire monotonic
@@ -181,7 +181,7 @@ define void @cmpxchg_i32_acquire_monotonic(ptr %ptr, i32 %cmp, i32 %val) nounwin
 ; LA64-NEXT:    beqz $a4, .LBB6_1
 ; LA64-NEXT:    b .LBB6_4
 ; LA64-NEXT:  .LBB6_3:
-; LA64-NEXT:    dbar 20
+; LA64-NEXT:    dbar 1792
 ; LA64-NEXT:  .LBB6_4:
 ; LA64-NEXT:    ret
   %res = cmpxchg ptr %ptr, i32 %cmp, i32 %val acquire monotonic
@@ -200,7 +200,7 @@ define void @cmpxchg_i64_acquire_monotonic(ptr %ptr, i64 %cmp, i64 %val) nounwin
 ; LA64-NEXT:    beqz $a4, .LBB7_1
 ; LA64-NEXT:    b .LBB7_4
 ; LA64-NEXT:  .LBB7_3:
-; LA64-NEXT:    dbar 20
+; LA64-NEXT:    dbar 1792
 ; LA64-NEXT:  .LBB7_4:
 ; LA64-NEXT:    ret
   %res = cmpxchg ptr %ptr, i64 %cmp, i64 %val acquire monotonic

@SixWeining
Copy link
Contributor Author

@xen0n @xry111 What do you think?

Copy link
Member

@heiher heiher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@SixWeining SixWeining merged commit 78abc45 into llvm:main Oct 19, 2023
4 checks passed
@SixWeining SixWeining deleted the cmpxchg-failure-only branch October 26, 2023 11:10
wangliu-iscas pushed a commit to plctlab/patchwork-gcc that referenced this pull request Nov 14, 2023
This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL: <memory-barrier> + <load-exclusive>
- SC: <store-conditional> + <memory-barrier>

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:llvm/llvm-project#67391
[2]:llvm/llvm-project#69339

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_memmodel_needs_release_fence): Remove.
	(loongarch_cas_failure_memorder_needs_acquire): New static
	function.
	(loongarch_print_operand): Redefine 'G' for the barrier on CAS
	failure.
	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	Remove the redundant barrier before the LL instruction, and
	emit an acquire barrier on failure if needed by
	failure_memorder.
	(atomic_cas_value_cmp_and_7_<mode>): Likewise.
	(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
	before the LL instruction.
	(atomic_cas_value_sub_7_<mode>): Likewise.
	(atomic_cas_value_and_7_<mode>): Likewise.
	(atomic_cas_value_xor_7_<mode>): Likewise.
	(atomic_cas_value_or_7_<mode>): Likewise.
	(atomic_cas_value_nand_7_<mode>): Likewise.
	(atomic_cas_value_exchange_7_<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/cas-acquire.c: New test.
nstester pushed a commit to nstester/gcc that referenced this pull request Nov 15, 2023
This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL: <memory-barrier> + <load-exclusive>
- SC: <store-conditional> + <memory-barrier>

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:llvm/llvm-project#67391
[2]:llvm/llvm-project#69339

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_memmodel_needs_release_fence): Remove.
	(loongarch_cas_failure_memorder_needs_acquire): New static
	function.
	(loongarch_print_operand): Redefine 'G' for the barrier on CAS
	failure.
	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	Remove the redundant barrier before the LL instruction, and
	emit an acquire barrier on failure if needed by
	failure_memorder.
	(atomic_cas_value_cmp_and_7_<mode>): Likewise.
	(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
	before the LL instruction.
	(atomic_cas_value_sub_7_<mode>): Likewise.
	(atomic_cas_value_and_7_<mode>): Likewise.
	(atomic_cas_value_xor_7_<mode>): Likewise.
	(atomic_cas_value_or_7_<mode>): Likewise.
	(atomic_cas_value_nand_7_<mode>): Likewise.
	(atomic_cas_value_exchange_7_<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/cas-acquire.c: New test.
kraj pushed a commit to kraj/gcc that referenced this pull request Nov 16, 2023
This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL: <memory-barrier> + <load-exclusive>
- SC: <store-conditional> + <memory-barrier>

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:llvm/llvm-project#67391
[2]:llvm/llvm-project#69339

Backported for fixing the acquire semantic issue which is known to
cause troubles on LA664.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_memmodel_needs_release_fence): Remove.
	(loongarch_cas_failure_memorder_needs_acquire): New static
	function.
	(loongarch_print_operand): Redefine 'G' for the barrier on CAS
	failure.
	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	Remove the redundant barrier before the LL instruction, and
	emit an acquire barrier on failure if needed by
	failure_memorder.
	(atomic_cas_value_cmp_and_7_<mode>): Likewise.
	(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
	before the LL instruction.
	(atomic_cas_value_sub_7_<mode>): Likewise.
	(atomic_cas_value_and_7_<mode>): Likewise.
	(atomic_cas_value_xor_7_<mode>): Likewise.
	(atomic_cas_value_or_7_<mode>): Likewise.
	(atomic_cas_value_nand_7_<mode>): Likewise.
	(atomic_cas_value_exchange_7_<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/cas-acquire.c: New test.

(cherry picked from commit 4d86dc5)
kraj pushed a commit to kraj/gcc that referenced this pull request Nov 16, 2023
This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL: <memory-barrier> + <load-exclusive>
- SC: <store-conditional> + <memory-barrier>

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:llvm/llvm-project#67391
[2]:llvm/llvm-project#69339

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_memmodel_needs_release_fence): Remove.
	(loongarch_cas_failure_memorder_needs_acquire): New static
	function.
	(loongarch_print_operand): Redefine 'G' for the barrier on CAS
	failure.
	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	Remove the redundant barrier before the LL instruction, and
	emit an acquire barrier on failure if needed by
	failure_memorder.
	(atomic_cas_value_cmp_and_7_<mode>): Likewise.
	(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
	before the LL instruction.
	(atomic_cas_value_sub_7_<mode>): Likewise.
	(atomic_cas_value_and_7_<mode>): Likewise.
	(atomic_cas_value_xor_7_<mode>): Likewise.
	(atomic_cas_value_or_7_<mode>): Likewise.
	(atomic_cas_value_nand_7_<mode>): Likewise.
	(atomic_cas_value_exchange_7_<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/cas-acquire.c: New test.

(cherry picked from commit 4d86dc5)
eebssk1 pushed a commit to eebssk1/m_gcc that referenced this pull request Nov 28, 2023
This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL: <memory-barrier> + <load-exclusive>
- SC: <store-conditional> + <memory-barrier>

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:llvm/llvm-project#67391
[2]:llvm/llvm-project#69339

Backported for fixing the acquire semantic issue which is known to
cause troubles on LA664.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_memmodel_needs_release_fence): Remove.
	(loongarch_cas_failure_memorder_needs_acquire): New static
	function.
	(loongarch_print_operand): Redefine 'G' for the barrier on CAS
	failure.
	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	Remove the redundant barrier before the LL instruction, and
	emit an acquire barrier on failure if needed by
	failure_memorder.
	(atomic_cas_value_cmp_and_7_<mode>): Likewise.
	(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
	before the LL instruction.
	(atomic_cas_value_sub_7_<mode>): Likewise.
	(atomic_cas_value_and_7_<mode>): Likewise.
	(atomic_cas_value_xor_7_<mode>): Likewise.
	(atomic_cas_value_or_7_<mode>): Likewise.
	(atomic_cas_value_nand_7_<mode>): Likewise.
	(atomic_cas_value_exchange_7_<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/cas-acquire.c: New test.

(cherry picked from commit 4d86dc5)
Blackhex pushed a commit to Windows-on-ARM-Experiments/gcc-woarm64 that referenced this pull request Dec 18, 2023
This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL: <memory-barrier> + <load-exclusive>
- SC: <store-conditional> + <memory-barrier>

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:llvm/llvm-project#67391
[2]:llvm/llvm-project#69339

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_memmodel_needs_release_fence): Remove.
	(loongarch_cas_failure_memorder_needs_acquire): New static
	function.
	(loongarch_print_operand): Redefine 'G' for the barrier on CAS
	failure.
	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	Remove the redundant barrier before the LL instruction, and
	emit an acquire barrier on failure if needed by
	failure_memorder.
	(atomic_cas_value_cmp_and_7_<mode>): Likewise.
	(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
	before the LL instruction.
	(atomic_cas_value_sub_7_<mode>): Likewise.
	(atomic_cas_value_and_7_<mode>): Likewise.
	(atomic_cas_value_xor_7_<mode>): Likewise.
	(atomic_cas_value_or_7_<mode>): Likewise.
	(atomic_cas_value_nand_7_<mode>): Likewise.
	(atomic_cas_value_exchange_7_<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/cas-acquire.c: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants