[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134

JonPsson1 · 2023-11-22T15:10:18Z

Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128.
Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14).
AtomicExpand pass now expands also 16 byte operations.
tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases.
Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c
shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll).
TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend.

Original patch rebased.
Remove the regalloc handling for CDSG loops.
Tests improved.

llvmbot · 2023-11-22T15:10:48Z

@llvm/pr-subscribers-clang

Author: Jonas Paulsson (JonPsson1)

Changes

Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128.
Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14).
AtomicExpand pass now expands also 16 byte operations.
tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases.
Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c
shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll).
TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend.

Original patch rebased.
Remove the regalloc handling for CDSG loops.
Tests improved.

Patch is 83.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73134.diff

12 Files Affected:

(modified) clang/lib/Basic/Targets/SystemZ.h (+1-1)
(added) clang/test/CodeGen/SystemZ/atomic-alignment.c (+35)
(added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c (+257)
(added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c (+301)
(added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i16.c (+219)
(added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i32.c (+219)
(added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i64.c (+219)
(added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i8.c (+219)
(added) clang/test/CodeGen/SystemZ/gnu-atomic_is_lock_free.c (+71)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+5-1)
(modified) llvm/test/CodeGen/SystemZ/atomicrmw-ops-i128.ll (+452-44)
(modified) llvm/test/CodeGen/SystemZ/atomicrmw-xchg-07.ll (+21-16)

diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index 9ba255745cf2cc5..e4ec338880f2109 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -60,7 +60,7 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
       resetDataLayout("E-m:e-i1:8:16-i8:8:16-i64:64-f128:64"
                       "-v128:64-a:8:16-n32:64");
     }
-    MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+    MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
     HasStrictFP = true;
   }
 
diff --git a/clang/test/CodeGen/SystemZ/atomic-alignment.c b/clang/test/CodeGen/SystemZ/atomic-alignment.c
new file mode 100644
index 000000000000000..da478842ca31b2b
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/atomic-alignment.c
@@ -0,0 +1,35 @@
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O3 -emit-llvm %s -o - | FileCheck %s
+//
+// Test alignment of 128 bit Atomic int/fp types, as well as loading
+// from memory with a simple addition. The fp128 is loaded as i128 and
+// then casted.
+
+// CHECK: @Atomic_int128 = {{.*}} i128 0, align 16
+// CHECK: @Atomic_fp128 = {{.*}} fp128 0xL00000000000000000000000000000000, align 16
+
+// CHECK-LABEL:  @f1
+// CHECK:      %atomic-load = load atomic i128, ptr @Atomic_int128 seq_cst, align 16
+// CHECK-NEXT: %add = add nsw i128 %atomic-load, 1
+// CHECK-NEXT: store i128 %add, ptr %agg.result, align 8
+// CHECK-NEXT: ret void
+
+// CHECK-LABEL:  @f2
+// CHECK:      %atomic-load = load atomic i128, ptr @Atomic_fp128 seq_cst, align 16
+// CHECK-NEXT: %0 = bitcast i128 %atomic-load to fp128
+// CHECK-NEXT: %add = fadd fp128 %0, 0xL00000000000000003FFF000000000000
+// CHECK-NEXT: store fp128 %add, ptr %agg.result, align 8
+// CHECK-NEXT: ret void
+
+
+#include <stdatomic.h>
+
+_Atomic __int128    Atomic_int128;
+_Atomic long double Atomic_fp128;
+
+__int128 f1() {
+  return Atomic_int128 + 1;
+}
+
+long double f2() {
+  return Atomic_fp128 + 1.0;
+}
diff --git a/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c
new file mode 100644
index 000000000000000..e3db2063312d2b4
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c
@@ -0,0 +1,257 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O1 -emit-llvm %s -o - | FileCheck %s
+//
+// Test GNU atomic builtins for __int128 aligned to 16 bytes, which should be
+// expanded to LLVM I/R by the front end.
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+__int128 Ptr __attribute__((aligned(16)));
+__int128 Ret __attribute__((aligned(16)));
+__int128 Val __attribute__((aligned(16)));
+__int128 Exp __attribute__((aligned(16)));
+__int128 Des __attribute__((aligned(16)));
+
+// CHECK-LABEL: @f1(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load atomic i128, ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
+__int128 f1() {
+  return __atomic_load_n(&Ptr, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load atomic i128, ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP0]], ptr @Ret, align 16
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f2() {
+  __atomic_load(&Ptr, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f3(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store atomic i128 [[TMP0]], ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    ret void
+//
+void f3() {
+  __atomic_store_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f4(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16
+// CHECK-NEXT:    store atomic i128 [[TMP0]], ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    ret void
+//
+void f4() {
+  __atomic_store(&Ptr, &Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f5(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xchg ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f5() {
+  return __atomic_exchange_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f6(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xchg ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr @Ret, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f6() {
+  __atomic_exchange(&Ptr, &Val, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f7(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Des, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr @Exp, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = cmpxchg ptr @Ptr, i128 [[TMP1]], i128 [[TMP0]] seq_cst seq_cst, align 16
+// CHECK-NEXT:    [[TMP3:%.*]] = extractvalue { i128, i1 } [[TMP2]], 1
+// CHECK-NEXT:    br i1 [[TMP3]], label [[CMPXCHG_CONTINUE:%.*]], label [[CMPXCHG_STORE_EXPECTED:%.*]]
+// CHECK:       cmpxchg.store_expected:
+// CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { i128, i1 } [[TMP2]], 0
+// CHECK-NEXT:    store i128 [[TMP4]], ptr @Exp, align 16
+// CHECK-NEXT:    br label [[CMPXCHG_CONTINUE]]
+// CHECK:       cmpxchg.continue:
+// CHECK-NEXT:    ret i1 [[TMP3]]
+//
+_Bool f7() {
+  return __atomic_compare_exchange_n(&Ptr, &Exp, Des, 0,
+                                     memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Exp, align 16
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr @Des, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = cmpxchg ptr @Ptr, i128 [[TMP0]], i128 [[TMP1]] seq_cst seq_cst, align 16
+// CHECK-NEXT:    [[TMP3:%.*]] = extractvalue { i128, i1 } [[TMP2]], 1
+// CHECK-NEXT:    br i1 [[TMP3]], label [[CMPXCHG_CONTINUE:%.*]], label [[CMPXCHG_STORE_EXPECTED:%.*]]
+// CHECK:       cmpxchg.store_expected:
+// CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { i128, i1 } [[TMP2]], 0
+// CHECK-NEXT:    store i128 [[TMP4]], ptr @Exp, align 16
+// CHECK-NEXT:    br label [[CMPXCHG_CONTINUE]]
+// CHECK:       cmpxchg.continue:
+// CHECK-NEXT:    ret i1 [[TMP3]]
+//
+_Bool f8() {
+  return __atomic_compare_exchange(&Ptr, &Exp, &Des, 0,
+                                   memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f9(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw add ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = add i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f9() {
+  return __atomic_add_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f10(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw sub ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = sub i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f10() {
+  return __atomic_sub_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f11(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw and ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f11() {
+  return __atomic_and_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f12(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xor ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = xor i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f12() {
+  return __atomic_xor_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f13(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw or ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = or i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f13() {
+  return __atomic_or_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f14(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw nand ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    [[TMP3:%.*]] = xor i128 [[TMP2]], -1
+// CHECK-NEXT:    store i128 [[TMP3]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f14() {
+  return __atomic_nand_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f15(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw add ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f15() {
+  return __atomic_fetch_add(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f16(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw sub ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f16() {
+  return __atomic_fetch_sub(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f17(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw and ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f17() {
+  return __atomic_fetch_and(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f18(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xor ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f18() {
+  return __atomic_fetch_xor(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f19(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw or ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f19() {
+  return __atomic_fetch_or(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f20(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw nand ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f20() {
+  return __atomic_fetch_nand(&Ptr, Val, memory_order_seq_cst);
+}
diff --git a/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c
new file mode 100644
index 000000000000000..e38e6572bd58f4e
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c
@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O1 -emit-llvm %s -o - | FileCheck %s
+//
+// Test GNU atomic builtins for __int128 (with default alignment of 8 bytes
+// only), resulting in libcalls.
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+__int128 Ptr;
+__int128 Ret;
+__int128 Val;
+__int128 Exp;
+__int128 Des;
+
+// TODO: This test and several more below have the unnecessary use of an alloca
+// remaining. This is due to 369c9b7, which changes the behavior of the MemCpyOpt
+// pass. It seems that a 'writable' attribute should now be added to the argument
+// in order for this optimization to proceed.
+
+// CHECK-LABEL: @f1(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    call void @__atomic_load(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[ATOMIC_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 8, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f1() {
+  return __atomic_load_n(&Ptr, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__atomic_load(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Ret, i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Ret, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f2() {
+  __atomic_load(&Ptr, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f3(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_store(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[DOTATOMICTMP]], i32 noundef signext 5)
+// CHECK-NEXT:    ret void
+//
+void f3() {
+  __atomic_store_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f4(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__atomic_store(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Val, i32 noundef signext 5)
+// CHECK-NEXT:    ret void
+//
+void f4() {
+  __atomic_store(&Ptr, &Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f5(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[DOTATOMICTMP]], ptr noundef nonnull [[ATOMIC_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f5() {
+  return __atomic_exchange_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f6(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__atomic_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Val, ptr noundef nonnull @Ret, i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Ret, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f6() {
+  __atomic_exchange(&Ptr, &Val, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f7(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Des, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Exp, ptr noundef nonnull [[DOTATOMICTMP]], i32 noundef signext 5, i32 noundef signext 5)
+// CHECK-NEXT:    ret i1 [[CALL]]
+//
+_Bool f7() {
+  return __atomic_compare_exchange_n(&Ptr, &Exp, Des, 0,
+                                     memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call zeroext i1 @__atomic_compare_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Exp, ptr noundef nonnull @Des, i32 noundef signext 5, i32 noundef signext 5)
+// CHECK-NEXT:    ret i1 [[CALL]]
+//
+_Bool f8() {
+  return __atomic_compare_exchange(&Ptr, &Exp, &Des, 0,
+                                   memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f9(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_fetch_add_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = add i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f9() {
+  return __atomic_add_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f10(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_fetch_sub_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = sub i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f10() {
+  return __atomic_sub_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f11(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_fetch_and_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f11() {
+  return __atomic_and_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f12(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-...
[truncated]

uweigand

Not sure about the is_lock_free results, see inline comments.

What does recent GCC return for those?

Everything else is looking good to me.

uweigand · 2023-11-24T12:22:28Z

clang/test/CodeGen/SystemZ/gnu-atomic_is_lock_free.c

+
+// CHECK-LABEL: @fun_PtrAl16_is_lock_free(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call zeroext i1 @__atomic_is_lock_free(i64 noundef 16, ptr noundef nonnull @Ptr_Al16) #[[ATTR2]]


Here I would have expected true, assuming the compiler correctly figures out the object is 16-byte aligned.

fun_PtrAl16_is_lock_free:
GCC returns true

uweigand · 2023-11-24T13:00:31Z

clang/test/CodeGen/SystemZ/gnu-atomic_is_lock_free.c

+// CHECK-NEXT:    ret i1 true
+//
+_Bool fun_noptr_is_lock_free() {
+  return __atomic_is_lock_free(16, 0);


But this seems actually incorrect - when using default assumptions, the operation should not be lock-free ...

fun_noptr_is_lock_free:
GCC returns true.

GCC: (GNU) 14.0.0 20231121 (experimental)

Hmm, it seems that GCC is returning true for all of these in this test case, probably because the way it is written: the Ptr_Al8 is placed after the 16-byte aligned Ptr_Al16, so Ptr_Al8 also gets the greater (over)alignment.

Maybe I should try these in two separate files instead.

…ons. - Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128. - Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14). - AtomicExpand pass now expands also 16 byte operations. - tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases. - Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c - shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll). - TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend. Original patch rebased. Remove the regalloc handling for CDSG loops. Tests improved.

JonPsson1 · 2023-11-24T14:14:31Z

clang/test/CodeGen/SystemZ/gnu-atomic_is_lock_free-i128-16Al.c

+
+#include <stdatomic.h>
+#include <stdint.h>
+


The first two functions here should return true..?

JonPsson1 · 2023-11-24T14:18:13Z

clang/test/CodeGen/SystemZ/gnu-atomic_is_lock_free-i128-16Al.c

+// CHECK-LABEL: @fun2
+// CHECK:       ret i1 true
+_Bool fun2() {
+  return __atomic_is_lock_free(16, 0);


But this seems actually incorrect - when using default assumptions, the operation should not be lock-free ...

JonPsson1 · 2023-11-24T14:21:04Z

Not sure about GCC here, as far as I can see, it is returning true for both of the 8-byte aligned cases in gnu-atomic_is_lock_free-i128-8Al.c, also now that the alignment is actually 8...

GCC is returning true for all in gnu-atomic_is_lock_free-i128-16Al.c, except for except for emitting a call to __c11_atomic_is_lock_free.

JonPsson1 · 2023-11-25T10:34:49Z

Sorry for the confusion of the files: the tests for __atomic_is_lock_free() and friends are now in back a single file atomic_is_lock_free-i128.c. The C library call is also included as it also changes behavior with this patch.
Current test output in CHECKs, with previous problems remaining.
I see that these can be called for long double / Atomic as well, and clang is now changing output with the patch. GCC however seems to return 'true' for any such calls, regardless of alignment / type, so I suspect this is not really expected to work. IIUC, the builtins are only intended for use with integral types. The library call may not be limited to ints. Waiting with these until later.

uweigand

This version LGTM now.

JonPsson1 requested review from uweigand and iii-i November 22, 2023 15:10

JonPsson1 self-assigned this Nov 22, 2023

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 22, 2023

uweigand reviewed Nov 24, 2023

View reviewed changes

JonPsson and others added 2 commits November 24, 2023 14:49

Rework testing of gnu-atomic_is_lock_free

0432ffb

JonPsson1 force-pushed the Atomic_i128 branch from e392e34 to 0432ffb Compare November 24, 2023 14:13

JonPsson1 commented Nov 24, 2023

View reviewed changes

Rework testing again, adding the C library call as well.

41c8fc6

JonPsson1 added 2 commits November 27, 2023 18:14

Fix testing (IP)

56f321d

Rework testing again.

63d4bd5

uweigand approved these changes Dec 4, 2023

View reviewed changes

JonPsson1 merged commit c568927 into llvm:main Dec 5, 2023
3 checks passed

JonPsson1 deleted the Atomic_i128 branch December 5, 2023 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134

[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134

JonPsson1 commented Nov 22, 2023

llvmbot commented Nov 22, 2023

uweigand left a comment •

edited

uweigand Nov 24, 2023

JonPsson1 Nov 24, 2023

uweigand Nov 24, 2023

JonPsson1 Nov 24, 2023

JonPsson1 Nov 24, 2023

JonPsson1 Nov 24, 2023

JonPsson1 Nov 24, 2023

JonPsson1 commented Nov 24, 2023

JonPsson1 commented Nov 25, 2023

uweigand left a comment


		#include <stdatomic.h>
		#include <stdint.h>

[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134

[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134

Conversation

JonPsson1 commented Nov 22, 2023

llvmbot commented Nov 22, 2023

uweigand left a comment • edited

Choose a reason for hiding this comment

uweigand Nov 24, 2023

Choose a reason for hiding this comment

JonPsson1 Nov 24, 2023

Choose a reason for hiding this comment

uweigand Nov 24, 2023

Choose a reason for hiding this comment

JonPsson1 Nov 24, 2023

Choose a reason for hiding this comment

JonPsson1 Nov 24, 2023

Choose a reason for hiding this comment

JonPsson1 Nov 24, 2023

Choose a reason for hiding this comment

JonPsson1 Nov 24, 2023

Choose a reason for hiding this comment

JonPsson1 commented Nov 24, 2023

JonPsson1 commented Nov 25, 2023

uweigand left a comment

Choose a reason for hiding this comment

uweigand left a comment •

edited