-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134
Conversation
@llvm/pr-subscribers-clang Author: Jonas Paulsson (JonPsson1) Changes
Original patch rebased. Patch is 83.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73134.diff 12 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index 9ba255745cf2cc5..e4ec338880f2109 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -60,7 +60,7 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
resetDataLayout("E-m:e-i1:8:16-i8:8:16-i64:64-f128:64"
"-v128:64-a:8:16-n32:64");
}
- MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+ MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
HasStrictFP = true;
}
diff --git a/clang/test/CodeGen/SystemZ/atomic-alignment.c b/clang/test/CodeGen/SystemZ/atomic-alignment.c
new file mode 100644
index 000000000000000..da478842ca31b2b
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/atomic-alignment.c
@@ -0,0 +1,35 @@
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O3 -emit-llvm %s -o - | FileCheck %s
+//
+// Test alignment of 128 bit Atomic int/fp types, as well as loading
+// from memory with a simple addition. The fp128 is loaded as i128 and
+// then casted.
+
+// CHECK: @Atomic_int128 = {{.*}} i128 0, align 16
+// CHECK: @Atomic_fp128 = {{.*}} fp128 0xL00000000000000000000000000000000, align 16
+
+// CHECK-LABEL: @f1
+// CHECK: %atomic-load = load atomic i128, ptr @Atomic_int128 seq_cst, align 16
+// CHECK-NEXT: %add = add nsw i128 %atomic-load, 1
+// CHECK-NEXT: store i128 %add, ptr %agg.result, align 8
+// CHECK-NEXT: ret void
+
+// CHECK-LABEL: @f2
+// CHECK: %atomic-load = load atomic i128, ptr @Atomic_fp128 seq_cst, align 16
+// CHECK-NEXT: %0 = bitcast i128 %atomic-load to fp128
+// CHECK-NEXT: %add = fadd fp128 %0, 0xL00000000000000003FFF000000000000
+// CHECK-NEXT: store fp128 %add, ptr %agg.result, align 8
+// CHECK-NEXT: ret void
+
+
+#include <stdatomic.h>
+
+_Atomic __int128 Atomic_int128;
+_Atomic long double Atomic_fp128;
+
+__int128 f1() {
+ return Atomic_int128 + 1;
+}
+
+long double f2() {
+ return Atomic_fp128 + 1.0;
+}
diff --git a/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c
new file mode 100644
index 000000000000000..e3db2063312d2b4
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c
@@ -0,0 +1,257 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O1 -emit-llvm %s -o - | FileCheck %s
+//
+// Test GNU atomic builtins for __int128 aligned to 16 bytes, which should be
+// expanded to LLVM I/R by the front end.
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+__int128 Ptr __attribute__((aligned(16)));
+__int128 Ret __attribute__((aligned(16)));
+__int128 Val __attribute__((aligned(16)));
+__int128 Exp __attribute__((aligned(16)));
+__int128 Des __attribute__((aligned(16)));
+
+// CHECK-LABEL: @f1(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load atomic i128, ptr @Ptr seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT: ret void
+//
+__int128 f1() {
+ return __atomic_load_n(&Ptr, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f2(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load atomic i128, ptr @Ptr seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP0]], ptr @Ret, align 16
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f2() {
+ __atomic_load(&Ptr, &Ret, memory_order_seq_cst);
+ return Ret;
+}
+
+// CHECK-LABEL: @f3(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: store atomic i128 [[TMP0]], ptr @Ptr seq_cst, align 16
+// CHECK-NEXT: ret void
+//
+void f3() {
+ __atomic_store_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f4(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16
+// CHECK-NEXT: store atomic i128 [[TMP0]], ptr @Ptr seq_cst, align 16
+// CHECK-NEXT: ret void
+//
+void f4() {
+ __atomic_store(&Ptr, &Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f5(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw xchg ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f5() {
+ return __atomic_exchange_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f6(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw xchg ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr @Ret, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f6() {
+ __atomic_exchange(&Ptr, &Val, &Ret, memory_order_seq_cst);
+ return Ret;
+}
+
+// CHECK-LABEL: @f7(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Des, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr @Exp, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = cmpxchg ptr @Ptr, i128 [[TMP1]], i128 [[TMP0]] seq_cst seq_cst, align 16
+// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { i128, i1 } [[TMP2]], 1
+// CHECK-NEXT: br i1 [[TMP3]], label [[CMPXCHG_CONTINUE:%.*]], label [[CMPXCHG_STORE_EXPECTED:%.*]]
+// CHECK: cmpxchg.store_expected:
+// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { i128, i1 } [[TMP2]], 0
+// CHECK-NEXT: store i128 [[TMP4]], ptr @Exp, align 16
+// CHECK-NEXT: br label [[CMPXCHG_CONTINUE]]
+// CHECK: cmpxchg.continue:
+// CHECK-NEXT: ret i1 [[TMP3]]
+//
+_Bool f7() {
+ return __atomic_compare_exchange_n(&Ptr, &Exp, Des, 0,
+ memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f8(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Exp, align 16
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr @Des, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = cmpxchg ptr @Ptr, i128 [[TMP0]], i128 [[TMP1]] seq_cst seq_cst, align 16
+// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { i128, i1 } [[TMP2]], 1
+// CHECK-NEXT: br i1 [[TMP3]], label [[CMPXCHG_CONTINUE:%.*]], label [[CMPXCHG_STORE_EXPECTED:%.*]]
+// CHECK: cmpxchg.store_expected:
+// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { i128, i1 } [[TMP2]], 0
+// CHECK-NEXT: store i128 [[TMP4]], ptr @Exp, align 16
+// CHECK-NEXT: br label [[CMPXCHG_CONTINUE]]
+// CHECK: cmpxchg.continue:
+// CHECK-NEXT: ret i1 [[TMP3]]
+//
+_Bool f8() {
+ return __atomic_compare_exchange(&Ptr, &Exp, &Des, 0,
+ memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f9(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw add ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = add i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f9() {
+ return __atomic_add_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f10(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw sub ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = sub i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f10() {
+ return __atomic_sub_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f11(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw and ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f11() {
+ return __atomic_and_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f12(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw xor ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = xor i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f12() {
+ return __atomic_xor_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f13(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw or ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = or i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f13() {
+ return __atomic_or_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f14(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw nand ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: [[TMP3:%.*]] = xor i128 [[TMP2]], -1
+// CHECK-NEXT: store i128 [[TMP3]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f14() {
+ return __atomic_nand_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f15(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw add ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f15() {
+ return __atomic_fetch_add(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f16(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw sub ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f16() {
+ return __atomic_fetch_sub(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f17(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw and ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f17() {
+ return __atomic_fetch_and(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f18(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw xor ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f18() {
+ return __atomic_fetch_xor(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f19(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw or ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f19() {
+ return __atomic_fetch_or(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f20(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = atomicrmw nand ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f20() {
+ return __atomic_fetch_nand(&Ptr, Val, memory_order_seq_cst);
+}
diff --git a/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c
new file mode 100644
index 000000000000000..e38e6572bd58f4e
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c
@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O1 -emit-llvm %s -o - | FileCheck %s
+//
+// Test GNU atomic builtins for __int128 (with default alignment of 8 bytes
+// only), resulting in libcalls.
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+__int128 Ptr;
+__int128 Ret;
+__int128 Val;
+__int128 Exp;
+__int128 Des;
+
+// TODO: This test and several more below have the unnecessary use of an alloca
+// remaining. This is due to 369c9b7, which changes the behavior of the MemCpyOpt
+// pass. It seems that a 'writable' attribute should now be added to the argument
+// in order for this optimization to proceed.
+
+// CHECK-LABEL: @f1(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: call void @__atomic_load(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[ATOMIC_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 8, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f1() {
+ return __atomic_load_n(&Ptr, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f2(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void @__atomic_load(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Ret, i32 noundef signext 5)
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Ret, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f2() {
+ __atomic_load(&Ptr, &Ret, memory_order_seq_cst);
+ return Ret;
+}
+
+// CHECK-LABEL: @f3(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: call void @__atomic_store(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[DOTATOMICTMP]], i32 noundef signext 5)
+// CHECK-NEXT: ret void
+//
+void f3() {
+ __atomic_store_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f4(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void @__atomic_store(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Val, i32 noundef signext 5)
+// CHECK-NEXT: ret void
+//
+void f4() {
+ __atomic_store(&Ptr, &Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f5(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: call void @__atomic_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[DOTATOMICTMP]], ptr noundef nonnull [[ATOMIC_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f5() {
+ return __atomic_exchange_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f6(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void @__atomic_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Val, ptr noundef nonnull @Ret, i32 noundef signext 5)
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Ret, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f6() {
+ __atomic_exchange(&Ptr, &Val, &Ret, memory_order_seq_cst);
+ return Ret;
+}
+
+// CHECK-LABEL: @f7(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Des, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Exp, ptr noundef nonnull [[DOTATOMICTMP]], i32 noundef signext 5, i32 noundef signext 5)
+// CHECK-NEXT: ret i1 [[CALL]]
+//
+_Bool f7() {
+ return __atomic_compare_exchange_n(&Ptr, &Exp, Des, 0,
+ memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f8(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[CALL:%.*]] = tail call zeroext i1 @__atomic_compare_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Exp, ptr noundef nonnull @Des, i32 noundef signext 5, i32 noundef signext 5)
+// CHECK-NEXT: ret i1 [[CALL]]
+//
+_Bool f8() {
+ return __atomic_compare_exchange(&Ptr, &Exp, &Des, 0,
+ memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f9(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: call void @__atomic_fetch_add_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = add i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f9() {
+ return __atomic_add_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f10(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: call void @__atomic_fetch_sub_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = sub i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f10() {
+ return __atomic_sub_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f11(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: call void @__atomic_fetch_and_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT: store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT: ret void
+//
+__int128 f11() {
+ return __atomic_and_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f12(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP:%.*]] = alloca i128, align 8
+// CHECK-...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the is_lock_free results, see inline comments.
What does recent GCC return for those?
Everything else is looking good to me.
|
||
// CHECK-LABEL: @fun_PtrAl16_is_lock_free( | ||
// CHECK-NEXT: entry: | ||
// CHECK-NEXT: [[CALL:%.*]] = tail call zeroext i1 @__atomic_is_lock_free(i64 noundef 16, ptr noundef nonnull @Ptr_Al16) #[[ATTR2]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would have expected true
, assuming the compiler correctly figures out the object is 16-byte aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fun_PtrAl16_is_lock_free:
GCC returns true
// CHECK-NEXT: ret i1 true | ||
// | ||
_Bool fun_noptr_is_lock_free() { | ||
return __atomic_is_lock_free(16, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this seems actually incorrect - when using default assumptions, the operation should not be lock-free ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fun_noptr_is_lock_free:
GCC returns true.
GCC: (GNU) 14.0.0 20231121 (experimental)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems that GCC is returning true for all of these in this test case, probably because the way it is written: the Ptr_Al8 is placed after the 16-byte aligned Ptr_Al16, so Ptr_Al8 also gets the greater (over)alignment.
Maybe I should try these in two separate files instead.
…ons. - Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128. - Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14). - AtomicExpand pass now expands also 16 byte operations. - tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases. - Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c - shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll). - TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend. Original patch rebased. Remove the regalloc handling for CDSG loops. Tests improved.
e392e34
to
0432ffb
Compare
|
||
#include <stdatomic.h> | ||
#include <stdint.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two functions here should return true..?
// CHECK-LABEL: @fun2 | ||
// CHECK: ret i1 true | ||
_Bool fun2() { | ||
return __atomic_is_lock_free(16, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this seems actually incorrect - when using default assumptions, the operation should not be lock-free ...
Not sure about GCC here, as far as I can see, it is returning true for both of the 8-byte aligned cases in gnu-atomic_is_lock_free-i128-8Al.c, also now that the alignment is actually 8... GCC is returning true for all in gnu-atomic_is_lock_free-i128-16Al.c, except for except for emitting a call to __c11_atomic_is_lock_free. |
Sorry for the confusion of the files: the tests for __atomic_is_lock_free() and friends are now in back a single file atomic_is_lock_free-i128.c. The C library call is also included as it also changes behavior with this patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version LGTM now.
Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128.
Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14).
AtomicExpand pass now expands also 16 byte operations.
tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases.
Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c
shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll).
TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend.
Original patch rebased.
Remove the regalloc handling for CDSG loops.
Tests improved.