Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] Properly support 16 byte atomic int/fp types and ops. #73134

Merged
merged 5 commits into from
Dec 5, 2023

Conversation

JonPsson1
Copy link
Contributor

  • Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128.

  • Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14).

  • AtomicExpand pass now expands also 16 byte operations.

  • tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases.

  • Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c

  • shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll).

  • TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend.

Original patch rebased.
Remove the regalloc handling for CDSG loops.
Tests improved.

@JonPsson1 JonPsson1 self-assigned this Nov 22, 2023
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 22, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-clang

Author: Jonas Paulsson (JonPsson1)

Changes
  • Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value of 128. It now produces IR instead of calls to __atomic instrinsics for 16 bytes as well. FP loads are first loaded as i128 and then casted to fp128.

  • Atomic __int128 (and long double) variables are aligned to 16 bytes (like gcc 14).

  • AtomicExpand pass now expands also 16 byte operations.

  • tests for __atomic builtins for all integer widths, with test for i128 in both align=8 and align=16 cases.

  • Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free / __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c

  • shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also get expanded by AtomicExpand. The default (and used) shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to integer (see atomicrmw-xchg-07.ll).

  • TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step smaller integer types should also be possible to handle this way instead of in backend.

Original patch rebased.
Remove the regalloc handling for CDSG loops.
Tests improved.


Patch is 83.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73134.diff

12 Files Affected:

  • (modified) clang/lib/Basic/Targets/SystemZ.h (+1-1)
  • (added) clang/test/CodeGen/SystemZ/atomic-alignment.c (+35)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c (+257)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c (+301)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i16.c (+219)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i32.c (+219)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i64.c (+219)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i8.c (+219)
  • (added) clang/test/CodeGen/SystemZ/gnu-atomic_is_lock_free.c (+71)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+5-1)
  • (modified) llvm/test/CodeGen/SystemZ/atomicrmw-ops-i128.ll (+452-44)
  • (modified) llvm/test/CodeGen/SystemZ/atomicrmw-xchg-07.ll (+21-16)
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index 9ba255745cf2cc5..e4ec338880f2109 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -60,7 +60,7 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
       resetDataLayout("E-m:e-i1:8:16-i8:8:16-i64:64-f128:64"
                       "-v128:64-a:8:16-n32:64");
     }
-    MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+    MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
     HasStrictFP = true;
   }
 
diff --git a/clang/test/CodeGen/SystemZ/atomic-alignment.c b/clang/test/CodeGen/SystemZ/atomic-alignment.c
new file mode 100644
index 000000000000000..da478842ca31b2b
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/atomic-alignment.c
@@ -0,0 +1,35 @@
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O3 -emit-llvm %s -o - | FileCheck %s
+//
+// Test alignment of 128 bit Atomic int/fp types, as well as loading
+// from memory with a simple addition. The fp128 is loaded as i128 and
+// then casted.
+
+// CHECK: @Atomic_int128 = {{.*}} i128 0, align 16
+// CHECK: @Atomic_fp128 = {{.*}} fp128 0xL00000000000000000000000000000000, align 16
+
+// CHECK-LABEL:  @f1
+// CHECK:      %atomic-load = load atomic i128, ptr @Atomic_int128 seq_cst, align 16
+// CHECK-NEXT: %add = add nsw i128 %atomic-load, 1
+// CHECK-NEXT: store i128 %add, ptr %agg.result, align 8
+// CHECK-NEXT: ret void
+
+// CHECK-LABEL:  @f2
+// CHECK:      %atomic-load = load atomic i128, ptr @Atomic_fp128 seq_cst, align 16
+// CHECK-NEXT: %0 = bitcast i128 %atomic-load to fp128
+// CHECK-NEXT: %add = fadd fp128 %0, 0xL00000000000000003FFF000000000000
+// CHECK-NEXT: store fp128 %add, ptr %agg.result, align 8
+// CHECK-NEXT: ret void
+
+
+#include <stdatomic.h>
+
+_Atomic __int128    Atomic_int128;
+_Atomic long double Atomic_fp128;
+
+__int128 f1() {
+  return Atomic_int128 + 1;
+}
+
+long double f2() {
+  return Atomic_fp128 + 1.0;
+}
diff --git a/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c
new file mode 100644
index 000000000000000..e3db2063312d2b4
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-16Al.c
@@ -0,0 +1,257 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O1 -emit-llvm %s -o - | FileCheck %s
+//
+// Test GNU atomic builtins for __int128 aligned to 16 bytes, which should be
+// expanded to LLVM I/R by the front end.
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+__int128 Ptr __attribute__((aligned(16)));
+__int128 Ret __attribute__((aligned(16)));
+__int128 Val __attribute__((aligned(16)));
+__int128 Exp __attribute__((aligned(16)));
+__int128 Des __attribute__((aligned(16)));
+
+// CHECK-LABEL: @f1(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load atomic i128, ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
+__int128 f1() {
+  return __atomic_load_n(&Ptr, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load atomic i128, ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP0]], ptr @Ret, align 16
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f2() {
+  __atomic_load(&Ptr, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f3(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store atomic i128 [[TMP0]], ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    ret void
+//
+void f3() {
+  __atomic_store_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f4(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16
+// CHECK-NEXT:    store atomic i128 [[TMP0]], ptr @Ptr seq_cst, align 16
+// CHECK-NEXT:    ret void
+//
+void f4() {
+  __atomic_store(&Ptr, &Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f5(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xchg ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f5() {
+  return __atomic_exchange_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f6(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xchg ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr @Ret, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f6() {
+  __atomic_exchange(&Ptr, &Val, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f7(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Des, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr @Exp, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = cmpxchg ptr @Ptr, i128 [[TMP1]], i128 [[TMP0]] seq_cst seq_cst, align 16
+// CHECK-NEXT:    [[TMP3:%.*]] = extractvalue { i128, i1 } [[TMP2]], 1
+// CHECK-NEXT:    br i1 [[TMP3]], label [[CMPXCHG_CONTINUE:%.*]], label [[CMPXCHG_STORE_EXPECTED:%.*]]
+// CHECK:       cmpxchg.store_expected:
+// CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { i128, i1 } [[TMP2]], 0
+// CHECK-NEXT:    store i128 [[TMP4]], ptr @Exp, align 16
+// CHECK-NEXT:    br label [[CMPXCHG_CONTINUE]]
+// CHECK:       cmpxchg.continue:
+// CHECK-NEXT:    ret i1 [[TMP3]]
+//
+_Bool f7() {
+  return __atomic_compare_exchange_n(&Ptr, &Exp, Des, 0,
+                                     memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Exp, align 16
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr @Des, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = cmpxchg ptr @Ptr, i128 [[TMP0]], i128 [[TMP1]] seq_cst seq_cst, align 16
+// CHECK-NEXT:    [[TMP3:%.*]] = extractvalue { i128, i1 } [[TMP2]], 1
+// CHECK-NEXT:    br i1 [[TMP3]], label [[CMPXCHG_CONTINUE:%.*]], label [[CMPXCHG_STORE_EXPECTED:%.*]]
+// CHECK:       cmpxchg.store_expected:
+// CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { i128, i1 } [[TMP2]], 0
+// CHECK-NEXT:    store i128 [[TMP4]], ptr @Exp, align 16
+// CHECK-NEXT:    br label [[CMPXCHG_CONTINUE]]
+// CHECK:       cmpxchg.continue:
+// CHECK-NEXT:    ret i1 [[TMP3]]
+//
+_Bool f8() {
+  return __atomic_compare_exchange(&Ptr, &Exp, &Des, 0,
+                                   memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f9(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw add ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = add i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f9() {
+  return __atomic_add_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f10(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw sub ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = sub i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f10() {
+  return __atomic_sub_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f11(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw and ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f11() {
+  return __atomic_and_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f12(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xor ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = xor i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f12() {
+  return __atomic_xor_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f13(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw or ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = or i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f13() {
+  return __atomic_or_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f14(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw nand ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    [[TMP3:%.*]] = xor i128 [[TMP2]], -1
+// CHECK-NEXT:    store i128 [[TMP3]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f14() {
+  return __atomic_nand_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f15(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw add ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f15() {
+  return __atomic_fetch_add(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f16(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw sub ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f16() {
+  return __atomic_fetch_sub(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f17(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw and ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f17() {
+  return __atomic_fetch_and(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f18(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw xor ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f18() {
+  return __atomic_fetch_xor(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f19(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw or ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f19() {
+  return __atomic_fetch_or(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f20(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 16, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = atomicrmw nand ptr @Ptr, i128 [[TMP0]] seq_cst, align 16
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f20() {
+  return __atomic_fetch_nand(&Ptr, Val, memory_order_seq_cst);
+}
diff --git a/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c
new file mode 100644
index 000000000000000..e38e6572bd58f4e
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/gnu-atomic-builtins-i128-8Al.c
@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple s390x-linux-gnu -O1 -emit-llvm %s -o - | FileCheck %s
+//
+// Test GNU atomic builtins for __int128 (with default alignment of 8 bytes
+// only), resulting in libcalls.
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+__int128 Ptr;
+__int128 Ret;
+__int128 Val;
+__int128 Exp;
+__int128 Des;
+
+// TODO: This test and several more below have the unnecessary use of an alloca
+// remaining. This is due to 369c9b7, which changes the behavior of the MemCpyOpt
+// pass. It seems that a 'writable' attribute should now be added to the argument
+// in order for this optimization to proceed.
+
+// CHECK-LABEL: @f1(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    call void @__atomic_load(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[ATOMIC_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 8, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f1() {
+  return __atomic_load_n(&Ptr, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__atomic_load(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Ret, i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Ret, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f2() {
+  __atomic_load(&Ptr, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f3(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_store(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[DOTATOMICTMP]], i32 noundef signext 5)
+// CHECK-NEXT:    ret void
+//
+void f3() {
+  __atomic_store_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f4(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__atomic_store(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Val, i32 noundef signext 5)
+// CHECK-NEXT:    ret void
+//
+void f4() {
+  __atomic_store(&Ptr, &Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f5(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull [[DOTATOMICTMP]], ptr noundef nonnull [[ATOMIC_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP1]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f5() {
+  return __atomic_exchange_n(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f6(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__atomic_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Val, ptr noundef nonnull @Ret, i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Ret, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f6() {
+  __atomic_exchange(&Ptr, &Val, &Ret, memory_order_seq_cst);
+  return Ret;
+}
+
+// CHECK-LABEL: @f7(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Des, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[DOTATOMICTMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Exp, ptr noundef nonnull [[DOTATOMICTMP]], i32 noundef signext 5, i32 noundef signext 5)
+// CHECK-NEXT:    ret i1 [[CALL]]
+//
+_Bool f7() {
+  return __atomic_compare_exchange_n(&Ptr, &Exp, Des, 0,
+                                     memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call zeroext i1 @__atomic_compare_exchange(i64 noundef 16, ptr noundef nonnull @Ptr, ptr noundef nonnull @Exp, ptr noundef nonnull @Des, i32 noundef signext 5, i32 noundef signext 5)
+// CHECK-NEXT:    ret i1 [[CALL]]
+//
+_Bool f8() {
+  return __atomic_compare_exchange(&Ptr, &Exp, &Des, 0,
+                                   memory_order_seq_cst, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f9(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_fetch_add_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = add i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f9() {
+  return __atomic_add_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f10(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_fetch_sub_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = sub i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f10() {
+  return __atomic_sub_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f11(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr @Val, align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store i128 [[TMP0]], ptr [[INDIRECT_ARG_TEMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    call void @__atomic_fetch_and_16(ptr nonnull sret(i128) align 8 [[TMP]], ptr noundef nonnull @Ptr, ptr noundef nonnull [[INDIRECT_ARG_TEMP]], i32 noundef signext 5)
+// CHECK-NEXT:    [[TMP1:%.*]] = load i128, ptr [[TMP]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = and i128 [[TMP1]], [[TMP0]]
+// CHECK-NEXT:    store i128 [[TMP2]], ptr [[AGG_RESULT:%.*]], align 8, !tbaa [[TBAA2]]
+// CHECK-NEXT:    ret void
+//
+__int128 f11() {
+  return __atomic_and_fetch(&Ptr, Val, memory_order_seq_cst);
+}
+
+// CHECK-LABEL: @f12(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP:%.*]] = alloca i128, align 8
+// CHECK-...
[truncated]

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the is_lock_free results, see inline comments.

What does recent GCC return for those?

Everything else is looking good to me.


// CHECK-LABEL: @fun_PtrAl16_is_lock_free(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[CALL:%.*]] = tail call zeroext i1 @__atomic_is_lock_free(i64 noundef 16, ptr noundef nonnull @Ptr_Al16) #[[ATTR2]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would have expected true, assuming the compiler correctly figures out the object is 16-byte aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun_PtrAl16_is_lock_free:
GCC returns true

// CHECK-NEXT: ret i1 true
//
_Bool fun_noptr_is_lock_free() {
return __atomic_is_lock_free(16, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this seems actually incorrect - when using default assumptions, the operation should not be lock-free ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun_noptr_is_lock_free:
GCC returns true.

GCC: (GNU) 14.0.0 20231121 (experimental)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems that GCC is returning true for all of these in this test case, probably because the way it is written: the Ptr_Al8 is placed after the 16-byte aligned Ptr_Al16, so Ptr_Al8 also gets the greater (over)alignment.

Maybe I should try these in two separate files instead.

JonPsson and others added 2 commits November 24, 2023 14:49
…ons.

- Clang FE now has MaxAtomicPromoteWidth and MaxAtomicInlineWidth with a value
  of 128. It now produces IR instead of calls to __atomic instrinsics for 16
  bytes as well. FP loads are first loaded as i128 and then casted to fp128.
- Atomic __int128 (and long double) variables are aligned to 16 bytes
  (like gcc 14).
- AtomicExpand pass now expands also 16 byte operations.

- tests for __atomic builtins for all integer widths, with test for i128 in
  both align=8 and align=16 cases.
- Resulting behavior of __atomic_is_lock_free / __atomic_always_lock_free /
  __c11_atomic_is_lock_free is tested in gnu-atomic_is_lock_free.c
- shouldExpandAtomicRMWInIR() was already returning true for any FP type. Now
  that the backend is acepting 16 byte atomics, 16 byte FP atomicrmw:s now also
  get expanded by AtomicExpand. The default (and used)
  shouldCastAtomicRMWIInIR() says that if the type is FP, it is casted to
  integer (see atomicrmw-xchg-07.ll).
- TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s.
  As a next step smaller integer types should also be possible to handle this
  way instead of in backend.

Original patch rebased.
Remove the regalloc handling for CDSG loops.
Tests improved.

#include <stdatomic.h>
#include <stdint.h>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first two functions here should return true..?

// CHECK-LABEL: @fun2
// CHECK: ret i1 true
_Bool fun2() {
return __atomic_is_lock_free(16, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this seems actually incorrect - when using default assumptions, the operation should not be lock-free ...

@JonPsson1
Copy link
Contributor Author

Not sure about GCC here, as far as I can see, it is returning true for both of the 8-byte aligned cases in gnu-atomic_is_lock_free-i128-8Al.c, also now that the alignment is actually 8...

GCC is returning true for all in gnu-atomic_is_lock_free-i128-16Al.c, except for except for emitting a call to __c11_atomic_is_lock_free.

@JonPsson1
Copy link
Contributor Author

Sorry for the confusion of the files: the tests for __atomic_is_lock_free() and friends are now in back a single file atomic_is_lock_free-i128.c. The C library call is also included as it also changes behavior with this patch.
Current test output in CHECKs, with previous problems remaining.
I see that these can be called for long double / Atomic as well, and clang is now changing output with the patch. GCC however seems to return 'true' for any such calls, regardless of alignment / type, so I suspect this is not really expected to work. IIUC, the builtins are only intended for use with integral types. The library call may not be limited to ints. Waiting with these until later.

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version LGTM now.

@JonPsson1 JonPsson1 merged commit c568927 into llvm:main Dec 5, 2023
3 checks passed
@JonPsson1 JonPsson1 deleted the Atomic_i128 branch December 5, 2023 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants