[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs #160659

ro-i · 2025-09-25T08:06:58Z

Setting the prescriptiveness of the num_threads clause to 'strict' and having a corresponding check (with message and severity clauses) does not align well with how OpenMP should be handled for GPUs.

The num_threads expression may be an arbitrary integer expression which is evaluated on the target, in correspondance to the OpenMP spec. This prevents the check from being done before launching the kernel, especially considering that the num_threads clause is associated with the parallel directive and that there may be multiple parallel directives with different num_threads clauses in a single target region. Acting on the result of the 'strict' check on the GPU would require doing I/O on the GPU, which can introduce performance regressions. Delaying any actions resulting from the 'strict' check and doing them on the host after executing the target region involves additional data copies and is not really semantically correct.

For now, the 'strict' modifier for the num_threads clause and its associated message and severity clause are set to be unsupported on GPUs. Targets other than GPUs still support the aforementioned features in the context of an OpenMP target region.

Setting the prescriptiveness of the num_threads clause to 'strict' and having a corresponding check (with message and severity clauses) does not align well with how OpenMP should be handled for GPUs. The num_threads expression may be an arbitrary integer expression which is evaluated on the target, in correspondance to the OpenMP spec. This prevents the check from being done before launching the kernel, especially considering that the num_threads clause is associated with the parallel directive and that there may be multiple parallel directives with different num_threads clauses in a single target region. Acting on the result of the 'strict' check on the GPU would require doing I/O on the GPU, which can introduce performance regressions. Delaying any actions resulting from the 'strict' check and doing them on the host after executing the target region involves additional data copies and is not really semantically correct. For now, the 'strict' modifier for the num_threads clause and its associated message and severity clause are set to be unsupported on GPUs. Targets other than GPUs still support the aforementioned features in the context of an OpenMP target region.

llvmbot · 2025-09-25T08:07:38Z

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-clang-codegen

Author: Robert Imschweiler (ro-i)

Changes

Setting the prescriptiveness of the num_threads clause to 'strict' and having a corresponding check (with message and severity clauses) does not align well with how OpenMP should be handled for GPUs.

The num_threads expression may be an arbitrary integer expression which is evaluated on the target, in correspondance to the OpenMP spec. This prevents the check from being done before launching the kernel, especially considering that the num_threads clause is associated with the parallel directive and that there may be multiple parallel directives with different num_threads clauses in a single target region. Acting on the result of the 'strict' check on the GPU would require doing I/O on the GPU, which can introduce performance regressions. Delaying any actions resulting from the 'strict' check and doing them on the host after executing the target region involves additional data copies and is not really semantically correct.

For now, the 'strict' modifier for the num_threads clause and its associated message and severity clause are set to be unsupported on GPUs. Targets other than GPUs still support the aforementioned features in the context of an OpenMP target region.

Patch is 1.47 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160659.diff

14 Files Affected:

(modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+2)
(modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+14-8)
(modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+10-4)
(modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+23-1)
(modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+11-1)
(modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+11-3)
(added) clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp (+108)
(modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+6074-6214)
(removed) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_strict_codegen.cpp (-3541)
(added) clang/test/OpenMP/nvptx_parallel_num_threads_strict_messages.cpp (+108)
(added) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+1828)
(removed) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_strict_codegen.cpp (-1447)
(removed) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_strict_codegen.cpp (-1911)
(modified) llvm/include/llvm/Frontend/OpenMP/OMP.td (+22)

diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index 0bd8a423c393e..9c6f7d717cdd6 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -433,6 +433,8 @@ def err_omp_more_one_clause : Error<
   "directive '#pragma omp %0' cannot contain more than one '%1' clause%select{| with '%3' name modifier| with 'source' dependence}2">;
 def err_omp_required_clause : Error<
   "directive '#pragma omp %0' requires the '%1' clause">;
+def err_omp_gpu_unsupported_clause: Error<"clause '%0' is currently not supported on a GPU">;
+def err_omp_gpu_unsupported_modifier_for_clause: Error<"modifier '%0' is currently not supported on a GPU for the '%1' clause">;
 
 // Static Analyzer Core
 def err_unknown_analyzer_checker_or_package : Error<
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a503aaf613e30..f4962f932aaa8 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -2703,7 +2703,8 @@ llvm::Value *CGOpenMPRuntime::emitForNext(CodeGenFunction &CGF,
 }
 
 llvm::Value *CGOpenMPRuntime::emitMessageClause(CodeGenFunction &CGF,
-                                                const Expr *Message) {
+                                                const Expr *Message,
+                                                SourceLocation Loc) {
   if (!Message)
     return llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
   return CGF.EmitScalarExpr(Message);
@@ -2713,11 +2714,13 @@ llvm::Value *
 CGOpenMPRuntime::emitMessageClause(CodeGenFunction &CGF,
                                    const OMPMessageClause *MessageClause) {
   return emitMessageClause(
-      CGF, MessageClause ? MessageClause->getMessageString() : nullptr);
+      CGF, MessageClause ? MessageClause->getMessageString() : nullptr,
+      MessageClause->getBeginLoc());
 }
 
 llvm::Value *
-CGOpenMPRuntime::emitSeverityClause(OpenMPSeverityClauseKind Severity) {
+CGOpenMPRuntime::emitSeverityClause(OpenMPSeverityClauseKind Severity,
+                                    SourceLocation Loc) {
   // OpenMP 6.0, 10.4: "If no severity clause is specified then the effect is
   // as if sev-level is fatal."
   return llvm::ConstantInt::get(CGM.Int32Ty,
@@ -2727,13 +2730,15 @@ CGOpenMPRuntime::emitSeverityClause(OpenMPSeverityClauseKind Severity) {
 llvm::Value *
 CGOpenMPRuntime::emitSeverityClause(const OMPSeverityClause *SeverityClause) {
   return emitSeverityClause(SeverityClause ? SeverityClause->getSeverityKind()
-                                           : OMPC_SEVERITY_unknown);
+                                           : OMPC_SEVERITY_unknown,
+                            SeverityClause->getBeginLoc());
 }
 
 void CGOpenMPRuntime::emitNumThreadsClause(
     CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
     OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
-    const Expr *Message) {
+    SourceLocation SeverityLoc, const Expr *Message,
+    SourceLocation MessageLoc) {
   if (!CGF.HaveInsertPoint())
     return;
   llvm::SmallVector<llvm::Value *, 4> Args(
@@ -2745,8 +2750,8 @@ void CGOpenMPRuntime::emitNumThreadsClause(
   RuntimeFunction FnID = OMPRTL___kmpc_push_num_threads;
   if (Modifier == OMPC_NUMTHREADS_strict) {
     FnID = OMPRTL___kmpc_push_num_threads_strict;
-    Args.push_back(emitSeverityClause(Severity));
-    Args.push_back(emitMessageClause(CGF, Message));
+    Args.push_back(emitSeverityClause(Severity, SeverityLoc));
+    Args.push_back(emitMessageClause(CGF, Message, MessageLoc));
   }
   CGF.EmitRuntimeCall(
       OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
@@ -12263,7 +12268,8 @@ llvm::Value *CGOpenMPSIMDRuntime::emitForNext(CodeGenFunction &CGF,
 void CGOpenMPSIMDRuntime::emitNumThreadsClause(
     CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
     OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
-    const Expr *Message) {
+    SourceLocation SeverityLoc, const Expr *Message,
+    SourceLocation MessageLoc) {
   llvm_unreachable("Not supported in SIMD-only mode");
 }
 
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index eb04eceee236c..ba76ba6b5f523 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1049,11 +1049,13 @@ class CGOpenMPRuntime {
                                    Address UB, Address ST);
 
   virtual llvm::Value *emitMessageClause(CodeGenFunction &CGF,
-                                         const Expr *Message);
+                                         const Expr *Message,
+                                         SourceLocation Loc);
   virtual llvm::Value *emitMessageClause(CodeGenFunction &CGF,
                                          const OMPMessageClause *MessageClause);
 
-  virtual llvm::Value *emitSeverityClause(OpenMPSeverityClauseKind Severity);
+  virtual llvm::Value *emitSeverityClause(OpenMPSeverityClauseKind Severity,
+                                          SourceLocation Loc);
   virtual llvm::Value *
   emitSeverityClause(const OMPSeverityClause *SeverityClause);
 
@@ -1069,7 +1071,9 @@ class CGOpenMPRuntime {
       CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
       OpenMPNumThreadsClauseModifier Modifier = OMPC_NUMTHREADS_unknown,
       OpenMPSeverityClauseKind Severity = OMPC_SEVERITY_fatal,
-      const Expr *Message = nullptr);
+      SourceLocation SeverityLoc = SourceLocation(),
+      const Expr *Message = nullptr,
+      SourceLocation MessageLoc = SourceLocation());
 
   /// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32
   /// global_tid, int proc_bind) to generate code for 'proc_bind' clause.
@@ -1956,7 +1960,9 @@ class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
       CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
       OpenMPNumThreadsClauseModifier Modifier = OMPC_NUMTHREADS_unknown,
       OpenMPSeverityClauseKind Severity = OMPC_SEVERITY_fatal,
-      const Expr *Message = nullptr) override;
+      SourceLocation SeverityLoc = SourceLocation(),
+      const Expr *Message = nullptr,
+      SourceLocation MessageLoc = SourceLocation()) override;
 
   /// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32
   /// global_tid, int proc_bind) to generate code for 'proc_bind' clause.
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 44a091e1b3c75..f3d3fa359c5f4 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -899,10 +899,32 @@ void CGOpenMPRuntimeGPU::emitProcBindClause(CodeGenFunction &CGF,
   // Nothing to do.
 }
 
+llvm::Value *CGOpenMPRuntimeGPU::emitMessageClause(CodeGenFunction &CGF,
+                                                   const Expr *Message,
+                                                   SourceLocation Loc) {
+  CGM.getDiags().Report(Loc, diag::err_omp_gpu_unsupported_clause)
+      << getOpenMPClauseName(OMPC_message);
+}
+
+llvm::Value *
+CGOpenMPRuntimeGPU::emitSeverityClause(OpenMPSeverityClauseKind Severity,
+                                       SourceLocation Loc) {
+  CGM.getDiags().Report(Loc, diag::err_omp_gpu_unsupported_clause)
+      << getOpenMPClauseName(OMPC_severity);
+}
+
 void CGOpenMPRuntimeGPU::emitNumThreadsClause(
     CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
     OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
-    const Expr *Message) {
+    SourceLocation SeverityLoc, const Expr *Message,
+    SourceLocation MessageLoc) {
+  if (Modifier == OMPC_NUMTHREADS_strict) {
+    CGM.getDiags().Report(Loc,
+                          diag::err_omp_gpu_unsupported_modifier_for_clause)
+        << "strict" << getOpenMPClauseName(OMPC_num_threads);
+    return;
+  }
+
   // Nothing to do.
 }
 
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
index 665221b7d7890..810d6aa082156 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -162,6 +162,14 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
                           llvm::omp::ProcBindKind ProcBind,
                           SourceLocation Loc) override;
 
+  // Currently unsupported on the device.
+  llvm::Value *emitMessageClause(CodeGenFunction &CGF, const Expr *Message,
+                                 SourceLocation Loc) override;
+
+  // Currently unsupported on the device.
+  virtual llvm::Value *emitSeverityClause(OpenMPSeverityClauseKind Severity,
+                                          SourceLocation Loc) override;
+
   /// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32
   /// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'
   /// clause.
@@ -169,7 +177,9 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
       CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
       OpenMPNumThreadsClauseModifier Modifier = OMPC_NUMTHREADS_unknown,
       OpenMPSeverityClauseKind Severity = OMPC_SEVERITY_fatal,
-      const Expr *Message = nullptr) override;
+      SourceLocation SeverityLoc = SourceLocation(),
+      const Expr *Message = nullptr,
+      SourceLocation MessageLoc = SourceLocation()) override;
 
   /// This function ought to emit, in the general case, a call to
   // the openmp runtime kmpc_push_num_teams. In NVPTX backend it is not needed
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index d72cd8fbfd608..ba9c7c60144e6 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1622,22 +1622,30 @@ static void emitCommonOMPParallelDirective(
   // if sev-level is fatal."
   OpenMPSeverityClauseKind Severity = OMPC_SEVERITY_fatal;
   clang::Expr *Message = nullptr;
+  SourceLocation SeverityLoc = SourceLocation();
+  SourceLocation MessageLoc = SourceLocation();
+
   llvm::Function *OutlinedFn =
       CGF.CGM.getOpenMPRuntime().emitParallelOutlinedFunction(
           CGF, S, *CS->getCapturedDecl()->param_begin(), InnermostKind,
           CodeGen);
+
   if (const auto *NumThreadsClause = S.getSingleClause<OMPNumThreadsClause>()) {
     CodeGenFunction::RunCleanupsScope NumThreadsScope(CGF);
     NumThreads = CGF.EmitScalarExpr(NumThreadsClause->getNumThreads(),
                                     /*IgnoreResultAssign=*/true);
     Modifier = NumThreadsClause->getModifier();
-    if (const auto *MessageClause = S.getSingleClause<OMPMessageClause>())
+    if (const auto *MessageClause = S.getSingleClause<OMPMessageClause>()) {
       Message = MessageClause->getMessageString();
-    if (const auto *SeverityClause = S.getSingleClause<OMPSeverityClause>())
+      MessageLoc = MessageClause->getBeginLoc();
+    }
+    if (const auto *SeverityClause = S.getSingleClause<OMPSeverityClause>()) {
       Severity = SeverityClause->getSeverityKind();
+      SeverityLoc = SeverityClause->getBeginLoc();
+    }
     CGF.CGM.getOpenMPRuntime().emitNumThreadsClause(
         CGF, NumThreads, NumThreadsClause->getBeginLoc(), Modifier, Severity,
-        Message);
+        SeverityLoc, Message, MessageLoc);
   }
   if (const auto *ProcBindClause = S.getSingleClause<OMPProcBindClause>()) {
     CodeGenFunction::RunCleanupsScope ProcBindScope(CGF);
diff --git a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp b/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
new file mode 100644
index 0000000000000..0b48e553b5c08
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
@@ -0,0 +1,108 @@
+// RUN: %clang_cc1 -DF1 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
+// RUN: %clang_cc1 -DF1 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
+// RUN: %clang_cc1 -DF2 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
+// RUN: %clang_cc1 -DF2 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
+// RUN: %clang_cc1 -DF3 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
+// RUN: %clang_cc1 -DF3 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
+
+#ifndef TARGET
+// expected-no-diagnostics
+#endif
+
+#ifdef F3
+template<typename tx>
+tx ftemplate(int n) {
+  tx a = 0;
+
+#ifdef TARGET
+  // expected-error@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause}}
+#endif
+  #pragma omp parallel num_threads(strict: tx(20)) severity(fatal) message("msg")
+  {
+  }
+
+  short b = 1;
+#ifdef TARGET
+  // expected-error@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause}}
+#endif
+  #pragma omp parallel num_threads(strict: b) severity(warning) message("msg")
+  {
+    a += b;
+  }
+
+  return a;
+}
+#endif
+
+#ifdef F2
+static
+int fstatic(int n) {
+
+#ifdef TARGET
+  // expected-error@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause}}
+#endif
+  #pragma omp target parallel num_threads(strict: n) message("msg")
+  {
+  }
+
+#ifdef TARGET
+  // expected-error@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause}}
+#endif
+  #pragma omp target parallel num_threads(strict: 32+n) severity(warning)
+  {
+  }
+
+  return n+1;
+}
+#endif
+
+#ifdef F1
+struct S1 {
+  double a;
+
+  int r1(int n){
+    int b = 1;
+
+#ifdef TARGET
+    // expected-error@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause}}
+#endif
+    #pragma omp parallel num_threads(strict: n-b) severity(warning) message("msg")
+    {
+      this->a = (double)b + 1.5;
+    }
+
+#ifdef TARGET
+    // expected-error@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause}}
+#endif
+    #pragma omp parallel num_threads(strict: 1024) severity(fatal)
+    {
+      this->a = 2.5;
+    }
+
+    return (int)a;
+  }
+};
+#endif
+
+int bar(int n){
+  int a = 0;
+
+#ifdef F1
+  #pragma omp target
+  {
+    S1 S;
+    a += S.r1(n);
+  }
+#endif
+
+#ifdef F2
+  a += fstatic(n);
+#endif
+
+#ifdef F3
+  #pragma omp target
+  a += ftemplate<int>(n);
+#endif
+
+  return a;
+}
diff --git a/clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp b/clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp
index 04dd9c0e9c69f..bf979d79fc61b 100644
--- a/clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp
+++ b/clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp
@@ -7,14 +7,6 @@
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
 
-// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK3
-// RUN: %clang_cc1 -DOMP60 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
-// RUN: %clang_cc1 -DOMP60 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK3
-
-// RUN: %clang_cc1 -DOMP60 -verify -fopenmp-simd -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
-// RUN: %clang_cc1 -DOMP60 -fopenmp-simd -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
-// RUN: %clang_cc1 -DOMP60 -fopenmp-simd -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
-
 // RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK5
 // RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK5
@@ -31,14 +23,6 @@
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
 
-// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK11
-// RUN: %clang_cc1 -DOMP60 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
-// RUN: %clang_cc1 -DOMP60 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK11
-
-// RUN: %clang_cc1 -DOMP60 -verify -fopenmp-simd -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
-// RUN: %clang_cc1 -DOMP60 -fopenmp-simd -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
-// RUN: %clang_cc1 -DOMP60 -fopenmp-simd -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
-
 // RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK13
 // RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple aarch64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=...
[truncated]

ro-i · 2025-09-25T12:46:24Z

(Transforms/LoopVectorize/AArch64/epilogue-vectorization-fix-scalar-resume-values.ll is currently broken, unrelated to this PR)

jhuber6

Should this be an error or a warning. Considering that it's basically ignored it might be easier to make it a warning so that the code doesn't refuse to compile when someone's swapping something from the CPU to GPU.

ro-i · 2025-09-25T12:54:31Z

That's a good point. I think the argument for the error was that it makes it clearer that this unsupported. A warning might be easy to overlook for larger builds, which might then lead the user believe that everything went as they expected.

jhuber6 · 2025-09-25T18:32:16Z

We emit warnings for stuff like #pragma omp simd on the GPU, which is also unsupported. Probably best to keep this as a warning.

ro-i · 2025-09-26T12:14:36Z

ok, made it a warning

jhuber6 · 2025-09-26T13:05:12Z

Fix Misc/warning-flags.c

ro-i · 2025-09-26T13:44:32Z

I think OpenMPClauses might be the most appropriate warning group? OpenMPTarget would be the other choice, ig, but it seems to be rather used in other situations

…#160659) Setting the prescriptiveness of the num_threads clause to 'strict' and having a corresponding check (with message and severity clauses) does not align well with how OpenMP should be handled for GPUs. The num_threads expression may be an arbitrary integer expression which is evaluated on the target, in correspondance to the OpenMP spec. This prevents the check from being done before launching the kernel, especially considering that the num_threads clause is associated with the parallel directive and that there may be multiple parallel directives with different num_threads clauses in a single target region. Acting on the result of the 'strict' check on the GPU would require doing I/O on the GPU, which can introduce performance regressions. Delaying any actions resulting from the 'strict' check and doing them on the host after executing the target region involves additional data copies and is not really semantically correct. For now, the 'strict' modifier for the num_threads clause and its associated message and severity clause are set to be unsupported on GPUs. Targets other than GPUs still support the aforementioned features in the context of an OpenMP target region.

ro-i requested review from mjklemm, jhuber6 and kparzysz September 25, 2025 08:06

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. flang:openmp clang:openmp OpenMP related changes to Clang labels Sep 25, 2025

add return nullptr

393665b

ro-i requested review from alexey-bataev and shiltian September 25, 2025 12:46

jhuber6 reviewed Sep 25, 2025

View reviewed changes

error -> warning

00e4724

fix warning group

0d1361a

jhuber6 approved these changes Sep 26, 2025

View reviewed changes

jhuber6 merged commit 814a3a6 into main Sep 26, 2025
9 checks passed

jhuber6 deleted the users/ro-i/omp-gpu-strict-unsupported branch September 26, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs #160659

[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs #160659

Uh oh!

ro-i commented Sep 25, 2025

Uh oh!

llvmbot commented Sep 25, 2025 •

edited

Loading

Uh oh!

ro-i commented Sep 25, 2025

Uh oh!

jhuber6 left a comment

Uh oh!

ro-i commented Sep 25, 2025

Uh oh!

jhuber6 commented Sep 25, 2025

Uh oh!

ro-i commented Sep 26, 2025

Uh oh!

jhuber6 commented Sep 26, 2025

Uh oh!

ro-i commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs #160659

[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs #160659

Uh oh!

Conversation

ro-i commented Sep 25, 2025

Uh oh!

llvmbot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ro-i commented Sep 25, 2025

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

ro-i commented Sep 25, 2025

Uh oh!

jhuber6 commented Sep 25, 2025

Uh oh!

ro-i commented Sep 26, 2025

Uh oh!

jhuber6 commented Sep 26, 2025

Uh oh!

ro-i commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Sep 25, 2025 •

edited

Loading