[flang][NVPTX] Add initial support to the NVPTX target #71992

fabianmcg · 2023-11-10T22:24:34Z

This patch adds initial support to the NVPTX target, enabling flang to produce OpenMP offload code for NVPTX targets.

This patch adds initial support to the NVPTX target, enabling Flang to produce OpenMP offload code for NVPTX targets.

llvmbot · 2023-11-10T22:25:02Z

@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-codegen

Author: Fabian Mora (fabianmcg)

Changes

This patch adds initial support to the NVPTX target, enabling flang to produce OpenMP offload code for NVPTX targets.

Patch is 20.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71992.diff

15 Files Affected:

(modified) flang/lib/Frontend/FrontendActions.cpp (+41)
(modified) flang/lib/Optimizer/CodeGen/Target.cpp (+30)
(modified) flang/test/Driver/omp-driver-offload.f90 (+42)
(modified) flang/test/Fir/target-rewrite-boxchar.fir (+1)
(modified) flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90 (+3-1)
(modified) flang/test/Lower/OpenMP/FIR/target_cpu_features.f90 (+4-1)
(modified) flang/test/Lower/OpenMP/omp-is-gpu.f90 (+3-1)
(modified) flang/test/Lower/OpenMP/target_cpu_features.f90 (+3-1)
(modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90 (+1-2)
(modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90 (+1-2)
(modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90 (+1-2)
(modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90 (+1-2)
(modified) openmp/libomptarget/test/offloading/fortran/basic_target_region.f90 (+1-2)
(modified) openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90 (+1-2)
(modified) openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90 (+1-2)

diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 73c00c8679c7ec6..18e312469f3ee63 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -175,6 +175,45 @@ getExplicitAndImplicitAMDGPUTargetFeatures(CompilerInstance &ci,
   return llvm::join(featuresVec, ",");
 }
 
+// Get feature string which represents combined explicit target features
+// for NVPTX and the target features specified by the user/
+// TODO: Have a more robust target conf like `clang/lib/Basic/Targets/NVPTX.cpp`
+static std::string
+getExplicitAndImplicitNVPTXTargetFeatures(CompilerInstance &ci,
+                                          const TargetOptions &targetOpts,
+                                          const llvm::Triple triple) {
+  llvm::StringRef cpu = targetOpts.cpu;
+  llvm::StringMap<bool> implicitFeaturesMap;
+  std::string errorMsg;
+  bool ptxVer = false;
+
+  // Add target features specified by the user
+  for (auto &userFeature : targetOpts.featuresAsWritten) {
+    llvm::StringRef userKeyString(llvm::StringRef(userFeature).drop_front(1));
+    implicitFeaturesMap[userKeyString.str()] = (userFeature[0] == '+');
+    // Check if the user provided a PTX version
+    if (userKeyString.startswith("ptx"))
+      ptxVer = true;
+  }
+
+  // Set the default PTX version to `ptx61` if none was provided.
+  // TODO: set the default PTX version based on the chip.
+  if (!ptxVer)
+    implicitFeaturesMap["ptx61"] = true;
+
+  // Set the compute capability.
+  implicitFeaturesMap[cpu.str()] = true;
+
+  llvm::SmallVector<std::string> featuresVec;
+  for (auto &implicitFeatureItem : implicitFeaturesMap) {
+    featuresVec.push_back((llvm::Twine(implicitFeatureItem.second ? "+" : "-") +
+                           implicitFeatureItem.first().str())
+                              .str());
+  }
+  llvm::sort(featuresVec);
+  return llvm::join(featuresVec, ",");
+}
+
 // Produces the string which represents target feature
 static std::string getTargetFeatures(CompilerInstance &ci) {
   const TargetOptions &targetOpts = ci.getInvocation().getTargetOpts();
@@ -188,6 +227,8 @@ static std::string getTargetFeatures(CompilerInstance &ci) {
   // them to the target features specified by the user
   if (triple.isAMDGPU()) {
     return getExplicitAndImplicitAMDGPUTargetFeatures(ci, targetOpts, triple);
+  } else if (triple.isNVPTX()) {
+    return getExplicitAndImplicitNVPTXTargetFeatures(ci, targetOpts, triple);
   }
   return llvm::join(targetOpts.featuresAsWritten.begin(),
                     targetOpts.featuresAsWritten.end(), ",");
diff --git a/flang/lib/Optimizer/CodeGen/Target.cpp b/flang/lib/Optimizer/CodeGen/Target.cpp
index 83e7fa9b440bed2..bb893277cb4d21d 100644
--- a/flang/lib/Optimizer/CodeGen/Target.cpp
+++ b/flang/lib/Optimizer/CodeGen/Target.cpp
@@ -621,6 +621,33 @@ struct TargetAMDGPU : public GenericTarget<TargetAMDGPU> {
 };
 } // namespace
 
+//===----------------------------------------------------------------------===//
+// NVPTX linux target specifics.
+//===----------------------------------------------------------------------===//
+
+namespace {
+struct TargetNVPTX : public GenericTarget<TargetNVPTX> {
+  using GenericTarget::GenericTarget;
+
+  // Default size (in bits) of the index type for strings.
+  static constexpr int defaultWidth = 64;
+
+  CodeGenSpecifics::Marshalling
+  complexArgumentType(mlir::Location loc, mlir::Type eleTy) const override {
+    CodeGenSpecifics::Marshalling marshal;
+    TODO(loc, "handle complex argument types");
+    return marshal;
+  }
+
+  CodeGenSpecifics::Marshalling
+  complexReturnType(mlir::Location loc, mlir::Type eleTy) const override {
+    CodeGenSpecifics::Marshalling marshal;
+    TODO(loc, "handle complex return types");
+    return marshal;
+  }
+};
+} // namespace
+
 //===----------------------------------------------------------------------===//
 // LoongArch64 linux target specifics.
 //===----------------------------------------------------------------------===//
@@ -708,6 +735,9 @@ fir::CodeGenSpecifics::get(mlir::MLIRContext *ctx, llvm::Triple &&trp,
   case llvm::Triple::ArchType::amdgcn:
     return std::make_unique<TargetAMDGPU>(ctx, std::move(trp),
                                           std::move(kindMap));
+  case llvm::Triple::ArchType::nvptx64:
+    return std::make_unique<TargetNVPTX>(ctx, std::move(trp),
+                                         std::move(kindMap));
   case llvm::Triple::ArchType::loongarch64:
     return std::make_unique<TargetLoongArch64>(ctx, std::move(trp),
                                                std::move(kindMap));
diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90
index bfdc3f6f4d4726b..ad50723b0e3a795 100644
--- a/flang/test/Driver/omp-driver-offload.f90
+++ b/flang/test/Driver/omp-driver-offload.f90
@@ -67,6 +67,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-threads-oversubscription \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-threads-oversubscription \
+! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
 ! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -74,6 +79,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-teams-oversubscription  \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-teams-oversubscription  \
+! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS
 ! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -81,6 +91,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-no-nested-parallelism  \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-no-nested-parallelism  \
+! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR
 ! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -88,6 +103,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-no-thread-state \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-no-thread-state \
+! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE
 ! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -95,6 +115,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-target-debug \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-target-debug \
+! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
 ! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -102,6 +127,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-target-debug \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-target-debug \
+! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
 ! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90"
 
 ! RUN: %flang -S -### %s -o %t 2>&1 \
@@ -111,6 +141,13 @@
 ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \
 ! RUN: -fopenmp-assume-no-thread-state \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL
+! RUN: %flang -S -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-target-debug -fopenmp-assume-threads-oversubscription \
+! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \
+! RUN: -fopenmp-assume-no-thread-state \
+! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL
 ! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription"
 ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism"
 ! CHECK-RTL-ALL: {{.*}}.f90"
@@ -120,6 +157,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-version=45 \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-version=45 \
+! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION
 ! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90"
 
 ! Test diagnostic error when host IR file is non-existent 
diff --git a/flang/test/Fir/target-rewrite-boxchar.fir b/flang/test/Fir/target-rewrite-boxchar.fir
index e66fa6041630380..b87cb35b46eb6c4 100644
--- a/flang/test/Fir/target-rewrite-boxchar.fir
+++ b/flang/test/Fir/target-rewrite-boxchar.fir
@@ -3,6 +3,7 @@
 // RUN: fir-opt --target-rewrite="target=aarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefix=INT64
 // RUN: fir-opt --target-rewrite="target=powerpc64le-unknown-linux-gnu" %s | FileCheck %s --check-prefix=INT64
 // RUN: fir-opt --target-rewrite="target=amdgcn-amd-amdhsa" %s | FileCheck %s --check-prefix=INT64
+// RUN: fir-opt --target-rewrite="target=nvptx64-nvidia-cuda" %s | FileCheck %s --check-prefix=INT64
 // RUN: fir-opt --target-rewrite="target=loongarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefix=INT64
 
 // Test that we rewrite the signatures and bodies of functions that take boxchar
diff --git a/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90 b/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90
index b702fc2c5a7e253..ac8d24974801570 100644
--- a/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90
+++ b/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90
@@ -1,9 +1,11 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 
 !RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -triple nvptx64-nvidia-cuda -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
 !RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-fir -o - %s | FileCheck %s
 
 !RUN: not %flang_fc1 -triple amdgcn-amd-amdhsa -emit-fir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
+!RUN: not %flang_fc1 -triple nvptx64-nvidia-cuda -emit-fir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
 !RUN: not bbc -fopenmp -fopenmp-is-gpu -emit-fir %s -o - 2>&1 | FileCheck %s --check-prefix=BBC-ERROR
 
 !CHECK: module attributes {{{.*}}omp.is_gpu = true
diff --git a/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90 b/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90
index c6159342c023aa4..179b71b3f0cfa5c 100644
--- a/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90
+++ b/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90
@@ -1,5 +1,7 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 !RUN: %flang_fc1 -emit-fir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -emit-hlfir -triple nvptx64-nvidia-cuda -target-cpu sm_80 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck --check-prefix=NVPTX %s
+
 
 !===============================================================================
 ! Target_Enter Simple
@@ -10,6 +12,7 @@
 !CHECK-SAME: +dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,
 !CHECK-SAME: +gfx8-insts,+gfx9-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,
 !CHECK-SAME: +wavefrontsize64">
+!NVPTX: omp.target = #omp.target<target_cpu = "sm_80", target_features = "+ptx61,+sm_80">
 !CHECK-LABEL: func.func @_QPomp_target_simple()
 subroutine omp_target_simple
   ! Directive needed to prevent subroutine from being filtered out when
diff --git a/flang/test/Lower/OpenMP/omp-is-gpu.f90 b/flang/test/Lower/OpenMP/omp-is-gpu.f90
index 12d0e4e869fba5a..3e6daeb522d7789 100644
--- a/flang/test/Lower/OpenMP/omp-is-gpu.f90
+++ b/flang/test/Lower/OpenMP/omp-is-gpu.f90
@@ -1,9 +1,11 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 
 !RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -triple nvptx64-nvidia-cuda -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
 !RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -o - %s | FileCheck %s
 
 !RUN: not %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
+!RUN: not %flang_fc1 -triple nvptx64-nvidia-cuda -emit-hlfir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
 !RUN: not bbc -fopenmp -fopenmp-is-gpu -emit-hlfir %s -o - 2>&1 | FileCheck %s --check-prefix=BBC-ERROR
 
 !CHECK: module attributes {{{.*}}omp.is_gpu = true
diff --git a/flang/test/Lower/OpenMP/target_cpu_features.f90 b/flang/test/Lower/OpenMP/target_cpu_features.f90
index 46fb14efad5c03c..ea1e5e38fca88ef 100644
--- a/flang/test/Lower/OpenMP/target_cpu_features.f90
+++ b/flang/test/Lower/OpenMP/target_cpu_features.f90
@@ -1,5 +1,6 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -emit-hlfir -triple nvptx64-nvidia-cuda -target-cpu sm_80 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck --check-prefix=NVPTX %s
 
 !===============================================================================
 ! Target_Enter Simple
@@ -10,6 +11,7 @@
 !CHECK-SAME: +dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,
 !CHECK-SAME: +gfx8-insts,+gfx9-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,
 !CHECK-SAME: +wavefrontsize64">
+!NVPTX: omp.target = #omp.target<target_cpu = "sm_80", target_features = "+ptx61,+sm_80">
 !CHECK-LABEL: func.func @_QPomp_target_simple()
 subroutine omp_target_simple
   ! Directive needed to prevent subroutine from being filtered out when
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90
index 58f5379e330ec03..993b91d4eb623e9 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of arrays with provided lower 
 ! and upper bounds as specified by OpenMP's sectioning
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90
index e3df7983e6b5c18..669d3674926f696 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of a regular array explicitly
 ! passed within a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90
index abc2763d4a30cca..c87d6ee24aed3ef 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of a regular array explicitly
 ! passed within a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90
index d3c799ff3334f4d..9b10e4c7650d05a 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of a regular array explicitly
 ! passed within a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90 b/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90
index 295452b0698a660..6423ac765670d48 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90
@@ -1,6 +1,5 @@
 ! Basic offloading test with a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90 b/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90
index f5e3ae00653a9ab..d2e59d93a0209ec 100644
--- a/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90
+++ b/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90
@@ -1,8 +1,7 @@
 ! Offloading test with a target region mapping a declare target
 ! Fortran array writing some values to it and checking the host
 ! correctly receives the updates made on the device.
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90 b/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90
index b4c793ca06cf798..884acb275a0eb47 100644
--- a/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90
+++ b/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90
@@ -2,8 +2,7 @@
 ! declare target For...
[truncated]

fabianmcg · 2023-11-16T12:20:00Z

Ping for review.

kiranchandramohan

Thanks for this contribution. This LGTM.

Have you tested the offloading tests on an Nvidia GPU?

kiranchandramohan · 2023-11-16T12:42:44Z

flang/lib/Frontend/FrontendActions.cpp

@@ -175,6 +175,45 @@ getExplicitAndImplicitAMDGPUTargetFeatures(CompilerInstance &ci,
  return llvm::join(featuresVec, ",");
 }

+// Get feature string which represents combined explicit target features
+// for NVPTX and the target features specified by the user/
+// TODO: Have a more robust target conf like `clang/lib/Basic/Targets/NVPTX.cpp`


If this code can be refactored and placed in llvm/lib/Frontend/Driver then both clang and flang can share it.

Sure, it should be doable, AMDGPU also needs a similar refactor, however, I'm thinking (and waiting) in adding them to llvm/offload as MLIR would also benefit from these refactors.

fabianmcg · 2023-11-16T12:59:43Z

Have you tested the offloading tests on an Nvidia GPU?

Yes, on an NVIDIA V100.

TIFitis

Looks good!

This patch adds initial support to the NVPTX target, enabling `flang` to produce OpenMP offload code for NVPTX targets.

[flang][NVPTX] Add initial support to the NVPTX target

dd7ada8

This patch adds initial support to the NVPTX target, enabling Flang to produce OpenMP offload code for NVPTX targets.

llvmbot added flang:driver flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp flang:codegen openmp:libomptarget OpenMP offload runtime labels Nov 10, 2023

fabianmcg requested review from clementval, kiranchandramohan, jdoerfert and jeanPerier November 10, 2023 22:28

AndiH mentioned this pull request Nov 13, 2023

OpenMP on NVIDIA in Flang AndiH/gpu-lang-compat#3

Open

kiranchandramohan requested review from skatrak, jsjodin, agozillon, TIFitis and banach-space November 16, 2023 12:34

kiranchandramohan approved these changes Nov 16, 2023

View reviewed changes

kiranchandramohan reviewed Nov 16, 2023

View reviewed changes

TIFitis approved these changes Nov 16, 2023

View reviewed changes

fabianmcg merged commit be9fa9d into llvm:main Nov 16, 2023
9 checks passed

fabianmcg deleted the flang-nvptx branch November 16, 2023 16:43

sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023

[flang][NVPTX] Add initial support to the NVPTX target (llvm#71992)

7665591

This patch adds initial support to the NVPTX target, enabling `flang` to produce OpenMP offload code for NVPTX targets.

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

[flang][NVPTX] Add initial support to the NVPTX target (llvm#71992)

064fafb

This patch adds initial support to the NVPTX target, enabling `flang` to produce OpenMP offload code for NVPTX targets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang][NVPTX] Add initial support to the NVPTX target #71992

[flang][NVPTX] Add initial support to the NVPTX target #71992

fabianmcg commented Nov 10, 2023

llvmbot commented Nov 10, 2023 •

edited

fabianmcg commented Nov 16, 2023

kiranchandramohan left a comment

kiranchandramohan Nov 16, 2023

fabianmcg Nov 16, 2023

fabianmcg commented Nov 16, 2023

TIFitis left a comment

[flang][NVPTX] Add initial support to the NVPTX target #71992

[flang][NVPTX] Add initial support to the NVPTX target #71992

Conversation

fabianmcg commented Nov 10, 2023

llvmbot commented Nov 10, 2023 • edited

fabianmcg commented Nov 16, 2023

kiranchandramohan left a comment

Choose a reason for hiding this comment

kiranchandramohan Nov 16, 2023

Choose a reason for hiding this comment

fabianmcg Nov 16, 2023

Choose a reason for hiding this comment

fabianmcg commented Nov 16, 2023

TIFitis left a comment

Choose a reason for hiding this comment

llvmbot commented Nov 10, 2023 •

edited