Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][NVPTX] Add initial support to the NVPTX target #71992

Merged
merged 1 commit into from Nov 16, 2023

Conversation

fabianmcg
Copy link
Contributor

This patch adds initial support to the NVPTX target, enabling flang to produce OpenMP offload code for NVPTX targets.

This patch adds initial support to the NVPTX target, enabling Flang to produce
OpenMP offload code for NVPTX targets.
@llvmbot llvmbot added flang:driver flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp flang:codegen openmp:libomptarget OpenMP offload runtime labels Nov 10, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-codegen

Author: Fabian Mora (fabianmcg)

Changes

This patch adds initial support to the NVPTX target, enabling flang to produce OpenMP offload code for NVPTX targets.


Patch is 20.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71992.diff

15 Files Affected:

  • (modified) flang/lib/Frontend/FrontendActions.cpp (+41)
  • (modified) flang/lib/Optimizer/CodeGen/Target.cpp (+30)
  • (modified) flang/test/Driver/omp-driver-offload.f90 (+42)
  • (modified) flang/test/Fir/target-rewrite-boxchar.fir (+1)
  • (modified) flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90 (+3-1)
  • (modified) flang/test/Lower/OpenMP/FIR/target_cpu_features.f90 (+4-1)
  • (modified) flang/test/Lower/OpenMP/omp-is-gpu.f90 (+3-1)
  • (modified) flang/test/Lower/OpenMP/target_cpu_features.f90 (+3-1)
  • (modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90 (+1-2)
  • (modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90 (+1-2)
  • (modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90 (+1-2)
  • (modified) openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90 (+1-2)
  • (modified) openmp/libomptarget/test/offloading/fortran/basic_target_region.f90 (+1-2)
  • (modified) openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90 (+1-2)
  • (modified) openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90 (+1-2)
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 73c00c8679c7ec6..18e312469f3ee63 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -175,6 +175,45 @@ getExplicitAndImplicitAMDGPUTargetFeatures(CompilerInstance &ci,
   return llvm::join(featuresVec, ",");
 }
 
+// Get feature string which represents combined explicit target features
+// for NVPTX and the target features specified by the user/
+// TODO: Have a more robust target conf like `clang/lib/Basic/Targets/NVPTX.cpp`
+static std::string
+getExplicitAndImplicitNVPTXTargetFeatures(CompilerInstance &ci,
+                                          const TargetOptions &targetOpts,
+                                          const llvm::Triple triple) {
+  llvm::StringRef cpu = targetOpts.cpu;
+  llvm::StringMap<bool> implicitFeaturesMap;
+  std::string errorMsg;
+  bool ptxVer = false;
+
+  // Add target features specified by the user
+  for (auto &userFeature : targetOpts.featuresAsWritten) {
+    llvm::StringRef userKeyString(llvm::StringRef(userFeature).drop_front(1));
+    implicitFeaturesMap[userKeyString.str()] = (userFeature[0] == '+');
+    // Check if the user provided a PTX version
+    if (userKeyString.startswith("ptx"))
+      ptxVer = true;
+  }
+
+  // Set the default PTX version to `ptx61` if none was provided.
+  // TODO: set the default PTX version based on the chip.
+  if (!ptxVer)
+    implicitFeaturesMap["ptx61"] = true;
+
+  // Set the compute capability.
+  implicitFeaturesMap[cpu.str()] = true;
+
+  llvm::SmallVector<std::string> featuresVec;
+  for (auto &implicitFeatureItem : implicitFeaturesMap) {
+    featuresVec.push_back((llvm::Twine(implicitFeatureItem.second ? "+" : "-") +
+                           implicitFeatureItem.first().str())
+                              .str());
+  }
+  llvm::sort(featuresVec);
+  return llvm::join(featuresVec, ",");
+}
+
 // Produces the string which represents target feature
 static std::string getTargetFeatures(CompilerInstance &ci) {
   const TargetOptions &targetOpts = ci.getInvocation().getTargetOpts();
@@ -188,6 +227,8 @@ static std::string getTargetFeatures(CompilerInstance &ci) {
   // them to the target features specified by the user
   if (triple.isAMDGPU()) {
     return getExplicitAndImplicitAMDGPUTargetFeatures(ci, targetOpts, triple);
+  } else if (triple.isNVPTX()) {
+    return getExplicitAndImplicitNVPTXTargetFeatures(ci, targetOpts, triple);
   }
   return llvm::join(targetOpts.featuresAsWritten.begin(),
                     targetOpts.featuresAsWritten.end(), ",");
diff --git a/flang/lib/Optimizer/CodeGen/Target.cpp b/flang/lib/Optimizer/CodeGen/Target.cpp
index 83e7fa9b440bed2..bb893277cb4d21d 100644
--- a/flang/lib/Optimizer/CodeGen/Target.cpp
+++ b/flang/lib/Optimizer/CodeGen/Target.cpp
@@ -621,6 +621,33 @@ struct TargetAMDGPU : public GenericTarget<TargetAMDGPU> {
 };
 } // namespace
 
+//===----------------------------------------------------------------------===//
+// NVPTX linux target specifics.
+//===----------------------------------------------------------------------===//
+
+namespace {
+struct TargetNVPTX : public GenericTarget<TargetNVPTX> {
+  using GenericTarget::GenericTarget;
+
+  // Default size (in bits) of the index type for strings.
+  static constexpr int defaultWidth = 64;
+
+  CodeGenSpecifics::Marshalling
+  complexArgumentType(mlir::Location loc, mlir::Type eleTy) const override {
+    CodeGenSpecifics::Marshalling marshal;
+    TODO(loc, "handle complex argument types");
+    return marshal;
+  }
+
+  CodeGenSpecifics::Marshalling
+  complexReturnType(mlir::Location loc, mlir::Type eleTy) const override {
+    CodeGenSpecifics::Marshalling marshal;
+    TODO(loc, "handle complex return types");
+    return marshal;
+  }
+};
+} // namespace
+
 //===----------------------------------------------------------------------===//
 // LoongArch64 linux target specifics.
 //===----------------------------------------------------------------------===//
@@ -708,6 +735,9 @@ fir::CodeGenSpecifics::get(mlir::MLIRContext *ctx, llvm::Triple &&trp,
   case llvm::Triple::ArchType::amdgcn:
     return std::make_unique<TargetAMDGPU>(ctx, std::move(trp),
                                           std::move(kindMap));
+  case llvm::Triple::ArchType::nvptx64:
+    return std::make_unique<TargetNVPTX>(ctx, std::move(trp),
+                                         std::move(kindMap));
   case llvm::Triple::ArchType::loongarch64:
     return std::make_unique<TargetLoongArch64>(ctx, std::move(trp),
                                                std::move(kindMap));
diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90
index bfdc3f6f4d4726b..ad50723b0e3a795 100644
--- a/flang/test/Driver/omp-driver-offload.f90
+++ b/flang/test/Driver/omp-driver-offload.f90
@@ -67,6 +67,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-threads-oversubscription \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-threads-oversubscription \
+! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
 ! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -74,6 +79,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-teams-oversubscription  \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-teams-oversubscription  \
+! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS
 ! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -81,6 +91,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-no-nested-parallelism  \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-no-nested-parallelism  \
+! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR
 ! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -88,6 +103,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-assume-no-thread-state \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-assume-no-thread-state \
+! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE
 ! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -95,6 +115,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-target-debug \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-target-debug \
+! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
 ! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90"
 
 ! RUN: %flang -### %s -o %t 2>&1 \
@@ -102,6 +127,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-target-debug \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-target-debug \
+! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
 ! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90"
 
 ! RUN: %flang -S -### %s -o %t 2>&1 \
@@ -111,6 +141,13 @@
 ! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \
 ! RUN: -fopenmp-assume-no-thread-state \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL
+! RUN: %flang -S -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-target-debug -fopenmp-assume-threads-oversubscription \
+! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \
+! RUN: -fopenmp-assume-no-thread-state \
+! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL
 ! CHECK-RTL-ALL: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" "-fopenmp-assume-teams-oversubscription"
 ! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism"
 ! CHECK-RTL-ALL: {{.*}}.f90"
@@ -120,6 +157,11 @@
 ! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
 ! RUN: -fopenmp-version=45 \
 ! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=sm_70 \
+! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
+! RUN: -fopenmp-version=45 \
+! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION
 ! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90"
 
 ! Test diagnostic error when host IR file is non-existent 
diff --git a/flang/test/Fir/target-rewrite-boxchar.fir b/flang/test/Fir/target-rewrite-boxchar.fir
index e66fa6041630380..b87cb35b46eb6c4 100644
--- a/flang/test/Fir/target-rewrite-boxchar.fir
+++ b/flang/test/Fir/target-rewrite-boxchar.fir
@@ -3,6 +3,7 @@
 // RUN: fir-opt --target-rewrite="target=aarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefix=INT64
 // RUN: fir-opt --target-rewrite="target=powerpc64le-unknown-linux-gnu" %s | FileCheck %s --check-prefix=INT64
 // RUN: fir-opt --target-rewrite="target=amdgcn-amd-amdhsa" %s | FileCheck %s --check-prefix=INT64
+// RUN: fir-opt --target-rewrite="target=nvptx64-nvidia-cuda" %s | FileCheck %s --check-prefix=INT64
 // RUN: fir-opt --target-rewrite="target=loongarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefix=INT64
 
 // Test that we rewrite the signatures and bodies of functions that take boxchar
diff --git a/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90 b/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90
index b702fc2c5a7e253..ac8d24974801570 100644
--- a/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90
+++ b/flang/test/Lower/OpenMP/FIR/omp-is-gpu.f90
@@ -1,9 +1,11 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 
 !RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -triple nvptx64-nvidia-cuda -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
 !RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-fir -o - %s | FileCheck %s
 
 !RUN: not %flang_fc1 -triple amdgcn-amd-amdhsa -emit-fir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
+!RUN: not %flang_fc1 -triple nvptx64-nvidia-cuda -emit-fir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
 !RUN: not bbc -fopenmp -fopenmp-is-gpu -emit-fir %s -o - 2>&1 | FileCheck %s --check-prefix=BBC-ERROR
 
 !CHECK: module attributes {{{.*}}omp.is_gpu = true
diff --git a/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90 b/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90
index c6159342c023aa4..179b71b3f0cfa5c 100644
--- a/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90
+++ b/flang/test/Lower/OpenMP/FIR/target_cpu_features.f90
@@ -1,5 +1,7 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 !RUN: %flang_fc1 -emit-fir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -emit-hlfir -triple nvptx64-nvidia-cuda -target-cpu sm_80 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck --check-prefix=NVPTX %s
+
 
 !===============================================================================
 ! Target_Enter Simple
@@ -10,6 +12,7 @@
 !CHECK-SAME: +dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,
 !CHECK-SAME: +gfx8-insts,+gfx9-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,
 !CHECK-SAME: +wavefrontsize64">
+!NVPTX: omp.target = #omp.target<target_cpu = "sm_80", target_features = "+ptx61,+sm_80">
 !CHECK-LABEL: func.func @_QPomp_target_simple()
 subroutine omp_target_simple
   ! Directive needed to prevent subroutine from being filtered out when
diff --git a/flang/test/Lower/OpenMP/omp-is-gpu.f90 b/flang/test/Lower/OpenMP/omp-is-gpu.f90
index 12d0e4e869fba5a..3e6daeb522d7789 100644
--- a/flang/test/Lower/OpenMP/omp-is-gpu.f90
+++ b/flang/test/Lower/OpenMP/omp-is-gpu.f90
@@ -1,9 +1,11 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 
 !RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -triple nvptx64-nvidia-cuda -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
 !RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -o - %s | FileCheck %s
 
 !RUN: not %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
+!RUN: not %flang_fc1 -triple nvptx64-nvidia-cuda -emit-hlfir -fopenmp %s -o - 2>&1 | FileCheck %s --check-prefix=FLANG-ERROR
 !RUN: not bbc -fopenmp -fopenmp-is-gpu -emit-hlfir %s -o - 2>&1 | FileCheck %s --check-prefix=BBC-ERROR
 
 !CHECK: module attributes {{{.*}}omp.is_gpu = true
diff --git a/flang/test/Lower/OpenMP/target_cpu_features.f90 b/flang/test/Lower/OpenMP/target_cpu_features.f90
index 46fb14efad5c03c..ea1e5e38fca88ef 100644
--- a/flang/test/Lower/OpenMP/target_cpu_features.f90
+++ b/flang/test/Lower/OpenMP/target_cpu_features.f90
@@ -1,5 +1,6 @@
-!REQUIRES: amdgpu-registered-target
+!REQUIRES: amdgpu-registered-target, nvptx-registered-target
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: %flang_fc1 -emit-hlfir -triple nvptx64-nvidia-cuda -target-cpu sm_80 -fopenmp -fopenmp-is-target-device %s -o - | FileCheck --check-prefix=NVPTX %s
 
 !===============================================================================
 ! Target_Enter Simple
@@ -10,6 +11,7 @@
 !CHECK-SAME: +dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,
 !CHECK-SAME: +gfx8-insts,+gfx9-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,
 !CHECK-SAME: +wavefrontsize64">
+!NVPTX: omp.target = #omp.target<target_cpu = "sm_80", target_features = "+ptx61,+sm_80">
 !CHECK-LABEL: func.func @_QPomp_target_simple()
 subroutine omp_target_simple
   ! Directive needed to prevent subroutine from being filtered out when
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90
index 58f5379e330ec03..993b91d4eb623e9 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of arrays with provided lower 
 ! and upper bounds as specified by OpenMP's sectioning
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90
index e3df7983e6b5c18..669d3674926f696 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of a regular array explicitly
 ! passed within a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90
index abc2763d4a30cca..c87d6ee24aed3ef 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of a regular array explicitly
 ! passed within a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90 b/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90
index d3c799ff3334f4d..9b10e4c7650d05a 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90
@@ -1,7 +1,6 @@
 ! Basic offloading test of a regular array explicitly
 ! passed within a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90 b/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90
index 295452b0698a660..6423ac765670d48 100644
--- a/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90
+++ b/openmp/libomptarget/test/offloading/fortran/basic_target_region.f90
@@ -1,6 +1,5 @@
 ! Basic offloading test with a target region
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90 b/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90
index f5e3ae00653a9ab..d2e59d93a0209ec 100644
--- a/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90
+++ b/openmp/libomptarget/test/offloading/fortran/declare-target-array-in-target-region.f90
@@ -1,8 +1,7 @@
 ! Offloading test with a target region mapping a declare target
 ! Fortran array writing some values to it and checking the host
 ! correctly receives the updates made on the device.
-! REQUIRES: flang, amdgcn-amd-amdhsa
-! UNSUPPORTED: nvptx64-nvidia-cuda
+! REQUIRES: flang, amdgcn-amd-amdhsa, nvptx64-nvidia-cuda
 ! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
 ! UNSUPPORTED: aarch64-unknown-linux-gnu
 ! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
diff --git a/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90 b/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90
index b4c793ca06cf798..884acb275a0eb47 100644
--- a/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90
+++ b/openmp/libomptarget/test/offloading/fortran/double-target-call-with-declare-target.f90
@@ -2,8 +2,7 @@
 ! declare target For...
[truncated]

@fabianmcg
Copy link
Contributor Author

Ping for review.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution. This LGTM.

Have you tested the offloading tests on an Nvidia GPU?

@@ -175,6 +175,45 @@ getExplicitAndImplicitAMDGPUTargetFeatures(CompilerInstance &ci,
return llvm::join(featuresVec, ",");
}

// Get feature string which represents combined explicit target features
// for NVPTX and the target features specified by the user/
// TODO: Have a more robust target conf like `clang/lib/Basic/Targets/NVPTX.cpp`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code can be refactored and placed in llvm/lib/Frontend/Driver then both clang and flang can share it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it should be doable, AMDGPU also needs a similar refactor, however, I'm thinking (and waiting) in adding them to llvm/offload as MLIR would also benefit from these refactors.

@fabianmcg
Copy link
Contributor Author

Have you tested the offloading tests on an Nvidia GPU?

Yes, on an NVIDIA V100.

Copy link
Member

@TIFitis TIFitis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@fabianmcg fabianmcg merged commit be9fa9d into llvm:main Nov 16, 2023
9 checks passed
@fabianmcg fabianmcg deleted the flang-nvptx branch November 16, 2023 16:43
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
This patch adds initial support to the NVPTX target, enabling `flang` to
produce OpenMP offload code for NVPTX targets.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
This patch adds initial support to the NVPTX target, enabling `flang` to
produce OpenMP offload code for NVPTX targets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:codegen flang:driver flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants