Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][runtime] Enable more code for offload device builds. #67489

Merged
merged 3 commits into from Sep 27, 2023

Conversation

vzakhari
Copy link
Contributor

@vzakhari vzakhari commented Sep 26, 2023

I extended the "closure" of the device code containing the initial
transformational.cpp. The device side of the library should not be
complete at least for some APIs. For example, I tested with C OpenMP
code calling BesselJnX0 with a nullptr descriptor that failed with
a runtime error when executing on a GPU.

I added --expt-relaxed-constexpr for NVCC compiler to avoid multiple
warnings about missing __attribute__((device)) on constexpr methods
coming from C++ header files.

I extended the "closure" of the device code containing the initial
transformational.cpp. The device side of the library should not be
complete at least for some APIs. For example, I tested with C OpenMP
code calling BesselJnX0 with a nullptr descriptor that failed with
a runtime error when executing on a GPU.

I added `--expt-relaxed-constexpr` for NVCC compiler to avoid multiple
warnings about missing __attribute__((device)) on constexpr methods
coming from C++ header files.
@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category labels Sep 26, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 26, 2023

@llvm/pr-subscribers-flang-runtime

Changes

I extended the "closure" of the device code containing the initial
transformational.cpp. The device side of the library should not be
complete at least for some APIs. For example, I tested with C OpenMP
code calling BesselJnX0 with a nullptr descriptor that failed with
a runtime error when executing on a GPU.

I added --expt-relaxed-constexpr for NVCC compiler to avoid multiple
warnings about missing __attribute__((device)) on constexpr methods
coming from C++ header files.


Patch is 36.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67489.diff

12 Files Affected:

  • (modified) flang/include/flang/Runtime/api-attrs.h (+23)
  • (modified) flang/include/flang/Runtime/descriptor.h (+4-4)
  • (modified) flang/include/flang/Runtime/memory.h (+99-6)
  • (modified) flang/include/flang/Runtime/type-code.h (+9-9)
  • (modified) flang/runtime/CMakeLists.txt (+8)
  • (modified) flang/runtime/ISO_Fortran_util.h (+5-5)
  • (modified) flang/runtime/derived.h (+6-4)
  • (modified) flang/runtime/descriptor.cpp (+29-24)
  • (modified) flang/runtime/terminator.cpp (+60-17)
  • (modified) flang/runtime/terminator.h (+58-13)
  • (modified) flang/runtime/type-code.cpp (+7-2)
  • (modified) flang/runtime/type-info.h (+42-42)
diff --git a/flang/include/flang/Runtime/api-attrs.h b/flang/include/flang/Runtime/api-attrs.h
index a866625a7b95ba4..0768682cadbdcbb 100644
--- a/flang/include/flang/Runtime/api-attrs.h
+++ b/flang/include/flang/Runtime/api-attrs.h
@@ -42,6 +42,18 @@
 #endif
 #endif /* !defined(RT_EXT_API_GROUP_END) */
 
+/*
+ * RT_OFFLOAD_API_GROUP_BEGIN/END pair is placed around definitions
+ * of functions that can be referenced in other modules of Flang
+ * runtime. For OpenMP offload these functions are made "declare target"
+ * making sure they are compiled for the target even though direct
+ * references to them from other "declare target" functions may not
+ * be seen. Host-only functions should not be put in between these
+ * two macros.
+ */
+#define RT_OFFLOAD_API_GROUP_BEGIN RT_EXT_API_GROUP_BEGIN
+#define RT_OFFLOAD_API_GROUP_END RT_EXT_API_GROUP_END
+
 /*
  * RT_VAR_GROUP_BEGIN/END pair is placed around definitions
  * of module scope variables referenced by Flang runtime (directly
@@ -88,4 +100,15 @@
 #endif
 #endif /* !defined(RT_CONST_VAR_ATTRS) */
 
+/*
+ * RT_DEVICE_COMPILATION is defined for any device compilation.
+ * Note that it can only be used reliably with compilers that perform
+ * separate host and device compilations.
+ */
+#if ((defined(__CUDACC__) || defined(__CUDA__)) && defined(__CUDA_ARCH__)) || (defined(_OPENMP) && (defined(__AMDGCN__) || defined(__NVPTX__)))
+#define RT_DEVICE_COMPILATION 1
+#else
+#undef RT_DEVICE_COMPILATION
+#endif
+
 #endif /* !FORTRAN_RUNTIME_API_ATTRS_H_ */
diff --git a/flang/include/flang/Runtime/descriptor.h b/flang/include/flang/Runtime/descriptor.h
index 62a8d123bf2ee06..09077ec849acee0 100644
--- a/flang/include/flang/Runtime/descriptor.h
+++ b/flang/include/flang/Runtime/descriptor.h
@@ -181,19 +181,19 @@ class Descriptor {
       ISO::CFI_attribute_t attribute = CFI_attribute_other);
 
   // CUDA_TODO: Clang does not support unique_ptr on device.
-  static OwningPtr<Descriptor> Create(TypeCode t, std::size_t elementBytes,
+  static RT_API_ATTRS OwningPtr<Descriptor> Create(TypeCode t, std::size_t elementBytes,
       void *p = nullptr, int rank = maxRank,
       const SubscriptValue *extent = nullptr,
       ISO::CFI_attribute_t attribute = CFI_attribute_other,
       int derivedTypeLenParameters = 0);
-  static OwningPtr<Descriptor> Create(TypeCategory, int kind, void *p = nullptr,
+  static RT_API_ATTRS OwningPtr<Descriptor> Create(TypeCategory, int kind, void *p = nullptr,
       int rank = maxRank, const SubscriptValue *extent = nullptr,
       ISO::CFI_attribute_t attribute = CFI_attribute_other);
-  static OwningPtr<Descriptor> Create(int characterKind,
+  static RT_API_ATTRS OwningPtr<Descriptor> Create(int characterKind,
       SubscriptValue characters, void *p = nullptr, int rank = maxRank,
       const SubscriptValue *extent = nullptr,
       ISO::CFI_attribute_t attribute = CFI_attribute_other);
-  static OwningPtr<Descriptor> Create(const typeInfo::DerivedType &dt,
+  static RT_API_ATTRS OwningPtr<Descriptor> Create(const typeInfo::DerivedType &dt,
       void *p = nullptr, int rank = maxRank,
       const SubscriptValue *extent = nullptr,
       ISO::CFI_attribute_t attribute = CFI_attribute_other);
diff --git a/flang/include/flang/Runtime/memory.h b/flang/include/flang/Runtime/memory.h
index 0afe5250169d0b4..579ba78a1c93b20 100644
--- a/flang/include/flang/Runtime/memory.h
+++ b/flang/include/flang/Runtime/memory.h
@@ -12,19 +12,22 @@
 #ifndef FORTRAN_RUNTIME_MEMORY_H_
 #define FORTRAN_RUNTIME_MEMORY_H_
 
+#include "flang/Runtime/api-attrs.h"
+#include <cassert>
 #include <memory>
+#include <type_traits>
 
 namespace Fortran::runtime {
 
 class Terminator;
 
-[[nodiscard]] void *AllocateMemoryOrCrash(
+[[nodiscard]] RT_API_ATTRS void *AllocateMemoryOrCrash(
     const Terminator &, std::size_t bytes);
 template <typename A> [[nodiscard]] A &AllocateOrCrash(const Terminator &t) {
   return *reinterpret_cast<A *>(AllocateMemoryOrCrash(t, sizeof(A)));
 }
-void FreeMemory(void *);
-template <typename A> void FreeMemory(A *p) {
+RT_API_ATTRS void FreeMemory(void *);
+template <typename A> RT_API_ATTRS void FreeMemory(A *p) {
   FreeMemory(reinterpret_cast<void *>(p));
 }
 template <typename A> void FreeMemoryAndNullify(A *&p) {
@@ -32,11 +35,101 @@ template <typename A> void FreeMemoryAndNullify(A *&p) {
   p = nullptr;
 }
 
-template <typename A> struct OwningPtrDeleter {
-  void operator()(A *p) { FreeMemory(p); }
+// Very basic implementation mimicking std::unique_ptr.
+// It should work for any offload device compiler.
+// It uses a fixed memory deleter based on FreeMemory(),
+// and does not support array objects with runtime length.
+template <typename A>
+class OwningPtr {
+public:
+  using pointer_type = A *;
+
+  OwningPtr() = default;
+  RT_API_ATTRS explicit OwningPtr(pointer_type p) : ptr_(p) {}
+  RT_API_ATTRS OwningPtr(const OwningPtr &) = delete;
+  RT_API_ATTRS OwningPtr& operator=(const OwningPtr &) = delete;
+  RT_API_ATTRS OwningPtr(OwningPtr &&other) {
+    ptr_ = other.ptr_;
+    other.ptr_ = pointer_type();
+  }
+  RT_API_ATTRS OwningPtr &operator=(OwningPtr &&other) {
+    if (this != &other) {
+      delete_ptr(ptr_);
+      ptr_ = other.ptr_;
+      other.ptr_ = pointer_type();
+    }
+    return *this;
+  }
+  constexpr RT_API_ATTRS OwningPtr(std::nullptr_t) : OwningPtr() { }
+
+  // Delete the pointer, if owns one.
+  RT_API_ATTRS ~OwningPtr() {
+    if (ptr_ != pointer_type()) {
+      delete_ptr(ptr_);
+      ptr_ = pointer_type();
+    }
+  }
+
+  // Release the ownership.
+  RT_API_ATTRS pointer_type release() {
+    pointer_type p = ptr_;
+    ptr_ = pointer_type();
+    return p;
+  }
+
+  // Replace the pointer.
+  RT_API_ATTRS void reset(pointer_type p = pointer_type()) {
+    std::swap(ptr_, p);
+    if (p != pointer_type()) {
+      // Delete the owned pointer.
+      delete_ptr(p);
+    }
+  }
+
+  // Exchange the pointer with another object.
+  RT_API_ATTRS void swap(OwningPtr &other) {
+    std::swap(ptr_, other.ptr_);
+  }
+
+  // Get the stored pointer.
+  RT_API_ATTRS pointer_type get() const {
+    return ptr_;
+  }
+
+  RT_API_ATTRS explicit operator bool() const {
+    return get() == pointer_type() ? false : true;
+  }
+
+  RT_API_ATTRS typename std::add_lvalue_reference<A>::type operator*() const {
+    assert(get() != pointer_type());
+    return *get();
+  }
+
+  RT_API_ATTRS pointer_type operator->() const {
+    return get();
+  }
+
+private:
+  RT_API_ATTRS void delete_ptr(pointer_type p) {
+    FreeMemory(p);
+  }
+  pointer_type ptr_{};
 };
 
-template <typename A> using OwningPtr = std::unique_ptr<A, OwningPtrDeleter<A>>;
+template <typename X, typename Y>
+inline RT_API_ATTRS bool operator!=(const OwningPtr<X> &x, const OwningPtr<Y> &y) {
+  return x.get() != y.get();
+}
+
+template <typename X>
+inline RT_API_ATTRS bool operator!=(const OwningPtr<X> &x, std::nullptr_t) {
+  return (bool)x;
+}
+
+template <typename X>
+inline RT_API_ATTRS bool operator!=(std::nullptr_t, const OwningPtr<X> &x) {
+  return (bool)x;
+}
 
 template <typename A> class SizedNew {
 public:
diff --git a/flang/include/flang/Runtime/type-code.h b/flang/include/flang/Runtime/type-code.h
index fb18dba54980f69..172355609e26128 100644
--- a/flang/include/flang/Runtime/type-code.h
+++ b/flang/include/flang/Runtime/type-code.h
@@ -26,29 +26,29 @@ class TypeCode {
 
   RT_API_ATTRS int raw() const { return raw_; }
 
-  constexpr bool IsValid() const {
+  constexpr RT_API_ATTRS bool IsValid() const {
     return raw_ >= CFI_type_signed_char && raw_ <= CFI_TYPE_LAST;
   }
-  constexpr bool IsInteger() const {
+  constexpr RT_API_ATTRS bool IsInteger() const {
     return raw_ >= CFI_type_signed_char && raw_ <= CFI_type_ptrdiff_t;
   }
-  constexpr bool IsReal() const {
+  constexpr RT_API_ATTRS bool IsReal() const {
     return raw_ >= CFI_type_half_float && raw_ <= CFI_type_float128;
   }
-  constexpr bool IsComplex() const {
+  constexpr RT_API_ATTRS bool IsComplex() const {
     return raw_ >= CFI_type_half_float_Complex &&
         raw_ <= CFI_type_float128_Complex;
   }
-  constexpr bool IsCharacter() const {
+  constexpr RT_API_ATTRS bool IsCharacter() const {
     return raw_ == CFI_type_char || raw_ == CFI_type_char16_t ||
         raw_ == CFI_type_char32_t;
   }
-  constexpr bool IsLogical() const {
+  constexpr RT_API_ATTRS bool IsLogical() const {
     return raw_ == CFI_type_Bool ||
         (raw_ >= CFI_type_int_least8_t && raw_ <= CFI_type_int_least64_t);
   }
-  constexpr bool IsDerived() const { return raw_ == CFI_type_struct; }
-  constexpr bool IsIntrinsic() const { return IsValid() && !IsDerived(); }
+  constexpr RT_API_ATTRS bool IsDerived() const { return raw_ == CFI_type_struct; }
+  constexpr RT_API_ATTRS bool IsIntrinsic() const { return IsValid() && !IsDerived(); }
 
   RT_API_ATTRS std::optional<std::pair<TypeCategory, int>>
   GetCategoryAndKind() const;
@@ -65,7 +65,7 @@ class TypeCode {
       return thisCK && thatCK && *thisCK == *thatCK;
     }
   }
-  bool operator!=(TypeCode that) const { return !(*this == that); }
+  RT_API_ATTRS bool operator!=(TypeCode that) const { return !(*this == that); }
 
 private:
   ISO::CFI_type_t raw_{CFI_type_other};
diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt
index 5b23065a32d1699..e7d416749219ef6 100644
--- a/flang/runtime/CMakeLists.txt
+++ b/flang/runtime/CMakeLists.txt
@@ -150,7 +150,10 @@ option(FLANG_EXPERIMENTAL_CUDA_RUNTIME
 
 # List of files that are buildable for all devices.
 set(supported_files
+  descriptor.cpp
+  terminator.cpp
   transformational.cpp
+  type-code.cpp
   )
 
 if (FLANG_EXPERIMENTAL_CUDA_RUNTIME)
@@ -175,6 +178,11 @@ if (FLANG_EXPERIMENTAL_CUDA_RUNTIME)
       -Xclang -fcuda-allow-variadic-functions
       )
   endif()
+  if ("${CMAKE_CUDA_COMPILER_ID}" MATCHES "NVIDIA")
+    set(CUDA_COMPILE_OPTIONS
+      --expt-relaxed-constexpr
+      )
+  endif()
   set_source_files_properties(${supported_files} PROPERTIES COMPILE_OPTIONS
     "${CUDA_COMPILE_OPTIONS}"
     )
diff --git a/flang/runtime/ISO_Fortran_util.h b/flang/runtime/ISO_Fortran_util.h
index 7d527bfd65789d8..d63cda8931f37b7 100644
--- a/flang/runtime/ISO_Fortran_util.h
+++ b/flang/runtime/ISO_Fortran_util.h
@@ -18,15 +18,15 @@
 #include <cstdlib>
 
 namespace Fortran::ISO {
-static inline constexpr bool IsCharacterType(CFI_type_t ty) {
+static inline constexpr RT_API_ATTRS bool IsCharacterType(CFI_type_t ty) {
   return ty == CFI_type_char || ty == CFI_type_char16_t ||
       ty == CFI_type_char32_t;
 }
-static inline constexpr bool IsAssumedSize(const CFI_cdesc_t *dv) {
+static inline constexpr RT_API_ATTRS bool IsAssumedSize(const CFI_cdesc_t *dv) {
   return dv->rank > 0 && dv->dim[dv->rank - 1].extent == -1;
 }
 
-static inline std::size_t MinElemLen(CFI_type_t type) {
+static inline RT_API_ATTRS std::size_t MinElemLen(CFI_type_t type) {
   auto typeParams{Fortran::runtime::TypeCode{type}.GetCategoryAndKind()};
   if (!typeParams) {
     Fortran::runtime::Terminator terminator{__FILE__, __LINE__};
@@ -38,7 +38,7 @@ static inline std::size_t MinElemLen(CFI_type_t type) {
       typeParams->first, typeParams->second);
 }
 
-static inline int VerifyEstablishParameters(CFI_cdesc_t *descriptor,
+static inline RT_API_ATTRS int VerifyEstablishParameters(CFI_cdesc_t *descriptor,
     void *base_addr, CFI_attribute_t attribute, CFI_type_t type,
     std::size_t elem_len, CFI_rank_t rank, const CFI_index_t extents[],
     bool external) {
@@ -77,7 +77,7 @@ static inline int VerifyEstablishParameters(CFI_cdesc_t *descriptor,
   return CFI_SUCCESS;
 }
 
-static inline void EstablishDescriptor(CFI_cdesc_t *descriptor, void *base_addr,
+static inline RT_API_ATTRS void EstablishDescriptor(CFI_cdesc_t *descriptor, void *base_addr,
     CFI_attribute_t attribute, CFI_type_t type, std::size_t elem_len,
     CFI_rank_t rank, const CFI_index_t extents[]) {
   descriptor->base_addr = base_addr;
diff --git a/flang/runtime/derived.h b/flang/runtime/derived.h
index 747a93303e0dbc0..6b9ea907fda9b8b 100644
--- a/flang/runtime/derived.h
+++ b/flang/runtime/derived.h
@@ -11,6 +11,8 @@
 #ifndef FORTRAN_RUNTIME_DERIVED_H_
 #define FORTRAN_RUNTIME_DERIVED_H_
 
+#include "flang/Runtime/api-attrs.h"
+
 namespace Fortran::runtime::typeInfo {
 class DerivedType;
 }
@@ -21,21 +23,21 @@ class Terminator;
 
 // Perform default component initialization, allocate automatic components.
 // Returns a STAT= code (0 when all's well).
-int Initialize(const Descriptor &, const typeInfo::DerivedType &, Terminator &,
+RT_API_ATTRS int Initialize(const Descriptor &, const typeInfo::DerivedType &, Terminator &,
     bool hasStat = false, const Descriptor *errMsg = nullptr);
 
 // Call FINAL subroutines, if any
-void Finalize(
+RT_API_ATTRS void Finalize(
     const Descriptor &, const typeInfo::DerivedType &derived, Terminator *);
 
 // Call FINAL subroutines, deallocate allocatable & automatic components.
 // Does not deallocate the original descriptor.
-void Destroy(const Descriptor &, bool finalize, const typeInfo::DerivedType &,
+RT_API_ATTRS void Destroy(const Descriptor &, bool finalize, const typeInfo::DerivedType &,
     Terminator *);
 
 // Return true if the passed descriptor is for a derived type
 // entity that has a dynamic (allocatable, automatic) component.
-bool HasDynamicComponent(const Descriptor &);
+RT_API_ATTRS bool HasDynamicComponent(const Descriptor &);
 
 } // namespace Fortran::runtime
 #endif // FORTRAN_RUNTIME_DERIVED_H_
diff --git a/flang/runtime/descriptor.cpp b/flang/runtime/descriptor.cpp
index ab6460708e9b68f..043b73255ab6686 100644
--- a/flang/runtime/descriptor.cpp
+++ b/flang/runtime/descriptor.cpp
@@ -20,14 +20,16 @@
 
 namespace Fortran::runtime {
 
-Descriptor::Descriptor(const Descriptor &that) { *this = that; }
+RT_OFFLOAD_API_GROUP_BEGIN
 
-Descriptor &Descriptor::operator=(const Descriptor &that) {
+RT_API_ATTRS Descriptor::Descriptor(const Descriptor &that) { *this = that; }
+
+RT_API_ATTRS Descriptor &Descriptor::operator=(const Descriptor &that) {
   std::memcpy(this, &that, that.SizeInBytes());
   return *this;
 }
 
-void Descriptor::Establish(TypeCode t, std::size_t elementBytes, void *p,
+RT_API_ATTRS void Descriptor::Establish(TypeCode t, std::size_t elementBytes, void *p,
     int rank, const SubscriptValue *extent, ISO::CFI_attribute_t attribute,
     bool addendum) {
   Terminator terminator{__FILE__, __LINE__};
@@ -58,33 +60,33 @@ void Descriptor::Establish(TypeCode t, std::size_t elementBytes, void *p,
 
 namespace {
 template <TypeCategory CAT, int KIND> struct TypeSizeGetter {
-  constexpr std::size_t operator()() const {
+  constexpr RT_API_ATTRS std::size_t operator()() const {
     CppTypeFor<CAT, KIND> arr[2];
     return sizeof arr / 2;
   }
 };
 } // namespace
 
-std::size_t Descriptor::BytesFor(TypeCategory category, int kind) {
+RT_API_ATTRS std::size_t Descriptor::BytesFor(TypeCategory category, int kind) {
   Terminator terminator{__FILE__, __LINE__};
   return ApplyType<TypeSizeGetter, std::size_t>(category, kind, terminator);
 }
 
-void Descriptor::Establish(TypeCategory c, int kind, void *p, int rank,
+RT_API_ATTRS void Descriptor::Establish(TypeCategory c, int kind, void *p, int rank,
     const SubscriptValue *extent, ISO::CFI_attribute_t attribute,
     bool addendum) {
   Establish(TypeCode(c, kind), BytesFor(c, kind), p, rank, extent, attribute,
       addendum);
 }
 
-void Descriptor::Establish(int characterKind, std::size_t characters, void *p,
+RT_API_ATTRS void Descriptor::Establish(int characterKind, std::size_t characters, void *p,
     int rank, const SubscriptValue *extent, ISO::CFI_attribute_t attribute,
     bool addendum) {
   Establish(TypeCode{TypeCategory::Character, characterKind},
       characterKind * characters, p, rank, extent, attribute, addendum);
 }
 
-void Descriptor::Establish(const typeInfo::DerivedType &dt, void *p, int rank,
+RT_API_ATTRS void Descriptor::Establish(const typeInfo::DerivedType &dt, void *p, int rank,
     const SubscriptValue *extent, ISO::CFI_attribute_t attribute) {
   Establish(TypeCode{TypeCategory::Derived, 0}, dt.sizeInBytes(), p, rank,
       extent, attribute, true);
@@ -94,7 +96,7 @@ void Descriptor::Establish(const typeInfo::DerivedType &dt, void *p, int rank,
   new (a) DescriptorAddendum{&dt};
 }
 
-OwningPtr<Descriptor> Descriptor::Create(TypeCode t, std::size_t elementBytes,
+RT_API_ATTRS OwningPtr<Descriptor> Descriptor::Create(TypeCode t, std::size_t elementBytes,
     void *p, int rank, const SubscriptValue *extent,
     ISO::CFI_attribute_t attribute, int derivedTypeLenParameters) {
   std::size_t bytes{SizeInBytes(rank, true, derivedTypeLenParameters)};
@@ -105,33 +107,33 @@ OwningPtr<Descriptor> Descriptor::Create(TypeCode t, std::size_t elementBytes,
   return OwningPtr<Descriptor>{result};
 }
 
-OwningPtr<Descriptor> Descriptor::Create(TypeCategory c, int kind, void *p,
+RT_API_ATTRS OwningPtr<Descriptor> Descriptor::Create(TypeCategory c, int kind, void *p,
     int rank, const SubscriptValue *extent, ISO::CFI_attribute_t attribute) {
   return Create(
       TypeCode(c, kind), BytesFor(c, kind), p, rank, extent, attribute);
 }
 
-OwningPtr<Descriptor> Descriptor::Create(int characterKind,
+RT_API_ATTRS OwningPtr<Descriptor> Descriptor::Create(int characterKind,
     SubscriptValue characters, void *p, int rank, const SubscriptValue *extent,
     ISO::CFI_attribute_t attribute) {
   return Create(TypeCode{TypeCategory::Character, characterKind},
       characterKind * characters, p, rank, extent, attribute);
 }
 
-OwningPtr<Descriptor> Descriptor::Create(const typeInfo::DerivedType &dt,
+RT_API_ATTRS OwningPtr<Descriptor> Descriptor::Create(const typeInfo::DerivedType &dt,
     void *p, int rank, const SubscriptValue *extent,
     ISO::CFI_attribute_t attribute) {
   return Create(TypeCode{TypeCategory::Derived, 0}, dt.sizeInBytes(), p, rank,
       extent, attribute, dt.LenParameters());
 }
 
-std::size_t Descriptor::SizeInBytes() const {
+RT_API_ATTRS std::size_t Descriptor::SizeInBytes() const {
   const DescriptorAddendum *addendum{Addendum()};
   return sizeof *this - sizeof(Dimension) + raw_.rank * sizeof(Dimension) +
       (addendum ? addendum->SizeInBytes() : 0);
 }
 
-std::size_t Descriptor::Elements() const {
+RT_API_ATTRS std::size_t Descriptor::Elements() const {
   int n{rank()};
   std::size_t elements{1};
   for (int j{0}; j < n; ++j) {
@@ -140,7 +142,7 @@ std::size_t Descriptor::Elements() const {
   return elements;
 }
 
-int Descriptor::Allocate() {
+RT_API_ATTRS int Descriptor::Allocate() {
   std::size_t byteSize{Elements() * ElementBytes()};
   // Zero size allocation is possible in Fortran and the resulting
   // descriptor must be allocated/associated. Since std::malloc(0)
@@ -162,7 +164,7 @@ int Descriptor::Allocate() {
   return 0;
 }
 
-int Descriptor::Destroy(
+RT_API_ATTRS int Descriptor::Destroy(
     bool finalize, bool destroyPointers, Terminator *terminator) {
   if (!destroyPointers && raw_.attribute == CFI_attribute_pointer) {
     return StatOk;
@@ -178,9 +180,9 @@ int Descriptor::Destroy(
   }
 }
 
-int Descriptor::Deallocate() { return ISO::CFI_deallocate(&raw_); }
+RT_API_ATTRS int Descriptor::Deallocate() { return ISO::CFI_deallocate(&raw_); }
 
-bool Descriptor::DecrementSubscripts(
+RT_API_ATTRS bool Descriptor::DecrementSubscripts(
     SubscriptValue *subscript, const int *permutation) const {
   for (int j{raw_.rank - 1}; j >= 0; --j) {
     int k{permutation ? permutation[j] : j};
@@ -193,7 +195,7 @@ bool Descriptor::DecrementSubscripts(
   return false;
 }
 
-std::size_t Descriptor::ZeroBasedElementNumber(
+RT_API_ATTRS std::size_t Descriptor::ZeroBasedElementNumber(
     const SubscriptValue *subscript, const int *permutation) const {
   std::size_t result{0};
   std::size_t coefficient{1};
@@ -206,7 +208,7 @@ std::size_t Descriptor::ZeroBasedElementNumber(
   return result;
 }
 
-bool Descriptor::EstablishPointerSection(const Descriptor &source,
+RT_API_ATTRS bool Descriptor::EstablishPointerSection(const Descriptor &source,
     const SubscriptValue *lower, const SubscriptValue *upper,
     const SubscriptValue *stride) {
   *this = source;
@@ -232,7 +234,7 @@ bool Descriptor::EstablishPointerSection(const Descriptor &source,
   return CFI_section(&raw_, &source.raw_, lower, upper, stride) == CFI_SUCCESS;
 }
 
-void Descriptor::Check() const {
+RT_API_ATTRS void Descriptor::Check()...
[truncated]

}

RT_API_ATTRS explicit operator bool() const {
return get() == pointer_type() ? false : true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return get() != pointer_type{}; should suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it.

RT_API_ATTRS OwningPtr& operator=(const OwningPtr &) = delete;
RT_API_ATTRS OwningPtr(OwningPtr &&other) {
ptr_ = other.ptr_;
other.ptr_ = pointer_type();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and throughout: pointer_type{} is more current C++ usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry about the style. Will fix it.


RT_API_ATTRS void Terminator::CrashHeader() const {
#if defined(RT_DEVICE_COMPILATION)
std::printf("\nfatal Fortran runtime error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fprintf(stderr, ...) here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device compilers (at least some of them) do not support stderr. In general, only a plain printf is supported on the device, e.g. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#formatted-output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, but stderr is being used in Terminator::CrashArgs() above, so I was hopeful that the limitation had been addressed. But now your host Terminator::CrashArgs() calls a CrashHeader() and CrashFooter() that write to stdout while still printing the arguments via vfprintf(stderr, ...). And I don't see an implementation of Terminator::CrashArgs() for the device, but I might just be confused by the diff here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to comment out Terminator::CrashArgs for the device compilation (with #if !defined(RT_DEVICE_COMPILATION) at line 15), since neither stderr nor va_list are supported. There are direct uses of CrashArgs in not yet enabled parts of the library, and I will have to deal with them as well.

Note that the crashHandler support is also not enabled for the device compilation. It seems to be used only for in-tree testing, so I thought it would not be a problem (at least for now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I misreading the new code? CrashArgs() calls CrashHeader() & CrashFooter() and they write to stdout. But the vfprintf() writes to stderr. Is that not a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CrashHeader and CrashFooter have different implementations for host and device. On the device they only use std::printf, and on the host they use all the other functions to write to stderr. CrashArgs does call them, but this only happens on the host, so everything is printed to stderr.

The code is different under defined(RT_DEVICE_COMPILATION) macro check.

@github-actions
Copy link

github-actions bot commented Sep 26, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor Author

@vzakhari vzakhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments, @klausler! I will fix the C++ style and formatting.

RT_API_ATTRS OwningPtr& operator=(const OwningPtr &) = delete;
RT_API_ATTRS OwningPtr(OwningPtr &&other) {
ptr_ = other.ptr_;
other.ptr_ = pointer_type();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry about the style. Will fix it.

}

RT_API_ATTRS explicit operator bool() const {
return get() == pointer_type() ? false : true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it.


RT_API_ATTRS void Terminator::CrashHeader() const {
#if defined(RT_DEVICE_COMPILATION)
std::printf("\nfatal Fortran runtime error");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device compilers (at least some of them) do not support stderr. In general, only a plain printf is supported on the device, e.g. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#formatted-output

@vzakhari vzakhari merged commit 4bdec58 into llvm:main Sep 27, 2023
3 checks passed
legrosbuffle pushed a commit to legrosbuffle/llvm-project that referenced this pull request Sep 29, 2023
)

I extended the "closure" of the device code containing the initial
transformational.cpp. The device side of the library should not be
complete at least for some APIs. For example, I tested with C OpenMP
code calling BesselJnX0 with a nullptr descriptor that failed with
a runtime error when executing on a GPU.

I added `--expt-relaxed-constexpr` for NVCC compiler to avoid multiple
warnings about missing `__attribute__((device))` on constexpr methods
coming from C++ header files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants