Skip to content

[flang][runtime] Added noinline for some functions in device build. #93128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 23, 2024

Conversation

vzakhari
Copy link
Contributor

This helps reducing the compilation time spent by the device
compiler optimizer and then the code generator. Since F18 RT
is going to be distributed as LLVM BC for some targets
(same way as LLVM liboffload device library is distributed)
and linked to the user offload code, the compilation time
of the produced LLVM BC will be critical.

This helps reducing the compilation time spent by the device
compiler optimizer and then the code generator. Since F18 RT
is going to be distributed as LLVM BC for some targets
(same way as LLVM liboffload device library is distributed)
and linked to the user offload code, the compilation time
of the produced LLVM BC will be critical.
@vzakhari vzakhari requested a review from klausler May 23, 2024 03:17
@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category labels May 23, 2024
@llvmbot
Copy link
Member

llvmbot commented May 23, 2024

@llvm/pr-subscribers-flang-runtime

Author: Slava Zakharin (vzakhari)

Changes

This helps reducing the compilation time spent by the device
compiler optimizer and then the code generator. Since F18 RT
is going to be distributed as LLVM BC for some targets
(same way as LLVM liboffload device library is distributed)
and linked to the user offload code, the compilation time
of the produced LLVM BC will be critical.


Full diff: https://github.com/llvm/llvm-project/pull/93128.diff

3 Files Affected:

  • (modified) flang/include/flang/Common/api-attrs.h (+22)
  • (modified) flang/include/flang/Common/visit.h (+3-2)
  • (modified) flang/runtime/terminator.h (+1-1)
diff --git a/flang/include/flang/Common/api-attrs.h b/flang/include/flang/Common/api-attrs.h
index 04ee307326ac9..d73e60996bc81 100644
--- a/flang/include/flang/Common/api-attrs.h
+++ b/flang/include/flang/Common/api-attrs.h
@@ -156,4 +156,26 @@
 #define RT_DIAG_DISABLE_CALL_HOST_FROM_DEVICE_WARN
 #endif /* !defined(__CUDACC__) */
 
+/*
+ * RT_DEVICE_NOINLINE may be used for non-performance critical
+ * functions that should not be inlined to minimize the amount
+ * of code that needs to be processed by the device compiler's
+ * optimizer.
+ */
+#ifndef __has_attribute
+#define __has_attribute(x) 0
+#endif
+#if __has_attribute(noinline)
+#define RT_NOINLINE_ATTR __attribute__((noinline))
+#else
+#define RT_NOINLINE_ATTR
+#endif
+#if (defined(__CUDACC__) || defined(__CUDA__)) && defined(__CUDA_ARCH__)
+#define RT_DEVICE_NOINLINE RT_NOINLINE_ATTR
+#define RT_DEVICE_NOINLINE_HOST_INLINE
+#else
+#define RT_DEVICE_NOINLINE
+#define RT_DEVICE_NOINLINE_HOST_INLINE inline
+#endif
+
 #endif /* !FORTRAN_RUNTIME_API_ATTRS_H_ */
diff --git a/flang/include/flang/Common/visit.h b/flang/include/flang/Common/visit.h
index d867338be7e0f..6d51e47522882 100644
--- a/flang/include/flang/Common/visit.h
+++ b/flang/include/flang/Common/visit.h
@@ -30,7 +30,7 @@ namespace log2visit {
 
 template <std::size_t LOW, std::size_t HIGH, typename RESULT, typename VISITOR,
     typename... VARIANT>
-inline RT_API_ATTRS RESULT Log2VisitHelper(
+RT_DEVICE_NOINLINE_HOST_INLINE RT_API_ATTRS RESULT Log2VisitHelper(
     VISITOR &&visitor, std::size_t which, VARIANT &&...u) {
   if constexpr (LOW + 7 >= HIGH) {
     switch (which - LOW) {
@@ -68,7 +68,8 @@ inline RT_API_ATTRS RESULT Log2VisitHelper(
 }
 
 template <typename VISITOR, typename... VARIANT>
-inline RT_API_ATTRS auto visit(VISITOR &&visitor, VARIANT &&...u)
+RT_DEVICE_NOINLINE_HOST_INLINE RT_API_ATTRS auto visit(
+    VISITOR &&visitor, VARIANT &&...u)
     -> decltype(visitor(std::get<0>(std::forward<VARIANT>(u))...)) {
   using Result = decltype(visitor(std::get<0>(std::forward<VARIANT>(u))...));
   if constexpr (sizeof...(u) == 1) {
diff --git a/flang/runtime/terminator.h b/flang/runtime/terminator.h
index 59a47ce93e7c9..609f059d6e092 100644
--- a/flang/runtime/terminator.h
+++ b/flang/runtime/terminator.h
@@ -54,7 +54,7 @@ class Terminator {
   // to regular printf for the device compilation.
   // Try to keep the inline implementations as small as possible.
   template <typename... Args>
-  [[noreturn]] RT_API_ATTRS const char *Crash(
+  [[noreturn]] RT_DEVICE_NOINLINE RT_API_ATTRS const char *Crash(
       const char *message, Args... args) const {
 #if !defined(RT_DEVICE_COMPILATION)
     // Invoke handler set up by the test harness.

Copy link

github-actions bot commented May 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@vzakhari vzakhari merged commit 208544f into llvm:main May 23, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants