- 
                Notifications
    
You must be signed in to change notification settings  - Fork 15.1k
 
[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
          
 @llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang Author: Grigory Pastukhov (grigorypas) ChangesIntroduces the  The attribute takes a single unsigned integer argument representing the This is part 1 of 3. Future PRs will: 
 Full diff: https://github.com/llvm/llvm-project/pull/165777.diff 5 Files Affected: 
 diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 749f531ec9ab1..1ccd659e49e63 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1984,6 +1984,13 @@ def Flatten : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def FlattenDeep : InheritableAttr {
+  let Spellings = [Clang<"flatten_deep">];
+  let Subjects = SubjectList<[Function], ErrorDiag>;
+  let Args = [UnsignedArgument<"MaxDepth">];
+  let Documentation = [FlattenDeepDocs];
+}
+
 def Format : InheritableAttr {
   let Spellings = [GCC<"format">];
   let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">,
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 2fdd041c1b46e..f4280531019f5 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -4032,6 +4032,29 @@ callee is unavailable or if the callee has the ``noinline`` attribute.
   }];
 }
 
+def FlattenDeepDocs : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+The ``flatten_deep`` attribute causes calls within the attributed function and
+their transitive callees to be inlined up to a specified depth, unless it is
+impossible to do so (for example if the body of the callee is unavailable or if
+the callee has the ``noinline`` attribute).
+
+This attribute takes a single unsigned integer argument representing the maximum
+depth of the call tree to inline. For example, ``__attribute__((flatten_deep(3)))``
+will inline all calls within the function, then inline all calls within those
+inlined functions (depth 2), and then inline all calls within those functions
+(depth 3).
+
+.. code-block:: c++
+
+  __attribute__((flatten_deep(3)))
+  void process_data() {
+    // All calls up to 3 levels deep in the call tree will be inlined
+  }
+  }];
+}
+
 def FormatDocs : Documentation {
   let Category = DocCatFunction;
   let Content = [{
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 964a2a791e18f..1a78dfce6e1f3 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3695,6 +3695,24 @@ static void handleInitPriorityAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) InitPriorityAttr(S.Context, AL, prioritynum));
 }
 
+static void handleFlattenDeepAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
+  Expr *E = AL.getArgAsExpr(0);
+  uint32_t maxDepth;
+  if (!S.checkUInt32Argument(AL, E, maxDepth)) {
+    AL.setInvalid();
+    return;
+  }
+
+  if (maxDepth == 0) {
+    S.Diag(AL.getLoc(), diag::err_attribute_argument_is_zero)
+        << AL << E->getSourceRange();
+    AL.setInvalid();
+    return;
+  }
+
+  D->addAttr(::new (S.Context) FlattenDeepAttr(S.Context, AL, maxDepth));
+}
+
 ErrorAttr *Sema::mergeErrorAttr(Decl *D, const AttributeCommonInfo &CI,
                                 StringRef NewUserDiagnostic) {
   if (const auto *EA = D->getAttr<ErrorAttr>()) {
@@ -7236,6 +7254,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
   case ParsedAttr::AT_Format:
     handleFormatAttr(S, D, AL);
     break;
+  case ParsedAttr::AT_FlattenDeep:
+    handleFlattenDeepAttr(S, D, AL);
+    break;
   case ParsedAttr::AT_FormatMatches:
     handleFormatMatchesAttr(S, D, AL);
     break;
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index ab4153a64f028..da6152dbff3a5 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -86,6 +86,7 @@
 // CHECK-NEXT: ExternalSourceSymbol ((SubjectMatchRule_record, SubjectMatchRule_enum, SubjectMatchRule_enum_constant, SubjectMatchRule_field, SubjectMatchRule_function, SubjectMatchRule_namespace, SubjectMatchRule_objc_category, SubjectMatchRule_objc_implementation, SubjectMatchRule_objc_interface, SubjectMatchRule_objc_method, SubjectMatchRule_objc_property, SubjectMatchRule_objc_protocol, SubjectMatchRule_record, SubjectMatchRule_type_alias, SubjectMatchRule_variable))
 // CHECK-NEXT: FlagEnum (SubjectMatchRule_enum)
 // CHECK-NEXT: Flatten (SubjectMatchRule_function)
+// CHECK-NEXT: FlattenDeep (SubjectMatchRule_function)
 // CHECK-NEXT: FunctionReturnThunks (SubjectMatchRule_function)
 // CHECK-NEXT: GNUInline (SubjectMatchRule_function)
 // CHECK-NEXT: HIPManaged (SubjectMatchRule_variable)
diff --git a/clang/test/Sema/attr-flatten-deep.c b/clang/test/Sema/attr-flatten-deep.c
new file mode 100644
index 0000000000000..92bc792424332
--- /dev/null
+++ b/clang/test/Sema/attr-flatten-deep.c
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+
+// Test basic usage - valid
+__attribute__((flatten_deep(3)))
+void test_valid() {
+}
+
+// Test attribute on non-function - should error
+__attribute__((flatten_deep(3))) int x; // expected-error {{'flatten_deep' attribute only applies to functions}}
+
+// Test depth = 0 - should error (depth must be >= 1)
+__attribute__((flatten_deep(0))) // expected-error {{'flatten_deep' attribute must be greater than 0}}
+void test_depth_zero() {
+}
 | 
    
| 
          
 ✅ With the latest revision this PR passed the C/C++ code formatter.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the motivation for this?
This also kind of changes the semantics of alwaysinline, which states the functions should be inlined always unless illegal. I guess this can be viewed as making inlining illegal in such circumstances, but a LangRef update would probably be good either way.
          
 Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side.   | 
    
| 
           I'm not sure this design is really practical for users. Trying to make the user count out the depth seems like it'll be difficult to use effectively: it's hard to count, and the user has no direct control over which calls are affected. And code refactoring is likely to break the intent. Could you describe a bit more what led you to this particular solution?  | 
    
          
 Thank you for your feedback and for raising these concerns.  | 
    
          
 You said patch 2 will update the alwaysinliner pass.  
 So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions?  | 
    
          
 Thank you for the feedback! Let me clarify the design: 
 | 
    
          
 Around  
 Whatever you want to call it, it doesn't impact correctness and is thus a form of cost heuristic. Whether it is a heuristic around runtime cost, code size cost, or compile time cost, it's still a cost heuristic. 
 This sounds like a nightmare to maintain. You probably want to remove it a new profile can be collected so the inliner can actually make proper decisions around cache pressure/call overhead removal/code folding due to inlining as this could very easily blow up icache pressure. Finding a good value that balances compile times and performance would also require tuning making maintenance even more difficult.  | 
    
          
 Fair point on the max depth parameter - you're right that it's a cost heuristic in the sense that it's a compile-time cost consideration rather than a correctness issue. The key distinction I wanted to make is that it's not encoding runtime performance heuristics into the IR (like "inline if hot" or "inline if small"), but rather a compile-time safety mechanism. But I understand your characterization. Regarding the PGO and stale profile concerns - I appreciate the point about maintenance complexity. Let me provide some additional context on why this can be useful: In sampling-based PGO (AutoFDO) workflows, profiles are often collected continuously from production, which means they're inherently somewhat stale since they're based on previous builds. This is actually beneficial for reducing release latency and broader FDO adoption, but it does create a challenge when new functions are added to hot paths - we consistently see regressions in these cases. Without this feature, the only option is waiting for the next release cycle to get proper profile coverage, which can result in noticeable performance loss. For these scenarios,   | 
    
          
 I'm familiar with how AutoFDO pipelines work and assumed that was your use case. I still don't think this is a very good solution. I feel like you're going to be much better off just rolling the profile. You can also selectively add  I'm not going to block this, but I would like more agreement that this is a good idea and effective solution to the problem at hand, ideally through an RFC.  | 
    
          
 This seems like a natural extension of existing flatten attribute. 
 I can see this being useful -- for some perf critical path, sometimes it's desirable to inline everything and avoid calls. Without such extension, this currently cannot be achieved reliably without side effects outside of a particular call tree. That said, I agree that using this to handle PGO stale profile may not be sustainable -- i won't do that. But disregard the PGO use case mentioned above, I think having the extension just for perf critical path is enough of a justification. FWIW, I've personally used existing flatten attempting to flatten the entire call tree for critical path, but only to realize it only inlines one level. I'd argue that the literal meaning of flatten is reduce to one level, not reduce by level. I think this extension at least would fill the gap for such use case to reduce to one level.  | 
    
Introduces the
flatten_deepattribute as an extension to the existingflattenattribute. Whileflattenonly inlines immediate callsites,flatten_deepenables inlining of function calls and their transitivecallees up to a specified depth, effectively providing full flattening
of the call tree.
The attribute takes a single unsigned integer argument representing the
maximum call tree depth to inline. For example, flatten_deep(3) will
inline all calls within a function, then inline all calls within those
inlined functions (depth 2), and then inline all calls within those
functions (depth 3).
This is part 1 of 2. The next PR will implement inlining logic in the AlwaysInliner pass