Skip to content

Conversation

@grigorypas
Copy link
Member

@grigorypas grigorypas commented Oct 30, 2025

Introduces the flatten_deep attribute as an extension to the existing
flatten attribute. While flatten only inlines immediate callsites,
flatten_deep enables inlining of function calls and their transitive
callees up to a specified depth, effectively providing full flattening
of the call tree.

The attribute takes a single unsigned integer argument representing the
maximum call tree depth to inline. For example, flatten_deep(3) will
inline all calls within a function, then inline all calls within those
inlined functions (depth 2), and then inline all calls within those
functions (depth 3).

This is part 1 of 2. The next PR will implement inlining logic in the AlwaysInliner pass

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 30, 2025
@grigorypas grigorypas requested a review from WenleiHe October 30, 2025 20:24
@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Grigory Pastukhov (grigorypas)

Changes

Introduces the flatten_deep attribute as an extension to the existing
flatten attribute. While flatten only inlines immediate callsites,
flatten_deep enables inlining of function calls and their transitive
callees up to a specified depth, effectively providing full flattening
of the call tree.

The attribute takes a single unsigned integer argument representing the
maximum call tree depth to inline. For example, flatten_deep(3) will
inline all calls within a function, then inline all calls within those
inlined functions (depth 2), and then inline all calls within those
functions (depth 3).

This is part 1 of 3. Future PRs will:

  • Add a corresponding LLVM IR attribute
  • Implement inlining logic in the AlwaysInliner pass

Full diff: https://github.com/llvm/llvm-project/pull/165777.diff

5 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+7)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+23)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+21)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (added) clang/test/Sema/attr-flatten-deep.c (+14)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 749f531ec9ab1..1ccd659e49e63 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1984,6 +1984,13 @@ def Flatten : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def FlattenDeep : InheritableAttr {
+  let Spellings = [Clang<"flatten_deep">];
+  let Subjects = SubjectList<[Function], ErrorDiag>;
+  let Args = [UnsignedArgument<"MaxDepth">];
+  let Documentation = [FlattenDeepDocs];
+}
+
 def Format : InheritableAttr {
   let Spellings = [GCC<"format">];
   let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">,
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 2fdd041c1b46e..f4280531019f5 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -4032,6 +4032,29 @@ callee is unavailable or if the callee has the ``noinline`` attribute.
   }];
 }
 
+def FlattenDeepDocs : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+The ``flatten_deep`` attribute causes calls within the attributed function and
+their transitive callees to be inlined up to a specified depth, unless it is
+impossible to do so (for example if the body of the callee is unavailable or if
+the callee has the ``noinline`` attribute).
+
+This attribute takes a single unsigned integer argument representing the maximum
+depth of the call tree to inline. For example, ``__attribute__((flatten_deep(3)))``
+will inline all calls within the function, then inline all calls within those
+inlined functions (depth 2), and then inline all calls within those functions
+(depth 3).
+
+.. code-block:: c++
+
+  __attribute__((flatten_deep(3)))
+  void process_data() {
+    // All calls up to 3 levels deep in the call tree will be inlined
+  }
+  }];
+}
+
 def FormatDocs : Documentation {
   let Category = DocCatFunction;
   let Content = [{
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 964a2a791e18f..1a78dfce6e1f3 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3695,6 +3695,24 @@ static void handleInitPriorityAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) InitPriorityAttr(S.Context, AL, prioritynum));
 }
 
+static void handleFlattenDeepAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
+  Expr *E = AL.getArgAsExpr(0);
+  uint32_t maxDepth;
+  if (!S.checkUInt32Argument(AL, E, maxDepth)) {
+    AL.setInvalid();
+    return;
+  }
+
+  if (maxDepth == 0) {
+    S.Diag(AL.getLoc(), diag::err_attribute_argument_is_zero)
+        << AL << E->getSourceRange();
+    AL.setInvalid();
+    return;
+  }
+
+  D->addAttr(::new (S.Context) FlattenDeepAttr(S.Context, AL, maxDepth));
+}
+
 ErrorAttr *Sema::mergeErrorAttr(Decl *D, const AttributeCommonInfo &CI,
                                 StringRef NewUserDiagnostic) {
   if (const auto *EA = D->getAttr<ErrorAttr>()) {
@@ -7236,6 +7254,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
   case ParsedAttr::AT_Format:
     handleFormatAttr(S, D, AL);
     break;
+  case ParsedAttr::AT_FlattenDeep:
+    handleFlattenDeepAttr(S, D, AL);
+    break;
   case ParsedAttr::AT_FormatMatches:
     handleFormatMatchesAttr(S, D, AL);
     break;
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index ab4153a64f028..da6152dbff3a5 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -86,6 +86,7 @@
 // CHECK-NEXT: ExternalSourceSymbol ((SubjectMatchRule_record, SubjectMatchRule_enum, SubjectMatchRule_enum_constant, SubjectMatchRule_field, SubjectMatchRule_function, SubjectMatchRule_namespace, SubjectMatchRule_objc_category, SubjectMatchRule_objc_implementation, SubjectMatchRule_objc_interface, SubjectMatchRule_objc_method, SubjectMatchRule_objc_property, SubjectMatchRule_objc_protocol, SubjectMatchRule_record, SubjectMatchRule_type_alias, SubjectMatchRule_variable))
 // CHECK-NEXT: FlagEnum (SubjectMatchRule_enum)
 // CHECK-NEXT: Flatten (SubjectMatchRule_function)
+// CHECK-NEXT: FlattenDeep (SubjectMatchRule_function)
 // CHECK-NEXT: FunctionReturnThunks (SubjectMatchRule_function)
 // CHECK-NEXT: GNUInline (SubjectMatchRule_function)
 // CHECK-NEXT: HIPManaged (SubjectMatchRule_variable)
diff --git a/clang/test/Sema/attr-flatten-deep.c b/clang/test/Sema/attr-flatten-deep.c
new file mode 100644
index 0000000000000..92bc792424332
--- /dev/null
+++ b/clang/test/Sema/attr-flatten-deep.c
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+
+// Test basic usage - valid
+__attribute__((flatten_deep(3)))
+void test_valid() {
+}
+
+// Test attribute on non-function - should error
+__attribute__((flatten_deep(3))) int x; // expected-error {{'flatten_deep' attribute only applies to functions}}
+
+// Test depth = 0 - should error (depth must be >= 1)
+__attribute__((flatten_deep(0))) // expected-error {{'flatten_deep' attribute must be greater than 0}}
+void test_depth_zero() {
+}

@wlei-llvm wlei-llvm requested a review from pcc October 30, 2025 21:42
@llvmbot llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir llvm:transforms labels Oct 31, 2025
@grigorypas grigorypas changed the title [clang] Add flatten_deep attribute for depth-limited inlining (1/3) [clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) Oct 31, 2025
@github-actions
Copy link

github-actions bot commented Oct 31, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the motivation for this?

This also kind of changes the semantics of alwaysinline, which states the functions should be inlined always unless illegal. I guess this can be viewed as making inlining illegal in such circumstances, but a LangRef update would probably be good either way.

@grigorypas
Copy link
Member Author

What exactly is the motivation for this?

This also kind of changes the semantics of alwaysinline, which states the functions should be inlined always unless illegal. I guess this can be viewed as making inlining illegal in such circumstances, but a LangRef update would probably be good either way.

Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.

@efriedma-quic
Copy link
Collaborator

I'm not sure this design is really practical for users. Trying to make the user count out the depth seems like it'll be difficult to use effectively: it's hard to count, and the user has no direct control over which calls are affected. And code refactoring is likely to break the intent.

Could you describe a bit more what led you to this particular solution?

@grigorypas
Copy link
Member Author

I'm not sure this design is really practical for users. Trying to make the user count out the depth seems like it'll be difficult to use effectively: it's hard to count, and the user has no direct control over which calls are affected. And code refactoring is likely to break the intent.

Could you describe a bit more what led you to this particular solution?

Thank you for your feedback and for raising these concerns.
To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.

@boomanaiden154
Copy link
Contributor

Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.

You said patch 2 will update the alwaysinliner pass. alwaysinline has previously always inlined a function unless it was illegal to do so. You're now maybe not inlining depending on the flatten_deep attribute, which seems like a cost heuristic encoded in the IR to me.

To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.

So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions?

@grigorypas
Copy link
Member Author

Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.

You said patch 2 will update the alwaysinliner pass. alwaysinline has previously always inlined a function unless it was illegal to do so. You're now maybe not inlining depending on the flatten_deep attribute, which seems like a cost heuristic encoded in the IR to me.

To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.

So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions?

Thank you for the feedback! Let me clarify the design:

alwaysinline Semantics Are Preserved

The alwaysinline semantics are not being changed. The original alwaysinline logic is applied first and takes precedence. The flatten_deep logic runs in the same pass but is applied at the end, after the standard alwaysinline processing. If a function has alwaysinline, it will be inlined according to the existing rules (unless illegal to do so), completely independent of any flatten_deep attributes.
You can see this in the suggested implementation here: https://github.com/grigorypas/llvm-project/tree/full_flattening

flatten_deep as a Natural Extension of flatten

flatten_deep(N) is a natural extension of the existing flatten attribute. While they differ in implementation, the motivation is similar:

  • flatten: Inlines all immediate callsites (single level) - implemented at frontend by marking direct calls with alwaysinline
  • flatten_deep(N): Inlines recursively/transitively up to N levels deep - requires backend support to propagate through the call tree
    Importantly, full/deep flattening cannot be achieved today with existing attributes. You can't achieve transitive inlining across the entire call tree with current mechanisms.

Max Depth as a Safeguard

The max depth parameter is not a cost heuristic - it's a safety limit:

  • Primary use case: Complete flattening of the call tree (large N)
  • Max depth parameter: A safeguard to prevent compile-time explosions with unexpectedly deep call trees
    This is similar to other compiler safety limits (e.g., -fconstexpr-depth=N) - we want to flatten the entire call tree in normal cases, but need a circuit breaker for pathological edge cases.

Use Case

This feature is useful for performance-critical code where eliminating call overhead across the entire call tree is beneficial, such as:

  • Deeply nested hot paths in performance-sensitive applications
  • PGO scenarios with stale profiles: When adding new functions to hot paths, flatten_deep(N) may help where default bottom-up inlining decisions rely on incomplete or stale profile data
    Does this clarification address your concerns?

@boomanaiden154
Copy link
Contributor

Does this clarification address your concerns?

Around alwaysinline semantics, yes. Thanks for the clarification. That seems much more reasonable.

The max depth parameter is not a cost heuristic - it's a safety limit:

Whatever you want to call it, it doesn't impact correctness and is thus a form of cost heuristic. Whether it is a heuristic around runtime cost, code size cost, or compile time cost, it's still a cost heuristic.

PGO scenarios with stale profiles: When adding new functions to hot paths, flatten_deep(N) may help where default bottom-up inlining decisions rely on incomplete or stale profile data

This sounds like a nightmare to maintain. You probably want to remove it a new profile can be collected so the inliner can actually make proper decisions around cache pressure/call overhead removal/code folding due to inlining as this could very easily blow up icache pressure. Finding a good value that balances compile times and performance would also require tuning making maintenance even more difficult.

@grigorypas
Copy link
Member Author

Does this clarification address your concerns?

Around alwaysinline semantics, yes. Thanks for the clarification. That seems much more reasonable.

The max depth parameter is not a cost heuristic - it's a safety limit:

Whatever you want to call it, it doesn't impact correctness and is thus a form of cost heuristic. Whether it is a heuristic around runtime cost, code size cost, or compile time cost, it's still a cost heuristic.

PGO scenarios with stale profiles: When adding new functions to hot paths, flatten_deep(N) may help where default bottom-up inlining decisions rely on incomplete or stale profile data

This sounds like a nightmare to maintain. You probably want to remove it a new profile can be collected so the inliner can actually make proper decisions around cache pressure/call overhead removal/code folding due to inlining as this could very easily blow up icache pressure. Finding a good value that balances compile times and performance would also require tuning making maintenance even more difficult.

Fair point on the max depth parameter - you're right that it's a cost heuristic in the sense that it's a compile-time cost consideration rather than a correctness issue. The key distinction I wanted to make is that it's not encoding runtime performance heuristics into the IR (like "inline if hot" or "inline if small"), but rather a compile-time safety mechanism. But I understand your characterization.

Regarding the PGO and stale profile concerns - I appreciate the point about maintenance complexity. Let me provide some additional context on why this can be useful:

In sampling-based PGO (AutoFDO) workflows, profiles are often collected continuously from production, which means they're inherently somewhat stale since they're based on previous builds. This is actually beneficial for reducing release latency and broader FDO adoption, but it does create a challenge when new functions are added to hot paths - we consistently see regressions in these cases. Without this feature, the only option is waiting for the next release cycle to get proper profile coverage, which can result in noticeable performance loss.

For these scenarios, flatten_deep(N) isn't intended as a permanent solution - it can be automatically cleaned up when profiles are refreshed (e.g., via automated codemod rules). It's more of a tactical tool for specific performance-critical paths rather than a blanket strategy.

@boomanaiden154
Copy link
Contributor

Let me provide some additional context on why this can be useful:

I'm familiar with how AutoFDO pipelines work and assumed that was your use case. I still don't think this is a very good solution. I feel like you're going to be much better off just rolling the profile. You can also selectively add flatten at certain points in the callstack.

I'm not going to block this, but I would like more agreement that this is a good idea and effective solution to the problem at hand, ideally through an RFC.

@WenleiHe
Copy link
Member

WenleiHe commented Nov 4, 2025

but I would like more agreement that this is a good idea

This seems like a natural extension of existing flatten attribute.

Deeply nested hot paths in performance-sensitive applications

I can see this being useful -- for some perf critical path, sometimes it's desirable to inline everything and avoid calls. Without such extension, this currently cannot be achieved reliably without side effects outside of a particular call tree.

That said, I agree that using this to handle PGO stale profile may not be sustainable -- i won't do that.

But disregard the PGO use case mentioned above, I think having the extension just for perf critical path is enough of a justification.

FWIW, I've personally used existing flatten attempting to flatten the entire call tree for critical path, but only to realize it only inlines one level. I'd argue that the literal meaning of flatten is reduce to one level, not reduce by level. I think this extension at least would fill the gap for such use case to reduce to one level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:ir llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants