[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777

grigorypas · 2025-10-30T20:24:01Z

Introduces the flatten_deep attribute as an extension to the existing
flatten attribute. While flatten only inlines immediate callsites,
flatten_deep enables inlining of function calls and their transitive
callees up to a specified depth, effectively providing full flattening
of the call tree.

The attribute takes a single unsigned integer argument representing the
maximum call tree depth to inline. For example, flatten_deep(3) will
inline all calls within a function, then inline all calls within those
inlined functions (depth 2), and then inline all calls within those
functions (depth 3).

This is part 1 of 2. The next PR will implement inlining logic in the AlwaysInliner pass

llvmbot · 2025-10-30T20:24:44Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Grigory Pastukhov (grigorypas)

Changes

Introduces the flatten_deep attribute as an extension to the existing
flatten attribute. While flatten only inlines immediate callsites,
flatten_deep enables inlining of function calls and their transitive
callees up to a specified depth, effectively providing full flattening
of the call tree.

The attribute takes a single unsigned integer argument representing the
maximum call tree depth to inline. For example, flatten_deep(3) will
inline all calls within a function, then inline all calls within those
inlined functions (depth 2), and then inline all calls within those
functions (depth 3).

This is part 1 of 3. Future PRs will:

Add a corresponding LLVM IR attribute
Implement inlining logic in the AlwaysInliner pass

Full diff: https://github.com/llvm/llvm-project/pull/165777.diff

5 Files Affected:

(modified) clang/include/clang/Basic/Attr.td (+7)
(modified) clang/include/clang/Basic/AttrDocs.td (+23)
(modified) clang/lib/Sema/SemaDeclAttr.cpp (+21)
(modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
(added) clang/test/Sema/attr-flatten-deep.c (+14)

diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 749f531ec9ab1..1ccd659e49e63 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1984,6 +1984,13 @@ def Flatten : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def FlattenDeep : InheritableAttr {
+  let Spellings = [Clang<"flatten_deep">];
+  let Subjects = SubjectList<[Function], ErrorDiag>;
+  let Args = [UnsignedArgument<"MaxDepth">];
+  let Documentation = [FlattenDeepDocs];
+}
+
 def Format : InheritableAttr {
   let Spellings = [GCC<"format">];
   let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">,
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 2fdd041c1b46e..f4280531019f5 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -4032,6 +4032,29 @@ callee is unavailable or if the callee has the ``noinline`` attribute.
   }];
 }
 
+def FlattenDeepDocs : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+The ``flatten_deep`` attribute causes calls within the attributed function and
+their transitive callees to be inlined up to a specified depth, unless it is
+impossible to do so (for example if the body of the callee is unavailable or if
+the callee has the ``noinline`` attribute).
+
+This attribute takes a single unsigned integer argument representing the maximum
+depth of the call tree to inline. For example, ``__attribute__((flatten_deep(3)))``
+will inline all calls within the function, then inline all calls within those
+inlined functions (depth 2), and then inline all calls within those functions
+(depth 3).
+
+.. code-block:: c++
+
+  __attribute__((flatten_deep(3)))
+  void process_data() {
+    // All calls up to 3 levels deep in the call tree will be inlined
+  }
+  }];
+}
+
 def FormatDocs : Documentation {
   let Category = DocCatFunction;
   let Content = [{
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 964a2a791e18f..1a78dfce6e1f3 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3695,6 +3695,24 @@ static void handleInitPriorityAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) InitPriorityAttr(S.Context, AL, prioritynum));
 }
 
+static void handleFlattenDeepAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
+  Expr *E = AL.getArgAsExpr(0);
+  uint32_t maxDepth;
+  if (!S.checkUInt32Argument(AL, E, maxDepth)) {
+    AL.setInvalid();
+    return;
+  }
+
+  if (maxDepth == 0) {
+    S.Diag(AL.getLoc(), diag::err_attribute_argument_is_zero)
+        << AL << E->getSourceRange();
+    AL.setInvalid();
+    return;
+  }
+
+  D->addAttr(::new (S.Context) FlattenDeepAttr(S.Context, AL, maxDepth));
+}
+
 ErrorAttr *Sema::mergeErrorAttr(Decl *D, const AttributeCommonInfo &CI,
                                 StringRef NewUserDiagnostic) {
   if (const auto *EA = D->getAttr<ErrorAttr>()) {
@@ -7236,6 +7254,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
   case ParsedAttr::AT_Format:
     handleFormatAttr(S, D, AL);
     break;
+  case ParsedAttr::AT_FlattenDeep:
+    handleFlattenDeepAttr(S, D, AL);
+    break;
   case ParsedAttr::AT_FormatMatches:
     handleFormatMatchesAttr(S, D, AL);
     break;
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index ab4153a64f028..da6152dbff3a5 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -86,6 +86,7 @@
 // CHECK-NEXT: ExternalSourceSymbol ((SubjectMatchRule_record, SubjectMatchRule_enum, SubjectMatchRule_enum_constant, SubjectMatchRule_field, SubjectMatchRule_function, SubjectMatchRule_namespace, SubjectMatchRule_objc_category, SubjectMatchRule_objc_implementation, SubjectMatchRule_objc_interface, SubjectMatchRule_objc_method, SubjectMatchRule_objc_property, SubjectMatchRule_objc_protocol, SubjectMatchRule_record, SubjectMatchRule_type_alias, SubjectMatchRule_variable))
 // CHECK-NEXT: FlagEnum (SubjectMatchRule_enum)
 // CHECK-NEXT: Flatten (SubjectMatchRule_function)
+// CHECK-NEXT: FlattenDeep (SubjectMatchRule_function)
 // CHECK-NEXT: FunctionReturnThunks (SubjectMatchRule_function)
 // CHECK-NEXT: GNUInline (SubjectMatchRule_function)
 // CHECK-NEXT: HIPManaged (SubjectMatchRule_variable)
diff --git a/clang/test/Sema/attr-flatten-deep.c b/clang/test/Sema/attr-flatten-deep.c
new file mode 100644
index 0000000000000..92bc792424332
--- /dev/null
+++ b/clang/test/Sema/attr-flatten-deep.c
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+
+// Test basic usage - valid
+__attribute__((flatten_deep(3)))
+void test_valid() {
+}
+
+// Test attribute on non-function - should error
+__attribute__((flatten_deep(3))) int x; // expected-error {{'flatten_deep' attribute only applies to functions}}
+
+// Test depth = 0 - should error (depth must be >= 1)
+__attribute__((flatten_deep(0))) // expected-error {{'flatten_deep' attribute must be greater than 0}}
+void test_depth_zero() {
+}

github-actions · 2025-10-31T17:44:04Z

✅ With the latest revision this PR passed the C/C++ code formatter.

boomanaiden154

What exactly is the motivation for this?

This also kind of changes the semantics of alwaysinline, which states the functions should be inlined always unless illegal. I guess this can be viewed as making inlining illegal in such circumstances, but a LangRef update would probably be good either way.

grigorypas · 2025-10-31T18:10:56Z

What exactly is the motivation for this?

This also kind of changes the semantics of alwaysinline, which states the functions should be inlined always unless illegal. I guess this can be viewed as making inlining illegal in such circumstances, but a LangRef update would probably be good either way.

Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.

efriedma-quic · 2025-10-31T18:21:58Z

I'm not sure this design is really practical for users. Trying to make the user count out the depth seems like it'll be difficult to use effectively: it's hard to count, and the user has no direct control over which calls are affected. And code refactoring is likely to break the intent.

Could you describe a bit more what led you to this particular solution?

grigorypas · 2025-10-31T18:30:15Z

I'm not sure this design is really practical for users. Trying to make the user count out the depth seems like it'll be difficult to use effectively: it's hard to count, and the user has no direct control over which calls are affected. And code refactoring is likely to break the intent.

Could you describe a bit more what led you to this particular solution?

Thank you for your feedback and for raising these concerns.
To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.

boomanaiden154 · 2025-10-31T20:03:50Z

Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.

You said patch 2 will update the alwaysinliner pass. alwaysinline has previously always inlined a function unless it was illegal to do so. You're now maybe not inlining depending on the flatten_deep attribute, which seems like a cost heuristic encoded in the IR to me.

To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.

So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions?

grigorypas · 2025-11-03T20:51:09Z

Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.

You said patch 2 will update the alwaysinliner pass. alwaysinline has previously always inlined a function unless it was illegal to do so. You're now maybe not inlining depending on the flatten_deep attribute, which seems like a cost heuristic encoded in the IR to me.

To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.

So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions?

Thank you for the feedback! Let me clarify the design:

`alwaysinline` Semantics Are Preserved

The alwaysinline semantics are not being changed. The original alwaysinline logic is applied first and takes precedence. The flatten_deep logic runs in the same pass but is applied at the end, after the standard alwaysinline processing. If a function has alwaysinline, it will be inlined according to the existing rules (unless illegal to do so), completely independent of any flatten_deep attributes.
You can see this in the suggested implementation here: https://github.com/grigorypas/llvm-project/tree/full_flattening

`flatten_deep` as a Natural Extension of `flatten`

flatten_deep(N) is a natural extension of the existing flatten attribute. While they differ in implementation, the motivation is similar:

flatten: Inlines all immediate callsites (single level) - implemented at frontend by marking direct calls with alwaysinline
flatten_deep(N): Inlines recursively/transitively up to N levels deep - requires backend support to propagate through the call tree
Importantly, full/deep flattening cannot be achieved today with existing attributes. You can't achieve transitive inlining across the entire call tree with current mechanisms.

Max Depth as a Safeguard

The max depth parameter is not a cost heuristic - it's a safety limit:

Primary use case: Complete flattening of the call tree (large N)
Max depth parameter: A safeguard to prevent compile-time explosions with unexpectedly deep call trees
This is similar to other compiler safety limits (e.g., -fconstexpr-depth=N) - we want to flatten the entire call tree in normal cases, but need a circuit breaker for pathological edge cases.

Use Case

This feature is useful for performance-critical code where eliminating call overhead across the entire call tree is beneficial, such as:

Deeply nested hot paths in performance-sensitive applications
PGO scenarios with stale profiles: When adding new functions to hot paths, flatten_deep(N) may help where default bottom-up inlining decisions rely on incomplete or stale profile data
Does this clarification address your concerns?

boomanaiden154 · 2025-11-03T21:42:39Z

Does this clarification address your concerns?

Around alwaysinline semantics, yes. Thanks for the clarification. That seems much more reasonable.

The max depth parameter is not a cost heuristic - it's a safety limit:

Whatever you want to call it, it doesn't impact correctness and is thus a form of cost heuristic. Whether it is a heuristic around runtime cost, code size cost, or compile time cost, it's still a cost heuristic.

PGO scenarios with stale profiles: When adding new functions to hot paths, flatten_deep(N) may help where default bottom-up inlining decisions rely on incomplete or stale profile data

This sounds like a nightmare to maintain. You probably want to remove it a new profile can be collected so the inliner can actually make proper decisions around cache pressure/call overhead removal/code folding due to inlining as this could very easily blow up icache pressure. Finding a good value that balances compile times and performance would also require tuning making maintenance even more difficult.

grigorypas · 2025-11-04T00:07:28Z

Does this clarification address your concerns?

Around alwaysinline semantics, yes. Thanks for the clarification. That seems much more reasonable.

The max depth parameter is not a cost heuristic - it's a safety limit:

Whatever you want to call it, it doesn't impact correctness and is thus a form of cost heuristic. Whether it is a heuristic around runtime cost, code size cost, or compile time cost, it's still a cost heuristic.

PGO scenarios with stale profiles: When adding new functions to hot paths, flatten_deep(N) may help where default bottom-up inlining decisions rely on incomplete or stale profile data

This sounds like a nightmare to maintain. You probably want to remove it a new profile can be collected so the inliner can actually make proper decisions around cache pressure/call overhead removal/code folding due to inlining as this could very easily blow up icache pressure. Finding a good value that balances compile times and performance would also require tuning making maintenance even more difficult.

Fair point on the max depth parameter - you're right that it's a cost heuristic in the sense that it's a compile-time cost consideration rather than a correctness issue. The key distinction I wanted to make is that it's not encoding runtime performance heuristics into the IR (like "inline if hot" or "inline if small"), but rather a compile-time safety mechanism. But I understand your characterization.

Regarding the PGO and stale profile concerns - I appreciate the point about maintenance complexity. Let me provide some additional context on why this can be useful:

In sampling-based PGO (AutoFDO) workflows, profiles are often collected continuously from production, which means they're inherently somewhat stale since they're based on previous builds. This is actually beneficial for reducing release latency and broader FDO adoption, but it does create a challenge when new functions are added to hot paths - we consistently see regressions in these cases. Without this feature, the only option is waiting for the next release cycle to get proper profile coverage, which can result in noticeable performance loss.

For these scenarios, flatten_deep(N) isn't intended as a permanent solution - it can be automatically cleaned up when profiles are refreshed (e.g., via automated codemod rules). It's more of a tactical tool for specific performance-critical paths rather than a blanket strategy.

boomanaiden154 · 2025-11-04T00:50:09Z

Let me provide some additional context on why this can be useful:

I'm familiar with how AutoFDO pipelines work and assumed that was your use case. I still don't think this is a very good solution. I feel like you're going to be much better off just rolling the profile. You can also selectively add flatten at certain points in the callstack.

I'm not going to block this, but I would like more agreement that this is a good idea and effective solution to the problem at hand, ideally through an RFC.

WenleiHe · 2025-11-04T07:24:26Z

but I would like more agreement that this is a good idea

This seems like a natural extension of existing flatten attribute.

Deeply nested hot paths in performance-sensitive applications

I can see this being useful -- for some perf critical path, sometimes it's desirable to inline everything and avoid calls. Without such extension, this currently cannot be achieved reliably without side effects outside of a particular call tree.

That said, I agree that using this to handle PGO stale profile may not be sustainable -- i won't do that.

But disregard the PGO use case mentioned above, I think having the extension just for perf critical path is enough of a justification.

FWIW, I've personally used existing flatten attempting to flatten the entire call tree for critical path, but only to realize it only inlines one level. I'd argue that the literal meaning of flatten is reduce to one level, not reduce by level. I think this extension at least would fill the gap for such use case to reduce to one level.

Add flatten_deep attribute to clang

f841323

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 30, 2025

grigorypas requested a review from WenleiHe October 30, 2025 20:24

grigorypas requested review from apolloww and wlei-llvm October 30, 2025 20:24

wlei-llvm requested a review from pcc October 30, 2025 21:42

Add flatten_deep attribute in LLVM

b92b7cf

llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir llvm:transforms labels Oct 31, 2025

grigorypas changed the title ~~[clang] Add flatten_deep attribute for depth-limited inlining (1/3)~~ [clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) Oct 31, 2025

Fix format issue

afe4616

boomanaiden154 reviewed Oct 31, 2025

View reviewed changes

Merge branch 'main' into add_flatten_deep_to_clang

138944c

WenleiHe requested a review from yuxuanchen1997 November 4, 2025 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777

[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777

grigorypas commented Oct 30, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

boomanaiden154 left a comment

Uh oh!

grigorypas commented Oct 31, 2025

Uh oh!

efriedma-quic commented Oct 31, 2025

Uh oh!

grigorypas commented Oct 31, 2025

Uh oh!

boomanaiden154 commented Oct 31, 2025

Uh oh!

grigorypas commented Nov 3, 2025

Uh oh!

boomanaiden154 commented Nov 3, 2025

Uh oh!

grigorypas commented Nov 4, 2025

Uh oh!

boomanaiden154 commented Nov 4, 2025

Uh oh!

WenleiHe commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777

Are you sure you want to change the base?

[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777

Conversation

grigorypas commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

grigorypas commented Oct 31, 2025

Uh oh!

efriedma-quic commented Oct 31, 2025

Uh oh!

grigorypas commented Oct 31, 2025

Uh oh!

boomanaiden154 commented Oct 31, 2025

Uh oh!

grigorypas commented Nov 3, 2025

alwaysinline Semantics Are Preserved

flatten_deep as a Natural Extension of flatten

Max Depth as a Safeguard

Use Case

Uh oh!

boomanaiden154 commented Nov 3, 2025

Uh oh!

grigorypas commented Nov 4, 2025

Uh oh!

boomanaiden154 commented Nov 4, 2025

Uh oh!

WenleiHe commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

grigorypas commented Oct 30, 2025 •

edited

Loading

llvmbot commented Oct 30, 2025 •

edited

Loading

github-actions bot commented Oct 31, 2025 •

edited

Loading

`alwaysinline` Semantics Are Preserved

`flatten_deep` as a Natural Extension of `flatten`