Skip to content

Commit b74c6d2

Browse files
committed
[InlineFunction] Disable emission of alignment assumptions by default
In D74183 clang started emitting alignment for sret parameters unconditionally. This caused a 1.5% compile-time regression on tramp3d-v4. The reason is that we now generate many instance of IR like %ptrint = ptrtoint %class.GuardLayers* %guards_m to i64 %maskedptr = and i64 %ptrint, 3 %maskcond = icmp eq i64 %maskedptr, 0 tail call void @llvm.assume(i1 %maskcond) to preserve the alignment information during inlining. Based on IR analysis, these assumptions also regress optimization. The attached phase ordering test case illustrates two issues: One are instruction count based optimization heuristics, which are affected by the four additional instructions of the assumption. The other is blocking of SROA due to ptrtoint casts (PR45763). We already encountered the same problem in Rust, where we (unlike Clang) generally prefer to emit alignment information absolutely everywhere it is available. We were only able to do this after hardcoding -preserve-alignment-assumptions-during-inlining=false, because we were seeing significant optimization and compile-time regressions otherwise. This patch disables -preserve-alignment-assumptions-during-inlining by default, because we should not be punishing people for adding more alignment annotations. Once the assume bundle work shakes out and we can represent (and use) alignment assumptions using assume bundles, it should be possible to re-enable this with reduced overhead. Differential Revision: https://reviews.llvm.org/D76886
1 parent c671345 commit b74c6d2

File tree

2 files changed

+118
-1
lines changed

2 files changed

+118
-1
lines changed

llvm/lib/Transforms/Utils/InlineFunction.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,12 @@ EnableNoAliasConversion("enable-noalias-to-md-conversion", cl::init(true),
7979
cl::Hidden,
8080
cl::desc("Convert noalias attributes to metadata during inlining."));
8181

82+
// Disabled by default, because the added alignment assumptions may increase
83+
// compile-time and block optimizations. This option is not suitable for use
84+
// with frontends that emit comprehensive parameter alignment annotations.
8285
static cl::opt<bool>
8386
PreserveAlignmentAssumptions("preserve-alignment-assumptions-during-inlining",
84-
cl::init(true), cl::Hidden,
87+
cl::init(false), cl::Hidden,
8588
cl::desc("Convert align attributes to assumptions during inlining."));
8689

8790
static cl::opt<bool> UpdateReturnAttributes(
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
2+
; RUN: opt -S -O2 -preserve-alignment-assumptions-during-inlining=0 < %s | FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-OFF,FALLBACK-0
3+
; RUN: opt -S -O2 -preserve-alignment-assumptions-during-inlining=1 < %s | FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-ON,FALLBACK-1
4+
; RUN: opt -S -O2 < %s | FileCheck %s --check-prefixes=CHECK,ASSUMPTIONS-OFF,FALLBACK-DEFAULT
5+
6+
target datalayout = "e-p:64:64-p5:32:32-A5"
7+
8+
; This illustrates an optimization difference caused by instruction counting
9+
; heuristics, which are affected by the additional instructions of the
10+
; alignment assumption.
11+
12+
define internal i1 @callee1(i1 %c, i64* align 8 %ptr) {
13+
store volatile i64 0, i64* %ptr
14+
ret i1 %c
15+
}
16+
17+
define void @caller1(i1 %c, i64* align 1 %ptr) {
18+
; ASSUMPTIONS-OFF-LABEL: @caller1(
19+
; ASSUMPTIONS-OFF-NEXT: br i1 [[C:%.*]], label [[TRUE2:%.*]], label [[FALSE2:%.*]]
20+
; ASSUMPTIONS-OFF: true2:
21+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 0, i64* [[PTR:%.*]], align 8
22+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
23+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
24+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
25+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
26+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
27+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 2, i64* [[PTR]], align 4
28+
; ASSUMPTIONS-OFF-NEXT: ret void
29+
; ASSUMPTIONS-OFF: false2:
30+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 1, i64* [[PTR]], align 4
31+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 0, i64* [[PTR]], align 8
32+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
33+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
34+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
35+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
36+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 -1, i64* [[PTR]], align 4
37+
; ASSUMPTIONS-OFF-NEXT: store volatile i64 3, i64* [[PTR]], align 4
38+
; ASSUMPTIONS-OFF-NEXT: ret void
39+
;
40+
; ASSUMPTIONS-ON-LABEL: @caller1(
41+
; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[TRUE1:%.*]], label [[FALSE1:%.*]]
42+
; ASSUMPTIONS-ON: true1:
43+
; ASSUMPTIONS-ON-NEXT: [[C_PR:%.*]] = phi i1 [ false, [[FALSE1]] ], [ true, [[TMP0:%.*]] ]
44+
; ASSUMPTIONS-ON-NEXT: [[PTRINT:%.*]] = ptrtoint i64* [[PTR:%.*]] to i64
45+
; ASSUMPTIONS-ON-NEXT: [[MASKEDPTR:%.*]] = and i64 [[PTRINT]], 7
46+
; ASSUMPTIONS-ON-NEXT: [[MASKCOND:%.*]] = icmp eq i64 [[MASKEDPTR]], 0
47+
; ASSUMPTIONS-ON-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
48+
; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8
49+
; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
50+
; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
51+
; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
52+
; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
53+
; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
54+
; ASSUMPTIONS-ON-NEXT: br i1 [[C_PR]], label [[TRUE2:%.*]], label [[FALSE2:%.*]]
55+
; ASSUMPTIONS-ON: false1:
56+
; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4
57+
; ASSUMPTIONS-ON-NEXT: br label [[TRUE1]]
58+
; ASSUMPTIONS-ON: true2:
59+
; ASSUMPTIONS-ON-NEXT: store volatile i64 2, i64* [[PTR]], align 8
60+
; ASSUMPTIONS-ON-NEXT: ret void
61+
; ASSUMPTIONS-ON: false2:
62+
; ASSUMPTIONS-ON-NEXT: store volatile i64 3, i64* [[PTR]], align 8
63+
; ASSUMPTIONS-ON-NEXT: ret void
64+
;
65+
br i1 %c, label %true1, label %false1
66+
67+
true1:
68+
%c2 = call i1 @callee1(i1 %c, i64* %ptr)
69+
store volatile i64 -1, i64* %ptr
70+
store volatile i64 -1, i64* %ptr
71+
store volatile i64 -1, i64* %ptr
72+
store volatile i64 -1, i64* %ptr
73+
store volatile i64 -1, i64* %ptr
74+
br i1 %c2, label %true2, label %false2
75+
76+
false1:
77+
store volatile i64 1, i64* %ptr
78+
br label %true1
79+
80+
true2:
81+
store volatile i64 2, i64* %ptr
82+
ret void
83+
84+
false2:
85+
store volatile i64 3, i64* %ptr
86+
ret void
87+
}
88+
89+
; This test illustrates that alignment assumptions may prevent SROA.
90+
; See PR45763.
91+
92+
define internal void @callee2(i64* noalias sret align 8 %arg) {
93+
store i64 0, i64* %arg, align 8
94+
ret void
95+
}
96+
97+
define amdgpu_kernel void @caller2() {
98+
; ASSUMPTIONS-OFF-LABEL: @caller2(
99+
; ASSUMPTIONS-OFF-NEXT: ret void
100+
;
101+
; ASSUMPTIONS-ON-LABEL: @caller2(
102+
; ASSUMPTIONS-ON-NEXT: [[ALLOCA:%.*]] = alloca i64, align 8, addrspace(5)
103+
; ASSUMPTIONS-ON-NEXT: [[CAST:%.*]] = addrspacecast i64 addrspace(5)* [[ALLOCA]] to i64*
104+
; ASSUMPTIONS-ON-NEXT: [[PTRINT:%.*]] = ptrtoint i64* [[CAST]] to i64
105+
; ASSUMPTIONS-ON-NEXT: [[MASKEDPTR:%.*]] = and i64 [[PTRINT]], 7
106+
; ASSUMPTIONS-ON-NEXT: [[MASKCOND:%.*]] = icmp eq i64 [[MASKEDPTR]], 0
107+
; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 [[MASKCOND]])
108+
; ASSUMPTIONS-ON-NEXT: ret void
109+
;
110+
%alloca = alloca i64, align 8, addrspace(5)
111+
%cast = addrspacecast i64 addrspace(5)* %alloca to i64*
112+
call void @callee2(i64* sret align 8 %cast)
113+
ret void
114+
}

0 commit comments

Comments
 (0)