-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer #88469
Conversation
…IT_DEF It does not make sense to scalarize G_FREEZE as it leads to the generation of pairs of G_UNMERGE_VALUES and G_BUILD_VECTORs which are difficult to optimize especially when operations like G_TRUNC operate before G_FREEZE but after G_UNMERGE_VALUES. Instead, it is better to legalize G_FREEZE like any other vector type would be, as it gets lowered to a COPY during instruction selection anyways. This is an issue that was encountered when looking at the TSVC benchmark, where the legalization of G_FREEZE would cause generation of unnecessary MOVs that adversely affected the performance.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-aarch64 Author: Dhruv Chawla (dc03-work) ChangesIt does not make sense to scalarize G_FREEZE as it leads to the generation Instead, it is better to legalize G_FREEZE like any other vector type This is an issue that was encountered when looking at the TSVC Full diff: https://github.com/llvm/llvm-project/pull/88469.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 96ded69905f7cc..bae999c8fd61a7 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -87,8 +87,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
const bool HasCSSC = ST.hasCSSC();
const bool HasRCPC3 = ST.hasRCPC3();
- getActionDefinitionsBuilder(
- {G_IMPLICIT_DEF, G_FREEZE, G_CONSTANT_FOLD_BARRIER})
+ getActionDefinitionsBuilder({G_IMPLICIT_DEF, G_CONSTANT_FOLD_BARRIER})
.legalFor({p0, s8, s16, s32, s64})
.legalFor(PackedVectorAllTypeList)
.widenScalarToNextPow2(0)
@@ -106,6 +105,18 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
return std::make_pair(0, EltTy);
});
+ getActionDefinitionsBuilder(G_FREEZE)
+ .legalFor({p0, s8, s16, s32, s64})
+ .legalFor(PackedVectorAllTypeList)
+ .widenScalarToNextPow2(0)
+ .clampScalar(0, s8, s64)
+ .moreElementsToNextPow2(0)
+ .widenVectorEltsToVectorMinSize(0, 64)
+ .clampNumElements(0, v8s8, v16s8)
+ .clampNumElements(0, v4s16, v8s16)
+ .clampNumElements(0, v2s32, v4s32)
+ .clampNumElements(0, v2s64, v2s64);
+
getActionDefinitionsBuilder(G_PHI)
.legalFor({p0, s16, s32, s64})
.legalFor(PackedVectorAllTypeList)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
index be674d79b54f1d..0f9086fd8972fa 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
@@ -130,3 +130,93 @@ body: |
%freeze:_(s2) = G_FREEZE %x
%ext:_(s64) = G_ZEXT %freeze
$x0 = COPY %ext(s64)
+...
+---
+name: test_freeze_v4s1
+body: |
+ bb.0.entry:
+ liveins: $q0
+ ; CHECK-LABEL: name: test_freeze_v4s1
+ ; CHECK: liveins: $q0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s8) = G_IMPLICIT_DEF
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s8>) = G_BUILD_VECTOR [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8)
+ ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<8 x s16>) = G_ANYEXT [[BUILD_VECTOR]](<8 x s8>)
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(<4 x s16>), [[UV1:%[0-9]+]]:_(<4 x s16>) = G_UNMERGE_VALUES [[ANYEXT]](<8 x s16>)
+ ; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(<4 x s16>) = G_FREEZE [[UV]]
+ ; CHECK-NEXT: [[ANYEXT1:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[FREEZE]](<4 x s16>)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32)
+ ; CHECK-NEXT: %ext:_(<4 x s32>) = G_AND [[ANYEXT1]], [[BUILD_VECTOR1]]
+ ; CHECK-NEXT: $q0 = COPY %ext(<4 x s32>)
+ %x:_(<4 x s1>) = G_IMPLICIT_DEF
+ %freeze:_(<4 x s1>) = G_FREEZE %x
+ %ext:_(<4 x s32>) = G_ZEXT %freeze
+ $q0 = COPY %ext(<4 x s32>)
+...
+---
+name: test_freeze_v3s8
+body: |
+ bb.0.entry:
+ liveins: $q0
+ ; CHECK-LABEL: name: test_freeze_v3s8
+ ; CHECK: liveins: $q0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s8) = G_IMPLICIT_DEF
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s8>) = G_BUILD_VECTOR [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8)
+ ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<8 x s16>) = G_ANYEXT [[BUILD_VECTOR]](<8 x s8>)
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(<4 x s16>), [[UV1:%[0-9]+]]:_(<4 x s16>) = G_UNMERGE_VALUES [[ANYEXT]](<8 x s16>)
+ ; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(<4 x s16>) = G_FREEZE [[UV]]
+ ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s16), [[UV3:%[0-9]+]]:_(s16), [[UV4:%[0-9]+]]:_(s16), [[UV5:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES [[FREEZE]](<4 x s16>)
+ ; CHECK-NEXT: %undef:_(s32) = G_IMPLICIT_DEF
+ ; CHECK-NEXT: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[UV2]](s16)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
+ ; CHECK-NEXT: %ext0:_(s32) = G_AND [[ANYEXT1]], [[C]]
+ ; CHECK-NEXT: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[UV3]](s16)
+ ; CHECK-NEXT: %ext1:_(s32) = G_AND [[ANYEXT2]], [[C]]
+ ; CHECK-NEXT: [[ANYEXT3:%[0-9]+]]:_(s32) = G_ANYEXT [[UV4]](s16)
+ ; CHECK-NEXT: %ext2:_(s32) = G_AND [[ANYEXT3]], [[C]]
+ ; CHECK-NEXT: %res:_(<4 x s32>) = G_BUILD_VECTOR %ext0(s32), %ext1(s32), %ext2(s32), %undef(s32)
+ ; CHECK-NEXT: $q0 = COPY %res(<4 x s32>)
+ %x:_(<3 x s8>) = G_IMPLICIT_DEF
+ %freeze:_(<3 x s8>) = G_FREEZE %x
+ %ext:_(<3 x s32>) = G_ZEXT %freeze
+ %undef:_(s32) = G_IMPLICIT_DEF
+ %ext0:_(s32), %ext1:_(s32), %ext2:_(s32) = G_UNMERGE_VALUES %ext
+ %res:_(<4 x s32>) = G_BUILD_VECTOR %ext0, %ext1, %ext2, %undef
+ $q0 = COPY %res(<4 x s32>)
+...
+---
+name: test_freeze_v4s1_select
+body: |
+ bb.0.entry:
+ liveins: $q0, $q1
+ ; CHECK-LABEL: name: test_freeze_v4s1_select
+ ; CHECK: liveins: $q0, $q1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32)
+ ; CHECK-NEXT: [[FCMP:%[0-9]+]]:_(<4 x s32>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(olt), [[COPY]](<4 x s32>), [[BUILD_VECTOR]]
+ ; CHECK-NEXT: [[FCMP1:%[0-9]+]]:_(<4 x s32>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(ogt), [[COPY1]](<4 x s32>), [[COPY]]
+ ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[FCMP1]](<4 x s32>)
+ ; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(<4 x s16>) = G_FREEZE [[TRUNC]]
+ ; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[FCMP]](<4 x s32>)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(<4 x s16>) = G_AND [[TRUNC1]], [[FREEZE]]
+ ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[AND]](<4 x s16>)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32), [[C1]](s32), [[C1]](s32)
+ ; CHECK-NEXT: [[AND1:%[0-9]+]]:_(<4 x s32>) = G_AND [[ANYEXT]], [[BUILD_VECTOR1]]
+ ; CHECK-NEXT: $q0 = COPY [[AND1]](<4 x s32>)
+ %1:_(<4 x s32>) = COPY $q0
+ %2:_(<4 x s32>) = COPY $q1
+ %3:_(s32) = G_CONSTANT i32 0
+ %4:_(<4 x s32>) = G_BUILD_VECTOR %3, %3, %3, %3
+ %5:_(s1) = G_CONSTANT i1 false
+ %6:_(<4 x s1>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(olt), %1:_(<4 x s32>), %4:_
+ %7:_(<4 x s1>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(ogt), %2:_(<4 x s32>), %1:_
+ %8:_(<4 x s1>) = G_FREEZE %7
+ %9:_(<4 x s1>) = G_AND %6, %8
+ %10:_(<4 x s32>) = G_ZEXT %9
+ $q0 = COPY %10
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index 200e9d19d58d25..c82052154a147f 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -131,9 +131,8 @@
# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
#
# DEBUG-NEXT: G_FREEZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices
-# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
-# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. the first uncovered type index: {{[0-9]+}}, OK
+# DEBUG-NEXT: .. the first uncovered imm index: {{[0-9]+}}, OK
# DEBUG-NEXT: G_CONSTANT_FOLD_BARRIER (opcode {{[0-9]+}}): 1 type index, 0 imm indices
# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
|
The updates to the test cases are in dca4dfc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fundamentally G_FREEZE and G_IMPLICIT_DEF have the same property of being synthetic compiler operations that need to be legal for all register types. They really should have the same legalization rules. It may make sense to replace the current set of rules, but I think they should be moved together
The reason I separated G_FREEZE out was because I found your commit about implementing fewerElementsIf() for G_IMPLICIT_DEF (3dddb16#diff-973c8a5764852eef4468cf4ba0dd6b70a2e0bf769640c9ff728c1f3613908abc) and I was not sure if this was being done for any special reason. In an ideal world, both G_FREEZE and G_IMPLICIT_DEF would just be able to copy the types (after legalization) of the input operand and of any uses respectively, however I don't think that is possible with the current design of the legalizer. |
These was a lot less implemented back then, and it was easier to implement one piece at a time. Intrinsically, these should have the same rules. |
This commit is so old, I'm not sure freeze was even in the IR yet |
Pre-commit failures look unrelated to the PR - from what I can see MSVC is running out of memory while building flang and it is crashing there. |
It does not make sense to scalarize G_FREEZE as it leads to the generation
of pairs of G_UNMERGE_VALUES and G_BUILD_VECTORs which are difficult to
optimize especially when operations like G_TRUNC operate before G_FREEZE
but after G_UNMERGE_VALUES.
Instead, it is better to legalize G_FREEZE like any other vector type
would be, as it gets lowered to a COPY during instruction selection
anyways.
This is an issue that was encountered when looking at the TSVC
benchmark, where the legalization of G_FREEZE would cause generation of
unnecessary MOVs that adversely affected the performance.