[AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer #88469

dc03-work · 2024-04-12T03:38:36Z

It does not make sense to scalarize G_FREEZE as it leads to the generation
of pairs of G_UNMERGE_VALUES and G_BUILD_VECTORs which are difficult to
optimize especially when operations like G_TRUNC operate before G_FREEZE
but after G_UNMERGE_VALUES.

Instead, it is better to legalize G_FREEZE like any other vector type
would be, as it gets lowered to a COPY during instruction selection
anyways.

This is an issue that was encountered when looking at the TSVC
benchmark, where the legalization of G_FREEZE would cause generation of
unnecessary MOVs that adversely affected the performance.

…IT_DEF It does not make sense to scalarize G_FREEZE as it leads to the generation of pairs of G_UNMERGE_VALUES and G_BUILD_VECTORs which are difficult to optimize especially when operations like G_TRUNC operate before G_FREEZE but after G_UNMERGE_VALUES. Instead, it is better to legalize G_FREEZE like any other vector type would be, as it gets lowered to a COPY during instruction selection anyways. This is an issue that was encountered when looking at the TSVC benchmark, where the legalization of G_FREEZE would cause generation of unnecessary MOVs that adversely affected the performance.

llvmbot · 2024-04-12T03:39:03Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Dhruv Chawla (dc03-work)

Changes

It does not make sense to scalarize G_FREEZE as it leads to the generation
of pairs of G_UNMERGE_VALUES and G_BUILD_VECTORs which are difficult to
optimize especially when operations like G_TRUNC operate before G_FREEZE
but after G_UNMERGE_VALUES.

Instead, it is better to legalize G_FREEZE like any other vector type
would be, as it gets lowered to a COPY during instruction selection
anyways.

This is an issue that was encountered when looking at the TSVC
benchmark, where the legalization of G_FREEZE would cause generation of
unnecessary MOVs that adversely affected the performance.

Full diff: https://github.com/llvm/llvm-project/pull/88469.diff

3 Files Affected:

(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+13-2)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir (+90)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+2-3)

diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 96ded69905f7cc..bae999c8fd61a7 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -87,8 +87,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
   const bool HasCSSC = ST.hasCSSC();
   const bool HasRCPC3 = ST.hasRCPC3();
 
-  getActionDefinitionsBuilder(
-      {G_IMPLICIT_DEF, G_FREEZE, G_CONSTANT_FOLD_BARRIER})
+  getActionDefinitionsBuilder({G_IMPLICIT_DEF, G_CONSTANT_FOLD_BARRIER})
       .legalFor({p0, s8, s16, s32, s64})
       .legalFor(PackedVectorAllTypeList)
       .widenScalarToNextPow2(0)
@@ -106,6 +105,18 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
             return std::make_pair(0, EltTy);
           });
 
+  getActionDefinitionsBuilder(G_FREEZE)
+      .legalFor({p0, s8, s16, s32, s64})
+      .legalFor(PackedVectorAllTypeList)
+      .widenScalarToNextPow2(0)
+      .clampScalar(0, s8, s64)
+      .moreElementsToNextPow2(0)
+      .widenVectorEltsToVectorMinSize(0, 64)
+      .clampNumElements(0, v8s8, v16s8)
+      .clampNumElements(0, v4s16, v8s16)
+      .clampNumElements(0, v2s32, v4s32)
+      .clampNumElements(0, v2s64, v2s64);
+
   getActionDefinitionsBuilder(G_PHI)
       .legalFor({p0, s16, s32, s64})
       .legalFor(PackedVectorAllTypeList)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
index be674d79b54f1d..0f9086fd8972fa 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
@@ -130,3 +130,93 @@ body: |
     %freeze:_(s2) = G_FREEZE %x
     %ext:_(s64) = G_ZEXT %freeze
     $x0 = COPY %ext(s64)
+...
+---
+name:            test_freeze_v4s1
+body: |
+  bb.0.entry:
+    liveins: $q0
+    ; CHECK-LABEL: name: test_freeze_v4s1
+    ; CHECK: liveins: $q0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s8) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s8>) = G_BUILD_VECTOR [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8)
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<8 x s16>) = G_ANYEXT [[BUILD_VECTOR]](<8 x s8>)
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(<4 x s16>), [[UV1:%[0-9]+]]:_(<4 x s16>) = G_UNMERGE_VALUES [[ANYEXT]](<8 x s16>)
+    ; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(<4 x s16>) = G_FREEZE [[UV]]
+    ; CHECK-NEXT: [[ANYEXT1:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[FREEZE]](<4 x s16>)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32)
+    ; CHECK-NEXT: %ext:_(<4 x s32>) = G_AND [[ANYEXT1]], [[BUILD_VECTOR1]]
+    ; CHECK-NEXT: $q0 = COPY %ext(<4 x s32>)
+    %x:_(<4 x s1>) = G_IMPLICIT_DEF
+    %freeze:_(<4 x s1>) = G_FREEZE %x
+    %ext:_(<4 x s32>) = G_ZEXT %freeze
+    $q0 = COPY %ext(<4 x s32>)
+...
+---
+name:            test_freeze_v3s8
+body: |
+  bb.0.entry:
+    liveins: $q0
+    ; CHECK-LABEL: name: test_freeze_v3s8
+    ; CHECK: liveins: $q0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s8) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s8>) = G_BUILD_VECTOR [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8), [[DEF]](s8)
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<8 x s16>) = G_ANYEXT [[BUILD_VECTOR]](<8 x s8>)
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(<4 x s16>), [[UV1:%[0-9]+]]:_(<4 x s16>) = G_UNMERGE_VALUES [[ANYEXT]](<8 x s16>)
+    ; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(<4 x s16>) = G_FREEZE [[UV]]
+    ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s16), [[UV3:%[0-9]+]]:_(s16), [[UV4:%[0-9]+]]:_(s16), [[UV5:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES [[FREEZE]](<4 x s16>)
+    ; CHECK-NEXT: %undef:_(s32) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[UV2]](s16)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
+    ; CHECK-NEXT: %ext0:_(s32) = G_AND [[ANYEXT1]], [[C]]
+    ; CHECK-NEXT: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[UV3]](s16)
+    ; CHECK-NEXT: %ext1:_(s32) = G_AND [[ANYEXT2]], [[C]]
+    ; CHECK-NEXT: [[ANYEXT3:%[0-9]+]]:_(s32) = G_ANYEXT [[UV4]](s16)
+    ; CHECK-NEXT: %ext2:_(s32) = G_AND [[ANYEXT3]], [[C]]
+    ; CHECK-NEXT: %res:_(<4 x s32>) = G_BUILD_VECTOR %ext0(s32), %ext1(s32), %ext2(s32), %undef(s32)
+    ; CHECK-NEXT: $q0 = COPY %res(<4 x s32>)
+    %x:_(<3 x s8>) = G_IMPLICIT_DEF
+    %freeze:_(<3 x s8>) = G_FREEZE %x
+    %ext:_(<3 x s32>) = G_ZEXT %freeze
+    %undef:_(s32) = G_IMPLICIT_DEF
+    %ext0:_(s32), %ext1:_(s32), %ext2:_(s32) = G_UNMERGE_VALUES %ext
+    %res:_(<4 x s32>) = G_BUILD_VECTOR %ext0, %ext1, %ext2, %undef
+    $q0 = COPY %res(<4 x s32>)
+...
+---
+name:            test_freeze_v4s1_select
+body: |
+  bb.0.entry:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: test_freeze_v4s1_select
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32)
+    ; CHECK-NEXT: [[FCMP:%[0-9]+]]:_(<4 x s32>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(olt), [[COPY]](<4 x s32>), [[BUILD_VECTOR]]
+    ; CHECK-NEXT: [[FCMP1:%[0-9]+]]:_(<4 x s32>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(ogt), [[COPY1]](<4 x s32>), [[COPY]]
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[FCMP1]](<4 x s32>)
+    ; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(<4 x s16>) = G_FREEZE [[TRUNC]]
+    ; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[FCMP]](<4 x s32>)
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:_(<4 x s16>) = G_AND [[TRUNC1]], [[FREEZE]]
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<4 x s32>) = G_ANYEXT [[AND]](<4 x s16>)
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32), [[C1]](s32), [[C1]](s32)
+    ; CHECK-NEXT: [[AND1:%[0-9]+]]:_(<4 x s32>) = G_AND [[ANYEXT]], [[BUILD_VECTOR1]]
+    ; CHECK-NEXT: $q0 = COPY [[AND1]](<4 x s32>)
+    %1:_(<4 x s32>) = COPY $q0
+    %2:_(<4 x s32>) = COPY $q1
+    %3:_(s32) = G_CONSTANT i32 0
+    %4:_(<4 x s32>) = G_BUILD_VECTOR %3, %3, %3, %3
+    %5:_(s1) = G_CONSTANT i1 false
+    %6:_(<4 x s1>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(olt), %1:_(<4 x s32>), %4:_
+    %7:_(<4 x s1>) = nnan ninf nsz arcp contract afn reassoc G_FCMP floatpred(ogt), %2:_(<4 x s32>), %1:_
+    %8:_(<4 x s1>) = G_FREEZE %7
+    %9:_(<4 x s1>) = G_AND %6, %8
+    %10:_(<4 x s32>) = G_ZEXT %9
+    $q0 = COPY %10
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index 200e9d19d58d25..c82052154a147f 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -131,9 +131,8 @@
 # DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 #
 # DEBUG-NEXT: G_FREEZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices
-# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
-# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. the first uncovered type index: {{[0-9]+}}, OK
+# DEBUG-NEXT: .. the first uncovered imm index: {{[0-9]+}}, OK
 
 # DEBUG-NEXT: G_CONSTANT_FOLD_BARRIER (opcode {{[0-9]+}}): 1 type index, 0 imm indices
 # DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}

dc03-work · 2024-04-12T03:39:59Z

The updates to the test cases are in dca4dfc.

arsenm

Fundamentally G_FREEZE and G_IMPLICIT_DEF have the same property of being synthetic compiler operations that need to be legal for all register types. They really should have the same legalization rules. It may make sense to replace the current set of rules, but I think they should be moved together

dc03-work · 2024-04-13T04:25:06Z

Fundamentally G_FREEZE and G_IMPLICIT_DEF have the same property of being synthetic compiler operations that need to be legal for all register types. They really should have the same legalization rules. It may make sense to replace the current set of rules, but I think they should be moved together

The reason I separated G_FREEZE out was because I found your commit about implementing fewerElementsIf() for G_IMPLICIT_DEF (3dddb16#diff-973c8a5764852eef4468cf4ba0dd6b70a2e0bf769640c9ff728c1f3613908abc) and I was not sure if this was being done for any special reason.

In an ideal world, both G_FREEZE and G_IMPLICIT_DEF would just be able to copy the types (after legalization) of the input operand and of any uses respectively, however I don't think that is possible with the current design of the legalizer.

arsenm · 2024-04-17T18:26:58Z

Fundamentally G_FREEZE and G_IMPLICIT_DEF have the same property of being synthetic compiler operations that need to be legal for all register types. They really should have the same legalization rules. It may make sense to replace the current set of rules, but I think they should be moved together

and I was not sure if this was being done for any special reason.

These was a lot less implemented back then, and it was easier to implement one piece at a time. Intrinsically, these should have the same rules.

arsenm · 2024-04-17T18:35:16Z

These was a lot less implemented back then, and it was easier to implement one piece at a time. Intrinsically, these should have the same rules.

This commit is so old, I'm not sure freeze was even in the IR yet

well

dc03-work · 2024-04-23T16:07:03Z

Pre-commit failures look unrelated to the PR - from what I can see MSVC is running out of memory while building flang and it is crashing there.

dc03-work added 2 commits April 12, 2024 09:05

[AArch64][GISel] Update legalize_freeze.mir for G_FREEZE patch

f42a3fe

dc03-work requested review from aemerson, madhur13490 and davemgreen April 12, 2024 03:38

llvmbot added backend:AArch64 llvm:globalisel labels Apr 12, 2024

madhur13490 requested a review from arsenm April 12, 2024 04:36

arsenm reviewed Apr 12, 2024

View reviewed changes

dc03-work mentioned this pull request Apr 17, 2024

[GISel][CombinerHelper] Combine op(trunc(x), trunc(y)) -> trunc(op(x, y)) #89023

Merged

Disable scalarization for G_IMPLICIT_DEF and G_CONSTANT_FOLD_BARRIER as

9429624

well

dc03-work requested a review from arsenm April 23, 2024 03:39

dc03-work changed the title ~~[AArch64][GISel] Separate legalize actions for G_FREEZE from G_IMPLICIT_DEF~~ [AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer Apr 23, 2024

arsenm approved these changes Apr 23, 2024

View reviewed changes

dc03-work merged commit 143be6a into llvm:main Apr 23, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer #88469

[AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer #88469

dc03-work commented Apr 12, 2024

llvmbot commented Apr 12, 2024 •

edited

dc03-work commented Apr 12, 2024

arsenm left a comment

dc03-work commented Apr 13, 2024

arsenm commented Apr 17, 2024

arsenm commented Apr 17, 2024

dc03-work commented Apr 23, 2024

[AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer #88469

[AArch64][GISel] Avoid scalarizing G_IMPLICIT_DEF and G_FREEZE in the Legalizer #88469

Conversation

dc03-work commented Apr 12, 2024

llvmbot commented Apr 12, 2024 • edited

dc03-work commented Apr 12, 2024

arsenm left a comment

Choose a reason for hiding this comment

dc03-work commented Apr 13, 2024

arsenm commented Apr 17, 2024

arsenm commented Apr 17, 2024

dc03-work commented Apr 23, 2024

llvmbot commented Apr 12, 2024 •

edited