Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalIsel][AArch64] more legal icmps #78239

Merged
merged 7 commits into from
Jan 17, 2024
Merged

Conversation

tschuett
Copy link
Member

In #78181 the godbolt (https://llvm.godbolt.org/z/vMsnxMf1v) crashed with GlobalIsel.

LLVM ERROR: unable to legalize instruction: %90:(<3 x s32>) = G_ICMP intpred(uge), %15:(<3 x s32>), %0:_ (in function: vec3_i32)

In llvm#78181 the godbolt (https://llvm.godbolt.org/z/vMsnxMf1v) crashed with GlobalIsel.

LLVM ERROR: unable to legalize instruction: %90:_(<3 x s32>) = G_ICMP intpred(uge), %15:_(<3 x s32>), %0:_ (in function: vec3_i32)
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Thorsten Schütt (tschuett)

Changes

In #78181 the godbolt (https://llvm.godbolt.org/z/vMsnxMf1v) crashed with GlobalIsel.

LLVM ERROR: unable to legalize instruction: %90:(<3 x s32>) = G_ICMP intpred(uge), %15:(<3 x s32>), %0:_ (in function: vec3_i32)


Full diff: https://github.com/llvm/llvm-project/pull/78239.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+8)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+6-1)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/legalize-cmp.mir (+47-7)
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 91d2497fdb7e20..22df0fd34c057b 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -5314,6 +5314,14 @@ LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
     Observer.changedInstr(MI);
     return Legalized;
   }
+  case TargetOpcode::G_ICMP: {
+    Observer.changingInstr(MI);
+    moreElementsVectorSrc(MI, MoreTy, 2);
+    moreElementsVectorSrc(MI, MoreTy, 3);
+    moreElementsVectorDst(MI, MoreTy, 0);
+    Observer.changedInstr(MI);
+    return Legalized;
+  }
   default:
     return UnableToLegalize;
   }
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index e94f9d0c68ffe7..7357d666f1f80b 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -524,7 +524,11 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .minScalarOrEltIf(
           [=](const LegalityQuery &Query) { return Query.Types[1] == v2p0; }, 0,
           s64)
-      .clampNumElements(0, v2s32, v4s32);
+      .moreElementsToNextPow2(0)
+      .clampMaxNumElements(0, s64, 2)
+      .clampMaxNumElements(0, s32, 4)
+      .clampMaxNumElements(0, s16, 8)
+      .clampMaxNumElements(0, s8, 16);
 
   getActionDefinitionsBuilder(G_FCMP)
       // If we don't have full FP16 support, then scalarize the elements of
@@ -863,6 +867,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
           },
           0, s8)
       .minScalarOrElt(0, s8) // Worst case, we need at least s8.
+      .moreElementsToNextPow2(1)
       .clampMaxNumElements(1, s64, 2)
       .clampMaxNumElements(1, s32, 4)
       .clampMaxNumElements(1, s16, 8)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-cmp.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-cmp.mir
index e9b3aa0a3a8fd8..b590f8bfd6c111 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-cmp.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-cmp.mir
@@ -56,7 +56,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %0:_(s128) = G_IMPLICIT_DEF
@@ -93,7 +95,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %lhs:_(s128) = G_IMPLICIT_DEF
@@ -132,7 +136,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %lhs:_(s88) = G_IMPLICIT_DEF
@@ -171,7 +177,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %lhs:_(s88) = G_IMPLICIT_DEF
@@ -210,7 +218,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %lhs:_(s96) = G_IMPLICIT_DEF
@@ -272,7 +282,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %lhs:_(s318) = G_IMPLICIT_DEF
@@ -318,7 +330,9 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors:
-  ; CHECK: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   RET_ReallyLR
   bb.1:
     %lhs:_(s158) = G_IMPLICIT_DEF
@@ -330,3 +344,29 @@ body:             |
     successors:
   bb.3:
     RET_ReallyLR
+...
+---
+name:            test_3xs32_eq_pr_78181
+tracksRegLiveness: true
+body:             |
+  bb.1:
+    liveins: $x0
+    ; CHECK-LABEL: name: test_3xs32_eq_pr_78181
+    ; CHECK: liveins: $x0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %const:_(s32) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR %const(s32), %const(s32), %const(s32), %const(s32)
+    ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR %const(s32), %const(s32), %const(s32), %const(s32)
+    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(<4 x s32>) = G_ICMP intpred(eq), [[BUILD_VECTOR]](<4 x s32>), [[BUILD_VECTOR1]]
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 1
+    ; CHECK-NEXT: [[EVEC:%[0-9]+]]:_(s32) = G_EXTRACT_VECTOR_ELT [[ICMP]](<4 x s32>), [[C]](s64)
+    ; CHECK-NEXT: $w0 = COPY [[EVEC]](s32)
+    ; CHECK-NEXT: RET_ReallyLR
+    %const:_(s32) = G_IMPLICIT_DEF
+    %rhs:_(<3 x s32>) = G_BUILD_VECTOR %const(s32), %const(s32), %const(s32)
+    %lhs:_(<3 x s32>) = G_BUILD_VECTOR %const(s32), %const(s32), %const(s32)
+    %cmp:_(<3 x s32>) = G_ICMP intpred(eq), %lhs(<3 x s32>), %rhs
+    %1:_(s32) = G_CONSTANT i32 1
+    %2:_(s32) = G_EXTRACT_VECTOR_ELT %cmp(<3 x s32>), %1(s32)
+    $w0 = COPY %2(s32)
+    RET_ReallyLR

%const:_(s32) = G_IMPLICIT_DEF
%rhs:_(<3 x s32>) = G_BUILD_VECTOR %const(s32), %const(s32), %const(s32)
%lhs:_(<3 x s32>) = G_BUILD_VECTOR %const(s32), %const(s32), %const(s32)
%cmp:_(<3 x s32>) = G_ICMP intpred(eq), %lhs(<3 x s32>), %rhs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test vectors of all the touched element types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. The error was that it failed to legalize <3 x s32>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you changed clampMaxNumElements on more element types than that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use now clampNumElements to limit lower and upper bound. There are now some clamp tests.

; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s32>) = G_CONCAT_VECTORS [[UV]](<2 x s32>), [[UV3]](<2 x s32>)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improved legalization of G_EXTRACT_VECTOR_ELT.

; CHECK-GI-NEXT: warning: Instruction selection used fallback path for abs_8b
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for abs_16b
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for abs_4h
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for abs_8h
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no fallbacks, but code differences.

@@ -2,12 +2,6 @@
; RUN: llc -mtriple=aarch64-none-eabi -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc -mtriple=aarch64-none-eabi -global-isel -global-isel-abort=2 -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI

; CHECK-GI: warning: Instruction selection used fallback path for v3i64_i64
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i64_i64
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v3i32_i32
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no fallbacks, but code differences.

; CHECK-SD-NEXT: cmgt v0.8h, v2.8h, v0.8h
; CHECK-SD-NEXT: bsl v1.16b, v5.16b, v7.16b
; CHECK-SD-NEXT: bsl v0.16b, v4.16b, v6.16b
; CHECK-SD-NEXT: ret
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look the same. The order of cmgt differs.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ICmp part looks OK to me so long as we have proper testing for it (which I added a little while ago).

@@ -863,6 +867,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
},
0, s8)
.minScalarOrElt(0, s8) // Worst case, we need at least s8.
.moreElementsToNextPow2(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtractVector should probably be in a separate pr. I was looking into some of the vector operations a little while ago, but I think that still needs cleaning up a bit. I had a patch for fcmp too which I will try and upload.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the change for the icmp tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, because they extract a single lane for the tests. It sounds harmless enough to keep then.

@@ -2,21 +2,6 @@
; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple | FileCheck -check-prefixes=CHECK,CHECK-SD %s
; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI

; CHECK-GI: warning: Instruction selection used fallback path for uabd16b_rdx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all of these really be removed? If so you can remove the -global-isel-abort=2 from the command line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can still be:

; CHECK-GI:       warning: Instruction selection used fallback path for abs_8b
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_16b
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_4h
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_8h
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_2s
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_4s
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_1d
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_1d_honestly
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fabds
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fabdd
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uabd_i64

Observer.changingInstr(MI);
moreElementsVectorSrc(MI, MoreTy, 2);
moreElementsVectorSrc(MI, MoreTy, 3);
moreElementsVectorDst(MI, MoreTy, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much this matters, but this looks like it is assuming that the three types are all the same, not that one is a predicate vector (i1 elements). Neon will work with full vector lane mask results, so it should work OK there, but possible not for other architectures with vector predicate register like SVE and MVE.

Copy link
Member Author

@tschuett tschuett Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. I should be the SrcTy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MoreTy comes from here:

LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,

Properly supporting predicates will take more work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might need to be similar to the trunc code:

LLT SrcTy = LLT::fixed_vector(
, but it might be better to wait until a vector target needs it, and just add a TODO at the moment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, SVE and moreElements is incompatible. I cannot change the length of a scalable vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2,12 +2,6 @@
; RUN: llc -mtriple=aarch64-none-eabi -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc -mtriple=aarch64-none-eabi -global-isel -global-isel-abort=2 -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this remove -global-isel-abort=2 now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to solve that issue in a different PR. I got failures without global-isel-abort:

LLVM ERROR: unable to legalize instruction: %31:_(<3 x s32>) = G_ASHR %99:_, %101:_(<3 x s32>) (in function: v3i32_i32)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't some of the "fallback" lines remain then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored the fallback. I will look into G_ASHR.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can update the fallback lines in the vabs test then this LGTM. I can rebase the fcmp patch I have on top of it, and hopefully make use of the LegalizerHelper change. Thanks

@@ -2,21 +2,6 @@
; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple | FileCheck -check-prefixes=CHECK,CHECK-SD %s
; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI

; CHECK-GI: warning: Instruction selection used fallback path for uabd16b_rdx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can still be:

; CHECK-GI:       warning: Instruction selection used fallback path for abs_8b
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_16b
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_4h
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_8h
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_2s
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_4s
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_1d
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for abs_1d_honestly
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fabds
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fabdd
; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uabd_i64

@tschuett tschuett merged commit 67dc6e9 into llvm:main Jan 17, 2024
3 of 4 checks passed
@tschuett tschuett deleted the gisel-leglize-cmps2 branch January 17, 2024 21:23
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
In llvm#78181 the godbolt
(https://llvm.godbolt.org/z/vMsnxMf1v) crashed with GlobalIsel.

LLVM ERROR: unable to legalize instruction: %90:_(<3 x s32>) = G_ICMP
intpred(uge), %15:_(<3 x s32>), %0:_ (in function: vec3_i32)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants