Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][GlobalISel] Take abs scalar codegen closer to SDAG #84886

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

madhur13490
Copy link
Contributor

This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.

codegen with GISel for > 128 bit types is not very good
with these method so not doing so.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 12, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Madhur Amilkanthwar (madhur13490)

Changes

This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.

codegen with GISel for > 128 bit types is not very good
with these method so not doing so.


Full diff: https://github.com/llvm/llvm-project/pull/84886.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+8)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir (+18-10)
  • (modified) llvm/test/CodeGen/AArch64/abs.ll (+15-15)
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 36adada2796531..6aa43f8b14a782 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1021,6 +1021,12 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .clampNumElements(0, v2s32, v4s32)
       .clampNumElements(0, v2s64, v2s64)
       .moreElementsToNextPow2(0)
+      // Do SMax based lowering for < 128 bits.
+      .customIf(
+          [=](const LegalityQuery &Q) {
+            LLT SrcTy = Q.Types[0];
+            return SrcTy.isScalar() && SrcTy.getSizeInBits() < 128;
+          })
       .lower();
 
   // For fadd reductions we have pairwise operations available. We treat the
@@ -1262,6 +1268,8 @@ bool AArch64LegalizerInfo::legalizeCustom(
     return legalizeDynStackAlloc(MI, Helper);
   case TargetOpcode::G_PREFETCH:
     return legalizePrefetch(MI, Helper);
+  case TargetOpcode::G_ABS:
+    return Helper.lowerAbsToMaxNeg(MI);
   }
 
   llvm_unreachable("expected switch to return");
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir
index 3123e304116fe5..cd27d4498e0ef9 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir
@@ -8,11 +8,12 @@ body:             |
   bb.0:
     ; CHECK-LABEL: name: abs_s32
     ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
-    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 31
-    ; CHECK-NEXT: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[COPY]], [[C]](s64)
-    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[COPY]], [[ASHR]]
-    ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s32) = G_XOR [[ADD]], [[ASHR]]
-    ; CHECK-NEXT: $w0 = COPY [[XOR]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C]], [[COPY]]
+    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(sgt), [[COPY]](s32), [[SUB]]
+    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s32), [[COPY]], [[SUB]]
+    ; CHECK-NEXT: $w0 = COPY [[SELECT]](s32)
+    ;
     ; CHECK-CSSC-LABEL: name: abs_s32
     ; CHECK-CSSC: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
     ; CHECK-CSSC-NEXT: [[ABS:%[0-9]+]]:_(s32) = G_ABS [[COPY]]
@@ -28,11 +29,12 @@ body:             |
   bb.0:
     ; CHECK-LABEL: name: abs_s64
     ; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
-    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 63
-    ; CHECK-NEXT: [[ASHR:%[0-9]+]]:_(s64) = G_ASHR [[COPY]], [[C]](s64)
-    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[COPY]], [[ASHR]]
-    ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s64) = G_XOR [[ADD]], [[ASHR]]
-    ; CHECK-NEXT: $x0 = COPY [[XOR]](s64)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
+    ; CHECK-NEXT: [[SUB:%[0-9]+]]:_(s64) = G_SUB [[C]], [[COPY]]
+    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(sgt), [[COPY]](s64), [[SUB]]
+    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:_(s64) = G_SELECT [[ICMP]](s32), [[COPY]], [[SUB]]
+    ; CHECK-NEXT: $x0 = COPY [[SELECT]](s64)
+    ;
     ; CHECK-CSSC-LABEL: name: abs_s64
     ; CHECK-CSSC: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
     ; CHECK-CSSC-NEXT: [[ABS:%[0-9]+]]:_(s64) = G_ABS [[COPY]]
@@ -55,6 +57,7 @@ body:             |
     ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<4 x s16>) = G_ABS [[COPY]]
     ; CHECK-NEXT: $d0 = COPY [[ABS]](<4 x s16>)
     ; CHECK-NEXT: RET_ReallyLR implicit $d0
+    ;
     ; CHECK-CSSC-LABEL: name: abs_v4s16
     ; CHECK-CSSC: liveins: $d0
     ; CHECK-CSSC-NEXT: {{  $}}
@@ -82,6 +85,7 @@ body:             |
     ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<8 x s16>) = G_ABS [[COPY]]
     ; CHECK-NEXT: $q0 = COPY [[ABS]](<8 x s16>)
     ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    ;
     ; CHECK-CSSC-LABEL: name: abs_v8s16
     ; CHECK-CSSC: liveins: $q0
     ; CHECK-CSSC-NEXT: {{  $}}
@@ -109,6 +113,7 @@ body:             |
     ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<2 x s32>) = G_ABS [[COPY]]
     ; CHECK-NEXT: $d0 = COPY [[ABS]](<2 x s32>)
     ; CHECK-NEXT: RET_ReallyLR implicit $d0
+    ;
     ; CHECK-CSSC-LABEL: name: abs_v2s32
     ; CHECK-CSSC: liveins: $d0
     ; CHECK-CSSC-NEXT: {{  $}}
@@ -136,6 +141,7 @@ body:             |
     ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<4 x s32>) = G_ABS [[COPY]]
     ; CHECK-NEXT: $q0 = COPY [[ABS]](<4 x s32>)
     ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    ;
     ; CHECK-CSSC-LABEL: name: abs_v4s32
     ; CHECK-CSSC: liveins: $q0
     ; CHECK-CSSC-NEXT: {{  $}}
@@ -163,6 +169,7 @@ body:             |
     ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<8 x s8>) = G_ABS [[COPY]]
     ; CHECK-NEXT: $d0 = COPY [[ABS]](<8 x s8>)
     ; CHECK-NEXT: RET_ReallyLR implicit $d0
+    ;
     ; CHECK-CSSC-LABEL: name: abs_v4s8
     ; CHECK-CSSC: liveins: $d0
     ; CHECK-CSSC-NEXT: {{  $}}
@@ -190,6 +197,7 @@ body:             |
     ; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<16 x s8>) = G_ABS [[COPY]]
     ; CHECK-NEXT: $q0 = COPY [[ABS]](<16 x s8>)
     ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    ;
     ; CHECK-CSSC-LABEL: name: abs_v16s8
     ; CHECK-CSSC: liveins: $q0
     ; CHECK-CSSC-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AArch64/abs.ll b/llvm/test/CodeGen/AArch64/abs.ll
index e00f70b94e3b42..bfedd2bce8c86c 100644
--- a/llvm/test/CodeGen/AArch64/abs.ll
+++ b/llvm/test/CodeGen/AArch64/abs.ll
@@ -15,9 +15,9 @@ define i8 @abs_i8(i8 %a){
 ; CHECK-GI-LABEL: abs_i8:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    sxtb w8, w0
-; CHECK-GI-NEXT:    asr w8, w8, #7
-; CHECK-GI-NEXT:    add w9, w0, w8
-; CHECK-GI-NEXT:    eor w0, w9, w8
+; CHECK-GI-NEXT:    neg w9, w0
+; CHECK-GI-NEXT:    cmp w8, w9, sxtb
+; CHECK-GI-NEXT:    cneg w0, w0, le
 ; CHECK-GI-NEXT:    ret
 entry:
   %res = call i8 @llvm.abs.i8(i8 %a, i1 0)
@@ -36,9 +36,9 @@ define i16 @abs_i16(i16 %a){
 ; CHECK-GI-LABEL: abs_i16:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    sxth w8, w0
-; CHECK-GI-NEXT:    asr w8, w8, #15
-; CHECK-GI-NEXT:    add w9, w0, w8
-; CHECK-GI-NEXT:    eor w0, w9, w8
+; CHECK-GI-NEXT:    neg w9, w0
+; CHECK-GI-NEXT:    cmp w8, w9, sxth
+; CHECK-GI-NEXT:    cneg w0, w0, le
 ; CHECK-GI-NEXT:    ret
 entry:
   %res = call i16 @llvm.abs.i16(i16 %a, i1 0)
@@ -55,9 +55,9 @@ define i32 @abs_i32(i32 %a){
 ;
 ; CHECK-GI-LABEL: abs_i32:
 ; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    asr w8, w0, #31
-; CHECK-GI-NEXT:    add w9, w0, w8
-; CHECK-GI-NEXT:    eor w0, w9, w8
+; CHECK-GI-NEXT:    neg w8, w0
+; CHECK-GI-NEXT:    cmp w0, w8
+; CHECK-GI-NEXT:    cneg w0, w0, le
 ; CHECK-GI-NEXT:    ret
 entry:
   %res = call i32 @llvm.abs.i32(i32 %a, i1 0)
@@ -74,9 +74,9 @@ define i64 @abs_i64(i64 %a){
 ;
 ; CHECK-GI-LABEL: abs_i64:
 ; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    asr x8, x0, #63
-; CHECK-GI-NEXT:    add x9, x0, x8
-; CHECK-GI-NEXT:    eor x0, x9, x8
+; CHECK-GI-NEXT:    neg x8, x0
+; CHECK-GI-NEXT:    cmp x0, x8
+; CHECK-GI-NEXT:    cneg x0, x0, le
 ; CHECK-GI-NEXT:    ret
 entry:
   %res = call i64 @llvm.abs.i64(i64 %a, i1 0)
@@ -248,9 +248,9 @@ define <1 x i32> @abs_v1i32(<1 x i32> %a){
 ; CHECK-GI-LABEL: abs_v1i32:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    fmov w8, s0
-; CHECK-GI-NEXT:    asr w9, w8, #31
-; CHECK-GI-NEXT:    add w8, w8, w9
-; CHECK-GI-NEXT:    eor w8, w8, w9
+; CHECK-GI-NEXT:    neg w9, w8
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cneg w8, w8, le
 ; CHECK-GI-NEXT:    fmov s0, w8
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-GI-NEXT:    ret

Copy link

github-actions bot commented Mar 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -1021,6 +1021,11 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.clampNumElements(0, v2s32, v4s32)
.clampNumElements(0, v2s64, v2s64)
.moreElementsToNextPow2(0)
// Do SMax based lowering for < 128 bits.
.customIf([=](const LegalityQuery &Q) {
LLT SrcTy = Q.Types[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default implementation of lower make a better guess based on legal operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The customIf action should also go up below the legal cases

@tschuett
Copy link
Member

Do you have an mca proof that smax is better than xor?

@madhur13490
Copy link
Contributor Author

Do you have an mca proof that smax is better than xor?

I understand the question but my intention of the patch is to take the codegen closer to what SDAG is doing. However, if you see the i8 case, it is not quite close yet and I am also not sure if the final assembly is right. Ideally, it should just be cmp+cneg (which is what SDAG) is doing.

If we get it closer to SDAG's codegen, then the benefit would be visible as Global ISel's current lowering leads to 3 instructions whereas SDAG is using just cmp+cneg.
I am looking for suggestions around this.

@aemerson
Copy link
Contributor

I agree with Matt, lower() should be checking for the legality of SMAX and use that if available, falling back to the current XOR method, see TargetLowering::expandABS(). But for AArch64 that's only legal with the CSSC extension, so you'll need to check why we generate the compare lowering in DAG at all.

@davemgreen
Copy link
Collaborator

Given we have a csneg instruction, do we just want cmp x, 0; csneg x, x? i.e I would expect us to want to lower to a compare+neg+select (which isn't either of the existing lowerAbs* methods, and might not be easy to tell generically that we want to generate code for, if there is not an obvious way to check for csneg).

@madhur13490
Copy link
Contributor Author

Given we have a csneg instruction, do we just want cmp x, 0; csneg x, x? i.e I would expect us to want to lower to a compare+neg+select (which isn't either of the existing lowerAbs* methods, and might not be easy to tell generically that we want to generate code for, if there is not an obvious way to check for csneg).

@davemgreen Yes, I was experimenting with this idea locally after this patch but was running into legalizing G_SELECT issue. I have fixed it now and the new commit now introduces cneg-based lowering. Please have a look.

@arsenm @aemerson Given that there is a different approach now, do you still think lower() should be modified? If so, how do we reliably detect? Can you please elaborate a bit more?

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2024

@arsenm @aemerson Given that there is a different approach now, do you still think lower() should be modified? If so, how do we reliably detect? Can you please elaborate a bit more?

Yes. Currently the lower implementation just calls lowerAbsToAddXor. You can conditionally call your new implementation based on isLegalOrCustom for the type with the relevant opcodes used in the expansion

@tschuett
Copy link
Member

Even if SMAX and lowerAbsToAddXor are legal, shouldn't we have the means to say for FOO cores, we prefer SMAX, and for BAR cores we prefer Xor?

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2024

Even if SMAX and lowerAbsToAddXor are legal, shouldn't we have the means to say for FOO cores, we prefer SMAX, and for BAR cores we prefer Xor?

You can always put the decision logic in custom lowering, but I think the default lower should try to guess a reasonable default

@madhur13490
Copy link
Contributor Author

@arsenm @aemerson Given that there is a different approach now, do you still think lower() should be modified? If so, how do we reliably detect? Can you please elaborate a bit more?

Yes. Currently the lower implementation just calls lowerAbsToAddXor. You can conditionally call your new implementation based on isLegalOrCustom for the type with the relevant opcodes used in the expansion

I have updated. Please have a look.

case G_ABS:
case G_ABS: {
LLT Ty = MRI.getType(MI.getOperand(0).getReg());
if (LI.isLegalOrCustom({G_ABS, Ty}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the G_ABS legality doesn't make sense in the G_ABS lowering implementation. You should need to check the operations in the expansion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem might be that there is no good way with the isLegal calls to check if there is a single instruction that does "select and conditional neg" at the same time, in the way csneg would do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be I can check for G_SELECT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the DAG has this chain of implementations:

SDValue TargetLowering::expandABS(SDNode *N, SelectionDAG &DAG,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for G_SELECT and G_ICMP leads to assertion when compiled for AMDGPU in abs.ll.
const T &llvm::ArrayRef<llvm::LLT>::operator[](size_t) const [T = llvm::LLT]: Assertion Index < Length && "Invalid index!" failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLT Ty = MRI.getType(MI.getOperand(0).getReg());
dbgs() << LI.isLegalOrCustom({G_ICMP, Ty});

Leads to an assert for llvm.abs.ll test. This doesn't need my patch to trigger an assert.
@arsenm Thoughts?
If this is uncovering a latent issue then I would like to go with legalizeCustom() and follow @davemgreen's suggestion. This also aligns with what RISCV is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for them how?

The specific instruction AArch64 wants to use is only indirectly related

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found that the assert is triggering for llvm.abs.i64 case in AMDGPU/GlobalISel/llvm.abs.ll. Constraining the lowering to < 8 bytes has no assert. Do you know what might be causing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know what that has to do with abs, but this isn't a well formed query for G_ICMP legality. G_ICMP has 2 type indexes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I was looking at how G_SSUBSAT is using G_SMIN and followed the same approach.
Anyway, updated the branch to check both type indexes.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think considering these are fairly AArch64 specific instructions, it might be best to just make this custom, in the same way we have it for SDAG.

@@ -515,6 +515,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
getActionDefinitionsBuilder(G_ICMP)
.legalFor({{s32, s32},
{s32, s64},
{s64, s64},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the select below is marking a s64 condition code as legal, which Im not sure is what we would want. We have the clampScalar(0, s32, s32) below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, s64 case doesn't work without this. Why don't want this to be legal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the flags between the cmp+select. I'm not sure what the ramifications of making them i64 would be, but it might be better if we can keep them as i32 if we can, to avoid awkward cases like select(anyext(cmp(..)), ..).

The nodes should be re-visited until they are legal. It looks like G_SELECT isn't handling narrowScalar yet, probably because it hasn't come up before:

LegalizerHelper::LegalizeResult
LegalizerHelper::narrowScalarSelect(MachineInstr &MI, unsigned TypeIdx,
                                    LLT NarrowTy) {
  if (TypeIdx != 0)
    return UnableToLegalize;

Hopefully if that is handled correctly for the icmp/sel, it will work better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{s64, s64} is legal for G_SELECT but further the index is clamped to s32. I don't understand why this is done.
Inspiring from this I made {s64, s64} legal for G_ICMP.

case G_ABS:
case G_ABS: {
LLT Ty = MRI.getType(MI.getOperand(0).getReg());
if (LI.isLegalOrCustom({G_ABS, Ty}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem might be that there is no good way with the isLegal calls to check if there is a single instruction that does "select and conditional neg" at the same time, in the way csneg would do.

@madhur13490
Copy link
Contributor Author

I think considering these are fairly AArch64 specific instructions, it might be best to just make this custom, in the same way we have it for SDAG.

Do you mean just use customifScalar and not have anything in lower()?

@madhur13490 madhur13490 force-pushed the madhur13490/cneg branch 2 times, most recently from 0c41e1a to 49d5931 Compare March 18, 2024 14:48
@tschuett tschuett self-requested a review March 18, 2024 19:40
case G_ABS: {
LLT Ty = MRI.getType(MI.getOperand(0).getReg());
if (Ty.isScalar() && LI.isLegalOrCustom({G_ICMP, {Ty, Ty}}))
return lowerAbsToCNeg(MI);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably should check for:

isLegalOrCustom({G_ICMP, {Ty, Ty}}) &&
isLegalOrCustom({G_SELECT, {Ty, Ty}}) &&
isLegalOrCustom({G_SUB, {Ty}}) &&
isLegalOrCustom({G_CONSTANT, {Ty}})

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't know if there is a good check that captures what we would want through the isLegalOrCustom interface. I think many targets would want the ashr/AddXor form even if they have legal sub+sel. I would remove this code and just keep the AArch64LegalizerInfo::legalizeCustom method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the same type for the compare result isn't right, but we don't have a great API for this. We kind of want the equivalent of getSetCCResultType. A more awkward solution would be to query with s1 and then look through the legalize actions until you find transforms on the original. Failing that, using i1 as the basis of the query is a more reasonable choice I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point is that we should do something akin to the DAG. We check for the legality of icmp and then select a select without checking whether it is legal or not. The icmp check is insufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tschuett How would that handle @arsenm's concern?

I agree with @davemgreen's approach and will update the patch accordingly. I would not like to touch lower as of now and handle it in legalizeCustom. Given that we don't have the right APIs, the right approach would be handle this separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are independent issues. We have no API to query the result type from compares. But building a Select without checking whether it is legal to do so may cause trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that would still reuse the same type (as @arsenm pointed out). Isn't it?

"Reusing the same type for the compare result isn't right"

So, how would it solve the core issue?
I understand the lack of API and thus I would like to avoid changing lower() in this patch.

@arsenm
Copy link
Contributor

arsenm commented Mar 19, 2024

Do you mean just use customifScalar and not have anything in lower()?

Regardless of what AArch64 wants or does, we should still try to pick a preferred lowering in the default implementation of lower.

This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.

codegen with GISel for > 128 bit types is not very good
with these method so not doing so.
@madhur13490
Copy link
Contributor Author

Updated the patch as discussed above. It also handles G_SELECT issue.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Thanks.

@madhur13490 madhur13490 merged commit 7bb87d5 into llvm:main Mar 21, 2024
4 checks passed
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
)

This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.

codegen with GISel for > 128 bit types is not very good
with these method so not doing so.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants