-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][GlobalISel] Take abs scalar codegen closer to SDAG #84886
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-globalisel Author: Madhur Amilkanthwar (madhur13490) ChangesThis patch improves codegen for scalar (<128bits) version codegen with GISel for > 128 bit types is not very good Full diff: https://github.com/llvm/llvm-project/pull/84886.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 36adada2796531..6aa43f8b14a782 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1021,6 +1021,12 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.clampNumElements(0, v2s32, v4s32)
.clampNumElements(0, v2s64, v2s64)
.moreElementsToNextPow2(0)
+ // Do SMax based lowering for < 128 bits.
+ .customIf(
+ [=](const LegalityQuery &Q) {
+ LLT SrcTy = Q.Types[0];
+ return SrcTy.isScalar() && SrcTy.getSizeInBits() < 128;
+ })
.lower();
// For fadd reductions we have pairwise operations available. We treat the
@@ -1262,6 +1268,8 @@ bool AArch64LegalizerInfo::legalizeCustom(
return legalizeDynStackAlloc(MI, Helper);
case TargetOpcode::G_PREFETCH:
return legalizePrefetch(MI, Helper);
+ case TargetOpcode::G_ABS:
+ return Helper.lowerAbsToMaxNeg(MI);
}
llvm_unreachable("expected switch to return");
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir
index 3123e304116fe5..cd27d4498e0ef9 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-abs.mir
@@ -8,11 +8,12 @@ body: |
bb.0:
; CHECK-LABEL: name: abs_s32
; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 31
- ; CHECK-NEXT: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[COPY]], [[C]](s64)
- ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[COPY]], [[ASHR]]
- ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s32) = G_XOR [[ADD]], [[ASHR]]
- ; CHECK-NEXT: $w0 = COPY [[XOR]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C]], [[COPY]]
+ ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(sgt), [[COPY]](s32), [[SUB]]
+ ; CHECK-NEXT: [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[ICMP]](s32), [[COPY]], [[SUB]]
+ ; CHECK-NEXT: $w0 = COPY [[SELECT]](s32)
+ ;
; CHECK-CSSC-LABEL: name: abs_s32
; CHECK-CSSC: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-CSSC-NEXT: [[ABS:%[0-9]+]]:_(s32) = G_ABS [[COPY]]
@@ -28,11 +29,12 @@ body: |
bb.0:
; CHECK-LABEL: name: abs_s64
; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 63
- ; CHECK-NEXT: [[ASHR:%[0-9]+]]:_(s64) = G_ASHR [[COPY]], [[C]](s64)
- ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[COPY]], [[ASHR]]
- ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s64) = G_XOR [[ADD]], [[ASHR]]
- ; CHECK-NEXT: $x0 = COPY [[XOR]](s64)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
+ ; CHECK-NEXT: [[SUB:%[0-9]+]]:_(s64) = G_SUB [[C]], [[COPY]]
+ ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(sgt), [[COPY]](s64), [[SUB]]
+ ; CHECK-NEXT: [[SELECT:%[0-9]+]]:_(s64) = G_SELECT [[ICMP]](s32), [[COPY]], [[SUB]]
+ ; CHECK-NEXT: $x0 = COPY [[SELECT]](s64)
+ ;
; CHECK-CSSC-LABEL: name: abs_s64
; CHECK-CSSC: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
; CHECK-CSSC-NEXT: [[ABS:%[0-9]+]]:_(s64) = G_ABS [[COPY]]
@@ -55,6 +57,7 @@ body: |
; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<4 x s16>) = G_ABS [[COPY]]
; CHECK-NEXT: $d0 = COPY [[ABS]](<4 x s16>)
; CHECK-NEXT: RET_ReallyLR implicit $d0
+ ;
; CHECK-CSSC-LABEL: name: abs_v4s16
; CHECK-CSSC: liveins: $d0
; CHECK-CSSC-NEXT: {{ $}}
@@ -82,6 +85,7 @@ body: |
; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<8 x s16>) = G_ABS [[COPY]]
; CHECK-NEXT: $q0 = COPY [[ABS]](<8 x s16>)
; CHECK-NEXT: RET_ReallyLR implicit $q0
+ ;
; CHECK-CSSC-LABEL: name: abs_v8s16
; CHECK-CSSC: liveins: $q0
; CHECK-CSSC-NEXT: {{ $}}
@@ -109,6 +113,7 @@ body: |
; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<2 x s32>) = G_ABS [[COPY]]
; CHECK-NEXT: $d0 = COPY [[ABS]](<2 x s32>)
; CHECK-NEXT: RET_ReallyLR implicit $d0
+ ;
; CHECK-CSSC-LABEL: name: abs_v2s32
; CHECK-CSSC: liveins: $d0
; CHECK-CSSC-NEXT: {{ $}}
@@ -136,6 +141,7 @@ body: |
; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<4 x s32>) = G_ABS [[COPY]]
; CHECK-NEXT: $q0 = COPY [[ABS]](<4 x s32>)
; CHECK-NEXT: RET_ReallyLR implicit $q0
+ ;
; CHECK-CSSC-LABEL: name: abs_v4s32
; CHECK-CSSC: liveins: $q0
; CHECK-CSSC-NEXT: {{ $}}
@@ -163,6 +169,7 @@ body: |
; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<8 x s8>) = G_ABS [[COPY]]
; CHECK-NEXT: $d0 = COPY [[ABS]](<8 x s8>)
; CHECK-NEXT: RET_ReallyLR implicit $d0
+ ;
; CHECK-CSSC-LABEL: name: abs_v4s8
; CHECK-CSSC: liveins: $d0
; CHECK-CSSC-NEXT: {{ $}}
@@ -190,6 +197,7 @@ body: |
; CHECK-NEXT: [[ABS:%[0-9]+]]:_(<16 x s8>) = G_ABS [[COPY]]
; CHECK-NEXT: $q0 = COPY [[ABS]](<16 x s8>)
; CHECK-NEXT: RET_ReallyLR implicit $q0
+ ;
; CHECK-CSSC-LABEL: name: abs_v16s8
; CHECK-CSSC: liveins: $q0
; CHECK-CSSC-NEXT: {{ $}}
diff --git a/llvm/test/CodeGen/AArch64/abs.ll b/llvm/test/CodeGen/AArch64/abs.ll
index e00f70b94e3b42..bfedd2bce8c86c 100644
--- a/llvm/test/CodeGen/AArch64/abs.ll
+++ b/llvm/test/CodeGen/AArch64/abs.ll
@@ -15,9 +15,9 @@ define i8 @abs_i8(i8 %a){
; CHECK-GI-LABEL: abs_i8:
; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: sxtb w8, w0
-; CHECK-GI-NEXT: asr w8, w8, #7
-; CHECK-GI-NEXT: add w9, w0, w8
-; CHECK-GI-NEXT: eor w0, w9, w8
+; CHECK-GI-NEXT: neg w9, w0
+; CHECK-GI-NEXT: cmp w8, w9, sxtb
+; CHECK-GI-NEXT: cneg w0, w0, le
; CHECK-GI-NEXT: ret
entry:
%res = call i8 @llvm.abs.i8(i8 %a, i1 0)
@@ -36,9 +36,9 @@ define i16 @abs_i16(i16 %a){
; CHECK-GI-LABEL: abs_i16:
; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: sxth w8, w0
-; CHECK-GI-NEXT: asr w8, w8, #15
-; CHECK-GI-NEXT: add w9, w0, w8
-; CHECK-GI-NEXT: eor w0, w9, w8
+; CHECK-GI-NEXT: neg w9, w0
+; CHECK-GI-NEXT: cmp w8, w9, sxth
+; CHECK-GI-NEXT: cneg w0, w0, le
; CHECK-GI-NEXT: ret
entry:
%res = call i16 @llvm.abs.i16(i16 %a, i1 0)
@@ -55,9 +55,9 @@ define i32 @abs_i32(i32 %a){
;
; CHECK-GI-LABEL: abs_i32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: asr w8, w0, #31
-; CHECK-GI-NEXT: add w9, w0, w8
-; CHECK-GI-NEXT: eor w0, w9, w8
+; CHECK-GI-NEXT: neg w8, w0
+; CHECK-GI-NEXT: cmp w0, w8
+; CHECK-GI-NEXT: cneg w0, w0, le
; CHECK-GI-NEXT: ret
entry:
%res = call i32 @llvm.abs.i32(i32 %a, i1 0)
@@ -74,9 +74,9 @@ define i64 @abs_i64(i64 %a){
;
; CHECK-GI-LABEL: abs_i64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: asr x8, x0, #63
-; CHECK-GI-NEXT: add x9, x0, x8
-; CHECK-GI-NEXT: eor x0, x9, x8
+; CHECK-GI-NEXT: neg x8, x0
+; CHECK-GI-NEXT: cmp x0, x8
+; CHECK-GI-NEXT: cneg x0, x0, le
; CHECK-GI-NEXT: ret
entry:
%res = call i64 @llvm.abs.i64(i64 %a, i1 0)
@@ -248,9 +248,9 @@ define <1 x i32> @abs_v1i32(<1 x i32> %a){
; CHECK-GI-LABEL: abs_v1i32:
; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: fmov w8, s0
-; CHECK-GI-NEXT: asr w9, w8, #31
-; CHECK-GI-NEXT: add w8, w8, w9
-; CHECK-GI-NEXT: eor w8, w8, w9
+; CHECK-GI-NEXT: neg w9, w8
+; CHECK-GI-NEXT: cmp w8, w9
+; CHECK-GI-NEXT: cneg w8, w8, le
; CHECK-GI-NEXT: fmov s0, w8
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-GI-NEXT: ret
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
7ec0ad3
to
d1e00f3
Compare
@@ -1021,6 +1021,11 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) | |||
.clampNumElements(0, v2s32, v4s32) | |||
.clampNumElements(0, v2s64, v2s64) | |||
.moreElementsToNextPow2(0) | |||
// Do SMax based lowering for < 128 bits. | |||
.customIf([=](const LegalityQuery &Q) { | |||
LLT SrcTy = Q.Types[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default implementation of lower make a better guess based on legal operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The customIf action should also go up below the legal cases
Do you have an mca proof that smax is better than xor? |
I understand the question but my intention of the patch is to take the codegen closer to what SDAG is doing. However, if you see the i8 case, it is not quite close yet and I am also not sure if the final assembly is right. Ideally, it should just be cmp+cneg (which is what SDAG) is doing. If we get it closer to SDAG's codegen, then the benefit would be visible as Global ISel's current lowering leads to 3 instructions whereas SDAG is using just cmp+cneg. |
I agree with Matt, |
Given we have a csneg instruction, do we just want |
d1e00f3
to
62e994b
Compare
@davemgreen Yes, I was experimenting with this idea locally after this patch but was running into legalizing G_SELECT issue. I have fixed it now and the new commit now introduces cneg-based lowering. Please have a look. @arsenm @aemerson Given that there is a different approach now, do you still think |
Yes. Currently the lower implementation just calls lowerAbsToAddXor. You can conditionally call your new implementation based on isLegalOrCustom for the type with the relevant opcodes used in the expansion |
Even if SMAX and lowerAbsToAddXor are legal, shouldn't we have the means to say for FOO cores, we prefer SMAX, and for BAR cores we prefer Xor? |
You can always put the decision logic in custom lowering, but I think the default lower should try to guess a reasonable default |
62e994b
to
3ab319f
Compare
I have updated. Please have a look. |
case G_ABS: | ||
case G_ABS: { | ||
LLT Ty = MRI.getType(MI.getOperand(0).getReg()); | ||
if (LI.isLegalOrCustom({G_ABS, Ty})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the G_ABS legality doesn't make sense in the G_ABS lowering implementation. You should need to check the operations in the expansion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem might be that there is no good way with the isLegal calls to check if there is a single instruction that does "select and conditional neg" at the same time, in the way csneg would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be I can check for G_SELECT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, the DAG has this chain of implementations:
SDValue TargetLowering::expandABS(SDNode *N, SelectionDAG &DAG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for G_SELECT and G_ICMP leads to assertion when compiled for AMDGPU in abs.ll.
const T &llvm::ArrayRef<llvm::LLT>::operator[](size_t) const [T = llvm::LLT]: Assertion Index < Length && "Invalid index!" failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLT Ty = MRI.getType(MI.getOperand(0).getReg());
dbgs() << LI.isLegalOrCustom({G_ICMP, Ty});
Leads to an assert for llvm.abs.ll test. This doesn't need my patch to trigger an assert.
@arsenm Thoughts?
If this is uncovering a latent issue then I would like to go with legalizeCustom()
and follow @davemgreen's suggestion. This also aligns with what RISCV is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for them how?
The specific instruction AArch64 wants to use is only indirectly related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found that the assert is triggering for llvm.abs.i64
case in AMDGPU/GlobalISel/llvm.abs.ll
. Constraining the lowering to < 8 bytes has no assert. Do you know what might be causing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know what that has to do with abs, but this isn't a well formed query for G_ICMP legality. G_ICMP has 2 type indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I was looking at how G_SSUBSAT is using G_SMIN and followed the same approach.
Anyway, updated the branch to check both type indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think considering these are fairly AArch64 specific instructions, it might be best to just make this custom, in the same way we have it for SDAG.
@@ -515,6 +515,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) | |||
getActionDefinitionsBuilder(G_ICMP) | |||
.legalFor({{s32, s32}, | |||
{s32, s64}, | |||
{s64, s64}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the select below is marking a s64 condition code as legal, which Im not sure is what we would want. We have the clampScalar(0, s32, s32) below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, s64 case doesn't work without this. Why don't want this to be legal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the flags between the cmp+select. I'm not sure what the ramifications of making them i64 would be, but it might be better if we can keep them as i32 if we can, to avoid awkward cases like select(anyext(cmp(..)), ..).
The nodes should be re-visited until they are legal. It looks like G_SELECT isn't handling narrowScalar yet, probably because it hasn't come up before:
LegalizerHelper::LegalizeResult
LegalizerHelper::narrowScalarSelect(MachineInstr &MI, unsigned TypeIdx,
LLT NarrowTy) {
if (TypeIdx != 0)
return UnableToLegalize;
Hopefully if that is handled correctly for the icmp/sel, it will work better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{s64, s64} is legal for G_SELECT but further the index is clamped to s32. I don't understand why this is done.
Inspiring from this I made {s64, s64} legal for G_ICMP.
case G_ABS: | ||
case G_ABS: { | ||
LLT Ty = MRI.getType(MI.getOperand(0).getReg()); | ||
if (LI.isLegalOrCustom({G_ABS, Ty})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem might be that there is no good way with the isLegal calls to check if there is a single instruction that does "select and conditional neg" at the same time, in the way csneg would do.
Do you mean just use |
0c41e1a
to
49d5931
Compare
case G_ABS: { | ||
LLT Ty = MRI.getType(MI.getOperand(0).getReg()); | ||
if (Ty.isScalar() && LI.isLegalOrCustom({G_ICMP, {Ty, Ty}})) | ||
return lowerAbsToCNeg(MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably should check for:
isLegalOrCustom({G_ICMP, {Ty, Ty}}) &&
isLegalOrCustom({G_SELECT, {Ty, Ty}}) &&
isLegalOrCustom({G_SUB, {Ty}}) &&
isLegalOrCustom({G_CONSTANT, {Ty}})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't know if there is a good check that captures what we would want through the isLegalOrCustom interface. I think many targets would want the ashr/AddXor form even if they have legal sub+sel. I would remove this code and just keep the AArch64LegalizerInfo::legalizeCustom method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing the same type for the compare result isn't right, but we don't have a great API for this. We kind of want the equivalent of getSetCCResultType. A more awkward solution would be to query with s1 and then look through the legalize actions until you find transforms on the original. Failing that, using i1 as the basis of the query is a more reasonable choice I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main point is that we should do something akin to the DAG. We check for the legality of icmp and then select a select without checking whether it is legal or not. The icmp check is insufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tschuett How would that handle @arsenm's concern?
I agree with @davemgreen's approach and will update the patch accordingly. I would not like to touch lower
as of now and handle it in legalizeCustom
. Given that we don't have the right APIs, the right approach would be handle this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are independent issues. We have no API to query the result type from compares. But building a Select without checking whether it is legal to do so may cause trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that would still reuse the same type (as @arsenm pointed out). Isn't it?
"Reusing the same type for the compare result isn't right"
So, how would it solve the core issue?
I understand the lack of API and thus I would like to avoid changing lower()
in this patch.
Regardless of what AArch64 wants or does, we should still try to pick a preferred lowering in the default implementation of lower. |
This patch improves codegen for scalar (<128bits) version of llvm.abs intrinsic by using the existing non-XOR based lowering. This takes the generated code closer to SDAG. codegen with GISel for > 128 bit types is not very good with these method so not doing so.
49d5931
to
8e7134f
Compare
Updated the patch as discussed above. It also handles G_SELECT issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Thanks.
This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.
codegen with GISel for > 128 bit types is not very good
with these method so not doing so.