Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Fix mul combine for MUL24 #79110

Merged
merged 5 commits into from
Jan 29, 2024
Merged

Conversation

Pierre-vh
Copy link
Contributor

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.

Fixes SWDEV-436654

MUL24 can now return  a i64 for i32 operands, but the combine was never updated to handle this case.
Extend the operand when rewriting the ADD to handle it.
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 23, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it.

Fixes SWDEV-436654


Full diff: https://github.com/llvm/llvm-project/pull/79110.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+2-2)
  • (added) llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll (+47)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 55d95154c75878b..109e86eb4117a2f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4246,12 +4246,12 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
   // operands, so we have to place the mul in the LHS
   if (SDValue MulOper = IsFoldableAdd(N0)) {
     SDValue MulVal = DAG.getNode(N->getOpcode(), DL, VT, N1, MulOper);
-    return DAG.getNode(ISD::ADD, DL, VT, MulVal, N1);
+    return DAG.getNode(ISD::ADD, DL, VT, MulVal, DAG.getZExtOrTrunc(N1, DL, VT));
   }
 
   if (SDValue MulOper = IsFoldableAdd(N1)) {
     SDValue MulVal = DAG.getNode(N->getOpcode(), DL, VT, N0, MulOper);
-    return DAG.getNode(ISD::ADD, DL, VT, MulVal, N0);
+    return DAG.getNode(ISD::ADD, DL, VT, MulVal, DAG.getZExtOrTrunc(N0, DL, VT));
   }
 
   // Skip if already mul24.
diff --git a/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll b/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll
new file mode 100644
index 000000000000000..624c8aa859c0939
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mul-combine-crash.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
+
+; Checks that the DAG mul combine can handle a MUL24 with a i32 and i64
+; operand.
+
+define i64 @test(i64 %x, i32 %z) {
+; CHECK-LABEL: test:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_and_b32_e32 v0, 0xff, v0
+; CHECK-NEXT:    v_and_b32_e32 v2, 1, v2
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[0:1]
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %a = add i64 %x, 0
+  %b = and i64 %a, 255
+  %c = and i32 %z, 1
+  %d = add nuw nsw i32 %c, 1
+  %e = zext nneg i32 %d to i64
+  %f = mul nuw nsw i64 %b, %e
+  %g = add nuw nsw i64 %f, 0
+  ret i64 %g
+}
+
+define i64 @test_swapped(i64 %x, i32 %z) {
+; CHECK-LABEL: test_swapped:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_and_b32_e32 v0, 0xff, v0
+; CHECK-NEXT:    v_and_b32_e32 v2, 1, v2
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[0:1]
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %a = add i64 %x, 0
+  %b = and i64 %a, 255
+  %c = and i32 %z, 1
+  %d = add nuw nsw i32 %c, 1
+  %e = zext nneg i32 %d to i64
+  %f = mul nuw nsw i64 %e, %b
+  %g = add nuw nsw i64 %f, 0
+  ret i64 %g
+}

Copy link

github-actions bot commented Jan 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.

@Pierre-vh
Copy link
Contributor Author

"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.

Ah indeed, good catch.
Should the combine just be removed? (cc @arsenm - I think you wrote this one)

@jayfoad
Copy link
Contributor

jayfoad commented Jan 24, 2024

"mul24 x, (add y, 1) -> add (mul24 x, y), x" does not seem safe at all. You know that y+1 is a 24-bit value, but that does not guarantee that y is a 24-bit value.

Ah indeed, good catch. Should the combine just be removed?

I guess it's safe for a regular mul, so maybe move the "Skip if already mul24" up a bit?

@arsenm
Copy link
Contributor

arsenm commented Jan 25, 2024

This probably needs work to extend for the mul24 cases, with some known bits checks. Probably should just skip for it

@@ -4254,10 +4254,6 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
return DAG.getNode(ISD::ADD, DL, VT, MulVal, N0);
}

// Skip if already mul24.
if (N->getOpcode() != ISD::MUL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should turn this into an assert? Is there definitely nothing else in performMulCombine that can operate on mul24?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is anything else it can do at first glance

@Pierre-vh Pierre-vh requested a review from arsenm January 29, 2024 12:28
%d = add nuw nsw i32 %c, 1
%e = zext nneg i32 %d to i64
%f = mul nuw nsw i64 %e, %b
%g = add nuw nsw i64 %f, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the adds of 0 really necessary? Can the test merge in with the existing test?

@Pierre-vh Pierre-vh merged commit ce72f78 into llvm:main Jan 29, 2024
3 of 4 checks passed
@Pierre-vh Pierre-vh deleted the fix-mul-comb branch January 29, 2024 15:37
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 28, 2024
MUL24 can now return a i64 for i32 operands, but the combine was never
updated to handle this case. Extend the operand when rewriting the ADD
to handle it.

Fixes SWDEV-436654

Change-Id: I0d0bf5b5f45b901f2ec60bdefb9597233d1ce482
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants