Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DXIL] Implement pow lowering #86733

Merged
merged 1 commit into from
Mar 28, 2024
Merged

[DXIL] Implement pow lowering #86733

merged 1 commit into from
Mar 28, 2024

Conversation

farzonl
Copy link
Member

@farzonl farzonl commented Mar 26, 2024

closes #86179

  • DXILIntrinsicExpansion.cpp - add the pow expansion to exp2(y*log2(x))

closes llvm#86179
- `DXILIntrinsicExpansion.cpp` - add the pow expansion to exp2(y*log2(x))
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 26, 2024

@llvm/pr-subscribers-backend-directx

Author: Farzon Lotfi (farzonl)

Changes

closes #86179

  • DXILIntrinsicExpansion.cpp - add the pow expansion to exp2(y*log2(x))

Full diff: https://github.com/llvm/llvm-project/pull/86733.diff

3 Files Affected:

  • (modified) llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp (+23)
  • (added) llvm/test/CodeGen/DirectX/pow-vec.ll (+15)
  • (added) llvm/test/CodeGen/DirectX/pow.ll (+29)
diff --git a/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp b/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
index 3cc375edabde92..3e2d10f5ee7a23 100644
--- a/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
+++ b/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
@@ -37,6 +37,7 @@ static bool isIntrinsicExpansion(Function &F) {
   case Intrinsic::exp:
   case Intrinsic::log:
   case Intrinsic::log10:
+  case Intrinsic::pow:
   case Intrinsic::dx_any:
   case Intrinsic::dx_clamp:
   case Intrinsic::dx_uclamp:
@@ -197,6 +198,26 @@ static bool expandLog10Intrinsic(CallInst *Orig) {
   return expandLogIntrinsic(Orig, numbers::ln2f / numbers::ln10f);
 }
 
+static bool expandPowIntrinsic(CallInst *Orig) {
+
+  Value *X = Orig->getOperand(0);
+  Value *Y = Orig->getOperand(1);
+  Type *Ty = X->getType();
+  IRBuilder<> Builder(Orig->getParent());
+  Builder.SetInsertPoint(Orig);
+
+  auto *Log2Call =
+      Builder.CreateIntrinsic(Ty, Intrinsic::log2, {X}, nullptr, "elt.log2");
+  auto *Mul = Builder.CreateFMul(Log2Call, Y);
+  auto *Exp2Call =
+      Builder.CreateIntrinsic(Ty, Intrinsic::exp2, {Mul}, nullptr, "elt.exp2");
+  Exp2Call->setTailCall(Orig->isTailCall());
+  Exp2Call->setAttributes(Orig->getAttributes());
+  Orig->replaceAllUsesWith(Exp2Call);
+  Orig->eraseFromParent();
+  return true;
+}
+
 static bool expandRcpIntrinsic(CallInst *Orig) {
   Value *X = Orig->getOperand(0);
   IRBuilder<> Builder(Orig->getParent());
@@ -270,6 +291,8 @@ static bool expandIntrinsic(Function &F, CallInst *Orig) {
     return expandLogIntrinsic(Orig);
   case Intrinsic::log10:
     return expandLog10Intrinsic(Orig);
+  case Intrinsic::pow:
+    return expandPowIntrinsic(Orig);
   case Intrinsic::dx_any:
     return expandAnyIntrinsic(Orig);
   case Intrinsic::dx_uclamp:
diff --git a/llvm/test/CodeGen/DirectX/pow-vec.ll b/llvm/test/CodeGen/DirectX/pow-vec.ll
new file mode 100644
index 00000000000000..781fa5b8cb2409
--- /dev/null
+++ b/llvm/test/CodeGen/DirectX/pow-vec.ll
@@ -0,0 +1,15 @@
+; RUN: opt -S  -dxil-intrinsic-expansion  < %s | FileCheck %s
+
+; Make sure dxil operation function calls for pow are generated for float and half.
+
+; CHECK-LABEL: pow_float4
+; CHECK: call <4 x float> @llvm.log2.v4f32(<4 x float>  %a)
+; CHECK: fmul <4 x float> %{{.*}}, %b
+; CHECK: call <4 x float> @llvm.exp2.v4f32(<4 x float>  %{{.*}})
+define noundef <4 x float> @pow_float4(<4 x float> noundef %a, <4 x float> noundef %b) {
+entry:
+  %elt.pow = call <4 x float> @llvm.pow.v4f32(<4 x float> %a, <4 x float> %b)
+  ret <4 x float> %elt.pow
+}
+
+declare <4 x float> @llvm.pow.v4f32(<4 x float>,<4 x float>)
diff --git a/llvm/test/CodeGen/DirectX/pow.ll b/llvm/test/CodeGen/DirectX/pow.ll
new file mode 100644
index 00000000000000..25ce0fe731d0ba
--- /dev/null
+++ b/llvm/test/CodeGen/DirectX/pow.ll
@@ -0,0 +1,29 @@
+; RUN: opt -S  -dxil-intrinsic-expansion  < %s | FileCheck %s --check-prefixes=CHECK,EXPCHECK
+; RUN: opt -S  -dxil-op-lower  < %s | FileCheck %s --check-prefixes=CHECK,DOPCHECK
+
+; Make sure dxil operation function calls for pow are generated.
+
+define noundef float @pow_float(float noundef %a, float noundef %b) {
+entry:
+; DOPCHECK: call float @dx.op.unary.f32(i32 23, float %a)
+; EXPCHECK: call float @llvm.log2.f32(float %a)
+; CHECK: fmul float %{{.*}}, %b
+; DOPCHECK: call float @dx.op.unary.f32(i32 21, float %{{.*}})
+; EXPCHECK: call float @llvm.exp2.f32(float %{{.*}})
+  %elt.pow = call float @llvm.pow.f32(float %a, float %b)
+  ret float %elt.pow
+}
+
+define noundef half @pow_half(half noundef %a, half noundef %b) {
+entry:
+; DOPCHECK: call half @dx.op.unary.f16(i32 23, half %a)
+; EXPCHECK: call half @llvm.log2.f16(half %a)
+; CHECK: fmul half %{{.*}}, %b
+; DOPCHECK: call half @dx.op.unary.f16(i32 21, half %{{.*}})
+; EXPCHECK: call half @llvm.exp2.f16(half %{{.*}})
+  %elt.pow = call half @llvm.pow.f16(half %a, half %b)
+  ret half %elt.pow
+}
+
+declare half @llvm.pow.f16(half,half)
+declare float @llvm.pow.f32(float,float)

Comment on lines +214 to +215
Exp2Call->setTailCall(Orig->isTailCall());
Exp2Call->setAttributes(Orig->getAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there test coverage for these tail call / attributes values being copied from Orig to Exp2Call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was copying an existing pattern, but you raise an interesting point. We don't seem to test for attributes in any of our tests. Should I remove these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know...it's the sort of thing that looks like it's important and worth doing, but what do I know?! If you can't think of a way to write a test that observes whether or not these have been copied properly then maybe it shouldn't be there. Maybe this is something we should deal with in a follow-up issue, since it sounds like it is more widespread than just the code touched in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll get some clarification from folks and do a broad fix if needed.

@farzonl farzonl merged commit 36b8643 into llvm:main Mar 28, 2024
6 checks passed
@farzonl farzonl deleted the dxil-pow-lowering branch March 28, 2024 16:32
@ilovepi
Copy link
Contributor

ilovepi commented Apr 24, 2024

Hi, I'm seeing an assertion failure that seems to be due to this patch.
I bisected it to this patch. I'm including a reproducer (not reduced) and my bisect script.

Failing bot: https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.arm64-debug-subbuild/b8749763985778255185/overview
Logs: https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8749760927129551041/+/u/clang/build/stdout

I bisected with

git bisect good  0f61051f541a5b8cfce25c84262dfdbadb9ca688
git bisect bad d3f6a88a1fb36f94c71940514e576821c6cc3ade
git bisect run bisect.sh

I checked the good and bad SHA's multiple times, and the good commit is one commit prior to yours.

Note if you want to use the bisect script you may need to update paths. The .cpp and other .sh are from the generated reproducer, so should work if you update the path to clang in the script.

I have creduce running now, so I can probably post a smaller reproducer in a while.

In the meantime, can you take a look or possibly revert this until a fix is ready?

repro.zip

@farzonl
Copy link
Member Author

farzonl commented Apr 24, 2024

Hi, I'm seeing an assertion failure that seems to be due to this patch. I bisected it to this patch. I'm including a reproducer (not reduced) and my bisect script.

Failing bot: https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.arm64-debug-subbuild/b8749763985778255185/overview Logs: https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8749760927129551041/+/u/clang/build/stdout

I bisected with

git bisect good  0f61051f541a5b8cfce25c84262dfdbadb9ca688
git bisect bad d3f6a88a1fb36f94c71940514e576821c6cc3ade
git bisect run bisect.sh

I checked the good and bad SHA's multiple times, and the good commit is one commit prior to yours.

Note if you want to use the bisect script you may need to update paths. The .cpp and other .sh are from the generated reproducer, so should work if you update the path to clang in the script.

I have creduce running now, so I can probably post a smaller reproducer in a while.

In the meantime, can you take a look or possibly revert this until a fix is ready?

repro.zip

This seems very unlikely. This change is in an experimental backend that isn’t enabled on any build bots. I don't see this file (llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp) compiled at all in the logs provided. But if you can provide instructions on how to cross build for fuscia/arm64 I’ll investigate.

@ilovepi
Copy link
Contributor

ilovepi commented Apr 24, 2024

While I'd normally agree, building clang w/ your commit hits the assertion, and one commit prior does not.

The reproducer script should work for you. Fuchsia is always available in the compiler as a target. Everything else should just work, so long as the aarch64 backend is enabled. The reproducer is a preprocessed cpp file.

@ilovepi
Copy link
Contributor

ilovepi commented Apr 24, 2024

Also, I totally agree that this seems unlikely for what its worth, but I don't know how to reconcile the assertion not hitting on the prior SHA. Also, thanks for taking a look.

@ilovepi
Copy link
Contributor

ilovepi commented Apr 24, 2024

But if you can provide instructions on how to cross build for fuscia/arm64 I’ll investigate.

The bot in question is cross compiling a normal linux toolchain for use in building Fuchsia on arm64 hosts. You should be able to replicate this issues w/ any x86_64 build of clang AFAIK.

@ilovepi
Copy link
Contributor

ilovepi commented Apr 24, 2024

@farzonl I'm sorry to say that I think you can ignore my issue. I found an error in my bisect script, and discovered that the toolchain I'd pointed the script at was pointed to a clang that wasn't getting updated w/ the clang build. Because of that, when I ran the reproducer script it still pointed to a broken (but not updated version of clang) and I assumed your commit was still bad. So I'm very sorry for wasting your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[DXIL] Add pow intrinsic Lowering
5 participants