-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GlobalISel] Constant-fold G_PTR_ADD with different type sizes #81473
Conversation
All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD. Fixes llvm#81464
@llvm/pr-subscribers-llvm-globalisel Author: Pierre van Houtryve (Pierre-vh) ChangesAll other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD. Fixes #81464 Full diff: https://github.com/llvm/llvm-project/pull/81473.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index 26fd12f9e51c43..d693316dc6e9d7 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -660,8 +660,15 @@ std::optional<APInt> llvm::ConstantFoldBinOp(unsigned Opcode,
default:
break;
case TargetOpcode::G_ADD:
- case TargetOpcode::G_PTR_ADD:
return C1 + C2;
+ case TargetOpcode::G_PTR_ADD: {
+ // Types can be of different width here.
+ if (C1.getBitWidth() < C2.getBitWidth())
+ return C1.zext(C1.getBitWidth()) + C2;
+ if (C1.getBitWidth() > C2.getBitWidth())
+ return C2.zext(C1.getBitWidth()) + C1;
+ return C1 + C2;
+ }
case TargetOpcode::G_AND:
return C1 & C2;
case TargetOpcode::G_ASHR:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
new file mode 100644
index 00000000000000..13be65612fa855
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
@@ -0,0 +1,20 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: test_ptradd_crash
+tracksRegLiveness: true
+body: |
+ bb.0:
+ ; CHECK-LABEL: name: test_ptradd_crash
+ ; CHECK: [[C:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+ ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[C]](p1) :: (load (s32), addrspace 1)
+ ; CHECK-NEXT: $sgpr0 = COPY [[LOAD]](s32)
+ ; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0
+ %1:_(p1) = G_CONSTANT i64 0
+ %3:_(s32) = G_CONSTANT i32 0
+ %0:_(<4 x s32>) = G_LOAD %1 :: (load (<4 x s32>) from `ptr addrspace(1) null`, addrspace 1)
+ %2:_(s32) = G_EXTRACT_VECTOR_ELT %0, %3
+ $sgpr0 = COPY %2
+ SI_RETURN_TO_EPILOG implicit $sgpr0
+...
|
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesAll other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD. Fixes #81464 Full diff: https://github.com/llvm/llvm-project/pull/81473.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index 26fd12f9e51c43..d693316dc6e9d7 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -660,8 +660,15 @@ std::optional<APInt> llvm::ConstantFoldBinOp(unsigned Opcode,
default:
break;
case TargetOpcode::G_ADD:
- case TargetOpcode::G_PTR_ADD:
return C1 + C2;
+ case TargetOpcode::G_PTR_ADD: {
+ // Types can be of different width here.
+ if (C1.getBitWidth() < C2.getBitWidth())
+ return C1.zext(C1.getBitWidth()) + C2;
+ if (C1.getBitWidth() > C2.getBitWidth())
+ return C2.zext(C1.getBitWidth()) + C1;
+ return C1 + C2;
+ }
case TargetOpcode::G_AND:
return C1 & C2;
case TargetOpcode::G_ASHR:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
new file mode 100644
index 00000000000000..13be65612fa855
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
@@ -0,0 +1,20 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: test_ptradd_crash
+tracksRegLiveness: true
+body: |
+ bb.0:
+ ; CHECK-LABEL: name: test_ptradd_crash
+ ; CHECK: [[C:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+ ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[C]](p1) :: (load (s32), addrspace 1)
+ ; CHECK-NEXT: $sgpr0 = COPY [[LOAD]](s32)
+ ; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0
+ %1:_(p1) = G_CONSTANT i64 0
+ %3:_(s32) = G_CONSTANT i32 0
+ %0:_(<4 x s32>) = G_LOAD %1 :: (load (<4 x s32>) from `ptr addrspace(1) null`, addrspace 1)
+ %2:_(s32) = G_EXTRACT_VECTOR_ELT %0, %3
+ $sgpr0 = COPY %2
+ SI_RETURN_TO_EPILOG implicit $sgpr0
+...
|
I am not yet convinced.
The pointer and the offset have different types and probably different sizes. p8 could be some weird 128bit pointer in address space 8. |
What do you mean? This handles both cases when C1 > C2 and C2 > C1, tests now covers both issues as well. |
Does |
Or do you need a |
Depends on the operation, if you have |
// truncate in this case. | ||
if (C1.getBitWidth() < C2.getBitWidth()) | ||
return C1 + C2.trunc(C1.getBitWidth()); | ||
return C1 + C2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use C2.zextOrTrunc unconditionally? Also, why is zero extending correct? (Why not sign extending for example?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sext is probably more correct as offsets can be negative, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "more correct" is good enough - this needs to be properly specified somewhere, preferably where G_PTR_ADD is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEXT is the correct one. It's what the legalizer does for widenScalar on G_PTR_ADD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Is constant-folding G_PTR_ADD target-independent? Does it depend on the address space? Do AMDGPU and AArch64 do the same for all address spaces? |
Very good question, and I believe the answer is yes - it depends on the AS. It's safe for most address spaces but in the case that the AS is special (e.g. fat pointer) it's probably not safe. I think we're missing a hook somewhere to ask the target if it's safe to trivially fold G_PTR_ADD as a |
You should check the DataLayout for nonintegral address spaces |
I just noticed it's checked by the caller. That function isn't called for G_PTR_ADD for non-integral AS |
SI_RETURN_TO_EPILOG implicit $sgpr0 | ||
... | ||
|
||
# Tries to emit a foldable G_PTR_ADD with (p1, s128) operands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually succeed in creating G_PTR_ADD with s128 RHS? If not, maybe it would be better to enforce in MachineVerifier that the RHS is no wider than the LHS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, MachineVerifier
doesn't check for it + MachineIRBuilder
doesn't complain
I can make a follow-up patch to enforce RHS <= LHS
if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion. If we do anything, maybe we should enforce that the RHS has to be the width of the index size for the address space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do anything, maybe we should enforce that the RHS has to be the width of the index size for the address space.
I wonder if G_PTR_ADD has the same semantics as IR getelementptr, where only the low order index-size bits of the pointer value are affected. @nikic do you know? |
As we lower getelementptr to G_PTR_ADD, I'd expect so... |
SI_RETURN_TO_EPILOG implicit $sgpr0 | ||
... | ||
|
||
# Tries to emit a foldable G_PTR_ADD with (p1, s128) operands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion. If we do anything, maybe we should enforce that the RHS has to be the width of the index size for the address space.
All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD.
Fixes #81464