Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalISel] Constant-fold G_PTR_ADD with different type sizes #81473

Merged
merged 4 commits into from
Feb 22, 2024

Conversation

Pierre-vh
Copy link
Contributor

All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD.

Fixes #81464

All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD.

Fixes  llvm#81464
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 12, 2024

@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

Changes

All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD.

Fixes #81464


Full diff: https://github.com/llvm/llvm-project/pull/81473.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/Utils.cpp (+8-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir (+20)
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index 26fd12f9e51c43..d693316dc6e9d7 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -660,8 +660,15 @@ std::optional<APInt> llvm::ConstantFoldBinOp(unsigned Opcode,
   default:
     break;
   case TargetOpcode::G_ADD:
-  case TargetOpcode::G_PTR_ADD:
     return C1 + C2;
+  case TargetOpcode::G_PTR_ADD: {
+    // Types can be of different width here.
+    if (C1.getBitWidth() < C2.getBitWidth())
+      return C1.zext(C1.getBitWidth()) + C2;
+    if (C1.getBitWidth() > C2.getBitWidth())
+      return C2.zext(C1.getBitWidth()) + C1;
+    return C1 + C2;
+  }
   case TargetOpcode::G_AND:
     return C1 & C2;
   case TargetOpcode::G_ASHR:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
new file mode 100644
index 00000000000000..13be65612fa855
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
@@ -0,0 +1,20 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name:            test_ptradd_crash
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: test_ptradd_crash
+    ; CHECK: [[C:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+    ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[C]](p1) :: (load (s32), addrspace 1)
+    ; CHECK-NEXT: $sgpr0 = COPY [[LOAD]](s32)
+    ; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0
+    %1:_(p1) = G_CONSTANT i64 0
+    %3:_(s32) = G_CONSTANT i32 0
+    %0:_(<4 x s32>) = G_LOAD %1 :: (load (<4 x s32>) from `ptr addrspace(1) null`, addrspace 1)
+    %2:_(s32) = G_EXTRACT_VECTOR_ELT %0, %3
+    $sgpr0 = COPY %2
+    SI_RETURN_TO_EPILOG implicit $sgpr0
+...

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 12, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD.

Fixes #81464


Full diff: https://github.com/llvm/llvm-project/pull/81473.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/Utils.cpp (+8-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir (+20)
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index 26fd12f9e51c43..d693316dc6e9d7 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -660,8 +660,15 @@ std::optional<APInt> llvm::ConstantFoldBinOp(unsigned Opcode,
   default:
     break;
   case TargetOpcode::G_ADD:
-  case TargetOpcode::G_PTR_ADD:
     return C1 + C2;
+  case TargetOpcode::G_PTR_ADD: {
+    // Types can be of different width here.
+    if (C1.getBitWidth() < C2.getBitWidth())
+      return C1.zext(C1.getBitWidth()) + C2;
+    if (C1.getBitWidth() > C2.getBitWidth())
+      return C2.zext(C1.getBitWidth()) + C1;
+    return C1 + C2;
+  }
   case TargetOpcode::G_AND:
     return C1 & C2;
   case TargetOpcode::G_ASHR:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
new file mode 100644
index 00000000000000..13be65612fa855
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-extract-vector-load.mir
@@ -0,0 +1,20 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name:            test_ptradd_crash
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: test_ptradd_crash
+    ; CHECK: [[C:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+    ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[C]](p1) :: (load (s32), addrspace 1)
+    ; CHECK-NEXT: $sgpr0 = COPY [[LOAD]](s32)
+    ; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0
+    %1:_(p1) = G_CONSTANT i64 0
+    %3:_(s32) = G_CONSTANT i32 0
+    %0:_(<4 x s32>) = G_LOAD %1 :: (load (<4 x s32>) from `ptr addrspace(1) null`, addrspace 1)
+    %2:_(s32) = G_EXTRACT_VECTOR_ELT %0, %3
+    $sgpr0 = COPY %2
+    SI_RETURN_TO_EPILOG implicit $sgpr0
+...

@tschuett
Copy link
Member

I am not yet convinced.

G_PTR_ADD p8 %5, s32 %6

The pointer and the offset have different types and probably different sizes. p8 could be some weird 128bit pointer in address space 8.

@Pierre-vh
Copy link
Contributor Author

I am not yet convinced.

G_PTR_ADD p8 %5, s32 %6

The pointer and the offset have different types and probably different sizes. p8 could be some weird 128bit pointer in address space 8.

What do you mean? This handles both cases when C1 > C2 and C2 > C1, tests now covers both issues as well.

@tschuett
Copy link
Member

Does LLT::pointer(8, 128) guarantee that I may perform plain integer arithmetic on the pointer?

@tschuett
Copy link
Member

Or do you need a G_PTRTOINT first?

@Pierre-vh
Copy link
Contributor Author

Depends on the operation, if you have G_PTR_ADD then you can directly increment the pointer.
If you G_PTRTOINT first you can use G_ADD but then you need the type of the LHS and RHS to match, so you need to ext/trunc as needed.
For G_PTR_ADD it's like a GEP so you can use a different type for the offset.

// truncate in this case.
if (C1.getBitWidth() < C2.getBitWidth())
return C1 + C2.trunc(C1.getBitWidth());
return C1 + C2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use C2.zextOrTrunc unconditionally? Also, why is zero extending correct? (Why not sign extending for example?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sext is probably more correct as offsets can be negative, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "more correct" is good enough - this needs to be properly specified somewhere, preferably where G_PTR_ADD is defined.

Copy link
Contributor Author

@Pierre-vh Pierre-vh Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEXT is the correct one. It's what the legalizer does for widenScalar on G_PTR_ADD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@tschuett
Copy link
Member

Is constant-folding G_PTR_ADD target-independent? Does it depend on the address space? Do AMDGPU and AArch64 do the same for all address spaces?

@Pierre-vh
Copy link
Contributor Author

Is constant-folding G_PTR_ADD target-independent? Does it depend on the address space? Do AMDGPU and AArch64 do the same for all address spaces?

Very good question, and I believe the answer is yes - it depends on the AS. It's safe for most address spaces but in the case that the AS is special (e.g. fat pointer) it's probably not safe. I think we're missing a hook somewhere to ask the target if it's safe to trivially fold G_PTR_ADD as a G_ADD. cc @arsenm ?
That or I'm misunderstanding how G_PTR_ADD is supposed to work.

@arsenm
Copy link
Contributor

arsenm commented Feb 12, 2024

You should check the DataLayout for nonintegral address spaces

@Pierre-vh
Copy link
Contributor Author

You should check the DataLayout for nonintegral address spaces

I just noticed it's checked by the caller. That function isn't called for G_PTR_ADD for non-integral AS

SI_RETURN_TO_EPILOG implicit $sgpr0
...

# Tries to emit a foldable G_PTR_ADD with (p1, s128) operands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually succeed in creating G_PTR_ADD with s128 RHS? If not, maybe it would be better to enforce in MachineVerifier that the RHS is no wider than the LHS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, MachineVerifier doesn't check for it + MachineIRBuilder doesn't complain

I can make a follow-up patch to enforce RHS <= LHS if you want

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion. If we do anything, maybe we should enforce that the RHS has to be the width of the index size for the address space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do anything, maybe we should enforce that the RHS has to be the width of the index size for the address space.

#84352

@jayfoad
Copy link
Contributor

jayfoad commented Feb 16, 2024

I wonder if G_PTR_ADD has the same semantics as IR getelementptr, where only the low order index-size bits of the pointer value are affected. @nikic do you know?

@nikic
Copy link
Contributor

nikic commented Feb 16, 2024

I wonder if G_PTR_ADD has the same semantics as IR getelementptr, where only the low order index-size bits of the pointer value are affected. @nikic do you know?

As we lower getelementptr to G_PTR_ADD, I'd expect so...

SI_RETURN_TO_EPILOG implicit $sgpr0
...

# Tries to emit a foldable G_PTR_ADD with (p1, s128) operands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion. If we do anything, maybe we should enforce that the RHS has to be the width of the index size for the address space.

@Pierre-vh Pierre-vh merged commit 4235e44 into llvm:main Feb 22, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU][GlobalISel] Assertion failure in constant folding in buildPtrAdd
7 participants