Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InferAddressSpaces] Fix constant replace to avoid modifying other functions #70611

Merged

Conversation

wenju-he
Copy link
Contributor

A constant value is unique in llvm context. InferAddressSpaces was
replacing its users in other functions as well. This leads to unexpected
behavior in our downstream use case after the pass.

InferAddressSpaces is a function passe, so it shall not modify functions
other than currently processed one.

Co-authored-by: Abhinav Gaba abhinav.gaba@intel.com

…nctions

A constant value is unique in llvm context. InferAddressSpaces was
replacing its users in other functions as well. This leads to unexpected
behavior in our downstream use case after the pass.

InferAddressSpaces is a function passe, so it shall not modify functions
other than currently processed one.

Co-authored-by: Abhinav Gaba <abhinav.gaba@intel.com>

@x = addrspace(1) global i32 0, align 4

define spir_func void @f2() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can drop all the spir_funcs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

; CHECK-NEXT: call spir_func void @f1(ptr noundef %x2)

; Ensure that the pass hasn't run on f3 yet.
; CHECK: IR Dump After InferAddressSpacesPass on f3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reproduce by adding optnone to f3 instead of relying on debug printing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, adding optnone to f3 works. Thank you for the suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -334,6 +335,15 @@ template<> struct simplify_type<User::const_op_iterator> {
}
};

template <> struct GraphTraits<User *> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment for why df is needed. Adding GraphTraits is a convenient way to iterate over user using df_begin/df_end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, this should not be in a public IR header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, this should not be in a public IR header.

I've reverted the change in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with putting this here? Seems nicer than inlining yet another DFS in another place

Copy link
Contributor Author

@wenju-he wenju-he Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured llvm-project compile time impact of this change to llvm/IR/User.h on intel icx 8358 cpu.

Build command:
cmake -GNinja -DLLVM_ENABLE_PROJECTS="llvm;clang;clang-tools-extra" -DLLVM_INCLUDE_TESTS=ON -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release ../llvm -DBUILD_SHARED_LIBS=OFF -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DCMAKE_INSTALL_PREFIX=install && time ninja -j40

Ref: (https://github.com/llvm/llvm-project.git c0d78c4)
Exp: (https://github.com/llvm/llvm-project.git c0d78c4) + change to User.h

OS: RHEL9 (note: llvm-project build on this system also builds https://github.com/KhronosGroup/SPIRV-LLVM-Translator as a project inside llvm/project folder)

REF:
real 10m39.052s
user 398m36.074s
sys 19m26.839s

real 10m39.058s
user 398m31.903s
sys 19m18.405s

real 10m40.249s
user 398m43.022s
sys 19m52.814s

real 10m40.110s
user 398m36.880s
sys 19m59.264s

real 10m40.802s
user 398m59.073s
sys 19m59.007s

EXP:
real 10m39.910s
user 398m55.339s
sys 19m36.278s

real 10m40.319s
user 398m52.378s
sys 19m38.137s

real 10m41.001s
user 399m15.809s
sys 19m57.681s

real 10m40.889s
user 399m8.594s
sys 19m55.882s

real 10m39.135s
user 398m30.025s
sys 19m51.427s

OS: Ubuntu 22

REF:
real 10m2.783s
user 376m47.038s
sys 18m35.310s

real 10m2.983s
user 376m47.032s
sys 18m46.517s

real 10m3.059s
user 376m52.423s
sys 18m45.694s

EXP:
real 10m2.730s
user 376m46.556s
sys 18m44.566s

real 10m3.007s
user 376m48.369s
sys 18m46.733s

real 10m2.860s
user 376m44.655s
sys 18m51.154s

Summary:
On RHEL9 build time is slowed down by less than ~1 second. On Ubuntu 22 there is no obvious change of compile time.

@nikic could we add GraphTraits<User *> to User.h since compile time impact is not obvious?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 3 more runs on RHEL9 today, the compile time diff is very small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with putting this here? Seems nicer than inlining yet another DFS in another place.

I have a couple of concerns about exposing it here.

The first is that "DFS walk on User *" is not a well-defined operation. You can perform a DFS walk either in the direction of users or in the direction of operands. Keep in mind that, despite the name, the defining property of a User is that it has operands, not that it has users. Having users is a property of Value *.

The second is that even if we say that it should be a walk in the direction of users, we still have the choice between doing the walk on User * or on Use &.

The third is that doing an unbounded DFS walk on values is just a bad idea in general, and we should not encourage it.

For the specific problem here, I'd consider expanding all constant expressions in the function upfront, and then not having to deal with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allowed bitcasts between address spaces with the same size, we could drop addrspacecast constantexprs altogether

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the specific problem here, I'd consider expanding all constant expressions in the function upfront, and then not having to deal with it.

is it right that I can use convertUsersOfConstantsToInstructions to expand constantexp to instructions? However, convertUsersOfConstantsToInstructions changes other functions as well. An option to add an addition function parameter to convertUsersOfConstantsToInstructions to ensure only constexpr users in the function is expanded. @nikic WDYT of this option?

convertUsersOfConstantsToInstructions is also doing a DFS on user of User.

C->replaceAllUsesWith(Replace);
VMap[C] = Replace;
for (User *U : make_early_inc_range(C->users())) {
for (auto It = df_begin(U), E = df_end(U); It != E;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need df ordering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constant could be used nested inside another constant, an example is

%gep0 = getelementptr inbounds double, ptr getelementptr ([648 x double], ptr addrspacecast (ptr addrspace(3) @lds to ptr), i64 0, i64 384), i64 %idx0

We can't replace constant within another constant directly because it may create new constant and in turn change other functions. So I need df to find the user instruction within the function and then replace the user in the instruction.

yubingex007-a11y pushed a commit that referenced this pull request Nov 1, 2023
#70610)

If function return value's type is pointer, we can try to collect flat
address expression from it.
This PR also fixes noop_ptrint_pair_ce2 in noop-ptrint-pair.ll in #70611
@wenju-he wenju-he marked this pull request as ready for review November 1, 2023 07:01
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 1, 2023

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-transforms

Author: Wenju He (wenju-he)

Changes

A constant value is unique in llvm context. InferAddressSpaces was
replacing its users in other functions as well. This leads to unexpected
behavior in our downstream use case after the pass.

InferAddressSpaces is a function passe, so it shall not modify functions
other than currently processed one.

Co-authored-by: Abhinav Gaba <abhinav.gaba@intel.com>


Full diff: https://github.com/llvm/llvm-project/pull/70611.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp (+32-2)
  • (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/noop-ptrint-pair.ll (+1-1)
  • (added) llvm/test/Transforms/InferAddressSpaces/optnone-func-unchanged.ll (+32)
diff --git a/llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp b/llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp
index 28fe1b5e75327e6..1bf50d79e5331e9 100644
--- a/llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp
+++ b/llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp
@@ -477,7 +477,7 @@ void InferAddressSpacesImpl::appendsFlatAddressExpressionToPostorderStack(
 }
 
 // Returns all flat address expressions in function F. The elements are ordered
-// ordered in postorder.
+// in postorder.
 std::vector<WeakTrackingVH>
 InferAddressSpacesImpl::collectFlatAddressExpressions(Function &F) const {
   // This function implements a non-recursive postorder traversal of a partial
@@ -1170,6 +1170,8 @@ bool InferAddressSpacesImpl::rewriteWithNewAddressSpaces(
   }
 
   SmallVector<Instruction *, 16> DeadInstructions;
+  ValueToValueMapTy VMap;
+  ValueMapper VMapper(VMap, RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
 
   // Replaces the uses of the old address expressions with the new ones.
   for (const WeakTrackingVH &WVH : Postorder) {
@@ -1188,7 +1190,30 @@ bool InferAddressSpacesImpl::rewriteWithNewAddressSpaces(
       if (C != Replace) {
         LLVM_DEBUG(dbgs() << "Inserting replacement const cast: " << Replace
                           << ": " << *Replace << '\n');
-        C->replaceAllUsesWith(Replace);
+        SmallVector<User *, 16> WorkList;
+        for (User *U : make_early_inc_range(C->users())) {
+          if (auto *I = dyn_cast<Instruction>(U)) {
+            if (I->getFunction() == F)
+              I->replaceUsesOfWith(C, Replace);
+          } else {
+            WorkList.append(U->user_begin(), U->user_end());
+          }
+        }
+        if (!WorkList.empty()) {
+          VMap[C] = Replace;
+          DenseSet<User *> Visited{WorkList.begin(), WorkList.end()};
+          while (!WorkList.empty()) {
+            User *U = WorkList.pop_back_val();
+            if (auto *I = dyn_cast<Instruction>(U)) {
+              if (I->getFunction() == F)
+                VMapper.remapInstruction(*I);
+              continue;
+            }
+            for (User *U2 : U->users())
+              if (Visited.insert(U2).second)
+                WorkList.push_back(U2);
+          }
+        }
         V = Replace;
       }
     }
@@ -1214,6 +1239,11 @@ bool InferAddressSpacesImpl::rewriteWithNewAddressSpaces(
       // Skip if the current user is the new value itself.
       if (CurUser == NewV)
         continue;
+
+      if (auto *CurUserI = dyn_cast<Instruction>(CurUser);
+          CurUserI && CurUserI->getFunction() != F)
+        continue;
+
       // Handle more complex cases like intrinsic that need to be remangled.
       if (auto *MI = dyn_cast<MemIntrinsic>(CurUser)) {
         if (!MI->isVolatile() && handleMemIntrinsicPtrUse(MI, V, NewV))
diff --git a/llvm/test/Transforms/InferAddressSpaces/AMDGPU/noop-ptrint-pair.ll b/llvm/test/Transforms/InferAddressSpaces/AMDGPU/noop-ptrint-pair.ll
index b6713e96bed3ff5..422ac0dfd2cd470 100644
--- a/llvm/test/Transforms/InferAddressSpaces/AMDGPU/noop-ptrint-pair.ll
+++ b/llvm/test/Transforms/InferAddressSpaces/AMDGPU/noop-ptrint-pair.ll
@@ -70,7 +70,7 @@ define ptr @noop_ptrint_pair_ce2() {
 }
 
 ; COMMON-LABEL: @noop_ptrint_pair_ce2_vec(
-; AMDGCN-NEXT: ret <2 x ptr> <ptr addrspacecast (ptr addrspace(1) @g to ptr), ptr inttoptr (i64 ptrtoint (ptr addrspace(3) @l to i64) to ptr)>
+; AMDGCN-NEXT: ret <2 x ptr> <ptr inttoptr (i64 ptrtoint (ptr addrspace(1) @g to i64) to ptr), ptr inttoptr (i64 ptrtoint (ptr addrspace(3) @l to i64) to ptr)>
 ; NOTTI-NEXT: ret <2 x ptr> <ptr inttoptr (i64 ptrtoint (ptr addrspace(1) @g to i64) to ptr), ptr inttoptr (i64 ptrtoint (ptr addrspace(3) @l to i64) to ptr)>
 define <2 x ptr> @noop_ptrint_pair_ce2_vec() {
   ret <2 x ptr> <ptr inttoptr (i64 ptrtoint (ptr addrspace(1) @g to i64) to ptr), ptr inttoptr (i64 ptrtoint (ptr addrspace(3) @l to i64) to ptr)>
diff --git a/llvm/test/Transforms/InferAddressSpaces/optnone-func-unchanged.ll b/llvm/test/Transforms/InferAddressSpaces/optnone-func-unchanged.ll
new file mode 100644
index 000000000000000..b0213f7486928ae
--- /dev/null
+++ b/llvm/test/Transforms/InferAddressSpaces/optnone-func-unchanged.ll
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -assume-default-is-flat-addrspace -S -passes=infer-address-spaces < %s 2>&1 | FileCheck %s
+
+@g = addrspace(1) global i32 0, align 4
+
+define ptr @f2() {
+; CHECK-LABEL: define ptr @f2() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[X2:%.*]] = addrspacecast ptr addrspace(1) @g to ptr
+; CHECK-NEXT:    ret ptr [[X2]]
+;
+entry:
+  %x1 = addrspacecast ptr addrspacecast (ptr addrspace(1) @g to ptr) to ptr addrspace(1)
+  %x2 = addrspacecast ptr addrspace(1) %x1 to ptr
+  ret ptr %x2
+}
+
+define ptr @f3() #0 {
+; CHECK-LABEL: define ptr @f3(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[X1:%.*]] = addrspacecast ptr addrspacecast (ptr addrspace(1) @g to ptr) to ptr addrspace(1)
+; CHECK-NEXT:    [[X2:%.*]] = addrspacecast ptr addrspace(1) [[X1]] to ptr
+; CHECK-NEXT:    ret ptr [[X2]]
+;
+entry:
+  %x1 = addrspacecast ptr addrspacecast (ptr addrspace(1) @g to ptr) to ptr addrspace(1)
+  %x2 = addrspacecast ptr addrspace(1) %x1 to ptr
+  ret ptr %x2
+}
+
+attributes #0 = { noinline optnone }

@wenju-he wenju-he requested review from arsenm and nikic November 1, 2023 07:42
ret ptr %x2
}

define ptr @f3() #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment explaining why f3 is here. Maybe also rename it to something descriptive

@@ -334,6 +335,15 @@ template<> struct simplify_type<User::const_op_iterator> {
}
};

template <> struct GraphTraits<User *> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with putting this here? Seems nicer than inlining yet another DFS in another place

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we could eliminate ConstantExpr addrspacecasts from the IR altogether, which would avoid most of the complexity here. I would also somewhat prefer to push this DFS into a helper function, but can live with it inline as-is

@wenju-he
Copy link
Contributor Author

I think it would be better if we could eliminate ConstantExpr addrspacecasts from the IR altogether, which would avoid most of the complexity here. I would also somewhat prefer to push this DFS into a helper function, but can live with it inline as-is

thank you for the review. I'll draft another PR to modify convertUsersOfConstantsToInstructions to allow change in a function only, so that DFS is not need here.

@yubingex007-a11y yubingex007-a11y merged commit fe146e9 into llvm:main Nov 13, 2023
6 checks passed
@wenju-he wenju-he deleted the InferAddressSpaces-function-change branch November 13, 2023 05:30
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
…nctions (llvm#70611)

A constant value is unique in llvm context. InferAddressSpaces was
replacing its users in other functions as well. This leads to unexpected
behavior in our downstream use case after the pass.

InferAddressSpaces is a function passe, so it shall not modify functions
other than currently processed one.

Co-authored-by: Abhinav Gaba <abhinav.gaba@intel.com>

---------

Co-authored-by: Abhinav Gaba <abhinav.gaba@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants