Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeadStoreElimination] Optimize tautological assignments #75744

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

BK1603
Copy link
Contributor

@BK1603 BK1603 commented Dec 17, 2023

If a store is dominated by a condition that ensures that the value being stored in a memory location is already present at that memory location, consider the store a noop.

Fixes #63419

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 17, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Shreyansh Chouhan (BK1603)

Changes

If a store is dominated by a condition that ensures that the value being stored in a memory location is already present at that memory location, consider the store a noop.

Fixes #63419

Needed help with two things:

  1. I feel that the code can be written in a separate function, but I was not able to come up with a succinct name for it 😅
  2. How do I go about writing a test for this. I tried writing a separate one and calling update_test_checks.py but the script crashed when it tried to run opt it seems:
bk1603@<!-- -->192 /w/llvm-project (DeadStoreEliminationOpt) [1]&gt; /work/llvm-project/llvm/utils/update_test_checks.py /work/llvm-project/llvm/test/Transforms/DeadStoreElimination/assume.ll
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: opt -passes=dse -S
^CTraceback (most recent call last):
  File "/work/llvm-project/llvm/utils/update_test_checks.py", line 333, in &lt;module&gt;
    main()
  File "/work/llvm-project/llvm/utils/update_test_checks.py", line 157, in main
    raw_tool_output = common.invoke_tool(
                      ^^^^^^^^^^^^^^^^^^^
  File "/work/llvm-project/llvm/utils/UpdateTestChecks/common.py", line 465, in invoke_tool
    stdout = subprocess.check_output(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 550, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 1196, in communicate
    stdout = self.stdout.read()
             ^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

cc @nikic for review


Full diff: https://github.com/llvm/llvm-project/pull/75744.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp (+30)
diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
index dd0a290252dae3..0455fa1770e50a 100644
--- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
@@ -1926,6 +1926,36 @@ struct DSEState {
       if (InitC && InitC == StoredConstant)
         return MSSA.isLiveOnEntryDef(
             MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def, BatchAA));
+
+      // If there is a dominating condition, that ensures that the value
+      // being stored in a memory location is already present at the
+      // memory location, the store is a noop.
+      BasicBlock *StoreBB = DefI->getParent();
+      auto *StorePtr = Store->getOperand(1);
+
+      DomTreeNode *IDom = DT.getNode(StoreBB)->getIDom();
+      auto *TI = IDom->getBlock()->getTerminator();
+      ICmpInst::Predicate Pred;
+      BasicBlock *TrueBB, *FalseBB;
+
+      if (!match(TI, m_Br(m_ICmp(Pred, m_Load(m_Specific(StorePtr)),
+                                 m_Specific(StoredConstant)),
+                          TrueBB, FalseBB)))
+        return false;
+
+      MemoryAccess *LastMod =
+          MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def, BatchAA);
+      DomTreeNode *CDom = DT.getNode(LastMod->getBlock());
+      if (!DT.dominates(CDom, IDom))
+        return false;
+
+      if (Pred == ICmpInst::ICMP_EQ && StoreBB == TrueBB)
+        return true;
+
+      if (Pred == ICmpInst::ICMP_NE && StoreBB == FalseBB)
+        return true;
+
+      return false;
     }
 
     if (!Store)

@nikic
Copy link
Contributor

nikic commented Dec 17, 2023

At a guess, it crashes because IDom can be nullptr (e.g. a store in the entry block will not have an idom).

@BK1603
Copy link
Contributor Author

BK1603 commented Dec 17, 2023

@nikic, you were right, I ran this under gdb and it seems that Store itself can be empty. I wonder when is that expected.

I've fixed the patch and added the test case as well. Please review.

Thanks

@nikic nikic requested a review from fhahn December 19, 2023 14:04
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Dec 23, 2023
@BK1603 BK1603 force-pushed the DeadStoreEliminationOpt branch 3 times, most recently from e464dc0 to db3710d Compare December 23, 2023 17:59
@BK1603 BK1603 requested a review from nikic December 23, 2023 19:17
@BK1603
Copy link
Contributor Author

BK1603 commented Dec 29, 2023

Ping

@BK1603 BK1603 force-pushed the DeadStoreEliminationOpt branch 2 times, most recently from ad7578e to bfe0692 Compare January 2, 2024 19:07
@nikic
Copy link
Contributor

nikic commented Jan 3, 2024

It looks like this patch introduces or exposes a miscompile in clang. There are crashes in stage2 binaries: http://llvm-compile-time-tracker.com/show_error.php?commit=0a5d24448bdf49ec6d7bc37bd8fe40203659b065

@BK1603 BK1603 force-pushed the DeadStoreEliminationOpt branch 2 times, most recently from c2c6835 to 3427de7 Compare January 6, 2024 17:53
@BK1603
Copy link
Contributor Author

BK1603 commented Jan 13, 2024

Hi @nikic, I am able to build target stage2 locally without any issues after the latest push.

-- Build files have been written to: /work/llvm-project/release/tools/clang/stage2-bins
[3806/3809] Performing build step for 'stage2'
[4941/4941] Linking CXX executable bin/obj2yaml
[3807/3809] No install step for 'stage2'
[3809/3809] Completed 'stage2'

I'm not sure how the job is triggered for remote builds, can you please retrigger them with the latest push?

@nikic
Copy link
Contributor

nikic commented Jan 13, 2024

Just tried again, and there are still crashes. To clarify though, stage2 itself does build successfully, but when you try to build https://github.com/llvm/llvm-test-suite using this stage2 it crashes.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct to me.

@BK1603 BK1603 force-pushed the DeadStoreEliminationOpt branch 4 times, most recently from a764d59 to 23d953a Compare January 13, 2024 16:13
@BK1603
Copy link
Contributor Author

BK1603 commented Jan 13, 2024

Hi @nikic, might have to retrigger the build for it to pass. It failed while fetching the commit.

Just tried again, and there are still crashes. To clarify though, stage2 itself does build successfully, but when you try to build https://github.com/llvm/llvm-test-suite using this stage2 it crashes.

Can replicate this locally. I'm investigating.

// Check if there is a dominating condition, that implies that the value
// being stored in a ptr is already present in the ptr.
bool dominatingConditionImpliesValue(MemoryDef *Def) {
auto *StoreI = dyn_cast<StoreInst>(Def->getMemoryInst());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when dyn_cast returns nullptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only reach this function if the store is valid (we check for Store before calling this)

bool dominatingConditionImpliesValue(MemoryDef *Def) {
auto *StoreI = dyn_cast<StoreInst>(Def->getMemoryInst());
BasicBlock *StoreBB = StoreI->getParent();
Value *StorePtr = StoreI->getPointerOperand();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We don't need StorePtr and StoreVal until later, so we might move these down.

@@ -1931,6 +1976,9 @@ struct DSEState {
if (!Store)
return false;

if (dominatingConditionImpliesValue(Def))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pass the Store instruction here to avoid confusion in the callee. That will also remove a dyn_cast you have there.

@BK1603
Copy link
Contributor Author

BK1603 commented Jan 23, 2024

Just an update, still looking into this (I only get time for this during weekends.)

The crash is happening during the RegAllocGreedy pass. Specifically when we call InterferenceCache::get

InterferenceCache::Entry *InterferenceCache::get(MCRegister PhysReg) {
  unsigned char E = PhysRegEntries[PhysReg.id()];
  if (E < CacheEntries && Entries[E].getPhysReg() == PhysReg) {
    if (!Entries[E].valid(LIUArray, TRI))
      Entries[E].revalidate(LIUArray, TRI);
    return &Entries[E];
  }
  // No valid entry exists, pick the next round-robin entry.
  E = RoundRobin;
  if (++RoundRobin == CacheEntries)
    RoundRobin = 0;
  for (unsigned i = 0; i != CacheEntries; ++i) {
    // Skip entries that are in use.
    if (Entries[E].hasRefs()) {
      if (++E == CacheEntries)
        E = 0;
      continue;
    }
    Entries[E].reset(PhysReg, LIUArray, TRI, MF);
    PhysRegEntries[PhysReg] = E;
    return &Entries[E];
  }
  llvm_unreachable("Ran out of interference cache entries.");
}

Instead of getting the actual pointer to the cache entry, we are getting the index of the entry. This later causes a segfault when we try to call setEntry on this returned index. (It in turn calls Entry->addRef and entry becomes an invalid address.) From what I can understand right now, this happens because
the condition Entries[E].hasRefs() ends up always being true. The variable RefCount is unsigned, and it underflows and wraps around to a value greater than 0 again, and we are never able to go to the Entries[e].reset(PhysReg, LIUArray, TRI, MF) line.

@BK1603
Copy link
Contributor Author

BK1603 commented Feb 12, 2024

There are two issues with the code.

  1. The marked store in the following code is getting optimized, which should not happen? (As per my expectations, the match should fail in this case and we should exit the optimization function immediately.) I am unable to reproduce this using a simpler test case.
  %33 = getelementptr %"struct.llvm::RAGreedy::GlobalSplitCandidate", ptr %19, i64 -1, i32 2
  %34 = getelementptr %"struct.llvm::RAGreedy::GlobalSplitCandidate", ptr %19, i64 -1, i32 2, i32 1
  store ptr null, ptr %34, align 8, !tbaa !550
  %35 = load ptr, ptr %33, align 8, !tbaa !552
  %36 = icmp eq ptr %35, null
  br i1 %36, label %41, label %37

37:                                               ; preds = %32
  %38 = getelementptr inbounds %"class.llvm::InterferenceCache::Entry", ptr %35, i64 0, i32 2
  %39 = load i32, ptr %38, align 8, !tbaa !494
  %40 = add i32 %39, -1
  store i32 %40, ptr %38, align 8, !tbaa !494  
  br label %41

41:                                               ; preds = %37, %32
  store ptr null, ptr %33, align 8, !tbaa !552            ; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> get's removed
  %42 = icmp eq ptr %20, %11
  br i1 %42, label %43, label %18, !llvm.loop !553
  1. We cannot optimise the store in cases where either of the true/false blocks post dominates the icmp condition. (For reasons similar to not doing so when both blocks are the same.)

@nikic What are your thoughts on the optimisation happening in 1? Is it expected? Any ideas as to why I am unable to reproduce it using the following test case (in noop-stores.ll):

define void @remove_tautological_store_bug(ptr %x) {
; CHECK-LABEL: @remove_tautological_store_bug(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[X1:%.*]] = load ptr, ptr [[X:%.*]], align 8
; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr [[X1]], null
; CHECK-NEXT:    br i1 [[CMP]], label [[IF_EQ:%.*]], label [[IF_ELSE:%.*]]
; CHECK:       if.else:
; CHECK-NEXT:    br label [[IF_EQ]]
; CHECK:       if.eq:
; CHECK-NEXT:    store ptr null, ptr [[X]], align 8
; CHECK-NEXT:    br label [[END:%.*]]
; CHECK:       end:
; CHECK-NEXT:    ret void
;
entry:                                               ; preds = %31, %26
  %x1 = load ptr, ptr %x, align 8
  %cmp = icmp eq ptr %x1, null
  br i1 %cmp, label %if.eq, label %if.else

if.else:                                               ; preds = %32
  br label %if.eq

if.eq:                                               ; preds = %37, %32
  store ptr null, ptr %x, align 8
  br label %end

end:
  ret void
}

cc @hiraditya

@nikic
Copy link
Contributor

nikic commented Feb 12, 2024

We cannot optimise the store in cases where either of the true/false blocks post dominates the icmp condition. (For reasons similar to not doing so when both blocks are the same.)

Duh, sorry for missing that. What you want to do is replace checks like StoreBB != BI->getSuccessor(0) with !DT.dominates(BasicBlockEdge(BI->getParent(), BI->getSuccessor(0)), StoreBB).

@nikic What are your thoughts on the optimisation happening in 1? Is it expected? Any ideas as to why I am unable to reproduce it using the following test case (in noop-stores.ll):

I applied your patch and tried that example, and the store does get (incorrectly) optimized away for me.

@BK1603
Copy link
Contributor Author

BK1603 commented Feb 13, 2024

I have updated the commit with a fix and a new test case for checking 2

As per my expectations, the match should fail in this case and we should exit the optimization function immediately.

This isn't correct, I was confused by ptr types here but essentially the match would pass (we are comparing a ptr (%35) loaded from another ptr (%33) against null, and the storing null in the originial ptr (%33), which does satisfy at least conditional constraint.)

The latest commit should solve stage 2 bin crashes. (I tried building stage 3 locally, and was able to get to the link step, without any crashes during compilation. But I cannot link the generated binaries due to memory constraints on my laptop.)

@nikic
Copy link
Contributor

nikic commented Feb 13, 2024

@nikic
Copy link
Contributor

nikic commented Feb 13, 2024

Please also update the PR description.

@BK1603 BK1603 force-pushed the DeadStoreEliminationOpt branch 2 times, most recently from ddb2b7d to a7cae25 Compare February 13, 2024 21:31
If a store is immediately dominated by a condition that ensures that the value
being stored in a memory location is already present at that memory location,
consider the store a noop.

Fixes llvm#63419
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed optimization of tautological assignment of a global variable
4 participants