Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BasicBlockUtils] Remove redundant llvm.dbg instructions after blocks to reduce compile time #89069

Merged
merged 1 commit into from
Apr 26, 2024

Conversation

coderchenlin
Copy link
Contributor

@coderchenlin coderchenlin commented Apr 17, 2024

this patch is to fix the compile time for some cases, before this change, some targets (riscv-64, ve) will spend much more compile time on this case (https://godbolt.org/z/rrov17cTo). With this change, the compile time was reduced a lot.

Fixes #89073

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 17, 2024

@llvm/pr-subscribers-llvm-transforms

Author: CL (coderchenlin)

Changes

merged.


Full diff: https://github.com/llvm/llvm-project/pull/89069.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+3)
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 5aa59acfa6df99..ca9d7649ea9826 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -333,6 +333,9 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
   // Finally, erase the old block and update dominator info.
   DeleteDeadBlock(BB, DTU);
 
+  // Remove redundant "llvm.dbg" instrunctions after blocks merged.
+  RemoveRedundantDbgInstrs(PredBB);
+
   return true;
 }
 

@coderchenlin
Copy link
Contributor Author

this patch helps to remove redundant llvm.dbgs after blocks merged.

@coderchenlin
Copy link
Contributor Author

the issue is there : #89073

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test showing the impact of this change

@@ -333,6 +333,9 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
// Finally, erase the old block and update dominator info.
DeleteDeadBlock(BB, DTU);

// Remove redundant "llvm.dbg" instrunctions after blocks merged.
RemoveRedundantDbgInstrs(PredBB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to do a forward and backward scan over the BB, even if there are no debug intrinsics in the block. Can it only be called when we cloned blocks with debug intrinsics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @fhahn , I will check it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhahn, I added a condition to control it (only execute with the "-g" flag ). Could you please review it again?

@coderchenlin coderchenlin force-pushed the br-remove-redundant-dbg branch 2 times, most recently from 5bd126f to c4e39bc Compare April 18, 2024 07:55
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! The change isn't really specific to LoopUnroll, but any code that uses MergeBlockIntoPredecessor; could you update the title and also update the PR descriptions .

Please also include Fixes https://github.com/llvm/llvm-project/issues/89073 at the end of the PR description, so the issue gets auto-closed when the PR is merged.

Please also add a loop-unroll test based on the issue.

@coderchenlin coderchenlin changed the title [LoopUnroll] Remove redundant llvm.dbg instructions after blocks [BasicBlockUtils] Remove redundant llvm.dbg instructions after blocks Apr 19, 2024
@coderchenlin coderchenlin changed the title [BasicBlockUtils] Remove redundant llvm.dbg instructions after blocks [BasicBlockUtils] Remove redundant llvm.dbg instructions after blocks to reduce compile time Apr 19, 2024
@coderchenlin
Copy link
Contributor Author

@fhahn , thanks for the review, the title and the PR description has been updated.
I'm not very sure the case is necessary, because it's not about any optimization passes, I strongly believe it won't have any bad impact if the "RemoveRedundantDbgInstrs" is ok. For another reason, I want to add the c case to check compile-time, but it seems not appropriate to add into the directory llvm/test/

@coderchenlin
Copy link
Contributor Author

image
the case: https://godbolt.org/z/rrov17cTo
the compile time was reduced a lot,

@coderchenlin coderchenlin requested a review from fhahn April 20, 2024 03:04
@coderchenlin
Copy link
Contributor Author

@jmorse @SLTozer , Hello! guys! I have submitted a PR to fix compile-time. Could you help me review this PR.

@jmorse jmorse requested review from SLTozer and OCHyams April 22, 2024 10:08
@jmorse
Copy link
Member

jmorse commented Apr 22, 2024

Thanks for the patch @coderchenlin ! I'm on holiday for a long while, but Stephen or Orlando may be able to review the patch.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also update the description of the PR and include Fixes https://github.com/llvm/llvm-project/issues/89073?

@@ -333,6 +333,10 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
// Finally, erase the old block and update dominator info.
DeleteDeadBlock(BB, DTU);

// Remove redundant "llvm.dbg" instrunctions after blocks merged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: after blocks have been merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the advice is acceptable, and the comment has been modified.

@SLTozer
Copy link
Contributor

SLTozer commented Apr 22, 2024

The compile time impact of this change looks to be more-or-less neutral on the compile time tracker - at ReleaseLTO-g, the geomean compile time change is +0.02%. Since it looks like this solves some particularly pathological cases though it's probably worth having - so SGTM as long as @fhahn has no more comments and is happy with the current state of the patch.

@coderchenlin coderchenlin requested a review from fhahn April 23, 2024 01:20
@fhahn
Copy link
Contributor

fhahn commented Apr 23, 2024

Please also add a loop-unroll test based on the issue.

@coderchenlin I might have missed it somewhere, but is it possible add a loop-unroll test based on the reproducer from #89073?

@coderchenlin
Copy link
Contributor Author

Please also add a loop-unroll test based on the issue.

@coderchenlin I might have missed it somewhere, but is it possible add a loop-unroll test based on the reproducer from #89073?

@fhahn, Thanks! I can add the IR case based on the reproducer from #89073, but I need some time to reduce the llvm-ir of the case. As @SLTozer said, the case is a little pathological, and the IR is also strange, I will try to reduce it and make it clearer

Before that, if you want to check the effect of this patch, please test this case llvm/test/Transforms/LoopUnroll/debug-info.ll with this command:
opt runtime-loop1.ll -S -o - -passes=loop-unroll -debug-only=basicblock-utils
it can remove 2 redundant llvm.dbg instructions.

@coderchenlin
Copy link
Contributor Author

@fhahn, could you help me check this newly added case?
I reduced the IR generated by the #89073 and made some changes to make it clearer.

the loop body unrolls 5 times (full unroll) and generates the same 5 llvm.dbg instructions, with this change, there only has 1 llvm.dbg

@@ -0,0 +1,51 @@
; RUN: opt < %s -S -mtriple=riscv64 -passes=loop-unroll | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Does the test require -mtriple=riscv64 / target triple = "riscv64"? If not, please remove the references to RISCV. Otherwise it needs to be moved to llvm/test/Transforms/LoopUnroll/RISCV/, as it will fail on systems that don't build the RISCV backend.

declare void @llvm.dbg.value(metadata, metadata, metadata) #1

attributes #0 = { "target-cpu"="generic-rv64" }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare void @llvm.dbg.value(metadata, metadata, metadata) #1

attributes #0 = { "target-cpu"="generic-rv64" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?


for.body: ; preds = %for.body, %entry
; There should be only one "llvm.dbg.vale" after loop unrolling
; CHECK: tail call void @llvm.dbg.value(metadata i32 0, metadata !13, metadata !DIExpression()), !dbg !17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention this earlier, but this checks that there is at least one call to llvm.dbg.value, not exactly one.

Multiple ways to go about this, .e.g

CHECK: call void @llvm.dbg.value
CHECK-NOT:  call void @llvm.dbg.value

or matching the full generated IR using llvm/utils/update_test_checks.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just check for call void @llvm.dbg.value, as otherwise the test would be unnecessary brittle, e.g. needing updating if the metadata numbering slightly changes

@coderchenlin coderchenlin force-pushed the br-remove-redundant-dbg branch 2 times, most recently from f8369ee to 5d394a7 Compare April 24, 2024 10:52
have been merged.

When compiling with the "-g" and enabling loop-unroll optimization, the
redundant "llvm.dbg" instructions sometimes will be generated. It has
a significant impact on compilation time in some cases.

Fixes llvm#89073.
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@coderchenlin
Copy link
Contributor Author

@fhahn, Thanks for the review, All tests passed, can this patch merge?

@fhahn fhahn merged commit 2e3e086 into llvm:main Apr 26, 2024
4 checks passed
Copy link

@coderchenlin Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@slackito
Copy link
Collaborator

slackito commented May 3, 2024

We've bisected a compilation time increase to this patch. It seems to introduce quadratic behavior in SimplifyCFG. The source code of a possible reproducer looks like this:

#include <map>
#include <string>

void regression() {
  static const std::map<int, std::string>* m = new std::map<int, std::string>({
      {1, "1"},       {2, "2"},       {3, "3"},       {4, "4"},
      // Many more such nodes in the map. With around 1200 nodes I see a
      // slowdown of ~20 seconds.
  });
  (void)m;
}

Unfortunately, reproducing the problem from this source code requires a profile I can't share, so the source code is useful for illustrative purposes only.

However, I was able to dump the IR using -mllvm -print-after=sample-profile to get a standalone reproducer (see slowdown.ll.txt attached, Github didn't like a .ll attachment). You should be able to verify the slowdown by running opt -O1 -o /dev/null slowdown.ll. Apologies for the big file. I've tried to reduce it but it looks like a quadratic scaling problem so the input needs to reach a certain size before the performance begins to be noticeably problematic.

I'm also attaching two --time-trace profiles, before and after this change. You can visualize them using the Chrome profile viewer at chrome://tracing or https://ui.perfetto.dev. Hopefully the stack of passes in the flame graph will be more helpful to diagnose the problem.

before.json
after.json
slowdown.ll.txt

@aeubanks
Copy link
Contributor

aeubanks commented May 3, 2024

seems like we're doing a linear scan of dbg instructions every time we call MergeBlockIntoPredecessor, which otherwise looks to be more constant-time-ish in the number of instructions in the block than linear-time-ish. the compile time blow-up is in SimplifyCFG, and the call to MergeBlockIntoPredecessor in SimplifyCFGOpt::simplifyOnce is probably the issue.

the call to RemoveRedundantDbgInstrs should probably be somewhere more targeted, rather than something commonly called like MergeBlockIntoPredecessor that may be called on blocks containing some set of instructions multiple times. e.g. once per-block in some pass that doesn't modify the CFG, or once after the CFG is finalized.

can we revert this and then land a more targeted solution?

slackito added a commit that referenced this pull request May 3, 2024
…r blocks to reduce compile time (#89069)"

This reverts commit 2e3e086. It caused
quadratic slowdown at compilation time in some cases. See the comments
in the original PR: #89069
@slackito
Copy link
Collaborator

slackito commented May 3, 2024

Reverted in 2cde0e2

@coderchenlin
Copy link
Contributor Author

seems like we're doing a linear scan of dbg instructions every time we call MergeBlockIntoPredecessor, which otherwise looks to be more constant-time-ish in the number of instructions in the block than linear-time-ish. the compile time blow-up is in SimplifyCFG, and the call to MergeBlockIntoPredecessor in SimplifyCFGOpt::simplifyOnce is probably the issue.

the call to RemoveRedundantDbgInstrs should probably be somewhere more targeted, rather than something commonly called like MergeBlockIntoPredecessor that may be called on blocks containing some set of instructions multiple times. e.g. once per-block in some pass that doesn't modify the CFG, or once after the CFG is finalized.

can we revert this and then land a more targeted solution?

Thanks for the new case, I will try to find another way to fix the original case. @slackito @aeubanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BasicBlockUtils]: remove redundant llvm.dbgs intrunctions to fix compile time error.
7 participants