Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVP] Check whether the default case is reachable #79993

Merged
merged 5 commits into from
Jan 31, 2024

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Jan 30, 2024

This patch eliminates unreachable default cases using context-sensitive range information.
Related patch: #76295

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 30, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch eliminates unreachable default cases using context-sensitive range information.
Related patch: #76295


Full diff: https://github.com/llvm/llvm-project/pull/79993.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp (+31)
  • (modified) llvm/test/Transforms/CorrelatedValuePropagation/basic.ll (+6-5)
  • (added) llvm/test/Transforms/CorrelatedValuePropagation/switch.ll (+299)
diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
index 9235850de92f3..e1a5d83e4b96c 100644
--- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -371,6 +371,7 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI,
   { // Scope for SwitchInstProfUpdateWrapper. It must not live during
     // ConstantFoldTerminator() as the underlying SwitchInst can be changed.
     SwitchInstProfUpdateWrapper SI(*I);
+    unsigned ReachableCaseCount = 0;
 
     for (auto CI = SI->case_begin(), CE = SI->case_end(); CI != CE;) {
       ConstantInt *Case = CI->getCaseValue();
@@ -407,6 +408,35 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI,
 
       // Increment the case iterator since we didn't delete it.
       ++CI;
+      ++ReachableCaseCount;
+    }
+
+    BasicBlock *DefaultDest = SI->getDefaultDest();
+    if (!isa<UnreachableInst>(DefaultDest->getFirstNonPHIOrDbg())) {
+      ConstantRange CR = LVI->getConstantRangeAtUse(I->getOperandUse(0),
+                                                    /*UndefAllowed*/ false);
+      // The default dest is unreachable if all cases are covered.
+      if (!CR.isSizeLargerThan(ReachableCaseCount)) {
+        BasicBlock *NewUnreachableBB =
+            BasicBlock::Create(BB->getContext(), "default.unreachable",
+                               BB->getParent(), DefaultDest);
+        new UnreachableInst(BB->getContext(), NewUnreachableBB);
+
+        bool RemoveOldBB = --SuccessorsCount[DefaultDest] == 0;
+
+        if (RemoveOldBB)
+          DefaultDest->removePredecessor(BB);
+        SI->setDefaultDest(NewUnreachableBB);
+
+        if (RemoveOldBB)
+          DTU.applyUpdatesPermissive(
+              {{DominatorTree::Delete, BB, DefaultDest}});
+        DTU.applyUpdatesPermissive(
+            {{DominatorTree::Insert, BB, NewUnreachableBB}});
+
+        ++NumDeadCases;
+        Changed = true;
+      }
     }
   }
 
@@ -1227,6 +1257,7 @@ CorrelatedValuePropagationPass::run(Function &F, FunctionAnalysisManager &AM) {
   if (!Changed) {
     PA = PreservedAnalyses::all();
   } else {
+    assert(DT->verify());
     PA.preserve<DominatorTreeAnalysis>();
     PA.preserve<LazyValueAnalysis>();
   }
diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
index 6227a5c822b10..9941d4c070d39 100644
--- a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
+++ b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
@@ -442,7 +442,7 @@ define i32 @switch_range(i32 %cond) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[S:%.*]] = urem i32 [[COND:%.*]], 3
 ; CHECK-NEXT:    [[S1:%.*]] = add nuw nsw i32 [[S]], 1
-; CHECK-NEXT:    switch i32 [[S1]], label [[UNREACHABLE:%.*]] [
+; CHECK-NEXT:    switch i32 [[S1]], label [[DEFAULT_UNREACHABLE:%.*]] [
 ; CHECK-NEXT:      i32 1, label [[EXIT1:%.*]]
 ; CHECK-NEXT:      i32 2, label [[EXIT2:%.*]]
 ; CHECK-NEXT:      i32 3, label [[EXIT1]]
@@ -451,6 +451,8 @@ define i32 @switch_range(i32 %cond) {
 ; CHECK-NEXT:    ret i32 1
 ; CHECK:       exit2:
 ; CHECK-NEXT:    ret i32 2
+; CHECK:       default.unreachable:
+; CHECK-NEXT:    unreachable
 ; CHECK:       unreachable:
 ; CHECK-NEXT:    ret i32 0
 ;
@@ -513,10 +515,9 @@ define i8 @switch_defaultdest_multipleuse(i8 %t0) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[O:%.*]] = or i8 [[T0:%.*]], 1
 ; CHECK-NEXT:    [[R:%.*]] = srem i8 1, [[O]]
-; CHECK-NEXT:    switch i8 [[R]], label [[EXIT:%.*]] [
-; CHECK-NEXT:      i8 0, label [[EXIT]]
-; CHECK-NEXT:      i8 1, label [[EXIT]]
-; CHECK-NEXT:    ]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       default.unreachable:
+; CHECK-NEXT:    unreachable
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret i8 0
 ;
diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/switch.ll b/llvm/test/Transforms/CorrelatedValuePropagation/switch.ll
new file mode 100644
index 0000000000000..4d396b5422401
--- /dev/null
+++ b/llvm/test/Transforms/CorrelatedValuePropagation/switch.ll
@@ -0,0 +1,299 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes=correlated-propagation -S | FileCheck %s
+
+define i32 @test_unreachable_default(i32 noundef %num) {
+; CHECK-LABEL: define i32 @test_unreachable_default(
+; CHECK-SAME: i32 noundef [[NUM:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[NUM]], -120
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[SUB]], 3
+; CHECK-NEXT:    [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 2
+; CHECK-NEXT:    switch i32 [[COND]], label [[DEFAULT_UNREACHABLE:%.*]] [
+; CHECK-NEXT:      i32 0, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 1, label [[SW_BB2:%.*]]
+; CHECK-NEXT:      i32 2, label [[SW_BB4:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @call0()
+; CHECK-NEXT:    br label [[CLEANUP:%.*]]
+; CHECK:       sw.bb2:
+; CHECK-NEXT:    [[CALL3:%.*]] = call i32 @call1()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.bb4:
+; CHECK-NEXT:    [[CALL5:%.*]] = call i32 @call2()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       default.unreachable:
+; CHECK-NEXT:    unreachable
+; CHECK:       sw.default:
+; CHECK-NEXT:    [[CALL6:%.*]] = call i32 @call3()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       cleanup:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[CALL6]], [[SW_DEFAULT:%.*]] ], [ [[CALL5]], [[SW_BB4]] ], [ [[CALL3]], [[SW_BB2]] ], [ [[CALL]], [[SW_BB]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
+;
+entry:
+  %sub = add i32 %num, -120
+  %cmp = icmp ult i32 %sub, 3
+  %cond = select i1 %cmp, i32 %sub, i32 2
+  switch i32 %cond, label %sw.default [
+  i32 0, label %sw.bb
+  i32 1, label %sw.bb2
+  i32 2, label %sw.bb4
+  ]
+
+sw.bb:
+  %call = call i32 @call0()
+  br label %cleanup
+
+sw.bb2:
+  %call3 = call i32 @call1()
+  br label %cleanup
+
+sw.bb4:
+  %call5 = call i32 @call2()
+  br label %cleanup
+
+sw.default:
+  %call6 = call i32 @call3()
+  br label %cleanup
+
+cleanup:
+  %retval.0 = phi i32 [ %call6, %sw.default ], [ %call5, %sw.bb4 ], [ %call3, %sw.bb2 ], [ %call, %sw.bb ]
+  ret i32 %retval.0
+}
+
+define i32 @test_unreachable_default_shared_edge(i32 noundef %num) {
+; CHECK-LABEL: define i32 @test_unreachable_default_shared_edge(
+; CHECK-SAME: i32 noundef [[NUM:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[NUM]], -120
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[SUB]], 3
+; CHECK-NEXT:    [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 2
+; CHECK-NEXT:    switch i32 [[COND]], label [[DEFAULT_UNREACHABLE:%.*]] [
+; CHECK-NEXT:      i32 0, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 1, label [[SW_BB2:%.*]]
+; CHECK-NEXT:      i32 2, label [[SW_BB4:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @call0()
+; CHECK-NEXT:    br label [[CLEANUP:%.*]]
+; CHECK:       sw.bb2:
+; CHECK-NEXT:    [[CALL3:%.*]] = call i32 @call1()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       default.unreachable:
+; CHECK-NEXT:    unreachable
+; CHECK:       sw.bb4:
+; CHECK-NEXT:    [[CALL5:%.*]] = call i32 @call2()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       cleanup:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[CALL5]], [[SW_BB4]] ], [ [[CALL3]], [[SW_BB2]] ], [ [[CALL]], [[SW_BB]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
+;
+entry:
+  %sub = add i32 %num, -120
+  %cmp = icmp ult i32 %sub, 3
+  %cond = select i1 %cmp, i32 %sub, i32 2
+  switch i32 %cond, label %sw.bb4 [
+  i32 0, label %sw.bb
+  i32 1, label %sw.bb2
+  i32 2, label %sw.bb4
+  ]
+
+sw.bb:
+  %call = call i32 @call0()
+  br label %cleanup
+
+sw.bb2:
+  %call3 = call i32 @call1()
+  br label %cleanup
+
+sw.bb4:
+  %call5 = call i32 @call2()
+  br label %cleanup
+
+cleanup:
+  %retval.0 = phi i32 [ %call5, %sw.bb4 ], [ %call3, %sw.bb2 ], [ %call, %sw.bb ]
+  ret i32 %retval.0
+}
+
+; Negative tests
+
+define i32 @test_reachable_default(i32 noundef %num) {
+; CHECK-LABEL: define i32 @test_reachable_default(
+; CHECK-SAME: i32 noundef [[NUM:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[NUM]], -120
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[SUB]], 3
+; CHECK-NEXT:    [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 4
+; CHECK-NEXT:    switch i32 [[COND]], label [[SW_DEFAULT:%.*]] [
+; CHECK-NEXT:      i32 0, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 1, label [[SW_BB2:%.*]]
+; CHECK-NEXT:      i32 2, label [[SW_BB4:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @call0()
+; CHECK-NEXT:    br label [[CLEANUP:%.*]]
+; CHECK:       sw.bb2:
+; CHECK-NEXT:    [[CALL3:%.*]] = call i32 @call1()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.bb4:
+; CHECK-NEXT:    [[CALL5:%.*]] = call i32 @call2()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.default:
+; CHECK-NEXT:    [[CALL6:%.*]] = call i32 @call3()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       cleanup:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[CALL6]], [[SW_DEFAULT]] ], [ [[CALL5]], [[SW_BB4]] ], [ [[CALL3]], [[SW_BB2]] ], [ [[CALL]], [[SW_BB]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
+;
+entry:
+  %sub = add i32 %num, -120
+  %cmp = icmp ult i32 %sub, 3
+  %cond = select i1 %cmp, i32 %sub, i32 4
+  switch i32 %cond, label %sw.default [
+  i32 0, label %sw.bb
+  i32 1, label %sw.bb2
+  i32 2, label %sw.bb4
+  ]
+
+sw.bb:
+  %call = call i32 @call0()
+  br label %cleanup
+
+sw.bb2:
+  %call3 = call i32 @call1()
+  br label %cleanup
+
+sw.bb4:
+  %call5 = call i32 @call2()
+  br label %cleanup
+
+sw.default:
+  %call6 = call i32 @call3()
+  br label %cleanup
+
+cleanup:
+  %retval.0 = phi i32 [ %call6, %sw.default ], [ %call5, %sw.bb4 ], [ %call3, %sw.bb2 ], [ %call, %sw.bb ]
+  ret i32 %retval.0
+}
+
+define i32 @test_unreachable_default_cond_may_be_undef(i32 %num) {
+; CHECK-LABEL: define i32 @test_unreachable_default_cond_may_be_undef(
+; CHECK-SAME: i32 [[NUM:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[NUM]], -120
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[SUB]], 3
+; CHECK-NEXT:    [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 2
+; CHECK-NEXT:    switch i32 [[COND]], label [[SW_DEFAULT:%.*]] [
+; CHECK-NEXT:      i32 0, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 1, label [[SW_BB2:%.*]]
+; CHECK-NEXT:      i32 2, label [[SW_BB4:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @call0()
+; CHECK-NEXT:    br label [[CLEANUP:%.*]]
+; CHECK:       sw.bb2:
+; CHECK-NEXT:    [[CALL3:%.*]] = call i32 @call1()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.bb4:
+; CHECK-NEXT:    [[CALL5:%.*]] = call i32 @call2()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.default:
+; CHECK-NEXT:    [[CALL6:%.*]] = call i32 @call3()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       cleanup:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[CALL6]], [[SW_DEFAULT]] ], [ [[CALL5]], [[SW_BB4]] ], [ [[CALL3]], [[SW_BB2]] ], [ [[CALL]], [[SW_BB]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
+;
+entry:
+  %sub = add i32 %num, -120
+  %cmp = icmp ult i32 %sub, 3
+  %cond = select i1 %cmp, i32 %sub, i32 2
+  switch i32 %cond, label %sw.default [
+  i32 0, label %sw.bb
+  i32 1, label %sw.bb2
+  i32 2, label %sw.bb4
+  ]
+
+sw.bb:
+  %call = call i32 @call0()
+  br label %cleanup
+
+sw.bb2:
+  %call3 = call i32 @call1()
+  br label %cleanup
+
+sw.bb4:
+  %call5 = call i32 @call2()
+  br label %cleanup
+
+sw.default:
+  %call6 = call i32 @call3()
+  br label %cleanup
+
+cleanup:
+  %retval.0 = phi i32 [ %call6, %sw.default ], [ %call5, %sw.bb4 ], [ %call3, %sw.bb2 ], [ %call, %sw.bb ]
+  ret i32 %retval.0
+}
+
+define i32 @test_default_is_already_unreachable(i32 %num) {
+; CHECK-LABEL: define i32 @test_default_is_already_unreachable(
+; CHECK-SAME: i32 [[NUM:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[NUM]], -120
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[SUB]], 3
+; CHECK-NEXT:    [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 2
+; CHECK-NEXT:    switch i32 [[COND]], label [[SW_DEFAULT:%.*]] [
+; CHECK-NEXT:      i32 0, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 1, label [[SW_BB2:%.*]]
+; CHECK-NEXT:      i32 2, label [[SW_BB4:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @call0()
+; CHECK-NEXT:    br label [[CLEANUP:%.*]]
+; CHECK:       sw.bb2:
+; CHECK-NEXT:    [[CALL3:%.*]] = call i32 @call1()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.bb4:
+; CHECK-NEXT:    [[CALL5:%.*]] = call i32 @call2()
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       sw.default:
+; CHECK-NEXT:    unreachable
+; CHECK:       cleanup:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[CALL5]], [[SW_BB4]] ], [ [[CALL3]], [[SW_BB2]] ], [ [[CALL]], [[SW_BB]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
+;
+entry:
+  %sub = add i32 %num, -120
+  %cmp = icmp ult i32 %sub, 3
+  %cond = select i1 %cmp, i32 %sub, i32 2
+  switch i32 %cond, label %sw.default [
+  i32 0, label %sw.bb
+  i32 1, label %sw.bb2
+  i32 2, label %sw.bb4
+  ]
+
+sw.bb:
+  %call = call i32 @call0()
+  br label %cleanup
+
+sw.bb2:
+  %call3 = call i32 @call1()
+  br label %cleanup
+
+sw.bb4:
+  %call5 = call i32 @call2()
+  br label %cleanup
+
+sw.default:
+  unreachable
+
+cleanup:
+  %retval.0 = phi i32 [ %call5, %sw.bb4 ], [ %call3, %sw.bb2 ], [ %call, %sw.bb ]
+  ret i32 %retval.0
+}
+
+declare i32 @call0()
+declare i32 @call1()
+declare i32 @call2()
+declare i32 @call3()

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jan 30, 2024
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@XChy XChy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtcxzyw dtcxzyw merged commit a034e65 into llvm:main Jan 31, 2024
3 of 4 checks passed
@dtcxzyw dtcxzyw deleted the perf/cvp-unreachable-default branch January 31, 2024 05:11
@eaeltsin
Copy link
Contributor

eaeltsin commented Feb 9, 2024

Heads-up - this might be the reason for the significant increase of generated protocol buffers compilation times (~5x).

We are looking for a reproducer, but maybe you have ideas right away?

@alexfh
Copy link
Contributor

alexfh commented Feb 9, 2024

Well, actually it's not a 5x increase, it's an increase from ~7s to "I got bored to wait" minutes (thus, either a superlinear from code size or an infinite loop). I ran clang under perf, and it seems to spend a lot of time in:

        __libc_start_main                                                                                                                                                                                                                           ▒        main                                                                                                                                                                                                                                        ▒        clang_main(int, char**, llvm::ToolContext const&)                                                                                                                                                                                           ▒        clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__u::pair<int, clang::driver::Command const*> >&)                                                                                         ▒        clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__u::pair<int, clang::driver::Command const*> >&, bool) const                                                                             ▒        clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const                                                                                                                       ▒        clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__u::optional<llvm::StringRef> >, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> >*, bool*) const                                               ▒        llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>)                                                                                                                                                                          ▒        void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__u::optional<llvm::StringRef> >, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> >*, bool*) const:▒        ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&)                                                                                                                                                               ▒        cc1_main(llvm::ArrayRef<char const*>, char const*, void*)                                                                                                                                                                                   ▒        clang::ExecuteCompilerInvocation(clang::CompilerInstance*)                                                                                                                                                                                  ▒        clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)                                                                                                                                                                              ▒        clang::FrontendAction::Execute()                                                                                                                                                                                                            ▒
        clang::ParseAST(clang::Sema&, bool, bool)                                                                                                                                                                                                   ▒
        clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&)                                                                                                                                                                           ▒
        clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::Int▒
        llvm::legacy::PassManagerImpl::run(llvm::Module&)                                                                                                                                                                                           ▒
        llvm::FPPassManager::runOnModule(llvm::Module&)                                                                                                                                                                                             ▒
        llvm::FPPassManager::runOnFunction(llvm::Function&)                                                                                                                                                                                         ▒
      - llvm::MachineFunctionPass::runOnFunction(llvm::Function&)                                                                                                                                                                                   ▒
         - 45.61% (anonymous namespace)::TailDuplicateBase::runOnMachineFunction(llvm::MachineFunction&)                                                                                                                                            ▒
              llvm::TailDuplicator::tailDuplicateBlocks()                                                                                                                                                                                           ▒
              llvm::TailDuplicator::shouldTailDuplicate(bool, llvm::MachineBasicBlock&)                                                                                                                                                             ▒

@nikic
Copy link
Contributor

nikic commented Feb 9, 2024

Probably another instance of #78582.

@alexfh
Copy link
Contributor

alexfh commented Feb 9, 2024

It finally finished after 1031s (increased from 7s before this patch).

@alexfh
Copy link
Contributor

alexfh commented Feb 9, 2024

A bit longer run under perf gives:

-   99.98%     0.00%  clang    clang               [.] clang_main(int, char**, llvm::ToolContext const&)                                                                                                                                            ▒
   - clang_main(int, char**, llvm::ToolContext const&)                                                                                                                                                                                              ▒
      - 99.98% clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__u::pair<int, clang::driver::Command const*> >&)                                                                                  ▒
           clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__u::pair<int, clang::driver::Command const*> >&, bool) const                                                                          ▒
           clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const                                                                                                                    ▒
           clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__u::optional<llvm::StringRef> >, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> >*, bool*) const                                            ▒
           llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>)                                                                                                                                                                       ▒
           void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__u::optional<llvm::StringRef> >, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> >*, bool*) con▒
         - ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&)                                                                                                                                                            ▒
            - 99.98% cc1_main(llvm::ArrayRef<char const*>, char const*, void*)                                                                                                                                                                      ▒
               - 99.98% clang::ExecuteCompilerInvocation(clang::CompilerInstance*)                                                                                                                                                                  ▒
                  - 99.98% clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)                                                                                                                                                           ▒
                     - 99.98% clang::FrontendAction::Execute()                                                                                                                                                                                      ▒
                        - clang::CodeGenAction::ExecuteAction()                                                                                                                                                                                     ▒
                           - 99.95% clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, cla▒
                              - 99.28% llvm::legacy::PassManagerImpl::run(llvm::Module&)                                                                                                                                                            ▒
                                 - 99.28% llvm::FPPassManager::runOnModule(llvm::Module&)                                                                                                                                                           ▒
                                    - 99.28% llvm::FPPassManager::runOnFunction(llvm::Function&)                                                                                                                                                    ▒
                                       - 99.17% llvm::MachineFunctionPass::runOnFunction(llvm::Function&)                                                                                                                                           ▒
                                          - 60.87% (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction(llvm::MachineFunction&)                                                                                                       ▒
                                             - 60.87% (anonymous namespace)::MachineBlockPlacement::buildCFGChains()                                                                                                                                ▒
                                                - 60.79% (anonymous namespace)::MachineBlockPlacement::buildLoopChains(llvm::MachineLoop const&)                                                                                                    ▒
                                                     55.20% llvm::MachineBranchProbabilityInfo::getEdgeProbability(llvm::MachineBasicBlock const*, llvm::MachineBasicBlock const*) const                                                            ▒
                                                   - 4.24% (anonymous namespace)::MachineBlockPlacement::TopFallThroughFreq(llvm::MachineBasicBlock const*, llvm::SmallSetVector<llvm::MachineBasicBlock const*, 16u> const&)                       ▒
                                                        0.80% llvm::MachineBasicBlock::getSuccProbability(std::__u::__wrap_iter<llvm::MachineBasicBlock* const*>) const                                                                             ▒
                                                     1.22% llvm::MachineBasicBlock::getSuccProbability(std::__u::__wrap_iter<llvm::MachineBasicBlock* const*>) const                                                                                ▒
                                          - 12.05% (anonymous namespace)::TailDuplicateBase::runOnMachineFunction(llvm::MachineFunction&)                                                                                                           ▒
                                             - 12.05% llvm::TailDuplicator::tailDuplicateBlocks()                                                                                                                                                   ▒
                                                  10.40% llvm::TailDuplicator::shouldTailDuplicate(bool, llvm::MachineBasicBlock&)                                                                                                                  ▒
                                                - 1.65% llvm::TailDuplicator::tailDuplicateAndUpdate(bool, llvm::MachineBasicBlock*, llvm::MachineBasicBlock*, llvm::SmallVectorImpl<llvm::MachineBasicBlock*>*, llvm::function_ref<void (llvm::Mach▒
                                                     0.91% llvm::MachineBasicBlock::isSuccessor(llvm::MachineBasicBlock const*) const                                                                                                               ▒
                                          - 8.89% (anonymous namespace)::BranchFolderPass::runOnMachineFunction(llvm::MachineFunction&)                                                                                                             ▒
                                             - llvm::BranchFolder::OptimizeFunction(llvm::MachineFunction&, llvm::TargetInstrInfo const*, llvm::TargetRegisterInfo const*, llvm::MachineLoopInfo*, bool)                                            ▒
                                                  4.09% llvm::MachineBasicBlock::hasEHPadSuccessor() const                                                                                                                                          ▒
                                                  3.11% llvm::MachineBasicBlock::mayHaveInlineAsmBr() const                                                                                                                                         ▒
                                                - 1.40% llvm::BranchFolder::OptimizeBranches(llvm::MachineFunction&)                                                                                                                                ▒
                                                   - 1.40% llvm::BranchFolder::OptimizeBlock(llvm::MachineBasicBlock*)                                                                                                                              ▒
                                                      + 1.35% llvm::MachineBasicBlock::canFallThrough()                                                                                                                                             ▒
                                          - 7.91% (anonymous namespace)::PHIElimination::runOnMachineFunction(llvm::MachineFunction&)                                                                                                               ▒
                                               7.34% llvm::LiveVariables::isLiveOut(llvm::Register, llvm::MachineBasicBlock const&)                                                                                                                 ▒
                                          - 5.95% llvm::MachineCycleInfoWrapperPass::runOnMachineFunction(llvm::MachineFunction&)                                                                                                                   ▒
                                               5.95% llvm::GenericCycleInfoCompute<llvm::GenericSSAContext<llvm::MachineFunction> >::run(llvm::MachineBasicBlock*)                                                                                  ▒
                                          + 0.93% llvm::MachineBlockFrequencyInfo::runOnMachineFunction(llvm::MachineFunction&)                                                                                                                     ▒
                                          + 0.76% (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&)                                                                                                               ▒
                              + 0.67% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >&, std::__u::unique_▒

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 9, 2024

It finally finished after 1031s (increased from 7s before this patch).

Could you please share the reproducer?

@alexfh
Copy link
Contributor

alexfh commented Feb 10, 2024

I've been trying to prepare a shareable test case, but so far I only have a few megabytes of C++ with internal code.

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

I've been trying to prepare a shareable test case, but so far I only have a few megabytes of C++ with internal code.

Could you check the difference between your code and #78578?

@alexfh
Copy link
Contributor

alexfh commented Feb 10, 2024

The original code is coming from a different file from what @dwblaikie used to prepare the reduced example in #76669 (comment).

@alexfh
Copy link
Contributor

alexfh commented Feb 10, 2024

I've got an example IR that shows the compile time regression (not as dramatic as 7->1000 s, but still around 25x increase):

$ ./clang-fast -O2 -Xclang=-ftime-report -c extracted.ll
...
===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 1.8437 seconds (1.8437 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   1.6917 (100.0%)   0.1520 (100.0%)   1.8437 (100.0%)   1.8437 (100.0%)  Clang front-end timer
   1.6917 (100.0%)   0.1520 (100.0%)   1.8437 (100.0%)   1.8437 (100.0%)  Total

$ ./clang-slow -O2 -Xclang=-ftime-report -c extracted.ll
...
===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  49.6733 (100.0%)   0.3947 (100.0%)  50.0680 (100.0%)  50.0719 (100.0%)  Code Generation Time
  49.6733 (100.0%)   0.3947 (100.0%)  50.0680 (100.0%)  50.0719 (100.0%)  Total
...
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 49.7508 seconds (49.7547 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  15.4055 ( 31.2%)   0.0120 (  3.1%)  15.4174 ( 31.0%)  15.4187 ( 31.0%)  Eliminate PHI nodes for register allocation
  12.8087 ( 25.9%)   0.0000 (  0.0%)  12.8087 ( 25.7%)  12.8097 ( 25.7%)  Control Flow Optimizer
   8.8482 ( 17.9%)   0.0040 (  1.0%)   8.8522 ( 17.8%)   8.8529 ( 17.8%)  Machine Cycle Info Analysis
   2.9137 (  5.9%)   0.1394 ( 36.0%)   3.0531 (  6.1%)   3.0533 (  6.1%)  Early Tail Duplication
   2.4706 (  5.0%)   0.0000 (  0.0%)   2.4706 (  5.0%)   2.4707 (  5.0%)  Branch Probability Basic Block Placement
   1.2901 (  2.6%)   0.0000 (  0.0%)   1.2901 (  2.6%)   1.2902 (  2.6%)  Machine code sinking
...
===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 50.1103 seconds (50.1142 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  49.7156 (100.0%)   0.3947 (100.0%)  50.1103 (100.0%)  50.1142 (100.0%)  Clang front-end timer
  49.7156 (100.0%)   0.3947 (100.0%)  50.1103 (100.0%)  50.1142 (100.0%)  Total

extracted.zip

@alexfh
Copy link
Contributor

alexfh commented Feb 10, 2024

Hopefully, this helps understanding what's wrong. Please let me know if you have ideas on how to proceed or need a longer time to resolve the problem.

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

Hopefully, this helps understanding what's wrong. Please let me know if you have ideas on how to proceed or need a longer time to resolve the problem.

I looked at it briefly, and it seems to be a different example. That's useful. But I'm on vacation, so this is going to be a little slow. But with your IR, I think we should be close to an answer.

@alexfh
Copy link
Contributor

alexfh commented Feb 10, 2024

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

When I'm on vacation, I usually do something in the evening. If you want to revert it, you'd better ask @dtcxzyw. @dtcxzyw should be more familiar with the backend than I am. Perhaps he would like to look into this as well (but I'm sure he's on vacation as well). Happy too! :3

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

I'll add you to #78582, and I'll update the follow-up here. The similarity between the two IRs is that the replicated BBs have many predecessors and successors. It's just that this IR is represented as a loop. (This number has also increased from 128 to 1547.)

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 10, 2024

I fixed the regression by changing the default case from %default.unreachable to another BB :(

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

I fixed the regression by changing the default case from %default.unreachable to another BB :(

An “arbitrarily meaningless” BB?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 10, 2024

I fixed the regression by changing the default case from %default.unreachable to another BB :(

An “arbitrarily meaningless” BB?

Yeah :)

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 10, 2024

I confirmed that the regression can be fixed by #78582.

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

I fixed the regression by changing the default case from %default.unreachable to another BB :(

An “arbitrarily meaningless” BB?

Yeah :)

I have considered a similar approach in my optimization PR. I'm pretty sure this has to be the wrong approach. Since unreachable provides more facts, this will definitely help with the final machine code. Also see #78582 (comment).

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 10, 2024

I fixed the regression by changing the default case from %default.unreachable to another BB :(

An “arbitrarily meaningless” BB?

Yeah :)

I have considered a similar approach in my optimization PR. I'm pretty sure this has to be the wrong approach. Since unreachable provides more facts, this will definitely help with the final machine code. Also see #78582 (comment).

I'm just saying that the key to this problem is the unreachable default case.

@DianQK
Copy link
Member

DianQK commented Feb 10, 2024

I fixed the regression by changing the default case from %default.unreachable to another BB :(

An “arbitrarily meaningless” BB?

Yeah :)

I have considered a similar approach in my optimization PR. I'm pretty sure this has to be the wrong approach. Since unreachable provides more facts, this will definitely help with the final machine code. Also see #78582 (comment).

I'm just saying that the key to this problem is the unreachable default case.

I get it. Please ignore me. :3

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 10, 2024

I found this PR does the same thing as https://reviews.llvm.org/D106056.

@eaeltsin
Copy link
Contributor

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

Hi, this is causing quite a significant number of failures in our internal testing - many protocol buffer generated files are timing out. We really appreciate it if you submit the fix soon or revert this until the fix is ready. Thanks!

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 13, 2024

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

Hi, this is causing quite a significant number of failures in our internal testing - many protocol buffer generated files are timing out. We really appreciate it if you submit the fix soon or revert this until the fix is ready. Thanks!

Potential fix: #78582

@eaeltsin
Copy link
Contributor

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

Hi, this is causing quite a significant number of failures in our internal testing - many protocol buffer generated files are timing out. We really appreciate it if you submit the fix soon or revert this until the fix is ready. Thanks!

Potential fix: #78582

Yes, I see, but reading through comments, I don't understand how close it is to being submitted. And we really need to mitigate the issue soon.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 13, 2024

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

Hi, this is causing quite a significant number of failures in our internal testing - many protocol buffer generated files are timing out. We really appreciate it if you submit the fix soon or revert this until the fix is ready. Thanks!

Potential fix: #78582

Yes, I see, but reading through comments, I don't understand how close it is to being submitted. And we really need to mitigate the issue soon.

That is fair. I will revert it today.

dtcxzyw added a commit to dtcxzyw/llvm-project that referenced this pull request Feb 13, 2024
dtcxzyw added a commit that referenced this pull request Feb 13, 2024
…81585)

This reverts commit a034e65.

Some protobuf users reported that this patch caused a significant
compile-time regression because `TailDuplicator` works poorly with a
specific pattern.

We will reland it once the codegen issue is fixed.
@eaeltsin
Copy link
Contributor

Thanks for looking! I'd say if it's not convenient to prepare a fix by around Tuesday, it would likely be better to revert (I can take care of this) until you're back from the holidays. Happy New Year, btw! ;)

Hi, this is causing quite a significant number of failures in our internal testing - many protocol buffer generated files are timing out. We really appreciate it if you submit the fix soon or revert this until the fix is ready. Thanks!

Potential fix: #78582

Yes, I see, but reading through comments, I don't understand how close it is to being submitted. And we really need to mitigate the issue soon.

That is fair. I will revert it today.

Thank you! We appreciate it.

@topperc
Copy link
Collaborator

topperc commented Feb 16, 2024

I'm also seeing an issue from this patch. I have a switch with all values 0-126 and a default. All 128 cases jump to basic blocks that contain a branch to a common merge point. The common merge point contains a phi that picks a constant based on how it was reached. I believe this constant is the original switch value + 5. Except for the default case which is 0. This makes it a candidate for LinearMapKind in SimplifyCFG's SwitchLookupTable::BuildLookup.

This patch determines the default case is unreachable. Somehow that allowed jump threading to thread case 126. This prevented SimplifyCFG from doing the lookup table conversion.

Correction, it turns out I had modified JumpThreading in my downstream to prevent it from threading this kind of lookup table. I guess the unreachable broke my modified JumpThreading. So this isn't a regression upstream.

@dtcxzyw dtcxzyw self-assigned this Mar 13, 2024
@DianQK
Copy link
Member

DianQK commented May 25, 2024

#78582 should have unlocked this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants