Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenable JumpTableToSwitch pass by default #83229

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexander-shaposhnikov
Copy link
Collaborator

This is a recommit of 1069823 (#82546)
with the default value of JumpTableSizeThreshold decreased from 10 to 5.
Added a comment with data points in JumpTableToSwitch.cpp.

Test plan:

  1. ninja check-all
  2. Built LLVM (with LLVM_ENABLE_PROJECTS="bolt;mlir;compiler-rt;flang;llvm;clang;lld;clang-tools-extra") with the bootstrapped version of Clang.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 28, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Alexander Shaposhnikov (alexander-shaposhnikov)

Changes

This is a recommit of 1069823 (#82546)
with the default value of JumpTableSizeThreshold decreased from 10 to 5.
Added a comment with data points in JumpTableToSwitch.cpp.

Test plan:

  1. ninja check-all
  2. Built LLVM (with LLVM_ENABLE_PROJECTS="bolt;mlir;compiler-rt;flang;llvm;clang;lld;clang-tools-extra") with the bootstrapped version of Clang.

Full diff: https://github.com/llvm/llvm-project/pull/83229.diff

9 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp (+9-1)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+1-5)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 142bd50b3798e0..17b55b63ac03cf 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -247,7 +247,7 @@ static cl::opt<bool>
 
 static cl::opt<bool> EnableJumpTableToSwitch(
     "enable-jump-table-to-switch",
-    cl::desc("Enable JumpTableToSwitch pass (default = off)"));
+    cl::desc("Enable JumpTableToSwitch pass (default = on)"), cl::init(true));
 
 // This option is used in simplifying testing SampleFDO optimizations for
 // profile loading.
diff --git a/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp b/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
index f9712dbbf68a6a..5ff7836f98ab6c 100644
--- a/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
+++ b/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
@@ -18,11 +18,19 @@
 
 using namespace llvm;
 
+// The choice of this threshold is motivated by the following data points:
+// 1. Chromium successfully builds with JumpTableSizeThreshold=10, and this
+// build includes many third-party C++ projects.
+// 2. Flang experiences several compilation slow-down issues with
+// JumpTableSizeThreshold=9, builds successfully with JumpTableSizeThreshold=8.
+// On several investigated pathological examples the expansion of jump tables
+// introduced more than 100K new callsites where inlining happened.
+// 3. TODO: Consider increasing this value.
 static cl::opt<unsigned>
     JumpTableSizeThreshold("jump-table-to-switch-size-threshold", cl::Hidden,
                            cl::desc("Only split jump tables with size less or "
                                     "equal than JumpTableSizeThreshold."),
-                           cl::init(10));
+                           cl::init(5));
 
 // TODO: Consider adding a cost model for profitability analysis of this
 // transformation. Currently we replace a jump table with a switch if all the
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index 51fb93daa4dfa6..285077ff8e31a9 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -71,10 +71,6 @@
 ; RUN:     -passes='default<O3>' -S  %s 2>&1 \
 ; RUN:     | FileCheck %s --check-prefixes=CHECK-O,CHECK-DEFAULT,CHECK-O3,%llvmcheckext,CHECK-EP-OPTIMIZER-LAST,CHECK-O23SZ
 
-; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
-; RUN:     -passes='default<O3>' -enable-jump-table-to-switch -S  %s 2>&1 \
-; RUN:     | FileCheck %s --check-prefixes=CHECK-O,CHECK-DEFAULT,CHECK-O3,CHECK-JUMP-TABLE-TO-SWITCH,CHECK-O23SZ,%llvmcheckext
-
 ; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
 ; RUN:     -passes='default<O3>' -enable-matrix -S  %s 2>&1 \
 ; RUN:     | FileCheck %s --check-prefixes=CHECK-O,CHECK-DEFAULT,CHECK-O3,CHECK-O23SZ,%llvmcheckext,CHECK-MATRIX
@@ -155,7 +151,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
-; CHECK-JUMP-TABLE-TO-SWITCH-NEXT: Running pass: JumpTableToSwitchPass
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 064362eabbf839..29a4d79037427e 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -90,6 +90,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 19a44867e434ac..bf06782c86f862 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -78,6 +78,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index ac80a31d8fd4bc..0cc61121de01ce 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -86,6 +86,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index 6486639e07b49c..0e5839797afe9e 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -121,6 +121,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index 09f9f0f48baddb..68c2e581463000 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -118,6 +118,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 47bdbfd2d357d4..8311a009711d1e 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -90,6 +90,7 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Feb 28, 2024

Edit: I forgot to rebuild, ignore me :)

@DavidSpickett
Copy link
Collaborator

Remembering to rebuild this time, I tried a few thresholds:

  • 5, takes at least half an hour on a single file, killed it before it could finish since that's going to break the 20 minute timeout anyway.
  • 4, same.
  • 3, longest time on a single file is around 10 minutes and the build completes in a reasonable time.

Not sure that a jump table size of 3 even worth looking at, but it's what tames the build.

For context, this is happening on a server with 160 Ampere Altra (Neoverse N1) cores, so it can do plenty of work while waiting for flang files to compile which helps the overall build time. Also we may be bumping up against single core performance perhaps that is why you were able to use threshold 5. It does not seem like a good idea to introduce something that even high end hardware struggles with, even if Flang's design might not be ideal.

If there is a Google codebase that kinda looks like flang but does not have these problems, perhaps we can compare and refactor flang to match, but that's a whole other task.

Again the bot's config is available in the build logs, https://lab.llvm.org/buildbot/#/builders/179/builds/9480, if you want to see if there's some flag that might be making this worse but in general it's gonna be what's in the flang sources as you've already seen.

@fhahn
Copy link
Contributor

fhahn commented Feb 28, 2024

Remembering to rebuild this time, I tried a few thresholds:

  • 5, takes at least half an hour on a single file, killed it before it could finish since that's going to break the 20 minute timeout anyway.
  • 4, same.
  • 3, longest time on a single file is around 10 minutes and the build completes in a reasonable time.

Not sure that a jump table size of 3 even worth looking at, but it's what tames the build.

How do the build times change relative to without the pass being enabled, i.e. what does the pass contribute to the overall compile-time.

It does not seem like a good idea to introduce something that even high end hardware struggles with, even if Flang's design might not be ideal.

yeah, I don't think we should enable something by default that causes excessive compile-time increases for valid input.

@DavidSpickett
Copy link
Collaborator

How do the build times change relative to without the pass being enabled, i.e. what does the pass contribute to the overall compile-time.

I'll get pass on/off and various threshold times for one of the large pre-processed files.

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Feb 28, 2024

With the pass disabled, or JumpTableSizeThreshold at 8 or below, compiling the file from #82546 (comment) takes between 1 and 2 minutes.

JumpTableSizeThreshold 9, has, so far, taken 40 minutes. I'll leave it running and see what the final time is.

Which is not exactly nuanced but there it is. Perhaps that file only has tables of a certain size. Certainly this explains why lowering the threshold to 5 got further into the build, but then got stuck on another file that probably has a different distribution of sizes.

llvm-project/flang/lib/Semantics/check-do-forall.cpp and llvm-project/flang/lib/Lower/ConvertExpr.cpp are two such later files I saw.

@fhahn
Copy link
Contributor

fhahn commented Feb 28, 2024

@DavidSpickett thanks for confirming. Before adjusting the threshold, I think it would be good to make sure there are no compile-time explosions. Seems like the flang codebase is great at uncovering those in this particular case

@DavidSpickett
Copy link
Collaborator

Final time for compiling the reproducer JumpTableSizeThreshold 9 was 1h52m17s.

If you make reproducers from the problematic files, I am happy to try them on our hardware if that helps.

Also if you find anything that Flang could do to reduce its compile time (regardless of this pass), I have colleagues working on it who might be able to implement that. We're probably stuck with the overall design but perhaps there's some small things we can do to help the compiler.

@alexander-shaposhnikov
Copy link
Collaborator Author

@DavidSpickett - would you be so kind to try to build Flang from the latest source (using a bootstrapped version of Clang with the optimization enabled (need to add cl::init(true) to EnableJumpTableToSwitch) (but leave the threshold unchanged)) on your machine and see how long it takes ?

@DavidSpickett
Copy link
Collaborator

clang 15 (release package) building -> clang 19 (88a733f with cl::init(true), release with asserts) building -> flang (release)

Took around 6.5 hours before I killed it. The same build without enabling the pass takes 15 minutes.

@alexander-shaposhnikov
Copy link
Collaborator Author

alexander-shaposhnikov commented Apr 26, 2024

@DavidSpickett - thanks, just want to clarify - is flang (release) = the current trunk ?
(the changes in Flang landed yesterday)

p.s. what would be the best way for me to reproduce this (if any) - i.e. is there any way to trigger this particular builld bot before the commit?

@nikic
Copy link
Contributor

nikic commented Apr 29, 2024

FWIW, I think that even if the specific flang issue is no longer present due to source changes in flang, we shouldn't re-enable this pass without some analysis for why that compile-time explosion happened in LLVM terms, and mitigation to avoid it -- beyond just lowering the threshold to avoid specific examples. It seems plausible that it could still happen in other cases with the lowered threshold.

@alexander-shaposhnikov
Copy link
Collaborator Author

alexander-shaposhnikov commented Apr 29, 2024

@nikic - I've looked (just a bit) into what's going on (probably will look more thoroughly when I have time, currently I'm overloaded with other work).
IIRC, the problematic examples contained ~10K+ jump tables, those jump tables got expanded into ~100K+ calls, and from there the compilation slowed down considerably (in particular, inlining took a significant amount of time), the pass itself was fast.

  • It'd be useful to better understand why e.g. this issue does or does not happen with clang + libstdc++ / gcc + libstdc++, conceptually the transformation performed by the pass can be done at the source level, and the LLVM IR for the report above (after optimization) should be ~somewhat similar to what one would get from libstdc++.
    I don't have an answer to this question, one thing that came up - with gcc flang has been using common::visit instead of std::visit and this potentially might explain the difference (I wanted to try to clarify this bit in the comment above, it's still unclear to me).
  • if the combination clang + libstdc++ works with common::visit replaced by std::visit, this might hint that there might be a better place in the pipeline for this transformation. If it doesn't - then, perhaps, the input is somewhat pathological.
  • I guess another way to attack the problem would be to try to change libc++'s std::variant/std::visit. In this case the optimization is still useful (cause the pattern occurs outside of libc++ too), but std::variant/std::visit are responsible for a very large portion of it.

@DavidSpickett
Copy link
Collaborator

@DavidSpickett - thanks, just want to clarify - is flang (release) = the current trunk ?
(the changes in Flang landed yesterday)

Yes it was 88a733f for the clang and then I used the same commit for flang.
In case I was too early for those changes I'll try another build overnight.

You mentioned libstdc++ vs. libcxx, this build is using libstdc++. We do have a builder that uses libcxx but it's single stage.

p.s. what would be the best way for me to reproduce this (if any) - i.e. is there any way to trigger this particular builld bot before the commit?

There is no way to run a pre-merge branch on a buildbot (but one day I hope).

The buildbot is running a docker container based on Ubuntu Focal. Our image is https://hub.docker.com/layers/linaro/ci-arm64-tcwg-llvmbot-ubuntu/focal/images/sha256-19e78b36b34f63c27ee53cdc92b6d38c4158167ba32856924442fc652e719c60?context=explore / https://git.linaro.org/ci/dockerfiles.git/tree/focal-arm64-tcwg-base/focal-arm64-tcwg-llvmbot/Dockerfile but there shouldn't be anything in there that's majorly different from the base Ubuntu so try that first.

If it's known to be AArch64 specific and you don't have access to a machine I can setup a container for you on one of ours. Email me at my commits address if you want to do that (and you / your employer is comfortable with that).

It may reproduce on an AWS Graviton, I'm checking now, I'm pretty sure it will. That's another option if you have access to AWS.

@DavidSpickett
Copy link
Collaborator

Also if you need someone to dig more into Flang specifically, I can ask one of my Linaro colleagues to look into this (I just report problems, they actually work on it :) ).

@DavidSpickett
Copy link
Collaborator

It did reproduce on a Graviton. At -j4 due to RAM constraints, without the pass it took 2h10m and with I killed the build after almost 5 hours. So this is not machine specific, perhaps it is architecture specific in that AArch64 has more inlining to do?

On the original machine the overnight build is at 14 hours and counting and a few processes got killed by the system, so I'll stop it there. Not a particularly useful result but I was curious if it would complete.

@alexander-shaposhnikov
Copy link
Collaborator Author

@DavidSpickett - thanks a lot, this is very useful (I'm still busy but hope to get back to this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants