Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

Open
Keno opened this issue Jan 18, 2023 · 1 comment
Open

SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

Keno opened this issue Jan 18, 2023 · 1 comment

Comments

@Keno
Copy link
Member

Keno commented Jan 18, 2023

Consider the attached .bc file. This file is quite large (about 4MB uncompressed,
about 300k IR statements, all in one function with very large BBs). As such, it is no surprise that
compiling it may take same time. However, on LLVM master the compile time (llc A2.bc)
is excessive (I haven't actually been patient enough to let it finish yet, but
at least three hours).

The compile time improves dramatically if I disable the SimplifySelectOps load reassociation:

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index dd7f692ede90..17f96ab2efe5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -24969,6 +24969,7 @@ bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect, SDValue LHS,
                                       LLD->getBasePtr().getValueType()))
       return false;
 
+    return false;
 
     // Check that the select condition doesn't reach either load.  If so,
     // folding this will induce a cycle into the DAG.  If not, this is safe to

With this patch applied, the whole compilation finishes in about 25s, which seems
reasonable to me for this size IR.

I think there are some low-hanging fruits to improve performance slightly, but the
overall problem here is algorithmic. The
SDNode::hasPredecessorHelper(LLD, Visited, Worklist)) check that is used to avoid
introducing cycles needs to essentially walk the entire function, which in functions
as large as this is very expensive.

I think there's a couple of options here:

  1. Try some sort of different search heuristic. For example, we might do a BFS of successors instead from both points to establish a local topological order. The intuition here is that while in the common case the two nodes are not each other's predecessors, they likely flow into common successors, making that a more fruitful search direction.
  2. We could try maintaining an incremental topological order. The changes that DAGCombine makes are fairly local, which avoids some of the usual worst case behavior in topological maintenance algorithms.
  3. We could maybe only run this optimization after legalization, when we already have a topological order.
  4. We could just put a MaxNodes cutoff into this optimization. This is probably the easiest option, but not particularly satisfying.

I'm of course also working on making the frontend emit something nicer here and maybe have it split the function into a few parts, but it would be good to fix it in both places if possible.

A2.bc.tar.gz

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 20, 2023

  1. We could just put a MaxNodes cutoff into this optimization. This is probably the easiest option, but not particularly satisfying.

hasPredecessorHelper already takes an optional MaxSteps limit that we could make use of

staticfloat added a commit to JuliaLang/llvm-project that referenced this issue Mar 28, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
vchuravy pushed a commit to JuliaLang/llvm-project that referenced this issue Apr 16, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
(cherry picked from commit 2105735)
vchuravy pushed a commit to JuliaLang/llvm-project that referenced this issue Apr 16, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
vchuravy pushed a commit to JuliaLang/llvm-project that referenced this issue Apr 30, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
maleadt pushed a commit to JuliaLang/llvm-project that referenced this issue Aug 2, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
maleadt pushed a commit to JuliaLang/llvm-project that referenced this issue Aug 2, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
vchuravy pushed a commit to JuliaLang/llvm-project that referenced this issue Oct 6, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
giordano pushed a commit to JuliaLang/llvm-project that referenced this issue Dec 27, 2023
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
giordano pushed a commit to JuliaLang/llvm-project that referenced this issue Mar 8, 2024
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
(cherry picked from commit 8a2c8f5)
giordano pushed a commit to JuliaLang/llvm-project that referenced this issue May 2, 2024
`SimplifySelectOps` is a late optimization in LLVM that attempts to
translate `select(C, load(A), load(B))` into `load(select(C, A, B))`.
However, in order for it to do this optimization, it needs to check that
`C` does not depend on the result of `load(A)` or `load(B)`.
Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not
have a topological order of statements computed at this stage of the
compiler, so LLVM needs to iterate through all statements in the
function in order to perform this legality check. For large functions,
this is extremely expensive, accounting for the majority of all
compilation time for such functions. On the other hand, the optimization
itself is minor, allowing at most the elision of one additional load
(and doesn't fire particularly often, because the middle end can perform
similar optimizations). Until there is a proper solution in LLVM, simply
disable this optimizations, making LLVM several orders of magnitude
faster on real world benchmarks.

X-ref: llvm#60132
(cherry picked from commit 8a2c8f5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants