SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

Keno · 2023-01-18T22:16:18Z

Consider the attached .bc file. This file is quite large (about 4MB uncompressed,
about 300k IR statements, all in one function with very large BBs). As such, it is no surprise that
compiling it may take same time. However, on LLVM master the compile time (llc A2.bc)
is excessive (I haven't actually been patient enough to let it finish yet, but
at least three hours).

The compile time improves dramatically if I disable the SimplifySelectOps load reassociation:

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index dd7f692ede90..17f96ab2efe5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -24969,6 +24969,7 @@ bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect, SDValue LHS,
                                       LLD->getBasePtr().getValueType()))
       return false;
 
+    return false;
 
     // Check that the select condition doesn't reach either load.  If so,
     // folding this will induce a cycle into the DAG.  If not, this is safe to

With this patch applied, the whole compilation finishes in about 25s, which seems
reasonable to me for this size IR.

I think there are some low-hanging fruits to improve performance slightly, but the
overall problem here is algorithmic. The
SDNode::hasPredecessorHelper(LLD, Visited, Worklist)) check that is used to avoid
introducing cycles needs to essentially walk the entire function, which in functions
as large as this is very expensive.

I think there's a couple of options here:

Try some sort of different search heuristic. For example, we might do a BFS of successors instead from both points to establish a local topological order. The intuition here is that while in the common case the two nodes are not each other's predecessors, they likely flow into common successors, making that a more fruitful search direction.
We could try maintaining an incremental topological order. The changes that DAGCombine makes are fairly local, which avoids some of the usual worst case behavior in topological maintenance algorithms.
We could maybe only run this optimization after legalization, when we already have a topological order.
We could just put a MaxNodes cutoff into this optimization. This is probably the easiest option, but not particularly satisfying.

I'm of course also working on making the frontend emit something nicer here and maybe have it split the function into a few parts, but it would be good to fix it in both places if possible.

A2.bc.tar.gz

The text was updated successfully, but these errors were encountered:

RKSimon · 2023-01-20T16:02:54Z

We could just put a MaxNodes cutoff into this optimization. This is probably the easiest option, but not particularly satisfying.

hasPredecessorHelper already takes an optional MaxSteps limit that we could make use of

`SimplifySelectOps` is a late optimization in LLVM that attempts to translate `select(C, load(A), load(B))` into `load(select(C, A, B))`. However, in order for it to do this optimization, it needs to check that `C` does not depend on the result of `load(A)` or `load(B)`. Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not have a topological order of statements computed at this stage of the compiler, so LLVM needs to iterate through all statements in the function in order to perform this legality check. For large functions, this is extremely expensive, accounting for the majority of all compilation time for such functions. On the other hand, the optimization itself is minor, allowing at most the elision of one additional load (and doesn't fire particularly often, because the middle end can perform similar optimizations). Until there is a proper solution in LLVM, simply disable this optimizations, making LLVM several orders of magnitude faster on real world benchmarks. X-ref: llvm#60132

`SimplifySelectOps` is a late optimization in LLVM that attempts to translate `select(C, load(A), load(B))` into `load(select(C, A, B))`. However, in order for it to do this optimization, it needs to check that `C` does not depend on the result of `load(A)` or `load(B)`. Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not have a topological order of statements computed at this stage of the compiler, so LLVM needs to iterate through all statements in the function in order to perform this legality check. For large functions, this is extremely expensive, accounting for the majority of all compilation time for such functions. On the other hand, the optimization itself is minor, allowing at most the elision of one additional load (and doesn't fire particularly often, because the middle end can perform similar optimizations). Until there is a proper solution in LLVM, simply disable this optimizations, making LLVM several orders of magnitude faster on real world benchmarks. X-ref: llvm#60132 (cherry picked from commit 2105735)

`SimplifySelectOps` is a late optimization in LLVM that attempts to translate `select(C, load(A), load(B))` into `load(select(C, A, B))`. However, in order for it to do this optimization, it needs to check that `C` does not depend on the result of `load(A)` or `load(B)`. Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not have a topological order of statements computed at this stage of the compiler, so LLVM needs to iterate through all statements in the function in order to perform this legality check. For large functions, this is extremely expensive, accounting for the majority of all compilation time for such functions. On the other hand, the optimization itself is minor, allowing at most the elision of one additional load (and doesn't fire particularly often, because the middle end can perform similar optimizations). Until there is a proper solution in LLVM, simply disable this optimizations, making LLVM several orders of magnitude faster on real world benchmarks. X-ref: llvm#60132

`SimplifySelectOps` is a late optimization in LLVM that attempts to translate `select(C, load(A), load(B))` into `load(select(C, A, B))`. However, in order for it to do this optimization, it needs to check that `C` does not depend on the result of `load(A)` or `load(B)`. Unfortunately (unlikely Julia and LLVM at the IR level), LLVM does not have a topological order of statements computed at this stage of the compiler, so LLVM needs to iterate through all statements in the function in order to perform this legality check. For large functions, this is extremely expensive, accounting for the majority of all compilation time for such functions. On the other hand, the optimization itself is minor, allowing at most the elision of one additional load (and doesn't fire particularly often, because the middle end can perform similar optimizations). Until there is a proper solution in LLVM, simply disable this optimizations, making LLVM several orders of magnitude faster on real world benchmarks. X-ref: llvm#60132 (cherry picked from commit 8a2c8f5)

github-actions bot added the new issue label Jan 18, 2023

Keno added llvm:compiletime llvm:codegen and removed new issue labels Jan 18, 2023

Keno mentioned this issue Feb 15, 2023

Carry patch to avoid excessive LLVM time JuliaLang/julia#48681

Closed

EugeneZelenko added slow-compile and removed llvm:compiletime labels Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

Keno commented Jan 18, 2023 •

edited by VoltrexKeyva

RKSimon commented Jan 20, 2023

SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

SelectionDAG's SimplifySelectOps can take a pathological amount of time #60132

Comments

Keno commented Jan 18, 2023 • edited by VoltrexKeyva

RKSimon commented Jan 20, 2023

Keno commented Jan 18, 2023 •

edited by VoltrexKeyva