New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undesired(?) slow down in ssa_opt_alias #7432
Comments
Here is another file which went from a dozen seconds to compile in Erlang/OTP 25 to minutes in Erlang/OTP 26: https://gist.github.com/josevalim/94bfd65e8eaf892ed700df349838796a It is an Elixir file but I can convert to Erlang if necessary. I suspect it is the exact the same issue, so the file in the issue description should be enough. I will edit this comment once profiling finishes. EDIT: Yup, also
|
@jhogberg, I missed this the first time around. I will take a look. |
@josevalim, that's not strange, the alias analysis pass is fairly costly. When you change You can disable the alias analysis pass by running |
@frej that makes perfect sense. We were just short-circuiting part of the analysis then, thanks! |
@frej benchmarking the code points to the sets operations being the source of slow down. I assume we build sets with all variables in a function and, for large functions, that's part of the problem? In any case, in kills_is([I|Is], Live0, KillsMap0, Blk) ->
{Live, Key} = case I of
#b_set{dst=Dst} ->
{sets:del_element(Dst, Live0), Dst};
_ ->
{Live0, {terminator, Blk}}
end,
Uses = sets:from_list(beam_ssa:used(I), [{version,2}]),
RemainingUses = sets:union(Live0, Uses),
Killed = sets:subtract(RemainingUses, Live0),
KillsMap = KillsMap0#{Key => Killed},
kills_is(Is, sets:union(Live, Killed), KillsMap, Blk); I think RemainingUses = sets:union(Live0, Uses),
Killed = sets:subtract(RemainingUses, Live0), is the same as: Killed = sets:subtract(Uses, Live0), And, I assume Killed = sets:from_list([not sets:is_element(L, Live0) || L <- beam_ssa:used(I)], [{version, 2}]), ? |
Thinking about the algorithm, my understanding is that you want to know when a variable/label is no longer used by computing the killsets. Today that is implemented with a pass forward, to compute all variables (that's the expensive one as it builds a large set), and then an additional pass. Could it perhaps be done with a single pass backwards? If you do a backward pass, the first time a variable appears, it is the last time it is used, therefore it belongs to the killset of that instruction and you will store it in the set of killed variables (call this the killed-but-not-defined set). As you traverse up and define those variables, you can remove them from the set. That's meant to be the benefit of this approach: you won't have to build very large sets for very large programs (unless the variables lifetime spans the whole program). The issue is branching code. You need to know if a label is defined within that branch or before, so now you are back with two passes: the first pass computes the killsets (alongside the killed-but-not-defined-set) and also stores all variables defined-but-not-killed in their own set (those are either unused or used within a branch). Those two sets should not get very large. All of this without going inside branches. Then we do another pass with the goal of traversing branching code. Inside the branching code, we have the same process, starting with a backward pass, such that when we see a variable for the first time (i.e. its last use):
The first pass and the pass within branches can be unified by setting the "killed-but-not-defined" to be variables given as arguments (so perhaps a better name is the "do-not-kill" set. Perhaps I am over-simplifying or over-complicating the problem, perhaps this is already what you are doing, but in case it does help somehow, there it goes (if it doesn't, apologies). |
@josevalim I'll incorporate the equivalence (A ∪ B) ∖ A <=> B ∖ A, unfortunately it doesn't make a noticeable speedup for the example. The alias pass uses the standard fixpoint algorithm for calculating the liveness for all variables in the function, and then the liveness is used to calculate the kill-sets. For this example we have over 800 variables and the liveness calculation needs 248 iterations before it converges. Doing an experiment were I do the killset calculation twice [edit: from the liveness information] doesn't change the execution time in any noticeable way. Doing the liveness calculation twice increases the total runtime by around 30%. So your idea to just calculate the kill sets directly looks attractive. |
I also found this paper which defines an implementation similar to the one I outlined above, without fixpoint computation, but it is also capable of handing loops: https://inria.hal.science/inria-00558509/PDF/RR-7503.pdf |
Nice! Section 4 covers the algorithm I'm using for liveness. The reason I'm going the long way over liveness is that I had a liveness pass already lying around, so I didn't look into a specialized algorithm. I ran profiling on the modules compiled by |
Variables which die in a basic block cannot influence the alias status of variables in successor blocks. By pruning dead variables, which are not part of a parent-child derivation relationship of live variables, the size of the active sharing state is reduced. The reduction in size speeds up subsequent aa_mergs_ss/3 operations. Combined with the improved kill-set calculation and the changed data structure for describing the alias status of variables, this patch provides a substantial reduction in time required for alias analysis. For the set of modules compiled by `scripts/diffable` the time spent in alias analysis is reduced by approximately 55%. For the example in Issue erlang#7432 [1], provided by José Valim, which has a large number of variables, the reduction is even more dramatic. The time spent in the alias analysis pass is reduced by 97%, [1] erlang#7432
@josevalim, a quick update. I've spent some time on this during the last weeks, I have a branch which provides a 55% speedup for the sets of modules compiled by |
Very exciting! Thanks for the work and sharing the updates! |
Variables which die in a basic block cannot influence the alias status of variables in successor blocks. By pruning dead variables, which are not part of a parent-child derivation relationship of live variables, the size of the active sharing state is reduced. The reduction in size speeds up subsequent `aa_merge_ss/3` operations. Combined with improved kill-set calculation (9ad5c5377cb5779f9c581f2549f0f48d224d0664) and an improved data structure for describing the alias status of variables (c389665), this patch provides a substantial reduction in time required for alias analysis. For the set of modules compiled by `scripts/diffable` the time spent in alias analysis is reduced by approximately 55%. For the example in Issue erlang#7432 [1], provided by José Valim, which has a large number of variables, the reduction is even more dramatic. The time spent in the alias analysis pass is reduced by 97%. [1] erlang#7432 Closes: erlang#7432
Variables which die in a basic block cannot influence the alias status of variables in successor blocks. By pruning dead variables, which are not part of a parent-child derivation relationship of live variables, the size of the active sharing state is reduced. The reduction in size speeds up subsequent `aa_merge_ss/3` operations. Combined with improved kill-set calculation (8545471) and an improved data structure for describing the alias status of variables (c389665), this patch provides a substantial reduction in time required for alias analysis. For the set of modules compiled by `scripts/diffable` the time spent in alias analysis is reduced by approximately 55%. For the example in Issue erlang#7432 [1], provided by José Valim, which has a large number of variables, the reduction is even more dramatic. The time spent in the alias analysis pass is reduced by 97%. [1] erlang#7432 Closes: erlang#7432
Variables which die in a basic block cannot influence the alias status of variables in successor blocks. By pruning dead variables, which are not part of a parent-child derivation relationship of live variables, the size of the active sharing state is reduced. The reduction in size speeds up subsequent `aa_merge_ss/3` operations. Combined with improved kill-set calculation (8545471) and an improved data structure for describing the alias status of variables (c389665), this patch provides a substantial reduction of the time required for alias analysis. For the set of modules compiled by `scripts/diffable` the time spent in alias analysis is reduced by approximately 55%. For the example in Issue erlang#7432 [1], provided by José Valim, which has a large number of variables, the reduction is even more dramatic. The time spent in the alias analysis pass is reduced by 97%. [1] erlang#7432 Closes: erlang#7432
Describe the bug
Take the Erlang file from this gist: https://gist.github.com/josevalim/c59e1d706daf6e863f6f2745a04fb810
Now compile it with
time erlc +time Elixir.HelloTest.erl
. In my machine, it takes 0.430s andssa_opt_alias
reports:Now search for
whatever:a(
(on lines 150 and 183) and replace it simply bya(
(i.e. convert the remote call into a local call). It runs much faster in 0.270s where:This slow down does not happen on Erlang/OTP 25. From an external glance, I wouldn't expect swapping a local call by a remote one to cause such slow down but I can be wrong.
Also note the larger the code contents, the slower it gets. :)
Affected versions
Erlang/OTP 26.
Additional context
Originally reported here: elixir-lang/elixir#12696
The text was updated successfully, but these errors were encountered: