Replace beam_dead with beam_ssa_dead #1955

bjorng · 2018-09-17T08:12:11Z

This pull request introduces the new beam_ssa_dead pass and removes the beam_dead and beam_split compiler passes. There are also new and improved optimizations in beam_ssa_opt and beam_ssa_type. A few of the optimizations in beam_dead have been moved to beam_jump, beam_a, and beam_peep.

In general, the new optimizations improves the code in many more places than beam_dead does. In a few cases, beam_dead could find optimizations that the new optimization passes fail to find.

For more details, see the individual commit messages.

When creating a phi node for the common exit block of a receive, the code failed to take into account that there could be more than one predecessor to the exit block for each remove_message. Rename exit_predessor/3 to exit_predessors/3 and make it return a list of the predecessors.

When optimizations get more powerful, beam_validator must keep up.

Add normalize/1 to simplify optimizations.

Since beam_ssa:linearize/1 may remove blocks that are unreachable, adjust phi nodes to make sure that they don't refer to discarded blocks or to blocks that no longer branch to the phi node in question.

Add trim_unreachable/1 to remove unreachable blocks and adjust phi nodes.

It is faster to use cerl_sets instead of gb_sets to keep track of seen blocks.

The 'move' instruction can be eliminated in code such as: {test,is_eq_exact,{f,42},[{x,0},{atom,value}]}. {move,{atom,value},{x,0}}. Move that optimization from beam_dead to beam_a. The optimization will be simpler because the 'move' instruction has not yet been moved into a block. Getting rid of 'move' earlier will also save work for later passes. Also move the optimization that eliminates instructions such as from beam_dead to beam_a: {test_is_eq_exact,{f,42},[{x,0},{x,0}]}.

This functionality will soon be needed.

Those optimizations are unsafe if beam_dead has been run before.

A select_val instruction that test whether a register is a boolean like this: {select_val,Reg,{f,Fail},{list,[{atom,true},Lbl,{atom,false},Lbl]}}. can be replaced with an is_boolean test: {test,is_boolean,{f,Fail},[Reg]}. {jump,{f,Lbl}}. This optimization is currently done in beam_dead. However, if done in the beam_peep, it can catch more opportunities to do the optimization, because after having run beam_jump, labels that were different have been coalesced.

Add more instructions to the list of functions that can be safely removed if their values are not used. This is necessary for correctness when doing more aggressive optimizations. Without this change, the 'succeeded' instruction could be optimized away leaving just the instruction followed by an unconditional branch, which the beam_ssa_codegen does not know how to handle. Here is an example: _3 = bs_start_match _1 br label 13 By adding bs_start_match to the list, the bs_start_match instruction will be removed too. (If the result of bs_start_match is actually used, the succeeded instruction would not be removed.) While we are it, rename the misnamed function is_pure/1 to no_side_effect/1 and move it to beam_ssa. is_pure/1 is a bad name because bif:get has no side effect, but is not pure.

Nested cases can led to code such as this: 10: _1 = phi {literal value1, label 8}, {Var, label 9} br 11 11: _2 = phi {_1, label 10}, {literal false, label 3} The phi nodes can be coalesced like this: 11: _2 = phi {literal value1, label 8}, {Var, label 9}, {literal false, label 3} Coalescing can help other optimizations, and can in some cases reduce register shuffling (if the phi variables for two phi nodes happens to be allocated to different registers).

When the argument for a #b_switch{} comes from a phi node with only literal values, the switch list could be pruned to only contain the possible values. It could also be possible to eliminate the failure label. Also simplify a switch with a single value list or switch that can be replaced with an is_boolean test.

This optimization working on the SSA format will replace the similar optimization in beam_dead. See the comment for an explanation of what the new optimization does.

Phi nodes with only literals are fairly common, so it's worthwhile to optimize this case.

Not doing CSE for tuple_size/1 seems to generate slightly better code in most cases.

The floating point optimization relies on heavily on the block order in the lineararized representation. A new optimization could easily break the optimization, for example so that no `fcheckerror` instructions were emitted. Rewrite the optimization to avoid dependencies on the linear block order.

Remove the following clause from the `fun` clauses in arith_op_types/2 because it cannot possibly match: (_, any) -> number; Here is why it cannot match: The second argument is the accumulator for lists:foldl/3. Its initial value is `unknown`. None of the clauses will update the accumulator to `any`, including the clause that matches `any` in the first argument -- it will set the accumulator to `number`. Thus, the accumulator (second argument) can never be `any` and the clause can never match. QED.

In beam_ssa_type, do substitutions similar to what ssa_opt_misc does to get rid of variables that evaluate to constant values. That somewhat simplifies the code of beam_ssa_type, and could improve performance of the compiler since instructions and variables are eliminated, reducing the amount of work for later passes.

Omitting `kill` instructions before BIFs that throw exceptions will reduce the code size. This optimization supersedes the same optimizations in beam_dead.

Add beam_ssa_dead to perform the main optimizations done by beam_dead: * Shortcut branches that jump to another block with a branch. If it can be seen that the second branch will always branch to a specific block, replace the target of the first branch. * Combined nested sequences of '=:=' tests and switch instructions operating on the same variable to a single switch. Diffing the compiler output, it seems that beam_ssa_dead finds many more opportunities for optimizations than beam_dead, although it does not find all opportunities that beam_dead does. In total, beam_ssa_dead is such improvement over beam_dead that there is no reason to keep beam_dead as well as beam_ssa_dead. Note that beam_ssa_dead does not attempt to optimize away redundant bs_context_binary instructions, because that instruction will be superseded by new instructions in the near future.

Most of the optimizations in beam_dead have been superseded by the optimizations in beam_ssa_dead. The forward/1 pass of beam_dead has been moved to beam_jump. The beam_split pass splits blocks that contain instructions with non-zero labels. Because there are no optimizations left that optimize instructions within blocks, beam_block never needs to put such instructions into blocks in the first place. beam_split also moved 'move' instructions out block to help beam_dead. That is no longer necessary since beam_dead no longer exists.

lib/compiler/src/beam_ssa_opt.erl

bjorng added 30 commits September 12, 2018 14:19

beam_ssa_type: Remove repeated clauses in meet/2

48b844b

beam_validator: Handle types for unary '-' or '+'

d96778a

beam_validator: Validate the literals in select_val

0975c1e

beam_validator: Infer more types

e2a939d

When optimizations get more powerful, beam_validator must keep up.

beam_ssa: Add normalize/1

d355182

Add normalize/1 to simplify optimizations.

Optimize 'and' and 'or' instructions

be092fb

Use beam_ssa:normalize/1 in beam_ssa_type

4fa8e38

beam_ssa: Extend linearize/1 to also adjust phi nodes

b21098e

Since beam_ssa:linearize/1 may remove blocks that are unreachable, adjust phi nodes to make sure that they don't refer to discarded blocks or to blocks that no longer branch to the phi node in question.

beam_ssa: Add trim_unreachable/1

5ebd2c4

Add trim_unreachable/1 to remove unreachable blocks and adjust phi nodes.

beam_ssa: Optimize linearize/1 and rpo/2

ecbe132

It is faster to use cerl_sets instead of gb_sets to keep track of seen blocks.

Introduce the beam_jump:instr_labels/1 function

56a6c76

This functionality will soon be needed.

Fix unsafe optimization in beam_dead

71182c8

Those optimizations are unsafe if beam_dead has been run before.

beam_ssa_opt: Add an optimization of tuple_size/1

4edd266

This optimization working on the SSA format will replace the similar optimization in beam_dead. See the comment for an explanation of what the new optimization does.

beam_ssa_opt: Slightly optimize performance of live optimization

b3b195f

Phi nodes with only literals are fairly common, so it's worthwhile to optimize this case.

beam_ssa_opt: Slightly optimize compile-time performance of CSE

b241241

beam_ssa_opt: Don't do CSE for tuple_size/1

f254487

Not doing CSE for tuple_size/1 seems to generate slightly better code in most cases.

beam_ssa_type: Infer types for more instructions and BIFs

1e81c5e

Cover more code in beam_ssa_type

354c64f

beam_ssa_codegen: Don't emit kill instructions before exit BIFs

ec1f35c

Omitting `kill` instructions before BIFs that throw exceptions will reduce the code size. This optimization supersedes the same optimizations in beam_dead.

bjorng added team:VM Assigned to OTP team VM enhancement labels Sep 17, 2018

bjorng self-assigned this Sep 17, 2018

bjorng requested a review from jhogberg September 17, 2018 08:12

jhogberg approved these changes Sep 17, 2018

View reviewed changes

lib/compiler/src/beam_ssa_opt.erl Show resolved Hide resolved

bjorng merged commit 70cb897 into erlang:master Sep 19, 2018

bjorng deleted the bjorn/compiler/beam_ssa_dead branch September 19, 2018 07:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace beam_dead with beam_ssa_dead #1955

Replace beam_dead with beam_ssa_dead #1955

bjorng commented Sep 17, 2018

Replace beam_dead with beam_ssa_dead #1955

Replace beam_dead with beam_ssa_dead #1955

Conversation

bjorng commented Sep 17, 2018