Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

270x and 36000x faster beam_ssa_bool and ssa_opt_sink passes from simple change #5686

Closed
drathier opened this issue Feb 6, 2022 · 7 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@drathier
Copy link

drathier commented Feb 6, 2022

Describe the bug
This is a follow-up on #5140 (comment) namely

The beam_bool and ssa_opt_sink passes are very fast when compiling dump@ps.erl, but slow for full@ps. If you want us to try to speed up those passes, we would need yet another version of your module.

This one-line change to the dump@ps.erl test file from #5140 triggers some pathologic case in both the beam_bool and ssa_opt_sink passes:

codecDump() -> fun (_@151) ->
codecDump() -> fun ({potato,_@151}) ->

The new file is available here: https://gist.github.com/drathier/709f7c5be3de1c1c345dc9fa93227dc2#file-booloptsink-ps-erl

To Reproduce
I ran this using the 24.2 erlc, not any of the fixed ones from #5140. Hence the disabled compiler pass.

~/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-3cf560cf86da1ea4f9b1337e4f333cd787bd731b$ date ; time ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc dump@ps.erl ; date 
Sun Feb  6 17:20:25 CET 2022
Compiling dump@ps
 remove_file                   :      0.004 s       8.8 kB
 parse_module                  :      0.177 s    5464.8 kB
 transform_module              :      0.000 s    5464.8 kB
 lint_module                   :      0.018 s    5465.1 kB
 compile_directives            :      0.000 s    5465.1 kB
 expand_records                :      0.007 s    5465.1 kB
 core                          :     15.987 s   41614.2 kB
 sys_core_fold                 :      1.146 s   37987.0 kB
 sys_core_alias                :      0.016 s   37987.0 kB
 core_transforms               :      0.000 s   37987.0 kB
 sys_core_bsm                  :      0.013 s   37987.0 kB
 v3_kernel                     :      0.234 s   39501.6 kB
 beam_kernel_to_ssa            :      0.129 s   25512.6 kB
 beam_ssa_bool                 :      0.100 s   25512.6 kB
 beam_ssa_share                :      0.030 s   25512.6 kB
 beam_ssa_recv                 :      0.001 s   25512.6 kB
 beam_ssa_bsm                  :      0.113 s   25605.6 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.108 s  98 %
    annotate_context_parameters:      0.001 s   1 %
    combine_matches            :      0.001 s   1 %
    accept_context_args        :      0.000 s   0 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
 beam_ssa_funs                 :      0.039 s   25605.6 kB
 beam_ssa_opt                  :      2.939 s   25682.9 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      1.026 s  37 %
    ssa_opt_live               :      0.649 s  23 %
    ssa_opt_type_continue      :      0.354 s  13 %
    ssa_opt_dead               :      0.227 s   8 %
    ssa_opt_cse                :      0.103 s   4 %
    ssa_opt_merge_blocks       :      0.080 s   3 %
    ssa_opt_tail_phis          :      0.047 s   2 %
    ssa_opt_trim_unreachable   :      0.042 s   2 %
    ssa_opt_linearize          :      0.038 s   1 %
    ssa_opt_element            :      0.036 s   1 %
    ssa_opt_split_blocks       :      0.031 s   1 %
    ssa_opt_record             :      0.030 s   1 %
    ssa_opt_coalesce_phis      :      0.026 s   1 %
    ssa_opt_tail_calls         :      0.026 s   1 %
    ssa_opt_float              :      0.018 s   1 %
    ssa_opt_bsm                :      0.006 s   0 %
    ssa_opt_blockify           :      0.006 s   0 %
    ssa_opt_ne                 :      0.004 s   0 %
    ssa_opt_tuple_size         :      0.004 s   0 %
    ssa_opt_sink               :      0.003 s   0 %
    ssa_opt_get_tuple_element  :      0.003 s   0 %
    ssa_opt_bsm_shortcut       :      0.002 s   0 %
    ssa_opt_sw                 :      0.002 s   0 %
    ssa_opt_bc_size            :      0.002 s   0 %
    ssa_opt_try                :      0.001 s   0 %
    ssa_opt_bs_puts            :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.080 s   25682.9 kB
 beam_ssa_pre_codegen          :     45.037 s   35647.5 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :     30.430 s  68 %
    find_yregs                 :      7.902 s  18 %
    reserve_yregs              :      4.538 s  10 %
    live_intervals             :      0.983 s   2 %
    reserve_regs               :      0.203 s   0 %
    linear_scan                :      0.201 s   0 %
    use_set_tuple_element      :      0.154 s   0 %
    turn_yregs                 :      0.114 s   0 %
    frame_size                 :      0.113 s   0 %
    number_instructions        :      0.087 s   0 %
    sanitize                   :      0.084 s   0 %
    assert_no_critical_edges   :      0.057 s   0 %
    opt_get_list               :      0.050 s   0 %
    copy_retval                :      0.050 s   0 %
    fix_bs                     :      0.041 s   0 %
    match_fail_instructions    :      0.004 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      1.582 s   15231.0 kB
 beam_validator_strong         :      1.545 s   15231.0 kB
 beam_a                        :      0.009 s   15138.0 kB
 beam_block                    :      0.013 s   17779.6 kB
 beam_jump                     :      0.040 s   17779.6 kB
 beam_peep                     :      0.011 s   17779.6 kB
 beam_clean                    :      0.007 s   17779.6 kB
 beam_flatten                  :      0.003 s   15138.0 kB
 beam_z                        :      0.004 s   10328.9 kB
 beam_validator_weak           :      1.529 s   10328.9 kB
 beam_asm                      :      0.072 s       9.2 kB
 save_binary                   :      0.001 s       9.1 kB
dump@ps.erl:3:2: Warning: export_all flag enabled - all functions will be exported
%    3| -compile(export_all).
%     |  ^

ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc dump@ps.erl  66.78s user 2.34s system 95% cpu 1:12.14 total
Sun Feb  6 17:21:37 CET 2022
~/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-3cf560cf86da1ea4f9b1337e4f333cd787bd731b$ date ; time ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc dump2pattern@ps.erl ; date
Sun Feb  6 17:27:13 CET 2022
Compiling dump2pattern@ps
 remove_file                   :      0.000 s       9.3 kB
 parse_module                  :      0.180 s    5465.5 kB
 transform_module              :      0.000 s    5465.5 kB
 lint_module                   :      0.017 s    5465.9 kB
 compile_directives            :      0.000 s    5466.0 kB
 expand_records                :      0.007 s    5466.0 kB
 core                          :     15.958 s   58395.5 kB
 sys_core_fold                 :      0.889 s   53530.8 kB
 sys_core_alias                :      0.030 s   53530.8 kB
 core_transforms               :      0.000 s   53530.8 kB
 sys_core_bsm                  :      0.009 s   53530.8 kB
 v3_kernel                     :      0.214 s   55294.6 kB
 beam_kernel_to_ssa            :      0.143 s   29588.7 kB
 beam_ssa_bool                 :     27.002 s   29588.7 kB
 beam_ssa_share                :      0.023 s   29588.7 kB
 beam_ssa_recv                 :      0.001 s   29588.7 kB
 beam_ssa_bsm                  :      0.105 s   29681.7 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.099 s  98 %
    annotate_context_parameters:      0.001 s   1 %
    combine_matches            :      0.001 s   1 %
    accept_context_args        :      0.000 s   0 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
 beam_ssa_funs                 :      0.049 s   29681.7 kB
 beam_ssa_opt                  :    112.304 s   29757.8 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_sink               :    107.902 s  96 %
    ssa_opt_type_start         :      0.961 s   1 %
    ssa_opt_type_continue      :      0.923 s   1 %
    ssa_opt_live               :      0.909 s   1 %
    ssa_opt_dead               :      0.688 s   1 %
    ssa_opt_cse                :      0.233 s   0 %
    ssa_opt_tail_phis          :      0.120 s   0 %
    ssa_opt_merge_blocks       :      0.096 s   0 %
    ssa_opt_trim_unreachable   :      0.051 s   0 %
    ssa_opt_record             :      0.041 s   0 %
    ssa_opt_tail_calls         :      0.034 s   0 %
    ssa_opt_split_blocks       :      0.033 s   0 %
    ssa_opt_element            :      0.033 s   0 %
    ssa_opt_linearize          :      0.032 s   0 %
    ssa_opt_coalesce_phis      :      0.023 s   0 %
    ssa_opt_float              :      0.018 s   0 %
    ssa_opt_ne                 :      0.009 s   0 %
    ssa_opt_tuple_size         :      0.007 s   0 %
    ssa_opt_blockify           :      0.006 s   0 %
    ssa_opt_bsm                :      0.004 s   0 %
    ssa_opt_try                :      0.004 s   0 %
    ssa_opt_get_tuple_element  :      0.004 s   0 %
    ssa_opt_bs_puts            :      0.002 s   0 %
    ssa_opt_sw                 :      0.002 s   0 %
    ssa_opt_bsm_shortcut       :      0.001 s   0 %
    ssa_opt_bc_size            :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.077 s   29757.8 kB
 beam_ssa_pre_codegen          :     44.902 s   39719.3 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :     30.782 s  69 %
    find_yregs                 :      7.563 s  17 %
    reserve_yregs              :      4.384 s  10 %
    live_intervals             :      0.980 s   2 %
    reserve_regs               :      0.197 s   0 %
    linear_scan                :      0.186 s   0 %
    use_set_tuple_element      :      0.148 s   0 %
    frame_size                 :      0.117 s   0 %
    sanitize                   :      0.116 s   0 %
    number_instructions        :      0.086 s   0 %
    turn_yregs                 :      0.085 s   0 %
    copy_retval                :      0.071 s   0 %
    opt_get_list               :      0.060 s   0 %
    assert_no_critical_edges   :      0.060 s   0 %
    fix_bs                     :      0.035 s   0 %
    match_fail_instructions    :      0.005 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      1.475 s   18807.2 kB
 beam_validator_strong         :      1.594 s   18807.2 kB
 beam_a                        :      0.011 s   18714.2 kB
 beam_block                    :      0.014 s   21355.8 kB
 beam_jump                     :      0.074 s   21355.7 kB
 beam_peep                     :      0.026 s   21355.7 kB
 beam_clean                    :      0.011 s   21355.7 kB
 beam_flatten                  :      0.003 s   18714.1 kB
 beam_z                        :      0.005 s   11580.8 kB
 beam_validator_weak           :      1.650 s   11580.8 kB
 beam_asm                      :      0.082 s       9.8 kB
 save_binary                   :      0.000 s      12.0 kB
/Users/drathier/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-3cf560cf86da1ea4f9b1337e4f333cd787bd731b/dump2pattern@ps.beam: Module name 'dump@ps' does not match file name 'dump2pattern@ps'
dump2pattern@ps.erl:3:2: Warning: export_all flag enabled - all functions will be exported
%    3| -compile(export_all).
%     |  ^

ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc dump2pattern@ps.erl  199.84s user 2.99s system 97% cpu 3:27.81 total
Sun Feb  6 17:30:41 CET 2022

Affected versions
Only tested 24.2

Hope it's fun to debug!

@drathier drathier added the bug Issue is reported as a bug label Feb 6, 2022
@drathier drathier changed the title 270x and 36000x slower beam_ssa_bool and ssa_opt_sink passes from simple change 270x and 36000x faster beam_ssa_bool and ssa_opt_sink passes from simple change Feb 6, 2022
@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Feb 7, 2022
@bjorng
Copy link
Contributor

bjorng commented Feb 7, 2022

I can't reproduce the problem with the master branch. Both beam_bool and ssa_opt_sink are very fast (well under a second each). When I compile with OTP 24.2, I see the same slowdowns as you do.

I suggest that you try to compile your full module with the master branch and see whether the problem remains.

@drathier
Copy link
Author

drathier commented Feb 7, 2022

Okey, will do!

bjorng added a commit to bjorng/otp that referenced this issue Feb 7, 2022
For the example in erlang#5686, the time for running the `place_frames`
sub pass of `beam_ssa_pre_codegen` is reduced from about 30 seconds
to less than 20 seconds on my computer.
bjorng added a commit to bjorng/otp that referenced this issue Feb 7, 2022
For the example in erlang#5686, the time for running the `place_frames`
sub pass of `beam_ssa_pre_codegen` is reduced from about 30 seconds
to less than 20 seconds on my computer, and the time for each of
the `beam_ssa_bool` and `ssa_opt_sink` passes is almost halved.
@bjorng
Copy link
Contributor

bjorng commented Feb 7, 2022

I didn't like the long time for the place_frames pass, so I took another look at the dominator calculation and the result is #5690. It is a simple enough change to include in OTP 24.3.

I turns out that the dominator calculation is a partial explanation for the slow beam_ssa_bool and ssa_opt_sink passes.

@drathier
Copy link
Author

drathier commented Feb 8, 2022

I can reproduce this with a clean build on mac with your latest changes:

~/drathier/otp$ g log -n 1                                         19:53:37.752
commit e1a64609ca0ff5750baed7918bfd4fe2ae6f5567 (HEAD -> bjorn/compiler/optimize-dominators, bjorn/bjorn/compiler/optimize-dominators)
Author: Björn Gustavsson <bjorn@erlang.org>
Date:   Mon Feb 7 14:17:31 2022 +0100

    Update primary bootstrap

You can see that place_frames is faster now.

With the old test file, beam_ssa_bool and ssa_opt_sink are fast.

~/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-999c038c12ee94ee20f5041be919ebfcbfcd2a73$ date ; time ERL_COMPILER_OPTIONS="[time]" ~/drathier/otp/bin/erlc dump@ps.erl; date                                             19:32:38.876
Tue Feb  8 19:32:50 CET 2022
Compiling dump@ps
 remove_file                   :      0.000 s       8.7 kB
 parse_module                  :      0.184 s    5464.7 kB
 transform_module              :      0.000 s    5464.7 kB
 lint_module                   :      0.015 s    5465.0 kB
 compile_directives            :      0.000 s    5465.1 kB
 expand_records                :      0.008 s    5465.1 kB
 core                          :     15.796 s   41614.2 kB
 sys_core_fold                 :      1.117 s   37987.0 kB
 sys_core_alias                :      0.015 s   37987.0 kB
 core_transforms               :      0.000 s   37987.0 kB
 sys_core_bsm                  :      0.007 s   37987.0 kB
 v3_kernel                     :      0.217 s   39501.6 kB
 beam_kernel_to_ssa            :      0.090 s   25512.5 kB
 beam_ssa_bool                 :      0.100 s   25512.5 kB
 beam_ssa_share                :      0.030 s   25512.5 kB
 beam_ssa_recv                 :      0.001 s   25512.5 kB
 beam_ssa_bsm                  :      0.107 s   25605.5 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.102 s  98 %
    annotate_context_parameters:      0.001 s   1 %
    combine_matches            :      0.001 s   1 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
    accept_context_args        :      0.000 s   0 %
 beam_ssa_funs                 :      0.042 s   25605.5 kB
 beam_ssa_opt                  :      2.648 s   25682.9 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      0.951 s  38 %
    ssa_opt_live               :      0.687 s  27 %
    ssa_opt_type_continue      :      0.265 s  11 %
    ssa_opt_dead               :      0.189 s   8 %
    ssa_opt_cse                :      0.087 s   3 %
    ssa_opt_merge_blocks       :      0.067 s   3 %
    ssa_opt_tail_phis          :      0.037 s   1 %
    ssa_opt_element            :      0.036 s   1 %
    ssa_opt_trim_unreachable   :      0.034 s   1 %
    ssa_opt_split_blocks       :      0.027 s   1 %
    ssa_opt_linearize          :      0.027 s   1 %
    ssa_opt_record             :      0.024 s   1 %
    ssa_opt_coalesce_phis      :      0.022 s   1 %
    ssa_opt_tail_calls         :      0.019 s   1 %
    ssa_opt_float              :      0.010 s   0 %
    ssa_opt_bsm                :      0.004 s   0 %
    ssa_opt_blockify           :      0.004 s   0 %
    ssa_opt_sink               :      0.004 s   0 %
    ssa_opt_ne                 :      0.004 s   0 %
    ssa_opt_tuple_size         :      0.003 s   0 %
    ssa_opt_get_tuple_element  :      0.002 s   0 %
    ssa_opt_bs_puts            :      0.002 s   0 %
    ssa_opt_bsm_shortcut       :      0.001 s   0 %
    ssa_opt_sw                 :      0.001 s   0 %
    ssa_opt_try                :      0.001 s   0 %
    ssa_opt_bc_size            :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.069 s   25682.9 kB
 beam_ssa_pre_codegen          :     10.526 s   35647.2 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :      4.427 s  42 %
    reserve_yregs              :      4.312 s  41 %
    live_intervals             :      0.648 s   6 %
    linear_scan                :      0.172 s   2 %
    reserve_regs               :      0.158 s   2 %
    find_yregs                 :      0.150 s   1 %
    frame_size                 :      0.104 s   1 %
    turn_yregs                 :      0.101 s   1 %
    copy_retval                :      0.090 s   1 %
    use_set_tuple_element      :      0.080 s   1 %
    number_instructions        :      0.077 s   1 %
    sanitize                   :      0.073 s   1 %
    assert_no_critical_edges   :      0.045 s   0 %
    opt_get_list               :      0.044 s   0 %
    fix_bs                     :      0.023 s   0 %
    match_fail_instructions    :      0.003 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      1.531 s   15231.0 kB
 beam_validator_strong         :      1.728 s   15231.0 kB
 beam_a                        :      0.008 s   15138.0 kB
 beam_block                    :      0.012 s   17779.6 kB
 beam_jump                     :      0.049 s   17779.6 kB
 beam_peep                     :      0.010 s   17779.6 kB
 beam_clean                    :      0.015 s   17779.6 kB
 beam_trim                     :    398.711 s   17774.0 kB
 beam_flatten                  :      0.004 s   15132.4 kB
 beam_z                        :      0.009 s   10323.3 kB
 beam_validator_weak           :      1.703 s   10323.3 kB
 beam_asm                      :      0.074 s       9.1 kB
 save_binary                   :      0.002 s       9.1 kB
dump@ps.erl:3:2: Warning: export_all flag enabled - all functions will be exported
%    3| -compile(export_all).
%     |  ^

ERL_COMPILER_OPTIONS="[time]" ~/drathier/otp/bin/erlc dump@ps.erl  360.17s user 21.06s system 87% cpu 7:15.54 total
Tue Feb  8 19:40:05 CET 2022

And with the new test file, beam_ssa_bool takes 16s and ssa_opt_sink takes 68s:

~/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-999c038c12ee94ee20f5041be919ebfcbfcd2a73$ date ; time ERL_COMPILER_OPTIONS="[time]" ~/drathier/otp/bin/erlc booloptsink@ps.erl; date                                      19:40:05.910
Tue Feb  8 19:41:00 CET 2022
Compiling booloptsink@ps
 remove_file                   :      0.000 s       9.2 kB
 parse_module                  :      0.177 s    5465.4 kB
 transform_module              :      0.000 s    5465.4 kB
 lint_module                   :      0.017 s    5465.8 kB
 compile_directives            :      0.000 s    5465.9 kB
 expand_records                :      0.007 s    5465.9 kB
 core                          :     15.779 s   56297.9 kB
 sys_core_fold                 :      0.937 s   51588.2 kB
 sys_core_alias                :      0.032 s   51588.2 kB
 core_transforms               :      0.000 s   51588.2 kB
 sys_core_bsm                  :      0.008 s   51588.2 kB
 v3_kernel                     :      0.257 s   53320.9 kB
 beam_kernel_to_ssa            :      0.108 s   29079.8 kB
 beam_ssa_bool                 :     15.996 s   29079.8 kB
 beam_ssa_share                :      0.062 s   29079.8 kB
 beam_ssa_recv                 :      0.002 s   29079.8 kB
 beam_ssa_bsm                  :      0.126 s   29172.8 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.117 s  98 %
    combine_matches            :      0.001 s   1 %
    annotate_context_parameters:      0.001 s   1 %
    accept_context_args        :      0.001 s   0 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
 beam_ssa_funs                 :      0.050 s   29172.8 kB
 beam_ssa_opt                  :     73.603 s   29248.9 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_sink               :     68.555 s  93 %
    ssa_opt_type_continue      :      1.127 s   2 %
    ssa_opt_live               :      1.090 s   1 %
    ssa_opt_type_start         :      1.054 s   1 %
    ssa_opt_dead               :      0.772 s   1 %
    ssa_opt_cse                :      0.215 s   0 %
    ssa_opt_tail_phis          :      0.208 s   0 %
    ssa_opt_merge_blocks       :      0.085 s   0 %
    ssa_opt_trim_unreachable   :      0.050 s   0 %
    ssa_opt_record             :      0.049 s   0 %
    ssa_opt_linearize          :      0.039 s   0 %
    ssa_opt_element            :      0.038 s   0 %
    ssa_opt_split_blocks       :      0.035 s   0 %
    ssa_opt_tail_calls         :      0.030 s   0 %
    ssa_opt_coalesce_phis      :      0.029 s   0 %
    ssa_opt_float              :      0.014 s   0 %
    ssa_opt_ne                 :      0.012 s   0 %
    ssa_opt_tuple_size         :      0.007 s   0 %
    ssa_opt_sw                 :      0.007 s   0 %
    ssa_opt_bsm                :      0.006 s   0 %
    ssa_opt_blockify           :      0.006 s   0 %
    ssa_opt_try                :      0.004 s   0 %
    ssa_opt_get_tuple_element  :      0.003 s   0 %
    ssa_opt_bs_puts            :      0.003 s   0 %
    ssa_opt_bc_size            :      0.002 s   0 %
    ssa_opt_bsm_shortcut       :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.089 s   29248.9 kB
 beam_ssa_pre_codegen          :     27.237 s   39211.5 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :     20.941 s  77 %
    reserve_yregs              :      4.185 s  15 %
    live_intervals             :      0.669 s   2 %
    linear_scan                :      0.217 s   1 %
    find_yregs                 :      0.205 s   1 %
    reserve_regs               :      0.188 s   1 %
    frame_size                 :      0.152 s   1 %
    use_set_tuple_element      :      0.138 s   1 %
    turn_yregs                 :      0.136 s   0 %
    sanitize                   :      0.116 s   0 %
    number_instructions        :      0.072 s   0 %
    assert_no_critical_edges   :      0.060 s   0 %
    copy_retval                :      0.053 s   0 %
    opt_get_list               :      0.044 s   0 %
    fix_bs                     :      0.034 s   0 %
    match_fail_instructions    :      0.006 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      1.549 s   18360.2 kB
 beam_validator_strong         :      1.545 s   18360.2 kB
 beam_a                        :      0.010 s   18267.2 kB
 beam_block                    :      0.012 s   20908.8 kB
 beam_jump                     :      0.071 s   20908.7 kB
 beam_peep                     :      0.009 s   20908.7 kB
 beam_clean                    :      0.010 s   20908.7 kB
 beam_trim                     :    382.341 s   20903.1 kB
 beam_flatten                  :      0.008 s   18261.5 kB
 beam_z                        :      0.008 s   11418.7 kB
 beam_validator_weak           :      1.384 s   11418.7 kB
 beam_asm                      :      0.056 s       9.7 kB
 save_binary                   :      0.000 s      11.8 kB
/Users/drathier/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-999c038c12ee94ee20f5041be919ebfcbfcd2a73/booloptsink@ps.beam: Module name 'dump@ps' does not match file name 'booloptsink@ps'
booloptsink@ps.erl:3:2: Warning: export_all flag enabled - all functions will be exported
%    3| -compile(export_all).
%     |  ^

ERL_COMPILER_OPTIONS="[time]" ~/drathier/otp/bin/erlc booloptsink@ps.erl  451.65s user 15.10s system 89% cpu 8:42.34 total
Tue Feb  8 19:49:42 CET 2022

I doubt it's mac-specific though. Could you give it another try? Both files are in the gist, maybe you didn't run both files?

@bjorng
Copy link
Contributor

bjorng commented Feb 9, 2022

I did use the newest file, the one with the potato matching in the fun head.

Commit e1a6460 is based on maint, not master, because it is a simple enough change that can be safely included in OTP 24.3. (I have now merged bjorn/compiler/optimize-dominators to maint and master.)

I get times very similar to yours when I compile using maint. When I compile using master, some of the other changes we have made to the compiler for some reason makes beam_ssa_bool and ssa_opt_sink fast.

@drathier
Copy link
Author

drathier commented Feb 9, 2022

I got it now. Must've mixed up maint and main (==master) in my head. Happy to confirm that this issue has already been fixed, as you said!

These are the timings for the original file that triggered this performance problem in #5140:

$ date ; time ERL_COMPILER_OPTIONS="[time]" ~/drathier/otp/bin/erlc output/full@ps.erl; date 
 
Wed Feb  9 10:16:55 CET 2022
Compiling output/full@ps.erl
 remove_file                   :      0.000 s       7.2 kB
 parse_module                  :      0.400 s   13530.4 kB
 transform_module              :      0.000 s   13530.4 kB
 lint_module                   :      0.045 s   13530.4 kB
 compile_directives            :      0.000 s   13530.5 kB
 expand_records                :      0.019 s   13530.5 kB
 core                          :     24.514 s  422755.4 kB
 sys_core_fold                 :      2.749 s  385366.7 kB
 sys_core_alias                :      0.056 s  385366.7 kB
 core_transforms               :      0.000 s  385366.7 kB
 sys_core_bsm                  :      0.029 s  385366.7 kB
 v3_kernel                     :      0.688 s  393307.4 kB
 beam_kernel_to_ssa            :      0.257 s  134582.5 kB
 beam_ssa_bool                 :      0.549 s  134578.4 kB
 beam_ssa_share                :      0.051 s  134578.4 kB
 beam_ssa_recv                 :      0.003 s  134578.4 kB
 beam_ssa_bsm                  :      0.256 s  134773.2 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.242 s  98 %
    combine_matches            :      0.002 s   1 %
    annotate_context_parameters:      0.001 s   1 %
    accept_context_args        :      0.001 s   0 %
    skip_outgoing_tail_extracti:      0.001 s   0 %
 beam_ssa_opt                  :     10.865 s  135966.2 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      4.820 s  46 %
    ssa_opt_live               :      1.529 s  15 %
    ssa_opt_type_continue      :      0.951 s   9 %
    ssa_opt_dead               :      0.810 s   8 %
    ssa_opt_sink               :      0.742 s   7 %
    ssa_opt_cse                :      0.315 s   3 %
    ssa_opt_merge_blocks       :      0.170 s   2 %
    ssa_opt_tail_phis          :      0.169 s   2 %
    ssa_opt_ranges             :      0.162 s   2 %
    ssa_opt_trim_unreachable   :      0.118 s   1 %
    ssa_opt_element            :      0.101 s   1 %
    ssa_opt_linearize          :      0.096 s   1 %
    ssa_opt_redundant_br       :      0.087 s   1 %
    ssa_opt_split_blocks       :      0.084 s   1 %
    ssa_opt_record             :      0.077 s   1 %
    ssa_opt_coalesce_phis      :      0.068 s   1 %
    ssa_opt_tail_literals      :      0.061 s   1 %
    ssa_opt_float              :      0.027 s   0 %
    ssa_opt_try                :      0.022 s   0 %
    ssa_opt_bsm                :      0.014 s   0 %
    ssa_opt_blockify           :      0.014 s   0 %
    ssa_opt_tuple_size         :      0.013 s   0 %
    ssa_opt_bs_create_bin      :      0.010 s   0 %
    ssa_opt_ne                 :      0.010 s   0 %
    ssa_opt_get_tuple_element  :      0.007 s   0 %
    ssa_opt_bc_size            :      0.004 s   0 %
    ssa_opt_bsm_shortcut       :      0.002 s   0 %
    ssa_opt_sw                 :      0.002 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.179 s  135966.2 kB
 beam_ssa_pre_codegen          :     40.797 s  159384.9 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :     25.852 s  63 %
    reserve_yregs              :      9.932 s  24 %
    live_intervals             :      1.929 s   5 %
    linear_scan                :      0.472 s   1 %
    reserve_regs               :      0.386 s   1 %
    find_yregs                 :      0.378 s   1 %
    frame_size                 :      0.347 s   1 %
    use_set_tuple_element      :      0.290 s   1 %
    turn_yregs                 :      0.280 s   1 %
    sanitize                   :      0.239 s   1 %
    number_instructions        :      0.192 s   0 %
    assert_no_critical_edges   :      0.137 s   0 %
    copy_retval                :      0.113 s   0 %
    opt_get_list               :      0.081 s   0 %
    fix_bs                     :      0.071 s   0 %
    fix_receives               :      0.012 s   0 %
    expand_match_fail          :      0.006 s   0 %
 beam_ssa_codegen              :      3.361 s  102391.7 kB
 beam_validator_strong         :      4.538 s  102391.7 kB
 beam_a                        :      0.028 s  102159.4 kB
 beam_block                    :      0.116 s  108252.8 kB
 beam_jump                     :      0.178 s  108240.6 kB
 beam_clean                    :      0.019 s  108240.6 kB
 beam_trim                     :      3.265 s  108227.5 kB
 beam_flatten                  :      0.029 s  102136.7 kB
 beam_z                        :      0.047 s   48939.0 kB
 beam_validator_weak           :      3.759 s   48939.0 kB
 beam_asm                      :      0.204 s       7.3 kB
 save_binary                   :      0.002 s       7.2 kB
ERL_COMPILER_OPTIONS="[time]" ~/drathier/otp/bin/erlc   88.78s user 4.89s system 94% cpu 1:39.47 total
Wed Feb  9 10:18:34 CET 2022

This is a massive improvement over what we had back in August. Big thanks!

@drathier drathier closed this as completed Feb 9, 2022
@bjorng
Copy link
Contributor

bjorng commented Feb 9, 2022

Thanks for bug reports. They helped us improve the compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

4 participants