New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite BSM optimizations in the new SSA-based intermediate format #1958

Merged
merged 9 commits into from Sep 28, 2018

Conversation

Projects
None yet
3 participants
@jhogberg
Contributor

jhogberg commented Sep 19, 2018

This PR improves the bit-syntax match optimization pass, leveraging the new SSA intermediate format to perform much more aggressive optimizations.

While mostly done this is still a work in progress, and I'm posting it a bit early because I need help adding HiPE support for a few new instructions. @kostis, @margnus1: could you lend me a hand?

  • bs_start_match3/4 : As bs_start_match2, but without "position slots."
  • bs_get_position/3 : Stores the current position in a register as an integer.
  • bs_set_position/2 : Restores position from a register.
  • bs_get_tail/3 : Returns the tail of a context at its current position, similar to bs_context_to_binary but non-destructive.

Some highlights:

Sub-binary creation can now be delayed within function bodies

Sub-binaries aren't created until they're actually used as binaries, making it significantly easier to write optimized happy paths. The following code used to disable reuse altogether:

    delayed(<<A, Rest/binary>>) ->
        case bar(A) of
            %% Match context will be reused here.
            true -> delayed(Rest);
            %% Sub-binary for 'Rest' will be created here.
            false -> Rest
        end.

This also applies when an alias is taken for the entire match:

    aliased(<<A, Rest/binary>>=Alias) ->
        case bar(A) of
            %% Match context will be reused here.
            true -> aliased(Rest);
            %% Sub-binary for 'Alias' will be created here.
            false -> Alias
        end.

The cost of backtracking is significantly reduced

Match contexts only keep track of their current position, and backtrack positions are now stored in plain registers instead of "slots" on the context. This means that passing them to another function or using them near a try block will no longer disable reuse, improving performance for modules that do a lot of backtracking.

    %% Example taken from the uri_string module
    parse_hier(<<"//"/utf8, Rest/binary>>), URI) ->
        try parse_userinfo(Rest, URI) of
            {T, URI1} ->
                %% Match context will be reused here.
                Userinfo = calculate_parsed_userinfo(Rest, T),
                %% Sub-binary will be created here.
                {Rest, URI1#{userinfo => Userinfo}}
        catch
            throw:{_,_,_} ->
                %% Match context will be reused here
                {T, URI1} = parse_host(Rest, URI),
                Host = calculate_parsed_host_port(Rest, T),
                %% Sub-binary will be created here.
                {Rest, URI1#{host => remove_brackets(Host)}}
        end;

As an added benefit this also avoids creating a new match context when a function needs more slots than the context it was given, which was a very surprising (but thankfully uncommon) performance sink.

Improved context reuse within function bodies

All matches operating on the same binary are combined into a single match sequence, greatly reducing the number of match contexts created.

    %% Example taken from the string module
    bin_search_str_1(Bin0, Start, First, SearchCPs) ->
        <<_:Start/binary, Bin/binary>> = Bin0,
        case binary:match(Bin, First) of
            nomatch -> {nomatch, byte_size(Bin0), []};
            {Where0, _} ->
                Where = Start+Where0,
                %% We already started a match on Bin0, so its context
                %% gets reused
                <<Keep:Where/binary, Cs0/binary>> = Bin0,
                case prefix_1(Cs0, SearchCPs) of
                    nomatch ->
                        %% This is a continuation on the match on Bin0, so we
                        %% reuse it here too. Note that this happens even though
                        %% it was passed to prefix_1 above.
                        <<_/utf8, Cs/binary>> = Cs0,
                        KeepSz = byte_size(Bin0) - byte_size(Cs),
                        %% ... and of course it's reused here as well.
                        bin_search_str_1(Bin0, KeepSz, First, SearchCPs);
                    [] ->
                        {Keep, [Cs0], <<>>};
                    Rest ->
                        {Keep, [Cs0], Rest}
                end
        end.

Wrappers no longer disable context reuse

Wrapper functions used to disable reuse in even the simplest cases, but no longer:

    example(<<0,1,2,Rest/binary>>) ->
        %% Match context will be reused here.
        sum_bytes(Rest).

    sum_bytes(Bin) ->
        sum_bytes_1(Bin, 0).
    sum_bytes_1(<<N, Rest/binary>>, Acc) ->
        sum_bytes_1(Rest, Acc + N);
    sum_bytes_1(<<>>, Acc) ->
        Acc.

@jhogberg jhogberg self-assigned this Sep 19, 2018

jhogberg added some commits May 8, 2018

Support using match contexts from Y registers
The upcoming beam_ssa_bsm pass allows match contexts to be used
across function calls that take said context as an argument, which
means it's fairly common for them to end up in Y registers.
Remove match context reuse annotations from core/kernel passes
The upcoming beam_ssa_bsm pass makes this redundant.
beam_ssa: Add helper functions and export more types
get_anno/3: as get_anno but with a default value
definitions/1-2: returns a map of variable definitions (#b_set{})
uses/1-2: returns a map of all uses of a given variable
mapfold_blocks_rpo/4: mapfolds over blocks
@jhogberg

This comment has been minimized.

Show comment
Hide comment
@jhogberg

jhogberg Sep 26, 2018

Contributor

I've fixed a performance regression on 32-bit platforms and cleaned up a few cosmetic issues. It doesn't look like there's much left to do so I'll merge it on Friday if nothing new pops up. HiPE is still not fixed but there's plenty of time until OTP 22.

I've also got some benchmarks now. The old pass did a pretty good job and a lot of our code was written with its limitations in mind (hurting readability in some cases), so most of the benchmarks are unchanged, but the ones I set out to improve have seen substantial gains.

The blue line is this PR and the green line is maint:

image
image

Contributor

jhogberg commented Sep 26, 2018

I've fixed a performance regression on 32-bit platforms and cleaned up a few cosmetic issues. It doesn't look like there's much left to do so I'll merge it on Friday if nothing new pops up. HiPE is still not fixed but there's plenty of time until OTP 22.

I've also got some benchmarks now. The old pass did a pretty good job and a lot of our code was written with its limitations in mind (hurting readability in some cases), so most of the benchmarks are unchanged, but the ones I set out to improve have seen substantial gains.

The blue line is this PR and the green line is maint:

image
image

@kostis

This comment has been minimized.

Show comment
Hide comment
@kostis

kostis Sep 26, 2018

Contributor

I would have assumed that there would be a line also for master, in order to also see the baseline of this PR -- and the performance improvement it offers.

Also, since presumably you have an easy way to create these graphs, perhaps it would be a good idea to include HiPE on maint (and maybe on master if it works on these two programs) in order to have points of reference for that too.

It's a super-busy period right now, but today we briefly talked with @margnus1 about at least providing some guidance to what parts of HiPE need to be changed and how. Let's see whether we will be able to do this next week...

Contributor

kostis commented Sep 26, 2018

I would have assumed that there would be a line also for master, in order to also see the baseline of this PR -- and the performance improvement it offers.

Also, since presumably you have an easy way to create these graphs, perhaps it would be a good idea to include HiPE on maint (and maybe on master if it works on these two programs) in order to have points of reference for that too.

It's a super-busy period right now, but today we briefly talked with @margnus1 about at least providing some guidance to what parts of HiPE need to be changed and how. Let's see whether we will be able to do this next week...

@jhogberg

This comment has been minimized.

Show comment
Hide comment
@jhogberg

jhogberg Sep 27, 2018

Contributor

I would have assumed that there would be a line also for master, in order to also see the baseline of this PR -- and the performance improvement it offers.

Also, since presumably you have an easy way to create these graphs, perhaps it would be a good idea to include HiPE on maint (and maybe on master if it works on these two programs) in order to have points of reference for that too.

The graphs were taken from the nightly builds and are of maint+PRs and master+PRs, there were no other PRs in master that touched related parts so I simplified the blue line as "this PR." I've included historical benchmarks below and it's pretty easy to spot when the PR was included:

image
image

You can also see that master was a lot slower than maint. This was mainly caused by the SSA rewrite changing how the beam code looks which confused the old pass a bit. Practically all benchmarks related to binary matching have been improved by this PR compared to master, but compared to maint it's roughly unchanged aside from the above benchmarks.

This is a bit unfortunate. Your code no longer has to be arcane to be fast (or at least less arcane), but our benchmarks won't show that because they tend to look at things we've written with the old pass in mind.

It's a super-busy period right now, but today we briefly talked with @margnus1 about at least providing some guidance to what parts of HiPE need to be changed and how. Let's see whether we will be able to do this next week...

Sounds good.

Contributor

jhogberg commented Sep 27, 2018

I would have assumed that there would be a line also for master, in order to also see the baseline of this PR -- and the performance improvement it offers.

Also, since presumably you have an easy way to create these graphs, perhaps it would be a good idea to include HiPE on maint (and maybe on master if it works on these two programs) in order to have points of reference for that too.

The graphs were taken from the nightly builds and are of maint+PRs and master+PRs, there were no other PRs in master that touched related parts so I simplified the blue line as "this PR." I've included historical benchmarks below and it's pretty easy to spot when the PR was included:

image
image

You can also see that master was a lot slower than maint. This was mainly caused by the SSA rewrite changing how the beam code looks which confused the old pass a bit. Practically all benchmarks related to binary matching have been improved by this PR compared to master, but compared to maint it's roughly unchanged aside from the above benchmarks.

This is a bit unfortunate. Your code no longer has to be arcane to be fast (or at least less arcane), but our benchmarks won't show that because they tend to look at things we've written with the old pass in mind.

It's a super-busy period right now, but today we briefly talked with @margnus1 about at least providing some guidance to what parts of HiPE need to be changed and how. Let's see whether we will be able to do this next week...

Sounds good.

jhogberg and others added some commits May 8, 2018

Rewrite BSM optimizations in the new SSA-based intermediate format
This commit improves the bit-syntax match optimization pass,
leveraging the new SSA intermediate format to perform much more
aggressive optimizations. Some highlights:

* Watch contexts can be reused even after being passed to a
  function or being used in a try block.
* Sub-binaries are no longer eagerly extracted, making it far
  easier to keep "happy paths" free from binary creation.
* Trivial wrapper functions no longer disable context reuse.
beam_ssa_opt: Eliminate redundant match alignment tests
The beam_ssa_bsm pass welds chained matches together, but the match
expressions themselves are unchanged and if there's a tail
alignment check it will be done each time. This subpass figures out
the checks we've already done and deletes the redundant ones.
beam_ssa_pre_codegen: Remove unused variable aliasing support
Remove the variable aliasing support that was needed for the
old beam_bsm pass.
Remove unused instruction bs_context_to_binary from the compiler
This has been superseded by bs_get_tail/3. Note that it is NOT
removed from the emulator or beam_disasm, as old modules are still
legal.

@jhogberg jhogberg merged commit 7d941c5 into erlang:master Sep 28, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
license/cla Contributor License Agreement is signed.
Details
@kostis

This comment has been minimized.

Show comment
Hide comment
@kostis

kostis Sep 29, 2018

Contributor

Why where these tests deleted instead of testing them with the no_bsm3 flag as the comment suggests? So that there are tests for that flag also - at least for the time being. Or are these tests somewhere else?

Contributor

kostis commented on afa36d2 Sep 29, 2018

Why where these tests deleted instead of testing them with the no_bsm3 flag as the comment suggests? So that there are tests for that flag also - at least for the time being. Or are these tests somewhere else?

This comment has been minimized.

Show comment
Hide comment
@bjorng

bjorng Oct 1, 2018

Contributor

no_bsm3 is tested in the compiler test suite.

f9dcf72 clones bs_match_SUITE to bs_match_r21_SUITE and compiles it with the r21 option (which implies no_bsm3).

Contributor

bjorng replied Oct 1, 2018

no_bsm3 is tested in the compiler test suite.

f9dcf72 clones bs_match_SUITE to bs_match_r21_SUITE and compiles it with the r21 option (which implies no_bsm3).

This comment has been minimized.

Show comment
Hide comment
@kostis

kostis Oct 1, 2018

Contributor

Are you running the tests in the compiler test suite also with HiPE?

My comment has to do with the fact that this commit simply disabled all bs_tests from the hipe test suite -- and also removed one more test from another dir -- for no good reason. Roughly one third of all hipe tests have been disabled by this commit and we have no idea whether there are any regressions. A better alternative would have been to add a no_bsm3 (or r21) compiler option to the bs_SUITE hipe tests and still have them tested with native code compilation instead of simply disabling them.

I suggest that this commit is reverted and a change along the lines suggested above gets added for the time being. In general, removing tests from test suites is not a good idea.

Contributor

kostis replied Oct 1, 2018

Are you running the tests in the compiler test suite also with HiPE?

My comment has to do with the fact that this commit simply disabled all bs_tests from the hipe test suite -- and also removed one more test from another dir -- for no good reason. Roughly one third of all hipe tests have been disabled by this commit and we have no idea whether there are any regressions. A better alternative would have been to add a no_bsm3 (or r21) compiler option to the bs_SUITE hipe tests and still have them tested with native code compilation instead of simply disabling them.

I suggest that this commit is reverted and a change along the lines suggested above gets added for the time being. In general, removing tests from test suites is not a good idea.

@jhogberg

This comment has been minimized.

Show comment
Hide comment
@jhogberg

jhogberg Oct 1, 2018

Contributor

I removed the tests because we already know that they fail hard and we don't want to litter our test result page with them. git rm was the simplest way to make sure they're completely out of the picture.

This is a temporary measure and we always planned to revert it in the branch that reintroduces support, I didn't mention it in the commit message because I figured it was obvious.

In either case I hope this won't remain a problem for long, the faster I get help the faster this gets resolved.

Contributor

jhogberg commented Oct 1, 2018

I removed the tests because we already know that they fail hard and we don't want to litter our test result page with them. git rm was the simplest way to make sure they're completely out of the picture.

This is a temporary measure and we always planned to revert it in the branch that reintroduces support, I didn't mention it in the commit message because I figured it was obvious.

In either case I hope this won't remain a problem for long, the faster I get help the faster this gets resolved.

@kostis

This comment has been minimized.

Show comment
Hide comment
@kostis

kostis Oct 1, 2018

Contributor

Please try to understand what I am complaining about.

The commit that removed these tests (afa36d2) is actually the same that suggested to use the option no_bsm3 to run programs that contain binary matching in native code. How do we know/ensure that this option works (and will continue to work)? Surely not by git removing all tests that exercise the native code compiler on programs containing binary matching, even if that "is the simplest way to do things."

So, independently of whether the HiPE compiler is fixed to work for the new instructions, there needs to be a way to test that the no_bsm3 / r21 path to generating native code works. I kindly ask you to revert this commit now and restore these tests with a no_bsm3 compiler option for them.

Contributor

kostis commented Oct 1, 2018

Please try to understand what I am complaining about.

The commit that removed these tests (afa36d2) is actually the same that suggested to use the option no_bsm3 to run programs that contain binary matching in native code. How do we know/ensure that this option works (and will continue to work)? Surely not by git removing all tests that exercise the native code compiler on programs containing binary matching, even if that "is the simplest way to do things."

So, independently of whether the HiPE compiler is fixed to work for the new instructions, there needs to be a way to test that the no_bsm3 / r21 path to generating native code works. I kindly ask you to revert this commit now and restore these tests with a no_bsm3 compiler option for them.

@bjorng

This comment has been minimized.

Show comment
Hide comment
@bjorng

bjorng Oct 1, 2018

Contributor

The r21 and no_bsm3 options are totally unsupported. The only reason for them being there in the first place is for some test suites that test backward compatibility to previous releases. It was our intention that the single mention of no_bsm3 should be reverted well before the release of OTP 22. The change to the documentation was only intended for the heroes that test out the master branch before the release of OTP 22.

Therefore, we have no interest that the hipe compiler in OTP 22 can handle code that has been compiled with no_bsm3. Unless you have a good reason otherwise, I suggest that you remove the support for the old binary matching instructions in master and only implement support for the new ones.

Also, note that there is at least one test case in hipe_SUITE that will crash the emulator if the old binary optimizations are disabled. (The one I mailed about earlier; see https://gist.github.com/bjorng/3666b4133b0d242ddcf1aa0626a67178.) There could be even more problematic test cases, because no_bsm3 totally disables the old binary match optimizations. So if we reverted the commit now, we would have to add an additional commit to disable at least one test case, and we would be testing a configuration that is never intended to be supported or released in OTP 22. I don't really see the point of doing that.

My suggestion (unless you have very good reasons otherwise), is that you revert the commit yourself as the first thing in your branch that implements the support for the new binary matching instructions.

Contributor

bjorng commented Oct 1, 2018

The r21 and no_bsm3 options are totally unsupported. The only reason for them being there in the first place is for some test suites that test backward compatibility to previous releases. It was our intention that the single mention of no_bsm3 should be reverted well before the release of OTP 22. The change to the documentation was only intended for the heroes that test out the master branch before the release of OTP 22.

Therefore, we have no interest that the hipe compiler in OTP 22 can handle code that has been compiled with no_bsm3. Unless you have a good reason otherwise, I suggest that you remove the support for the old binary matching instructions in master and only implement support for the new ones.

Also, note that there is at least one test case in hipe_SUITE that will crash the emulator if the old binary optimizations are disabled. (The one I mailed about earlier; see https://gist.github.com/bjorng/3666b4133b0d242ddcf1aa0626a67178.) There could be even more problematic test cases, because no_bsm3 totally disables the old binary match optimizations. So if we reverted the commit now, we would have to add an additional commit to disable at least one test case, and we would be testing a configuration that is never intended to be supported or released in OTP 22. I don't really see the point of doing that.

My suggestion (unless you have very good reasons otherwise), is that you revert the commit yourself as the first thing in your branch that implements the support for the new binary matching instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment