Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite BSM optimizations in the new SSA-based intermediate format #1958

Merged
merged 9 commits into from
Sep 28, 2018

Conversation

jhogberg
Copy link
Contributor

This PR improves the bit-syntax match optimization pass, leveraging the new SSA intermediate format to perform much more aggressive optimizations.

While mostly done this is still a work in progress, and I'm posting it a bit early because I need help adding HiPE support for a few new instructions. @kostis, @margnus1: could you lend me a hand?

  • bs_start_match3/4 : As bs_start_match2, but without "position slots."
  • bs_get_position/3 : Stores the current position in a register as an integer.
  • bs_set_position/2 : Restores position from a register.
  • bs_get_tail/3 : Returns the tail of a context at its current position, similar to bs_context_to_binary but non-destructive.

Some highlights:

Sub-binary creation can now be delayed within function bodies

Sub-binaries aren't created until they're actually used as binaries, making it significantly easier to write optimized happy paths. The following code used to disable reuse altogether:

    delayed(<<A, Rest/binary>>) ->
        case bar(A) of
            %% Match context will be reused here.
            true -> delayed(Rest);
            %% Sub-binary for 'Rest' will be created here.
            false -> Rest
        end.

This also applies when an alias is taken for the entire match:

    aliased(<<A, Rest/binary>>=Alias) ->
        case bar(A) of
            %% Match context will be reused here.
            true -> aliased(Rest);
            %% Sub-binary for 'Alias' will be created here.
            false -> Alias
        end.

The cost of backtracking is significantly reduced

Match contexts only keep track of their current position, and backtrack positions are now stored in plain registers instead of "slots" on the context. This means that passing them to another function or using them near a try block will no longer disable reuse, improving performance for modules that do a lot of backtracking.

    %% Example taken from the uri_string module
    parse_hier(<<"//"/utf8, Rest/binary>>), URI) ->
        try parse_userinfo(Rest, URI) of
            {T, URI1} ->
                %% Match context will be reused here.
                Userinfo = calculate_parsed_userinfo(Rest, T),
                %% Sub-binary will be created here.
                {Rest, URI1#{userinfo => Userinfo}}
        catch
            throw:{_,_,_} ->
                %% Match context will be reused here
                {T, URI1} = parse_host(Rest, URI),
                Host = calculate_parsed_host_port(Rest, T),
                %% Sub-binary will be created here.
                {Rest, URI1#{host => remove_brackets(Host)}}
        end;

As an added benefit this also avoids creating a new match context when a function needs more slots than the context it was given, which was a very surprising (but thankfully uncommon) performance sink.

Improved context reuse within function bodies

All matches operating on the same binary are combined into a single match sequence, greatly reducing the number of match contexts created.

    %% Example taken from the string module
    bin_search_str_1(Bin0, Start, First, SearchCPs) ->
        <<_:Start/binary, Bin/binary>> = Bin0,
        case binary:match(Bin, First) of
            nomatch -> {nomatch, byte_size(Bin0), []};
            {Where0, _} ->
                Where = Start+Where0,
                %% We already started a match on Bin0, so its context
                %% gets reused
                <<Keep:Where/binary, Cs0/binary>> = Bin0,
                case prefix_1(Cs0, SearchCPs) of
                    nomatch ->
                        %% This is a continuation on the match on Bin0, so we
                        %% reuse it here too. Note that this happens even though
                        %% it was passed to prefix_1 above.
                        <<_/utf8, Cs/binary>> = Cs0,
                        KeepSz = byte_size(Bin0) - byte_size(Cs),
                        %% ... and of course it's reused here as well.
                        bin_search_str_1(Bin0, KeepSz, First, SearchCPs);
                    [] ->
                        {Keep, [Cs0], <<>>};
                    Rest ->
                        {Keep, [Cs0], Rest}
                end
        end.

Wrappers no longer disable context reuse

Wrapper functions used to disable reuse in even the simplest cases, but no longer:

    example(<<0,1,2,Rest/binary>>) ->
        %% Match context will be reused here.
        sum_bytes(Rest).

    sum_bytes(Bin) ->
        sum_bytes_1(Bin, 0).
    sum_bytes_1(<<N, Rest/binary>>, Acc) ->
        sum_bytes_1(Rest, Acc + N);
    sum_bytes_1(<<>>, Acc) ->
        Acc.

@jhogberg jhogberg added team:VM Assigned to OTP team VM enhancement labels Sep 19, 2018
@jhogberg jhogberg self-assigned this Sep 19, 2018
@jhogberg jhogberg force-pushed the john/compiler/ssa-bsm-opt branch 2 times, most recently from 6e56fcb to 671119e Compare September 19, 2018 11:42
The upcoming beam_ssa_bsm pass allows match contexts to be used
across function calls that take said context as an argument, which
means it's fairly common for them to end up in Y registers.
The upcoming beam_ssa_bsm pass makes this redundant.
get_anno/3: as get_anno but with a default value
definitions/1-2: returns a map of variable definitions (#b_set{})
uses/1-2: returns a map of all uses of a given variable
mapfold_blocks_rpo/4: mapfolds over blocks
@jhogberg
Copy link
Contributor Author

jhogberg commented Sep 26, 2018

I've fixed a performance regression on 32-bit platforms and cleaned up a few cosmetic issues. It doesn't look like there's much left to do so I'll merge it on Friday if nothing new pops up. HiPE is still not fixed but there's plenty of time until OTP 22.

I've also got some benchmarks now. The old pass did a pretty good job and a lot of our code was written with its limitations in mind (hurting readability in some cases), so most of the benchmarks are unchanged, but the ones I set out to improve have seen substantial gains.

The blue line is this PR and the green line is maint:

image
image

@kostis
Copy link
Contributor

kostis commented Sep 26, 2018

I would have assumed that there would be a line also for master, in order to also see the baseline of this PR -- and the performance improvement it offers.

Also, since presumably you have an easy way to create these graphs, perhaps it would be a good idea to include HiPE on maint (and maybe on master if it works on these two programs) in order to have points of reference for that too.

It's a super-busy period right now, but today we briefly talked with @margnus1 about at least providing some guidance to what parts of HiPE need to be changed and how. Let's see whether we will be able to do this next week...

@jhogberg
Copy link
Contributor Author

jhogberg commented Sep 27, 2018

I would have assumed that there would be a line also for master, in order to also see the baseline of this PR -- and the performance improvement it offers.

Also, since presumably you have an easy way to create these graphs, perhaps it would be a good idea to include HiPE on maint (and maybe on master if it works on these two programs) in order to have points of reference for that too.

The graphs were taken from the nightly builds and are of maint+PRs and master+PRs, there were no other PRs in master that touched related parts so I simplified the blue line as "this PR." I've included historical benchmarks below and it's pretty easy to spot when the PR was included:

image
image

You can also see that master was a lot slower than maint. This was mainly caused by the SSA rewrite changing how the beam code looks which confused the old pass a bit. Practically all benchmarks related to binary matching have been improved by this PR compared to master, but compared to maint it's roughly unchanged aside from the above benchmarks.

This is a bit unfortunate. Your code no longer has to be arcane to be fast (or at least less arcane), but our benchmarks won't show that because they tend to look at things we've written with the old pass in mind.

It's a super-busy period right now, but today we briefly talked with @margnus1 about at least providing some guidance to what parts of HiPE need to be changed and how. Let's see whether we will be able to do this next week...

Sounds good.

jhogberg and others added 6 commits September 28, 2018 11:31
This commit improves the bit-syntax match optimization pass,
leveraging the new SSA intermediate format to perform much more
aggressive optimizations. Some highlights:

* Watch contexts can be reused even after being passed to a
  function or being used in a try block.
* Sub-binaries are no longer eagerly extracted, making it far
  easier to keep "happy paths" free from binary creation.
* Trivial wrapper functions no longer disable context reuse.
The beam_ssa_bsm pass welds chained matches together, but the match
expressions themselves are unchanged and if there's a tail
alignment check it will be done each time. This subpass figures out
the checks we've already done and deletes the redundant ones.
Remove the variable aliasing support that was needed for the
old beam_bsm pass.
This has been superseded by bs_get_tail/3. Note that it is NOT
removed from the emulator or beam_disasm, as old modules are still
legal.
@jhogberg jhogberg merged commit 7d941c5 into erlang:master Sep 28, 2018
@jhogberg
Copy link
Contributor Author

jhogberg commented Oct 1, 2018

I removed the tests because we already know that they fail hard and we don't want to litter our test result page with them. git rm was the simplest way to make sure they're completely out of the picture.

This is a temporary measure and we always planned to revert it in the branch that reintroduces support, I didn't mention it in the commit message because I figured it was obvious.

In either case I hope this won't remain a problem for long, the faster I get help the faster this gets resolved.

@kostis
Copy link
Contributor

kostis commented Oct 1, 2018

Please try to understand what I am complaining about.

The commit that removed these tests (afa36d2) is actually the same that suggested to use the option no_bsm3 to run programs that contain binary matching in native code. How do we know/ensure that this option works (and will continue to work)? Surely not by git removing all tests that exercise the native code compiler on programs containing binary matching, even if that "is the simplest way to do things."

So, independently of whether the HiPE compiler is fixed to work for the new instructions, there needs to be a way to test that the no_bsm3 / r21 path to generating native code works. I kindly ask you to revert this commit now and restore these tests with a no_bsm3 compiler option for them.

@bjorng
Copy link
Contributor

bjorng commented Oct 1, 2018

The r21 and no_bsm3 options are totally unsupported. The only reason for them being there in the first place is for some test suites that test backward compatibility to previous releases. It was our intention that the single mention of no_bsm3 should be reverted well before the release of OTP 22. The change to the documentation was only intended for the heroes that test out the master branch before the release of OTP 22.

Therefore, we have no interest that the hipe compiler in OTP 22 can handle code that has been compiled with no_bsm3. Unless you have a good reason otherwise, I suggest that you remove the support for the old binary matching instructions in master and only implement support for the new ones.

Also, note that there is at least one test case in hipe_SUITE that will crash the emulator if the old binary optimizations are disabled. (The one I mailed about earlier; see https://gist.github.com/bjorng/3666b4133b0d242ddcf1aa0626a67178.) There could be even more problematic test cases, because no_bsm3 totally disables the old binary match optimizations. So if we reverted the commit now, we would have to add an additional commit to disable at least one test case, and we would be testing a configuration that is never intended to be supported or released in OTP 22. I don't really see the point of doing that.

My suggestion (unless you have very good reasons otherwise), is that you revert the commit yourself as the first thing in your branch that implements the support for the new binary matching instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants