Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize binary matching for fixed-width segments #6259

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

bjorng
Copy link
Contributor

@bjorng bjorng commented Aug 30, 2022

Consider this function:

foo(<<A:6, B:6, C:6, D:6>>) ->
    {A, B, C, D}.

The compiler in Erlang/OTP 25 and earlier would generate the following
code for doing the binary matching:

{test,bs_start_match3,{f,1},1,[{x,0}],{x,1}}.
{bs_get_position,{x,1},{x,0},2}.
{test,bs_get_integer2,
      {f,3},
      2,
      [{x,1},
       {integer,6},
       1,
       {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
      {x,2}}.
{test,bs_get_integer2,
      {f,3},
      3,
      [{x,1},
       {integer,6},
       1,
       {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
      {x,3}}.
{test,bs_get_integer2,
      {f,3},
      4,
      [{x,1},
       {integer,6},
       1,
       {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
      {x,4}}.
{test,bs_get_integer2,
      {f,3},
      5,
      [{x,1},
       {integer,6},
       1,
       {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
      {x,5}}.
{test,bs_test_tail2,{f,3},[{x,1},0]}.

That is, there would be one instruction for each segment being
matched. Having separate match instructions for each segment makes it
difficult for the JIT to do any serious optimization. Currently, when
matching a segment with a size that is not a multiple of 8, the JIT
will generate code that calls a helper function. Common sizes such as
8, 16, and 32 are specially optimized with inline code in the x86 JIT
and in the non-JIT BEAM VM.

This pull request introduces a new bs_match instruction for matching of
integer and binary segments of fixed size. Here is the generated code
for the example:

{test,bs_start_match3,{f,1},1,[{x,0}],{x,1}}.
{bs_get_position,{x,1},{x,0},2}.
{bs_match,{f,3},
          {x,1},
          {commands,[{ensure_exactly,24},
                     {integer,2,{literal,[]},6,1,{x,2}},
                     {integer,3,{literal,[]},6,1,{x,3}},
                     {integer,4,{literal,[]},6,1,{x,4}},
                     {integer,5,{literal,[]},6,1,{x,5}}]}}.

Having only one instruction for the matching allows the JIT to
generate faster code. The generated code will do the following:

  • Test that the size of the binary being matched is exactly 24 bits.

  • Read 24 bits from the binary into a temporary CPU register.

  • For each segment, extract the integer from the temporary register
    by shifting and masking.

Because of the before-mentioned optimization for certain common
segment sizes, the main part of the Base64 encoding in the base64
module is currently implemented in the following non-intuitive way:

encode_binary(<<B1:8, B2:8, B3:8, Ls/bits>>, A) ->
    BB = (B1 bsl 16) bor (B2 bsl 8) bor B3,
    encode_binary(Ls,
                  <<A/bits,(b64e(BB bsr 18)):8,
                    (b64e((BB bsr 12) band 63)):8,
                    (b64e((BB bsr 6) band 63)):8,
                    (b64e(BB band 63)):8>>)

With the new optimization, it is now possible to express the Base64
encoding in a more natural way, which is also faster than before:

encode_binary(<<B1:6, B2:6, B3:6, B4:6, Ls/bits>>, A) ->
    encode_binary(Ls,
                  <<A/bits,
                    (b64e(B1)):8,
                    (b64e(B2)):8,
                    (b64e(B3)):8,
                    (b64e(B4)):8>>)

@bjorng bjorng added team:VM Assigned to OTP team VM enhancement labels Aug 30, 2022
@bjorng bjorng self-assigned this Aug 30, 2022
@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Aug 30, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 30, 2022

CT Test Results

       4 files     378 suites   55m 21s ⏱️
2 073 tests 2 022 ✔️ 51 💤 0
5 901 runs  5 832 ✔️ 69 💤 0

Results for commit 4f0ec73.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng force-pushed the bjorn/bs_match/OTP-18137 branch from 773c922 to b7cce53 Compare September 1, 2022 04:38
Consider this function:

    foo(<<A:6, B:6, C:6, D:6>>) ->
        {A, B, C, D}.

The compiler in Erlang/OTP 25 and earlier would generate the following
code for doing the binary matching:

    {test,bs_start_match3,{f,1},1,[{x,0}],{x,1}}.
    {bs_get_position,{x,1},{x,0},2}.
    {test,bs_get_integer2,
          {f,3},
          2,
          [{x,1},
           {integer,6},
           1,
           {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
          {x,2}}.
    {test,bs_get_integer2,
          {f,3},
          3,
          [{x,1},
           {integer,6},
           1,
           {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
          {x,3}}.
    {test,bs_get_integer2,
          {f,3},
          4,
          [{x,1},
           {integer,6},
           1,
           {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
          {x,4}}.
    {test,bs_get_integer2,
          {f,3},
          5,
          [{x,1},
           {integer,6},
           1,
           {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}],
          {x,5}}.
    {test,bs_test_tail2,{f,3},[{x,1},0]}.

That is, there would be one instruction for each segment being
matched. Having separate match instructions for each segment makes it
difficult for the JIT to do any serious optimization. Currently, when
matching a segment with a size that is not a multiple of 8, the JIT
will generate code that calls a helper function. Common sizes such as
8, 16, and 32 are specially optimized with inline code in the x86 JIT
and in the non-JIT BEAM VM.

This commit introduces a new `bs_match` instruction for matching of
integer and binary segments of fixed size. Here is the generated code
for the example:

    {test,bs_start_match3,{f,1},1,[{x,0}],{x,1}}.
    {bs_get_position,{x,1},{x,0},2}.
    {bs_match,{f,3},
              {x,1},
              {commands,[{ensure_exactly,24},
                         {integer,2,{literal,[]},6,1,{x,2}},
                         {integer,3,{literal,[]},6,1,{x,3}},
                         {integer,4,{literal,[]},6,1,{x,4}},
                         {integer,5,{literal,[]},6,1,{x,5}}]}}.

Having only one instruction for the matching allows the JIT to
generate faster code. The generated code will do the following:

* Test that the size of the binary being matched is exactly 24 bits.

* Read 24 bits from the binary into a temporary CPU register.

* For each segment, extract the integer from the temporary register
  by shifting and masking.

Because of the before-mentioned optimization for certain common
segment sizes, the main part of the Base64 encoding in the `base64`
module is currently implemented in the following non-intuitive way:

    encode_binary(<<B1:8, B2:8, B3:8, Ls/bits>>, A) ->
        BB = (B1 bsl 16) bor (B2 bsl 8) bor B3,
        encode_binary(Ls,
                      <<A/bits,(b64e(BB bsr 18)):8,
                        (b64e((BB bsr 12) band 63)):8,
                        (b64e((BB bsr 6) band 63)):8,
                        (b64e(BB band 63)):8>>)

With the new optimization, it is now possible to express the Base64
encoding in a more natural way, which is also faster than before:

    encode_binary(<<B1:6, B2:6, B3:6, B4:6, Ls/bits>>, A) ->
        encode_binary(Ls,
                      <<A/bits,
                        (b64e(B1)):8,
                        (b64e(B2)):8,
                        (b64e(B3)):8,
                        (b64e(B4)):8>>)
@bjorng bjorng force-pushed the bjorn/bs_match/OTP-18137 branch from b7cce53 to 4f0ec73 Compare September 2, 2022 03:52
@bjorng bjorng merged commit a1d02a0 into erlang:master Sep 2, 2022
@bjorng bjorng deleted the bjorn/bs_match/OTP-18137 branch September 2, 2022 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant