-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize binary matching for fixed-width segments #6259
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CT Test Results 4 files 378 suites 55m 21s ⏱️ Results for commit 4f0ec73. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
bjorng
force-pushed
the
bjorn/bs_match/OTP-18137
branch
from
September 1, 2022 04:38
773c922
to
b7cce53
Compare
Consider this function: foo(<<A:6, B:6, C:6, D:6>>) -> {A, B, C, D}. The compiler in Erlang/OTP 25 and earlier would generate the following code for doing the binary matching: {test,bs_start_match3,{f,1},1,[{x,0}],{x,1}}. {bs_get_position,{x,1},{x,0},2}. {test,bs_get_integer2, {f,3}, 2, [{x,1}, {integer,6}, 1, {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}], {x,2}}. {test,bs_get_integer2, {f,3}, 3, [{x,1}, {integer,6}, 1, {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}], {x,3}}. {test,bs_get_integer2, {f,3}, 4, [{x,1}, {integer,6}, 1, {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}], {x,4}}. {test,bs_get_integer2, {f,3}, 5, [{x,1}, {integer,6}, 1, {field_flags,[{anno,[4,{file,"t.erl"}]},unsigned,big]}], {x,5}}. {test,bs_test_tail2,{f,3},[{x,1},0]}. That is, there would be one instruction for each segment being matched. Having separate match instructions for each segment makes it difficult for the JIT to do any serious optimization. Currently, when matching a segment with a size that is not a multiple of 8, the JIT will generate code that calls a helper function. Common sizes such as 8, 16, and 32 are specially optimized with inline code in the x86 JIT and in the non-JIT BEAM VM. This commit introduces a new `bs_match` instruction for matching of integer and binary segments of fixed size. Here is the generated code for the example: {test,bs_start_match3,{f,1},1,[{x,0}],{x,1}}. {bs_get_position,{x,1},{x,0},2}. {bs_match,{f,3}, {x,1}, {commands,[{ensure_exactly,24}, {integer,2,{literal,[]},6,1,{x,2}}, {integer,3,{literal,[]},6,1,{x,3}}, {integer,4,{literal,[]},6,1,{x,4}}, {integer,5,{literal,[]},6,1,{x,5}}]}}. Having only one instruction for the matching allows the JIT to generate faster code. The generated code will do the following: * Test that the size of the binary being matched is exactly 24 bits. * Read 24 bits from the binary into a temporary CPU register. * For each segment, extract the integer from the temporary register by shifting and masking. Because of the before-mentioned optimization for certain common segment sizes, the main part of the Base64 encoding in the `base64` module is currently implemented in the following non-intuitive way: encode_binary(<<B1:8, B2:8, B3:8, Ls/bits>>, A) -> BB = (B1 bsl 16) bor (B2 bsl 8) bor B3, encode_binary(Ls, <<A/bits,(b64e(BB bsr 18)):8, (b64e((BB bsr 12) band 63)):8, (b64e((BB bsr 6) band 63)):8, (b64e(BB band 63)):8>>) With the new optimization, it is now possible to express the Base64 encoding in a more natural way, which is also faster than before: encode_binary(<<B1:6, B2:6, B3:6, B4:6, Ls/bits>>, A) -> encode_binary(Ls, <<A/bits, (b64e(B1)):8, (b64e(B2)):8, (b64e(B3)):8, (b64e(B4)):8>>)
bjorng
force-pushed
the
bjorn/bs_match/OTP-18137
branch
from
September 2, 2022 03:52
b7cce53
to
4f0ec73
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
team:VM
Assigned to OTP team VM
testing
currently being tested, tag is used by OTP internal CI
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Consider this function:
The compiler in Erlang/OTP 25 and earlier would generate the following
code for doing the binary matching:
That is, there would be one instruction for each segment being
matched. Having separate match instructions for each segment makes it
difficult for the JIT to do any serious optimization. Currently, when
matching a segment with a size that is not a multiple of 8, the JIT
will generate code that calls a helper function. Common sizes such as
8, 16, and 32 are specially optimized with inline code in the x86 JIT
and in the non-JIT BEAM VM.
This pull request introduces a new
bs_match
instruction for matching ofinteger and binary segments of fixed size. Here is the generated code
for the example:
Having only one instruction for the matching allows the JIT to
generate faster code. The generated code will do the following:
Test that the size of the binary being matched is exactly 24 bits.
Read 24 bits from the binary into a temporary CPU register.
For each segment, extract the integer from the temporary register
by shifting and masking.
Because of the before-mentioned optimization for certain common
segment sizes, the main part of the Base64 encoding in the
base64
module is currently implemented in the following non-intuitive way:
With the new optimization, it is now possible to express the Base64
encoding in a more natural way, which is also faster than before: