`private_append` optimization for binaries #6804

frej · 2023-02-06T15:13:15Z

When a binary is grown by appending data to it using bs_create_bin,
a considerable performance improvement can be achieved if the append
can be done using the destructive private_append instead of
append. Using private_append flavor of bs_create_bin is only
possible when the binary being extended has been created by
bs_writable_binary or by another bs_create_bin using
private_append. As private_append is destructive, an additional
requirement is that there is only one live reference to the binary
being being extended.

This set of patches implements the compiler optimization described above together with its dependencies.

github-actions · 2023-02-06T16:01:55Z

CT Test Results

      2 files   295 suites 12m 10s ⏱️
  761 tests   757 ✔️ 2 💤 2 ❌
4 708 runs 4 702 ✔️ 2 💤 4 ❌

For more details on these failures, see this check.

Results for commit 858fff4.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

lib/compiler/src/beam_types.erl

lib/compiler/test/property_test/beam_types_prop.erl

bjorng · 2023-02-07T06:46:07Z

lib/compiler/src/beam_ssa_alias.erl

+             bs_test_tail ->
+                 SS1;
+             build_stacktrace ->
+                 SS1; % XXXX


What does XXXX mean? If this line still needs attention, please change it to something clearer, such as TODO: ....

It means i should have done a git grep XXXX before pushing :) I use at as a marker for things I should take a second look at.

At this particular location my thinking is that build_stacktrace is special in that it is only executed when we're exiting and additionally its only argument isn't a term which can contain a binary. I could do a aa_set_aliased([Dst|Args], SS1) but I think that is redundant. Correct? I added a comment about that.

Any verdict on build_stacktrace @bjorng ? The updated comment is in 0af6c67.

build_stacktrace is not only used when exiting. The code that follows build_stacktrace can do anything, for example print the stack trace to a log file and continue executing.

The argument can potentially contain binaries:

otp/erts/emulator/beam/beam_common.c

Lines 1181 to 1189 in a231184

/*

* Building a symbolic representation of a saved stack trace. Note that

* the exception object 'exc', unless NIL, points to a cons cell which

* holds the given args and the quick-saved data (encoded as a bignum).

*

* If the bignum is negative, the given args is a complete stacktrace.

*/

Eterm

build_stacktrace(Process* c_p, Eterm exc) {

So it's even worse, anything live at this point can be captured and aliased.

Sort of: a stack trace (currently) only contains the arguments of the operation that crashed, so we only need to consider the arguments of all operations that can crash into the current exception handler.

I'm not sure that tracking it would be worth it though, so maybe the optimization should just give up when encountering these. I don't think that doing so would cost too much since it's pretty rare to use try/catch with captured stack traces in the middle of a hot loop that appends binaries.

That's not too bad. At least it can't alias terms which are not known when the analysis encounters build_stacktrace. Analyzing exactly which operations can crash into the handler sounds complicated, but just aliasing everything is quite simple.Hopefully users will wrap the loop with the try catch and not have the in the body. For example (with the fix in 63fd18d):

stacktrace() -> X = ..., % something unique try ex:code_that_fails(), X % still unique catch _:_:Stacktrace -> io:format("Stacktrace: ~p~n", [Stacktrace]), X % now conservatively aliased end.

Implementing this correction does not remove any private-append transforms in the set of files compiled by diffable.

Are there any plans to include more in the stack trace, i.e. arguments to functions higher on the stack? That would require the stack-crawler (or what it is called) to deep copy terms.

bjorng · 2023-02-07T06:50:29Z

lib/compiler/src/beam_ssa_private_append.erl

+            AlreadyFound = maps:get(Fun, Found0, []),
+            Found = Found0#{Fun => [{append,Dst,Var}|AlreadyFound]},
+            find_appends_is(Is, Fun, Found);
+        _ ->


Nitpick:

Suggested change

_ ->

false ->

lib/compiler/src/beam_ssa_alias.erl

bjorng · 2023-02-07T08:18:17Z

Some benchmark results on my Intel iMac:

Erlang/OTP 25.2:

== Testing with 1 MB ==
fun base64:encode/1: 1000 iterations in 16307 ms: 61 it/sec
fun base64:decode/1: 1000 iterations in 18402 ms: 54 it/sec

Current master branch:

== Testing with 1 MB ==
fun base64:encode/1: 1000 iterations in 7832 ms: 127 it/sec
fun base64:decode/1: 1000 iterations in 10878 ms: 91 it/sec

This PR:

== Testing with 1 MB ==
fun base64:encode/1: 1000 iterations in 5555 ms: 180 it/sec
fun base64:decode/1: 1000 iterations in 6433 ms: 155 it/sec

That makes both encode and decode of Base64 almost three times as fast compared to OTP 25.

lib/compiler/src/beam_ssa_alias.erl

bjorng · 2023-02-07T09:23:13Z

lib/compiler/src/beam_ssa_alias.erl

+aa_set_status(#b_local{}, _Status, State) ->
+    State;
+aa_set_status(#b_remote{}, _Status, State) ->
+    State;


Not covered by our tests. Can this happen? Is there a test case missing?

I can't create an example which covers it, so away it goes! (fix in beb1977)

bjorng · 2023-02-07T09:26:25Z

lib/compiler/src/beam_ssa_private_append.erl

+            {{hd,H},{tl,T}} ->
+                {pair,H,T};
+            {{tl,T},{hd,H}} ->
+                {pair,H,T};


Not covered.

Currently the type analyzer does not handle the case when you "misuse" cons cells. For example:

b(L) -> b(L, [<<>>|<<>>]). b([H|T], [A|B]) -> b([H|T], [<<A/binary,H:8>>|<<B/binary,H:8>>]); b([], Acc) -> Acc.

Here the type analyzer will conclude that the type of the accumulator is nonempty_improper_list(bitstring(8,appendable), bitstring(8)) and not nonempty_improper_list(bitstring(256,appendable), bitstring(256,appendable)) which would be expected. If the type analysis pass is fixed so both fields of the cons are transformed by private append, we will hit this code with the example above.

Apart from a couple of lines dealing with nifs (will fix that), all other non-covered lines in beam_ssa_private_append.erl are due to this type analysis fault as we never try to transform pairs of binaries.

When I last looked at this, during the autumn, I seem to recall that the type was less specific, i.e. nonempty_improper_list(bitstring(8,appendable), any) so maybe this can now be fixed. I'll have a look.

This is now fixed in d1211f5, beam_ssa_type:type/5 had problems handling bs_create_bin when the
first fragment was typed as a #t_union{} containing an appendable bitstring. The same error was in beam_ssa_private_append:is_appendable/2` (fixed in 6952cb6).

~~Additionally 6952cb6 adds a tests which exercises ht/tl lines above~~ [not the case, the code coverage percent is correct], this getting the coverage to 99% (the remaining non-covered lines are due to the NIF problem mentioned below).

{{tl,T},{hd,H}} never happens, patches are stored in an orddict, so only {{hd,H},{tl,T}} can occur. So the non-reachable code is removed in b33d99e.

michalmuskala · 2023-02-07T09:59:21Z

This is an awesome improvement! The results for base64 are really impressive.

Is it possible to "pre-allocate" the binary? In many cases, like base64 you actually know exactly how big the result is going to be - it would be great if there was a way to communicate this information to the runtime system. This could probably improve the results even further.

bjorng · 2023-02-07T13:53:52Z

Added to our daily builds along with an updated primary bootstrap.

bjorng · 2023-02-07T14:03:09Z

Is it possible to "pre-allocate" the binary?

Yes, it would be possible, and it is already done for binary comprehensions for which the exact size can be calculated beforehand. This PR does not attempt to do any pre-allocating (the existing analysis for pre-allocation used for binary comprehensions is not general enough), but is something we will consider doing in OTP 27.

michalmuskala · 2023-02-07T15:22:27Z

I don't think it even needs to be fully automated. I think giving control to the user would also work. Additionally, I don't think getting the size "wrong" would lead to any unsafely - the binary will either be re-allocated, or just remain with some unused bytes at the end wasting some memory. In particular, an API like binary:empty_appendable(100) could be useful.
There are probably cases where automated analysis wouldn't be useful, but optimising for the most common case could be extremely beneficial. For example, for JSON when encoding a string we know the result will be at least as long as the input string + some padding (some implementations have heuristics for that, e.g. over-allocate 10%), so we can start the allocations much higher and avoid couple rounds of re-allocations, even if the final result is a little off.

max-au · 2023-02-07T16:03:19Z

Binary preallocation is really useful for protocol serialisation. For example, protobuf and Thrift could be sped up significantly, and avoid cumbersome double transformation (term -> iolist -> binary).

bjorng · 2023-02-08T06:43:54Z

I don't think it even needs to be fully automated. I think giving control to the user would also work. Additionally, I don't think getting the size "wrong" would lead to any unsafely - the binary will either be re-allocated, or just remain with some unused bytes at the end wasting some memory.

Yes, getting the size wrong is safe. The compiler's size calculation is extremely conservative to avoid allocating a huge binary that is not actually needed, not because it would be unsafe.

In particular, an API like binary:empty_appendable(100) could be useful.

Yes, that would work.

garazdawi · 2023-02-08T08:55:05Z

In particular, an API like binary:empty_appendable(100) could be useful.

list_to_binary(List) ->
  list_to_binary(List, <<length(List)/size-hint>>). %% Creates an empty binary with a size hint.
list_to_binary([H|T], Acc) ->
  list_to_binary(T, <<H,Acc/binary>>);
list_to_binary([], Acc) ->
  Acc.

frej · 2023-02-08T13:26:57Z

I have pushed a squashed version of all the fixups on fixups. If you want to pull this branch for nightly testing @bjorng, now is good time.

bjorng · 2023-02-08T14:53:08Z

Added the updated branch along with an updated bootstrap to our daily builds.

frej · 2023-02-09T07:40:28Z

While working to increase the code-coverage in beam_ssa_alias.erl I (re-)discovered that for modules containing NIFs beam_ssa_opt:module/2 completely gives up on creating a function-db for all functions, not just the ones calling NIFs.
As beam_ssa_alias.erl is written to gracefully handle NIFs, but needs the caller/callee information provided by the function-db, this of course leads to some dead code (the predicates ) as we won't hit code paths where we deal with partial information.

Modifying beam_ssa_opt to build a partial function-db seems dangerous less than two days before code freeze, so (@bjorng, @jhogberg) should I excise the code dealing with NIFs in this PR or should I leave it in? As far as I can tell, this is 2 lines of code and I would be replacing them with TODO comments.

bjorng · 2023-02-09T12:06:17Z

Don't worry about the dead code for this release candidate.

Can you squash the commits again? I will add it to our daily builds and we plan to merge tomorrow.

frej · 2023-02-09T12:16:47Z

Can you squash the commits again? I will add it to our daily builds and we plan to merge tomorrow.

Will do within an hour or so, right now I'm in a meeting (where I'm supposed to be presenting).

frej · 2023-02-09T14:02:22Z

Took a bit longer, squashed and pushed @bjorng.

bjorng · 2023-02-09T14:06:28Z

Thanks! Added to our daily builds.

When a bitstring is used as an accumulator and is being grown by appending new data to it, the runtime system can make the append operation very efficent if it is guaranteed that the bitstring being extended: 1) Originated as a `<<>>`. 2) The append operation is done with `bs_create_bin` with the accumulator as the leftmost fragment. 3) The append does not change the unit of the bitstring. 4) There is only one live pointer to the accumulator on the heap or in VM registers. This patch extends the type analysis pass to track 1) and 2). This is done by extending the `t_bitstring` record with an `appendable` flag which is updated alongside with the `unit` field. The basic principle is that `<<>>` literals are constructed with the flag set and operations which capture a pointer to the binary clears it. The functions to calculate the least upper bound and greatest lower bound are also updated to account for the appendable status.

Avoid any assumptions on the number and kind of annotations before a call. Instead just track the last seen result type annotation and apply it when calls are found.

Annotate ret-instruction with the type they return, skip the annotation if nothing is yet known about the variable. The annotation will be used by the alias analysis pass.

Make the SSA pretty printer print annotations on terminators. This helps with debugging optimizations which depend on annotated terminators.

For debugging in type-dependent optimization passes it is useful to have a type pretty printer available in order to generate readable trace and debug printouts.

This commit inserts an empty analysis pass which will eventually be extended to do a full alias analysis.

Implement alias analysis as a prerequisite for alias analysis.

Calculate the killset for all functions. A killset allows us to look up the variables which die at a code location.

Perform an alias analysis of all reachable functions, alias information is added as annotations on the SSA code. Alias analysis is done by an algorithm inspired by Kotzmann and Mössenböck's 2005 algorithm for Escape Analysis (https://www.usenix.org/events/vee05/full_papers/p111-kotzmann.pdf), particularly their escape equivalent sets. But in contrast to Kotzmann and Mössenböck, instead of just tracking escaping values we track if a value in a variable is unique and/or aliased. A variable is said to be unique if it currently is the only live variable pointing to a particular term on the heap. Literals and non-boxed terms are considered unique. A variable is said to be aliased if it points to a heap term which can be reached by other means than the boxed pointer in the variable. The alias analysis is performed by traversing the functions in the module and their code. For each operation the uniqueness and alias status are updated. The unique/aliased status is maintained in a map which maps a variable to a either a status or another variable. Thus constructing equivalent sets in the same way a Kotzmann and Mössenböck. When the analysis finishes each instruction is annotated with information about which of its arguments are unique or aliased.

Extend the BEAM SSA pretty printer to pretty-print annotations regarding alias and unique status.

When a binary is grown by appending data to it using `bs_create_bin`, a considerable performance improvement can be achieved if the append can be done using the destructive `private_append` instead of `append`. Using `private_append` flavor of `bs_create_bin` is only possible when the binary being extended has been created by `bs_writable_binary` or by another `bs_create_bin` using `private_append`. As `private_append` is destructive, an additional requirement is that there is only one live reference to the binary being being extended. This commit implements a new SSA optimization pass which finds suitable code sequences which iteratively grow a binaries starting with `<<>>` and rewrites them to start with an initial value created by `bs_writable_binary` and use `private_append` for `bs_create_bin`.

frej · 2023-02-10T11:52:44Z

Looking at other recently pushed PRs it looks like I'm not the only one who gets tripped on the "Test Erlang/OTP (system)" step of the CI, so unless someone tells me differently, I will consider the CI failure not-my-fault™.

(@bjorng and @jhogberg, you have my phone number in an email if you need my urgent attention this afternoon)

jhogberg reviewed Feb 6, 2023

View reviewed changes

lib/compiler/src/beam_types.erl Show resolved Hide resolved

jhogberg reviewed Feb 6, 2023

View reviewed changes

lib/compiler/test/property_test/beam_types_prop.erl Outdated Show resolved Hide resolved

jhogberg requested a review from bjorng February 6, 2023 16:39

jhogberg added team:VM Assigned to OTP team VM enhancement labels Feb 6, 2023

bjorng self-assigned this Feb 7, 2023

bjorng approved these changes Feb 7, 2023

View reviewed changes

jhogberg reviewed Feb 7, 2023

View reviewed changes

lib/compiler/src/beam_ssa_alias.erl Outdated Show resolved Hide resolved

jhogberg reviewed Feb 7, 2023

View reviewed changes

lib/compiler/src/beam_ssa_alias.erl Outdated Show resolved Hide resolved

bjorng reviewed Feb 7, 2023

View reviewed changes

frej force-pushed the frej/private-append branch 2 times, most recently from 27623e8 to d8cfd8f Compare February 8, 2023 13:24

frej force-pushed the frej/private-append branch 2 times, most recently from 7d9a6cb to b33d99e Compare February 9, 2023 14:01

frej force-pushed the frej/private-append branch from f9539dd to b33d99e Compare February 10, 2023 08:58

frej added 8 commits February 10, 2023 10:57

compiler: Improve robustness of compile_SUITE:types_pp test.

0c70161

Avoid any assumptions on the number and kind of annotations before a call. Instead just track the last seen result type annotation and apply it when calls are found.

compiler: Annotate ret-instructions with the type they return

8a88864

Annotate ret-instruction with the type they return, skip the annotation if nothing is yet known about the variable. The annotation will be used by the alias analysis pass.

compiler: Print terminator annotations in SSA dumps

e85ccf6

Make the SSA pretty printer print annotations on terminators. This helps with debugging optimizations which depend on annotated terminators.

compiler: Export beam_ssa_pp:format_type

9ce6a2e

For debugging in type-dependent optimization passes it is useful to have a type pretty printer available in order to generate readable trace and debug printouts.

compiler: alias 1/4: Introduce alias analysis pass

56eeb28

This commit inserts an empty analysis pass which will eventually be extended to do a full alias analysis.

compiler: alias 2/4: Implement liveness analysis

46f8dac

Implement alias analysis as a prerequisite for alias analysis.

compiler: alias 3/4: Calculate kill sets

d2286ff

Calculate the killset for all functions. A killset allows us to look up the variables which die at a code location.

frej force-pushed the frej/private-append branch from b33d99e to c002140 Compare February 10, 2023 10:03

frej added 3 commits February 10, 2023 12:03

compiler: pretty-print unique/alias status

c7ab1dd

Extend the BEAM SSA pretty printer to pretty-print annotations regarding alias and unique status.

frej force-pushed the frej/private-append branch from c002140 to 858fff4 Compare February 10, 2023 11:04

bjorng merged commit a092c49 into erlang:master Feb 10, 2023

frej deleted the frej/private-append branch February 13, 2023 11:16

bjorng mentioned this pull request Mar 16, 2023

Efficiency Guide: Add more examples for binaries #7022

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`private_append` optimization for binaries #6804

`private_append` optimization for binaries #6804

frej commented Feb 6, 2023

github-actions bot commented Feb 6, 2023 •

edited

bjorng Feb 7, 2023

frej Feb 7, 2023

frej Feb 8, 2023

bjorng Feb 8, 2023

frej Feb 8, 2023

jhogberg Feb 8, 2023

frej Feb 8, 2023

bjorng Feb 7, 2023

frej Feb 7, 2023

bjorng commented Feb 7, 2023

bjorng Feb 7, 2023

frej Feb 8, 2023

bjorng Feb 7, 2023

frej Feb 8, 2023

frej Feb 9, 2023 •

edited

frej Feb 9, 2023

michalmuskala commented Feb 7, 2023

bjorng commented Feb 7, 2023

bjorng commented Feb 7, 2023

michalmuskala commented Feb 7, 2023

max-au commented Feb 7, 2023

bjorng commented Feb 8, 2023

garazdawi commented Feb 8, 2023

frej commented Feb 8, 2023

bjorng commented Feb 8, 2023

frej commented Feb 9, 2023

bjorng commented Feb 9, 2023

frej commented Feb 9, 2023

frej commented Feb 9, 2023

bjorng commented Feb 9, 2023

frej commented Feb 10, 2023

	/*
	* Building a symbolic representation of a saved stack trace. Note that
	* the exception object 'exc', unless NIL, points to a cons cell which
	* holds the given args and the quick-saved data (encoded as a bignum).
	*
	* If the bignum is negative, the given args is a complete stacktrace.
	*/
	Eterm
	build_stacktrace(Process* c_p, Eterm exc) {

private_append optimization for binaries #6804

private_append optimization for binaries #6804

Conversation

frej commented Feb 6, 2023

github-actions bot commented Feb 6, 2023 • edited

CT Test Results

Artifacts

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjorng commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frej Feb 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michalmuskala commented Feb 7, 2023

bjorng commented Feb 7, 2023

bjorng commented Feb 7, 2023

michalmuskala commented Feb 7, 2023

max-au commented Feb 7, 2023

bjorng commented Feb 8, 2023

garazdawi commented Feb 8, 2023

frej commented Feb 8, 2023

bjorng commented Feb 8, 2023

frej commented Feb 9, 2023

bjorng commented Feb 9, 2023

frej commented Feb 9, 2023

frej commented Feb 9, 2023

bjorng commented Feb 9, 2023

frej commented Feb 10, 2023

`private_append` optimization for binaries #6804

`private_append` optimization for binaries #6804

github-actions bot commented Feb 6, 2023 •

edited

frej Feb 9, 2023 •

edited