-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
private_append
optimization for binaries
#6804
Conversation
CT Test Results 2 files 295 suites 12m 10s ⏱️ For more details on these failures, see this check. Results for commit 858fff4. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
lib/compiler/src/beam_ssa_alias.erl
Outdated
bs_test_tail -> | ||
SS1; | ||
build_stacktrace -> | ||
SS1; % XXXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does XXXX
mean? If this line still needs attention, please change it to something clearer, such as TODO: ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means i should have done a git grep XXXX
before pushing :) I use at as a marker for things I should take a second look at.
At this particular location my thinking is that build_stacktrace
is special in that it is only executed when we're exiting and additionally its only argument isn't a term which can contain a binary. I could do a aa_set_aliased([Dst|Args], SS1)
but I think that is redundant. Correct? I added a comment about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_stacktrace
is not only used when exiting. The code that follows build_stacktrace
can do anything, for example print the stack trace to a log file and continue executing.
The argument can potentially contain binaries:
otp/erts/emulator/beam/beam_common.c
Lines 1181 to 1189 in a231184
/* | |
* Building a symbolic representation of a saved stack trace. Note that | |
* the exception object 'exc', unless NIL, points to a cons cell which | |
* holds the given args and the quick-saved data (encoded as a bignum). | |
* | |
* If the bignum is negative, the given args is a complete stacktrace. | |
*/ | |
Eterm | |
build_stacktrace(Process* c_p, Eterm exc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's even worse, anything live at this point can be captured and aliased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of: a stack trace (currently) only contains the arguments of the operation that crashed, so we only need to consider the arguments of all operations that can crash into the current exception handler.
I'm not sure that tracking it would be worth it though, so maybe the optimization should just give up when encountering these. I don't think that doing so would cost too much since it's pretty rare to use try/catch
with captured stack traces in the middle of a hot loop that appends binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not too bad. At least it can't alias terms which are not known when the analysis encounters build_stacktrace
. Analyzing exactly which operations can crash into the handler sounds complicated, but just aliasing everything is quite simple.Hopefully users will wrap the loop with the try catch and not have the in the body. For example (with the fix in 63fd18d):
stacktrace() ->
X = ..., % something unique
try
ex:code_that_fails(),
X % still unique
catch
_:_:Stacktrace ->
io:format("Stacktrace: ~p~n", [Stacktrace]),
X % now conservatively aliased
end.
Implementing this correction does not remove any private-append transforms in the set of files compiled by diffable.
Are there any plans to include more in the stack trace, i.e. arguments to functions higher on the stack? That would require the stack-crawler (or what it is called) to deep copy terms.
AlreadyFound = maps:get(Fun, Found0, []), | ||
Found = Found0#{Fun => [{append,Dst,Var}|AlreadyFound]}, | ||
find_appends_is(Is, Fun, Found); | ||
_ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick:
_ -> | |
false -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
Some benchmark results on my Intel iMac: Erlang/OTP 25.2:
Current
This PR:
That makes both encode and decode of Base64 almost three times as fast compared to OTP 25. |
lib/compiler/src/beam_ssa_alias.erl
Outdated
aa_set_status(#b_local{}, _Status, State) -> | ||
State; | ||
aa_set_status(#b_remote{}, _Status, State) -> | ||
State; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not covered by our tests. Can this happen? Is there a test case missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't create an example which covers it, so away it goes! (fix in beb1977)
{{hd,H},{tl,T}} -> | ||
{pair,H,T}; | ||
{{tl,T},{hd,H}} -> | ||
{pair,H,T}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the type analyzer does not handle the case when you "misuse" cons cells. For example:
b(L) ->
b(L, [<<>>|<<>>]).
b([H|T], [A|B]) ->
b([H|T], [<<A/binary,H:8>>|<<B/binary,H:8>>]);
b([], Acc) ->
Acc.
Here the type analyzer will conclude that the type of the accumulator is nonempty_improper_list(bitstring(8,appendable), bitstring(8))
and not nonempty_improper_list(bitstring(256,appendable), bitstring(256,appendable))
which would be expected. If the type analysis pass is fixed so both fields of the cons are transformed by private append, we will hit this code with the example above.
Apart from a couple of lines dealing with nifs (will fix that), all other non-covered lines in beam_ssa_private_append.erl are due to this type analysis fault as we never try to transform pairs of binaries.
When I last looked at this, during the autumn, I seem to recall that the type was less specific, i.e. nonempty_improper_list(bitstring(8,appendable), any)
so maybe this can now be fixed. I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fixed in d1211f5, beam_ssa_type:type/5
had problems handling bs_create_bin
when the
first fragment was typed as a #t_union{} containing an appendable bitstring. The same error was in beam_ssa_private_append:is_appendable/2` (fixed in 6952cb6).
Additionally 6952cb6 adds a tests which exercises [not the case, the code coverage percent is correct], this getting the coverage to 99% (the remaining non-covered lines are due to the NIF problem mentioned below).ht
/tl
lines above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{tl,T},{hd,H}}
never happens, patches are stored in an orddict, so only {{hd,H},{tl,T}}
can occur. So the non-reachable code is removed in b33d99e.
This is an awesome improvement! The results for Is it possible to "pre-allocate" the binary? In many cases, like |
Added to our daily builds along with an updated primary bootstrap. |
Yes, it would be possible, and it is already done for binary comprehensions for which the exact size can be calculated beforehand. This PR does not attempt to do any pre-allocating (the existing analysis for pre-allocation used for binary comprehensions is not general enough), but is something we will consider doing in OTP 27. |
I don't think it even needs to be fully automated. I think giving control to the user would also work. Additionally, I don't think getting the size "wrong" would lead to any unsafely - the binary will either be re-allocated, or just remain with some unused bytes at the end wasting some memory. In particular, an API like |
Binary preallocation is really useful for protocol serialisation. For example, protobuf and Thrift could be sped up significantly, and avoid cumbersome double transformation (term -> iolist -> binary). |
Yes, getting the size wrong is safe. The compiler's size calculation is extremely conservative to avoid allocating a huge binary that is not actually needed, not because it would be unsafe.
Yes, that would work. |
list_to_binary(List) ->
list_to_binary(List, <<length(List)/size-hint>>). %% Creates an empty binary with a size hint.
list_to_binary([H|T], Acc) ->
list_to_binary(T, <<H,Acc/binary>>);
list_to_binary([], Acc) ->
Acc. |
27623e8
to
d8cfd8f
Compare
I have pushed a squashed version of all the fixups on fixups. If you want to pull this branch for nightly testing @bjorng, now is good time. |
Added the updated branch along with an updated bootstrap to our daily builds. |
While working to increase the code-coverage in Modifying |
Don't worry about the dead code for this release candidate. Can you squash the commits again? I will add it to our daily builds and we plan to merge tomorrow. |
Will do within an hour or so, right now I'm in a meeting (where I'm supposed to be presenting). |
7d9a6cb
to
b33d99e
Compare
Took a bit longer, squashed and pushed @bjorng. |
Thanks! Added to our daily builds. |
f9539dd
to
b33d99e
Compare
When a bitstring is used as an accumulator and is being grown by appending new data to it, the runtime system can make the append operation very efficent if it is guaranteed that the bitstring being extended: 1) Originated as a `<<>>`. 2) The append operation is done with `bs_create_bin` with the accumulator as the leftmost fragment. 3) The append does not change the unit of the bitstring. 4) There is only one live pointer to the accumulator on the heap or in VM registers. This patch extends the type analysis pass to track 1) and 2). This is done by extending the `t_bitstring` record with an `appendable` flag which is updated alongside with the `unit` field. The basic principle is that `<<>>` literals are constructed with the flag set and operations which capture a pointer to the binary clears it. The functions to calculate the least upper bound and greatest lower bound are also updated to account for the appendable status.
Avoid any assumptions on the number and kind of annotations before a call. Instead just track the last seen result type annotation and apply it when calls are found.
Annotate ret-instruction with the type they return, skip the annotation if nothing is yet known about the variable. The annotation will be used by the alias analysis pass.
Make the SSA pretty printer print annotations on terminators. This helps with debugging optimizations which depend on annotated terminators.
For debugging in type-dependent optimization passes it is useful to have a type pretty printer available in order to generate readable trace and debug printouts.
This commit inserts an empty analysis pass which will eventually be extended to do a full alias analysis.
Implement alias analysis as a prerequisite for alias analysis.
Calculate the killset for all functions. A killset allows us to look up the variables which die at a code location.
b33d99e
to
c002140
Compare
Perform an alias analysis of all reachable functions, alias information is added as annotations on the SSA code. Alias analysis is done by an algorithm inspired by Kotzmann and Mössenböck's 2005 algorithm for Escape Analysis (https://www.usenix.org/events/vee05/full_papers/p111-kotzmann.pdf), particularly their escape equivalent sets. But in contrast to Kotzmann and Mössenböck, instead of just tracking escaping values we track if a value in a variable is unique and/or aliased. A variable is said to be unique if it currently is the only live variable pointing to a particular term on the heap. Literals and non-boxed terms are considered unique. A variable is said to be aliased if it points to a heap term which can be reached by other means than the boxed pointer in the variable. The alias analysis is performed by traversing the functions in the module and their code. For each operation the uniqueness and alias status are updated. The unique/aliased status is maintained in a map which maps a variable to a either a status or another variable. Thus constructing equivalent sets in the same way a Kotzmann and Mössenböck. When the analysis finishes each instruction is annotated with information about which of its arguments are unique or aliased.
Extend the BEAM SSA pretty printer to pretty-print annotations regarding alias and unique status.
When a binary is grown by appending data to it using `bs_create_bin`, a considerable performance improvement can be achieved if the append can be done using the destructive `private_append` instead of `append`. Using `private_append` flavor of `bs_create_bin` is only possible when the binary being extended has been created by `bs_writable_binary` or by another `bs_create_bin` using `private_append`. As `private_append` is destructive, an additional requirement is that there is only one live reference to the binary being being extended. This commit implements a new SSA optimization pass which finds suitable code sequences which iteratively grow a binaries starting with `<<>>` and rewrites them to start with an initial value created by `bs_writable_binary` and use `private_append` for `bs_create_bin`.
c002140
to
858fff4
Compare
Looking at other recently pushed PRs it looks like I'm not the only one who gets tripped on the "Test Erlang/OTP (system)" step of the CI, so unless someone tells me differently, I will consider the CI failure not-my-fault™. (@bjorng and @jhogberg, you have my phone number in an email if you need my urgent attention this afternoon) |
When a binary is grown by appending data to it using
bs_create_bin
,a considerable performance improvement can be achieved if the append
can be done using the destructive
private_append
instead ofappend
. Usingprivate_append
flavor ofbs_create_bin
is onlypossible when the binary being extended has been created by
bs_writable_binary
or by anotherbs_create_bin
usingprivate_append
. Asprivate_append
is destructive, an additionalrequirement is that there is only one live reference to the binary
being being extended.
This set of patches implements the compiler optimization described above together with its dependencies.