New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not allocate a new tuple when the value is the same #1890
Conversation
This patch optimizes tuple operations to not allocate new tuple when an element is being replaced by the exact same value in memory. Imagine this very common idiom: Record#my_record{key = compute_new_value(Value, Condition)} where: compute_new_x(X, true) -> X + 1; compute_new_x(X, false) -> X; In many cases, we are not changing the value in `key`, however the code prior to this patch would still allocate a new tuple. This optimization changes this. The cost of optimization is minimum, as it only adds a C array access plus a pointer comparison. The major benefit is reducing the GC pressure by avoiding allocating data. We have only changed erlang:setelement/3. The benchmarks basically create a tuple and perform the same operations, roughly 20000 times, once replacing the key with the same value, and another with a different value. For a tuple with 4 elements, replacing the fourth element 20000 times went from 557us to 388us. For a tuple with 8 elements, replacing the fourth element 20000 times went from 641us to 421us.
Unfortunately this patch does not work as is. That's because the compiler may use the dsetelement operation, which assumes that setelement returns a new tuple and therefore perform operations in place. In order to solve this, we need to change how the compiler works. One option is break the current setlement followed by a dsetlement in 2. One operation that copies and checks the tuple size, followed by multiple dsetelement operations. Therefore a The other option is to replace the dsetelement operation into a new operation that is similar to how I have sent a PR to get the discussion started and figure out if there is a way to move this optimization forward. Thank you! |
I will ping @bjorng because, even though the final impact of this change is the VM, we will need to change the compiler to achieve it (if we want to do this change in the first place). :) |
That seems to be the best way to do it. That would speed up updating of records. We have considered adding such new instructions anyway as an optimization. When loading old code that uses |
Do you think it would speed up even when updating a single element? Or do you think we should still keep setelement for single updates and updates where the position is not known at compile time? |
The update in itself would not be faster. But since the instruction would not clobber the X registers, there could possibly be an indirect speedup because of less register shuffling and possibly smaller stack frame. So it would probably be worth using the new instruction for updating only a single element at a known position.
We must definitely keep it for updates where the position is not known at compile time. It must also be kept in case it is applied. Users could also expect that calls to |
I always forget about traceability. I think this introduces a dilemma. We don't have a syntax operation for updating tuples. The record syntax converts to setelement calls and we can trigger the dsetelement optimization by hand, which means loss of traceability: bar(Tuple) ->
erlang:setelement(1, erlang:setelement(2, Tuple, bar), baz). emits:
In fact, we leverage the fact the optimization can be triggered by hand in Elixir's implementation of records. However, the optimization above only applies when the values being set are safe and if the updates happen in certain order. I believe we can easily lift the order restriction. We can lift the safety restriction too but it means a slight change of error semantics in cases like this: erlang:setelement(1, erlang:setelement(20, Tuple, bar), error(oops)). Note that the inner setelement operation is meant to be out of bounds. If we lift the safety restriction, it means we have to reorder the operations above so we have: Var1 = bar,
Var2 = error(oops),
erlang:setelement(1, erlang:setelement(20, Tuple, Var1), Var2). Now we get a different error than the original one. This can be problematic in face of any side-effects. On one hand, we want to optimize code as much as possible. On the other hand, we want to keep the semantics clear, including tracing semantics. Assuming we are going to introduce a new operation, we need to choose when this new operation will be invoked. I believe we have to answer some questions:
However, applying rules 1 and 3 means we lose traceability in even more cases. One option to circumvent this is to introduce a new node in the Erlang Abstract Format for updating multiple tuple elements in a row. With this we can keep traceability, but it adds an AST node that doesn't have a syntactical representation. It also means that all code written that leverages dsetel will no longer do so unless they explicitly update to the new node. I have a simpler proposal though: to always apply the optimization whenever the position is known at compile time and it is safe. As far as I know, this happen in other cases, such as invoking |
I think it's fine for the In a function like: foo(X, Y, Z) ->
setelement(1, setelement(2, X, Z), Y). We're able to only trace the |
I think I agree. Explicit use of
I agree.
I am not sure that I understand you completely. Do you mean that we should not change the evaluation order to make the optimization possible in more cases? For example:
Which is equivalent to:
In this case, the optimization should not be applied. Is that what you meant? |
Yes. I mean that we should not change the evaluation order to make the optimization possible in more cases. In this case: foo(Tuple) ->
T1 = setelement(20, Tuple, bar),
T2 = bar(),
setelement(1, T1, T2).
We will end-up invoking the new operation twice, copying the tuple twice. Once to change the element at position 20 and another to change it at position 1. While in this case: foo(Tuple) ->
T2 = bar(),
T1 = setelement(20, Tuple, bar),
setelement(1, T1, T2).
The operation is called only once and it will perform both changes at once, effectively copying the tuple a single time. Here is my proposal:
We will need a compiler pass that will convert the |
A potential way to get around the issue of not changing the error semantics would to to emit a sort-of |
Good. I agree. I have a suggestion for a slightly different implementation. I suggest that it should be more like The reason for that is that before updating an Erlang record, there will always be code that checks the size and record tag of the tuple. Having the check in the update operation in the tuple update operation too would be unnecessary. How would that work for Elixir? Do you call |
To expand on what @michalmuskala said. Imagine this operation: erlang:setelement(
1,
erlang:setelement(
10,
erlang:setelement(
5,
Tuple,
some_fun()
),
another_fun(),
final_fun()
) In this case it becomes this: A1 = some_fun(),
B1 = erlang:setelement(5, Tuple, A1),
B2 = another_fun(),
C1 = erlang:setelement(10, B1, B2),
C2 = final_fun(),
erlang:setelement(1, C1, C2). As we can see, we can't merge those operations because we have functions calls in the middle of setelement. However, @michalmuskala suggested for us to split the data validation from the data updating, in order to be able to merge this. So we could rewrite the code above to: A1 = some_fun(),
assert_tuple_size(Tuple, 5),
B2 = another_fun(),
assert_tuple_size(Tuple, 10),
C2 = final_fun(),
%% assert_tuple_size(Tuple, 1) - no need to assert as we already checked for 10
update_tuple_elems(Tuple, 10 B2, 5 A1, 1 C2). It feels like this could work. However I don't want to bite more than I can chew :) so I suggest to go ahead with the earlier proposal and we can consider this once that is in place. |
Sorry @bjorng, we were writing at the same time. I think we got to similar conclusions through different means? I think we can make it so We could enable those optimizations only if the assertions are already there but I would instinctively prefer if we can apply the optimization in as many cases as possible. Do you see any downsides? PS: In Elixir we don't check the tuple but we could make it so. Thanks for asking! |
I don't have a strong opinion, but I think slightly would prefer to keep a
Only really the (lack of) traceability of explicit
OK. So automatically adding |
I see. For the single setelement call, converting a setelement/3 into a size assertion + update means we are doing more work. Question: couldn't we keep them as two separate operations in the compiler and merge them into a single one in the loader?
To me this is a downside. Especially because there is code that would be optimized today but it will no longer be optimized if we require an explicit size assertion.
I am not worried about the Elixir compiler in this case. :) I will be happy to change it to do whatever the Erlang compiler requires. I took a quick look at OTP and Elixir codebases and the number of setelement calls are very few. So we are mostly talking about records here and, since we control the code emitted by records, any of the options are likely fine. To summarize, we have three different approaches:
Anyway, I have enough information to move forward and we can resume this discussion once I have made a bit more progress. I will go with 1 or 2, depending on the implementation complexity and, once a PR is sent, we can discuss accordingly. Thanks! |
I would prefer 2 or 3. I am not sure that 3 is actually that more complex or harder to implement, but who knows what turns up when one actually starts implementing it...
Yes, an actual implementation will be useful to continue the discussion. I assume that you are aware of our ongoing work with the SSA intermediated format in https://github.com/bjorng/otp/tree/bjorn/compiler/ssa? Part of your implementation will have to change when we'll merge that branch (probably before the end of August or early September). |
Beautiful. Since we have about 1 year to get this in, I will wait for the SSA pass to be merged and then resume work. I will close this for now. Thanks! |
NOTE: This patch is wrong. Do not merge, see the first comment.
This patch optimizes tuple operations to not allocate new tuple
when an element is being replaced by the exact same value in memory.
Imagine this very common idiom:
where:
In many cases, we are not changing the value in
key
, howeverthe code prior to this patch would still allocate a new tuple.
This optimization changes this.
The cost of optimization is minimum, as it only adds a C array
access plus a pointer comparison. The major benefit is reducing
the GC pressure by avoiding allocating data.
We have only changed erlang:setelement/3. The benchmarks basically
create a tuple and perform the same operations, roughly 20000 times,
once replacing the key with the same value, and another with a
different value.
For a tuple with 4 elements, replacing the fourth element 20000 times
went from 557us to 388us.
For a tuple with 8 elements, replacing the fourth element 20000 times
went from 641us to 421us.