jit: Optimized code generation for select_val #3043

frej · 2021-02-08T13:51:35Z

Two patches optimizing the code generation for select_val:

Optimize the case when a select_val has exactly two choices which branch to the same label or a fail destination and where the values only differ by one bit.
The i_select_val_bins instruction has previously been implemented by a call to a global subroutine which does a binary search in a table. This patch changes the code generator to emit the binary search as code.

Optimize the case when a select_val has exactly two choices which branch to the same label or a fail destination and where the values only differ by one bit. The optimization makes use of the observation that (V == X || V == Y) is equivalent to (V | (X ^ Y)) == (X | Y) when (X ^ Y) has only one bit set, which allows the select_val to be implemented with only one compare. This optimization is unconditionally performed by both GCC and LLVM on x86_64. When this optimization is implemented in the code generator, the change in performance compared to the unmodified system varies depending on the probability distribution of the value the select_val operates on. For illustration, consider a {select_val, X, Fail, [{1,Dest}, {2, Dest}]}-instruction where the input only consists of ones. In this case a slowdown of 8% is apparent. If the same instruction is fed only twos, the slowdown decreases to around 3%. If the input is changed to be a 1/3 each of 1, 2, and 3, the performance change depends strongly on how well the branch predictor functions in the unmodified system. If the input is a repeated sequence i.e. (123)+ or 1+2+3+ the slowdown is between 3% and 4%. If the input is completely random, in which case the branch predictor cannot help things, the optimization provides a speedup of ~25%. For the larger benchmarks in ebench's `small` class, the only statistically significant performance changes are that `decode` is ~4% faster and `fib` is ~2.5% faster.

The i_select_val_bins instruction has previously been implemented by a call to a global subroutine which does a binary search in a table. This patch changes the code generator to emit the binary search as code. The changed code generation strategy provides varying speedups compared to the table search, depending on the number of choices and size of the values searched for, but is in no case slower than the legacy implementation. For i_select_val_bins-instructions with more than 11 and less than 33 choices, a speedup of more than 40% can be expected. For a larger number of choices (<= 4096), but where the largest key's tagged form fits in a 16 bit immediate, a speedup of between 40% to 65% can be expected. If the key values are too large to fit in a 16 bit immediate, performance falls off when the table gets larger. Peak performance at a speedup of more than 60% for small keys (tagged value fits in 16 bits) occurs at a table size of 512 elements, and falls off to 40% when the table contains 4096 keys. For large keys (the tagged value does not fit in 32 bits), peak performance occurs at a table size of 256 choices with a speedup of 70%, but performance falls off to 25% when the number of choices is increased to 4096. For small keys (tagged small integer fits in a 16 bit word), memory use is reduced by around 5%. With large keys (the tagged value does not fit in 32 bits), the size required for a i_select_val_bins instruction can increase by up to 10%.

garazdawi

lgtm

bjorng · 2021-02-17T12:17:43Z

Thanks for your pull request.

frej added 2 commits February 8, 2021 13:51

bjorng added enhancement team:VM Assigned to OTP team VM labels Feb 9, 2021

bjorng self-assigned this Feb 9, 2021

bjorng added the testing currently being tested, tag is used by OTP internal CI label Feb 9, 2021

bjorng changed the title ~~jit: Optimized code generaton for select_val~~ jit: Optimized code generation for select_val Feb 9, 2021

garazdawi approved these changes Feb 9, 2021

View reviewed changes

bjorng merged commit e22ec8b into erlang:master Feb 17, 2021

frej deleted the frej/select branch February 17, 2021 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jit: Optimized code generation for select_val #3043

jit: Optimized code generation for select_val #3043

frej commented Feb 8, 2021

garazdawi left a comment

bjorng commented Feb 17, 2021

jit: Optimized code generation for select_val #3043

jit: Optimized code generation for select_val #3043

Conversation

frej commented Feb 8, 2021

garazdawi left a comment

Choose a reason for hiding this comment

bjorng commented Feb 17, 2021