Skip to content

JIT: Minor optimizations #6298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 16, 2022
Merged

Conversation

bjorng
Copy link
Contributor

@bjorng bjorng commented Sep 14, 2022

This PR improves the code generation of the JIT in several ways.

The most important is eliminating reading from a BEAM register written by the previous BEAM instructions if the contents is already available in a CPU register.

For example, consider:

foo(<<X:16>>) ->
    (X band 7) bor 1.

The BEAM code for the expression body looks like this:

    {gc_bif,'band',
            {f,0},
            3,
            [{tr,{x,2},{t_integer,{0,65535}}},{integer,7}],
            {x,0}}.
    {gc_bif,'bor',{f,0},1,[{tr,{x,0},{t_integer,{0,7}}},{integer,1}],{x,0}}.

The result of the band operand is written X-register 0 ({x,0}). The bor operation then reads X-register 0.

The JIT in Erlang/OTP 25 generates the following code:

# i_band_ssjd
    mov rsi, qword ptr [rbx+16]
    mov eax, 127
    and rax, rsi
    mov qword ptr [rbx], rax
# i_bor_jssd
    mov rsi, qword ptr [rbx]
    mov eax, 31
    or rax, rsi
    mov qword ptr [rbx], rax

Here we can see that the last instruction of band stores the result to memory (mov qword ptr [rbx], rax). The very first thing bor does is to read back the value (albeit to another CPU register).

The new optimization will rewrite the first instruction of bor to get the value from the rax register:

# i_bor_jssd
    mov rsi, rax
    mov eax, 31
    or rax, rsi
    mov qword ptr [rbx], rax

But that is not actually the code that will be emitted because there is another optimization in this pull request that avoids loading a small integer operand into a register for logical operations. Here is the actual code for the two BEAM instructions:

# i_band_ssjd
    and eax, 127
    mov qword ptr [rbx], rax
# i_bor_jssd
    or rax, 31
    mov qword ptr [rbx], rax

As it happened, the result of the instruction before band (not shown) was available in the rax register. Since one operand for band is the integer 127, it can be used as an immediate operand for the and instruction. Similarly, the bor can also be reduced to two instructions because the result from band happened to be available in exactly the register it needed to be.

@bjorng bjorng added team:VM Assigned to OTP team VM enhancement testing currently being tested, tag is used by OTP internal CI labels Sep 14, 2022
@bjorng bjorng self-assigned this Sep 14, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 14, 2022

CT Test Results

       3 files     132 suites   47m 27s ⏱️
1 496 tests 1 445 ✔️ 51 💤 0
1 887 runs  1 818 ✔️ 69 💤 0

Results for commit a4e80bd.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

Copy link
Contributor

@garazdawi garazdawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

I wonder what the overhead of the comment function and its arguments are for when we actually don't emit any log... maybe the compiler is smart enough to optimize it away.

@bjorng bjorng force-pushed the bjorn/jit/minor-optimizations branch from 47f762f to a4e80bd Compare September 15, 2022 06:10
@bjorng
Copy link
Contributor Author

bjorng commented Sep 15, 2022

The compiler will not optimize away the call to ubif2mfa() because it is in another compilation unit and it can't know whether the call has any side effects.

I have updated the annotation code to test whether logging is enabled before doing anything potentially expensive. I sneaked in a few other improvements and simplifications.

bjorng and others added 7 commits September 16, 2022 10:06
Don't refetch a BEAM register whose contents already happen to be in a
CPU register (left there from the previous instruction).

This optimization makes most difference for the x86_64 JIT, where it
is applied about 57,000 times in the Erlang/OTP code base.

The AArch64 JIT keeps the 6 lowest-numbered X registers in CPU
registers, but the optimization is still applied about 10,000 times in
the Erlang/OTP code base.

Co-authored-by: John Högberg <john@erlang.org>
* Avoid some register shuffling for the `+` and `-` operators.

* Using instructions with immediate operands for the bitwise operators
  to eliminate `mov` instructions.
The TEST instruction with byte-size operands is shorter than
the TEST instruction with four-byte operands.
@bjorng bjorng force-pushed the bjorn/jit/minor-optimizations branch from a4e80bd to 78b4d1d Compare September 16, 2022 08:07
@bjorng bjorng merged commit 32006e4 into erlang:master Sep 16, 2022
@bjorng bjorng deleted the bjorn/jit/minor-optimizations branch September 16, 2022 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants