-
Notifications
You must be signed in to change notification settings - Fork 3k
JIT: Minor optimizations #6298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Minor optimizations #6298
Conversation
CT Test Results 3 files 132 suites 47m 27s ⏱️ Results for commit a4e80bd. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
I wonder what the overhead of the comment
function and its arguments are for when we actually don't emit any log... maybe the compiler is smart enough to optimize it away.
47f762f
to
a4e80bd
Compare
The compiler will not optimize away the call to I have updated the annotation code to test whether logging is enabled before doing anything potentially expensive. I sneaked in a few other improvements and simplifications. |
Don't refetch a BEAM register whose contents already happen to be in a CPU register (left there from the previous instruction). This optimization makes most difference for the x86_64 JIT, where it is applied about 57,000 times in the Erlang/OTP code base. The AArch64 JIT keeps the 6 lowest-numbered X registers in CPU registers, but the optimization is still applied about 10,000 times in the Erlang/OTP code base. Co-authored-by: John Högberg <john@erlang.org>
* Avoid some register shuffling for the `+` and `-` operators. * Using instructions with immediate operands for the bitwise operators to eliminate `mov` instructions.
The TEST instruction with byte-size operands is shorter than the TEST instruction with four-byte operands.
a4e80bd
to
78b4d1d
Compare
This PR improves the code generation of the JIT in several ways.
The most important is eliminating reading from a BEAM register written by the previous BEAM instructions if the contents is already available in a CPU register.
For example, consider:
The BEAM code for the expression body looks like this:
The result of the
band
operand is written X-register 0 ({x,0}
). Thebor
operation then reads X-register 0.The JIT in Erlang/OTP 25 generates the following code:
Here we can see that the last instruction of
band
stores the result to memory (mov qword ptr [rbx], rax
). The very first thingbor
does is to read back the value (albeit to another CPU register).The new optimization will rewrite the first instruction of
bor
to get the value from therax
register:But that is not actually the code that will be emitted because there is another optimization in this pull request that avoids loading a small integer operand into a register for logical operations. Here is the actual code for the two BEAM instructions:
As it happened, the result of the instruction before
band
(not shown) was available in therax
register. Since one operand forband
is the integer 127, it can be used as an immediate operand for theand
instruction. Similarly, thebor
can also be reduced to two instructions because the result fromband
happened to be available in exactly the register it needed to be.