-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BeamAsm - a JIT for Erlang/OTP #2745
Conversation
Congratulations, can't believe this is happening! Now RIIR it, jk. |
Wow thanks for this work, this is awesome to see |
Awesome! 👍 |
Just can't wait for the release and have a test on my own, thanks for the great work. |
Just curious, why asmjit instead of LLVM? Smaller dependency? |
LLVM is much slower at generating code when compared to asmjit. LLVM can do a lot more, but it's main purpose is not to be a JIT compiler. With asmjit we get full control over all the register allocation and can do a lot of simplifications when generating code. On the downside we don't get any of LLVMs built-in optimizations. We also considered using dynasm, but found the tooling that comes with asmjit to be better. |
My bad. I was referring to ORCJIT: |
So was I. LLVM is still too slow for what we want to do here. We used llvm in other JIT attempts, but we always had issues with the time it takes for the compiler to run. The sea of nodes approach that llvm ir uses just can't be as fast as we need it to be, even if you disable all of the optimizations. Maybe if you emit machine instructions instead of llvm ir, but even then I'm doubtful. Also, as you mention the size of the dependency does play in. Adding 10s of megabytes and having to support any faults done by llvm is a huge undertaking. |
This is great work you did. There is no doubt. But honestly, my Erlang VMs run for hours. I don't really care about startup time (https://github.com/scalaris-team/scalaris). |
very cool |
Sorry for the side-tracking, as I could look into that myself, but I have a small question: is the implementation modular, done in a way where we could later experiment with different JIT backends, such as Rust's cranelift? |
Many people do care and not just those whose instances are short-lived. We don't want to make that trade-off at the moment.
Yep, it was one of our design goals. Changing to a different assembler is pretty straightforward and going down the IR route shouldn't be too difficult either, albeit tedious as you need to re-implement every instruction. |
Super cool to know! And I'm super happy to see progress in that part of the Erlang VM. Now if only Ericsson was hiring remote… 😎 |
Please don't get me wrong. This is no criticism. I am REALLY happy to see this coming. |
Hello First of all great work! I'm the maintainer the meta-erlang project. I'm testing this PR using a set of qemu machines and checking if I can run common erlang applications that meta-erlang supports (like tsung, yaws, emq, rabbitmq, ..). That is time consuming, but I will post here my results once I get there. Thanks |
52f03e0
to
29aa848
Compare
New changes have been force pushed. I've renamed the I've also added the function The branch is also rebased into the current latest master branch. |
I've added a description of the files involved in the jit for those interested. |
Thanks @garazdawi and team this is awesome! Some feedback from trying:
I feel like I've missed something obvious -- e.g. is there a threshold after which the jit will be activated? Can I somehow check whether the code executed is jitted for real? Also let me know if I should provide this or more detailed feedback somewhere else. I'm happy to help or to dig into this more. |
This is done in preperation from BeamAsm Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>
Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>
We do this in order to align the C and C++ code to use similar formatting options when using clang-format.
Compiler enhancements in OTP 22 and later have rendered the `literal_type_tests/1` test case ineffective.
Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org> Co-authored-by: Björn Gustavsson <bjorn@erlang.org>
Co-authored-by: Lukas Larsson <lukas@erlang.org> Co-authored-by: Björn Gustavsson <bjorn@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>
Co-authored-by: Lukas Larsson <lukas@erlang.org> Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>
Using perf dump is superior to using perf map as we are able to use `perf annotate` which means we can view which x86 assembly instruction was using the most CPU. perf record -k mono erl +JPperf true perf inject --jit -i perf.data -o perf.jitted.data perf report -M intel -i perf.jitted.data The implementation was inspired from the mono repo: https://github.com/mono/mono/blob/master/mono/mini/mini-runtime.c It should be easy to add support for Erlang source file and line mapping if we want to do that.
Merged for release in OTP-24. Please continue testing and reporting issues either in this PR, erlang-questions or bugs.erlang.org. We have a fix for the issue experienced by @dominicletz, there will be a new PR with that "soon". |
Opening the PR as suggested here: erlang/otp#2745 (comment)
@garazdawi JIT is awsome! I tried some bench tests, which achieved more than 50% perfomance increase!
OTP 23 ~= 30s, OTP 23(HIPE) ~= 9s, OTP 24(JIT) ~= 15.5s Btw, the test result on go is 7 seconds. |
This link is broken JFYI |
Should be fixed now. |
What happened to scalaris by the way? Looks cool but no new release for over 5 years. Is it dead? |
This PR introduces BeamAsm, a JIT compiler for the Erlang VM.
Implementation
BeamAsm provides load-time conversion of Erlang beam instructions into native code on x86-64. This allows the loader to eliminate any instruction dispatching overhead and also specialize each instruction on their argument types.
BeamAsm does not do any cross instruction optimizations and the x and y register arrays work the same as when interpreting beam instructions. This allows the erlang run-time system to be largely unchanged except for places that need to work with loaded beam instructions like code loading, tracing, and a few others.
BeamAsm uses asmjit to generate native code in run-time. Only small parts of the Assembler API of asmjit is used. At the moment asmjit only supports x86 32/64 bit assembler, but work is ongoing to also support arm 64-bit.
For a more lengthy description of how the implementation works, you can view the internal documentation of BeamAsm.
Performance
How much faster is BeamAsm than the interpreter? That will depend a lot on what your application is doing.
For example, the number of Estones as computed by the estone benchmark suite becomes about 50% larger, meaning about 50% more work can be done during the same time period. Individual benchmarks within the estone benchmark suite vary from a 170% increase (pattern matching) to no change at all (huge messages). So, not surprising, computation heavy workload can show quite a large gain, while communication heavy workloads remain about the same.
If we run the JSON benchmarks found in the Poison or Jason, BeamAsm achieves anything from 30% to 130% increase (average at about 70%) in the number of iterations per second for all Erlang/Elixir implementations. For some benchmarks, BeamAsm is even faster than the pure C implementation jiffy.
More complex applications tend to see a more moderate performance increase, for instance, RabbitMQ is able to handle 30% to 50% more messages per second depending on the scenario.
Profiling/Debugging
One of the great things about executing native code is that some of the utilities used to profile C/C++/Rust/go can be used to profile Erlang code. For instance, this is what a run of
perf
on Linux can look like:There are more details in the internal documentation of BeamAsm on how to achieve this.
Drawbacks
Loading native code uses more memory. We expect the loaded code to be about 10% larger when using BeamAsm than when using the interpreter.
This PR includes a major rewrite of how the Erlang code loader works. The new loader does not include HiPE support, which means that it will not be possible to run HiPE compiled code in OTP-24.
We are still looking for anyone that wants to maintain HiPE so that it can continue to push the boundary on what high-performance erlang looks like.
Try it out!
We are looking for any feedback you can provide about the functionality and performance of BeamAsm. To compile it you need a relatively modern C++ compiler and an operating system that allows memory to be executable and writable at the same time (which is most OSs, except OpenBSD).
If you are on windows you can download installers here:
Note that these are built using our internal nightly tests, so contains more changes than what this PR includes.