Implement BeamAsm - a JIT for Erlang/OTP #2745

garazdawi · 2020-09-11T12:02:47Z

This PR introduces BeamAsm, a JIT compiler for the Erlang VM.

Implementation

BeamAsm provides load-time conversion of Erlang beam instructions into native code on x86-64. This allows the loader to eliminate any instruction dispatching overhead and also specialize each instruction on their argument types.

BeamAsm does not do any cross instruction optimizations and the x and y register arrays work the same as when interpreting beam instructions. This allows the erlang run-time system to be largely unchanged except for places that need to work with loaded beam instructions like code loading, tracing, and a few others.

BeamAsm uses asmjit to generate native code in run-time. Only small parts of the Assembler API of asmjit is used. At the moment asmjit only supports x86 32/64 bit assembler, but work is ongoing to also support arm 64-bit.

For a more lengthy description of how the implementation works, you can view the internal documentation of BeamAsm.

Performance

How much faster is BeamAsm than the interpreter? That will depend a lot on what your application is doing.

For example, the number of Estones as computed by the estone benchmark suite becomes about 50% larger, meaning about 50% more work can be done during the same time period. Individual benchmarks within the estone benchmark suite vary from a 170% increase (pattern matching) to no change at all (huge messages). So, not surprising, computation heavy workload can show quite a large gain, while communication heavy workloads remain about the same.

If we run the JSON benchmarks found in the Poison or Jason, BeamAsm achieves anything from 30% to 130% increase (average at about 70%) in the number of iterations per second for all Erlang/Elixir implementations. For some benchmarks, BeamAsm is even faster than the pure C implementation jiffy.

More complex applications tend to see a more moderate performance increase, for instance, RabbitMQ is able to handle 30% to 50% more messages per second depending on the scenario.

Profiling/Debugging

One of the great things about executing native code is that some of the utilities used to profile C/C++/Rust/go can be used to profile Erlang code. For instance, this is what a run of perf on Linux can look like:

There are more details in the internal documentation of BeamAsm on how to achieve this.

Drawbacks

Loading native code uses more memory. We expect the loaded code to be about 10% larger when using BeamAsm than when using the interpreter.

This PR includes a major rewrite of how the Erlang code loader works. The new loader does not include HiPE support, which means that it will not be possible to run HiPE compiled code in OTP-24.

We are still looking for anyone that wants to maintain HiPE so that it can continue to push the boundary on what high-performance erlang looks like.

Try it out!

We are looking for any feedback you can provide about the functionality and performance of BeamAsm. To compile it you need a relatively modern C++ compiler and an operating system that allows memory to be executable and writable at the same time (which is most OSs, except OpenBSD).

If you are on windows you can download installers here:

Note that these are built using our internal nightly tests, so contains more changes than what this PR includes.

erts/configure.in

nox · 2020-09-11T13:46:54Z

Congratulations, can't believe this is happening! Now RIIR it, jk.

erts/emulator/internal_doc/BeamAsm.md

erts/emulator/internal_doc/beam_makeops.md

lib/compiler/src/compile.erl

erts/emulator/asmjit/core/features.h

isubasinghe · 2020-09-12T02:47:26Z

Wow thanks for this work, this is awesome to see

gilbertwong96 · 2020-09-12T08:02:31Z

Awesome！ 👍

dqzjh0319 · 2020-09-12T08:04:43Z

Just can't wait for the release and have a test on my own, thanks for the great work.

tschuett · 2020-09-12T10:26:06Z

Just curious, why asmjit instead of LLVM? Smaller dependency?
Awesome work btw.

garazdawi · 2020-09-12T12:42:46Z

LLVM is much slower at generating code when compared to asmjit. LLVM can do a lot more, but it's main purpose is not to be a JIT compiler.

With asmjit we get full control over all the register allocation and can do a lot of simplifications when generating code.

On the downside we don't get any of LLVMs built-in optimizations.

We also considered using dynasm, but found the tooling that comes with asmjit to be better.

tschuett · 2020-09-12T13:00:43Z

My bad. I was referring to ORCJIT:
https://llvm.org/docs/ORCv2.html

garazdawi · 2020-09-12T13:56:00Z

So was I. LLVM is still too slow for what we want to do here. We used llvm in other JIT attempts, but we always had issues with the time it takes for the compiler to run. The sea of nodes approach that llvm ir uses just can't be as fast as we need it to be, even if you disable all of the optimizations. Maybe if you emit machine instructions instead of llvm ir, but even then I'm doubtful.

Also, as you mention the size of the dependency does play in. Adding 10s of megabytes and having to support any faults done by llvm is a huge undertaking.

tschuett · 2020-09-12T15:44:51Z

This is great work you did. There is no doubt.

But honestly, my Erlang VMs run for hours. I don't really care about startup time (https://github.com/scalaris-team/scalaris).

SisMaker · 2020-09-13T05:22:02Z

very cool

erts/configure.in

nox · 2020-09-13T12:40:46Z

Sorry for the side-tracking, as I could look into that myself, but I have a small question: is the implementation modular, done in a way where we could later experiment with different JIT backends, such as Rust's cranelift?

make/run_make.mk

jhogberg · 2020-09-13T14:41:17Z

But honestly, my Erlang VMs run for hours. I don't really care about startup time (https://github.com/scalaris-team/scalaris).

Many people do care and not just those whose instances are short-lived. We don't want to make that trade-off at the moment.

Sorry for the side-tracking, as I could look into that myself, but I have a small question: is the implementation modular, done in a way where we could later experiment with different JIT backends, such as Rust's cranelift?

Yep, it was one of our design goals. Changing to a different assembler is pretty straightforward and going down the IR route shouldn't be too difficult either, albeit tedious as you need to re-implement every instruction.

nox · 2020-09-13T15:19:07Z

Yep, it was one of our design goals. Changing to a different assembler is pretty straightforward and going down the IR route shouldn't be too difficult either, albeit tedious as you need to re-implement every instruction.

Super cool to know! And I'm super happy to see progress in that part of the Erlang VM. Now if only Ericsson was hiring remote… 😎

tschuett · 2020-09-13T18:08:52Z

But honestly, my Erlang VMs run for hours. I don't really care about startup time (https://github.com/scalaris-team/scalaris).

Many people do care and not just those whose instances are short-lived. We don't want to make that trade-off at the moment.

Please don't get me wrong. This is no criticism. I am REALLY happy to see this coming.

joaohf · 2020-09-14T19:16:03Z

Hello

First of all great work!

I'm the maintainer the meta-erlang project. I'm testing this PR using a set of qemu machines and checking if I can run common erlang applications that meta-erlang supports (like tsung, yaws, emq, rabbitmq, ..). That is time consuming, but I will post here my results once I get there.

Thanks

garazdawi · 2020-09-16T09:20:43Z

New changes have been force pushed.

I've renamed the FLAVOR of the JIT to be jit instead of asm. Also I've expanded the internal documentation to address some of the questions and problems that I've heard about.

I've also added the function erlang:system_info(emu_flavor) that will return jit if the current emulator is a jit emulator.

The branch is also rebased into the current latest master branch.

garazdawi · 2020-09-16T09:21:37Z

I've added a description of the files involved in the jit for those interested.

dominicletz · 2020-09-16T22:21:46Z

Thanks @garazdawi and team this is awesome! Some feedback from trying:

It's super easy to get running on linux ❤️

kerl build git https://github.com/garazdawi/otp.git beamasm 24.jit
kerl install 24.jit 24.jit
. ./24.jit/activate

I've run some tests and system_information(emu_flavor) returns jit 👍
I couldn't really measure any performance differences though -- I've test run my pure beam sha256 implementation as I thought it must have gotten better but 😭

I feel like I've missed something obvious -- e.g. is there a threshold after which the jit will be activated? Can I somehow check whether the code executed is jitted for real?

Also let me know if I should provide this or more detailed feedback somewhere else. I'm happy to help or to dig into this more.

This is done in preperation from BeamAsm Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

We do this in order to align the C and C++ code to use similar formatting options when using clang-format.

Compiler enhancements in OTP 22 and later have rendered the `literal_type_tests/1` test case ineffective.

Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org> Co-authored-by: Björn Gustavsson <bjorn@erlang.org>

Co-authored-by: Lukas Larsson <lukas@erlang.org> Co-authored-by: Björn Gustavsson <bjorn@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

Co-authored-by: Lukas Larsson <lukas@erlang.org> Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

Using perf dump is superior to using perf map as we are able to use `perf annotate` which means we can view which x86 assembly instruction was using the most CPU. perf record -k mono erl +JPperf true perf inject --jit -i perf.data -o perf.jitted.data perf report -M intel -i perf.jitted.data The implementation was inspired from the mono repo: https://github.com/mono/mono/blob/master/mono/mini/mini-runtime.c It should be easy to add support for Erlang source file and line mapping if we want to do that.

garazdawi · 2020-09-22T06:36:40Z

Merged for release in OTP-24. Please continue testing and reporting issues either in this PR, erlang-questions or bugs.erlang.org.

We have a fix for the issue experienced by @dominicletz, there will be a new PR with that "soon".

Opening the PR as suggested here: erlang/otp#2745 (comment)

GeraldXv · 2020-11-19T10:26:54Z

@garazdawi JIT is awsome! I tried some bench tests, which achieved more than 50% perfomance increase!
While the fibonacci tests fails.

-module(fib).
-export([loop_test/1]).
loop_test(Num) ->
F = fun Loop(N) ->
            case N of
                0 -> ok;
                _ ->
                   fib(35),
                    Loop(N-1)
            end
        end,
 F((Num) ).
fib(X) when X <2 ->
    1;
fib(X) when X >= 2 ->
    fib(X-1) + fib(X-2).

Erlang/OTP 23 [erts-11.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.1 (abort with ^G)
1> c(fib).
{ok,fib}
2> timer:tc(fun() -> fib:loop_test(100) end).
{29970315,ok}
3> c(fib, [native]).
{ok,fib}
4> timer:tc(fun() -> fib:loop_test(100) end).
{8904689,ok}

Erlang/OTP 24 [DEVELOPMENT] [erts-11.1.3] [source-4981289bf8] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]

Eshell V11.1.3 (abort with ^G)
1> c(fib).
{ok,fib}
2> timer:tc(fun() -> fib:loop_test(100) end).
{15541481,ok}

OTP 23 ~= 30s， OTP 23(HIPE) ~= 9s, OTP 24(JIT) ~= 15.5s

Btw, the test result on go is 7 seconds.

peaceful-james · 2020-12-06T16:08:37Z

I've added a description of the files involved in the jit for those interested.

This link is broken JFYI

garazdawi · 2020-12-07T07:30:23Z

This link is broken JFYI

Should be fixed now.

heri16 · 2021-08-21T05:52:00Z

This is great work you did. There is no doubt.

But honestly, my Erlang VMs run for hours. I don't really care about startup time (https://github.com/scalaris-team/scalaris).

What happened to scalaris by the way? Looks cool but no new release for over 5 years. Is it dead?

garazdawi changed the title ~~Beamasm~~ Implement BeamAsm - a JIT for Erlang/OTP Sep 11, 2020

ferd reviewed Sep 11, 2020

View reviewed changes

erts/configure.in Outdated Show resolved Hide resolved

bjorng added the team:VM Assigned to OTP team VM label Sep 11, 2020

ferd reviewed Sep 11, 2020

View reviewed changes

samuelpordeus reviewed Sep 11, 2020

View reviewed changes

erts/emulator/asmjit/core/features.h Show resolved Hide resolved

xueyuanl mentioned this pull request Sep 12, 2020

Daily Hacker News 12-09-2020 xueyuanl/daily-hackernews#11

Open

jj1bdx reviewed Sep 13, 2020

View reviewed changes

erts/configure.in Show resolved Hide resolved

josevalim reviewed Sep 13, 2020

View reviewed changes

make/run_make.mk Show resolved Hide resolved

bjorng added the testing currently being tested, tag is used by OTP internal CI label Sep 14, 2020

garazdawi removed the testing currently being tested, tag is used by OTP internal CI label Sep 14, 2020

garazdawi force-pushed the beamasm branch 2 times, most recently from 52f03e0 to 29aa848 Compare September 16, 2020 09:13

garazdawi and others added 9 commits September 21, 2020 16:40

erts: Rework loading of beam code

6169dec

This is done in preperation from BeamAsm Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

erts: Add asmjit version 358d944bf201

f36e3c4

Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

otp: Add clang_format rule based on LLVM style

24d6119

otp: Adapt .clang-format to our style

1d0cd4d

erts: Make erts indent C++ code to 4 spaces

51aec07

We do this in order to align the C and C++ code to use similar formatting options when using clang-format.

Test test instructions with literal operands properly

334bfdc

Compiler enhancements in OTP 22 and later have rendered the `literal_type_tests/1` test case ineffective.

code_SUITE: Extend tests of bad BEAM files

7cd43f0

Add missing erts_dlc_init on Windows

09a2226

break.c: Don't double-close crash dump file

e51cbb1

garazdawi force-pushed the beamasm branch from d1ce676 to 5f10cff Compare September 21, 2020 14:54

garazdawi and others added 7 commits September 22, 2020 07:51

erts: Implement the BeamAsm JIT

b165524

Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org> Co-authored-by: Björn Gustavsson <bjorn@erlang.org>

erts: Use alternate signal stack when HiPE or beamasm is enabled

76d711e

jit: Reserve space for native execution on the Erlang stack

11e7056

jit: Use the native stack as the Erlang stack

b3de21a

Co-authored-by: Lukas Larsson <lukas@erlang.org> Co-authored-by: Björn Gustavsson <bjorn@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

jit: Further polish and optimize BEAMASM

69bc845

Co-authored-by: Lukas Larsson <lukas@erlang.org> Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Dan Gudmundsson <dgud@erlang.org>

jit: Keep active code index in a register

65b4923

garazdawi force-pushed the beamasm branch from 5f10cff to 65b4923 Compare September 22, 2020 05:52

garazdawi merged commit a428457 into erlang:master Sep 22, 2020

garazdawi deleted the beamasm branch September 22, 2020 06:36

jhogberg mentioned this pull request Sep 22, 2020

jit: Allocate code within 2GB of the emulator when possible #2761

Closed

samuelpordeus added a commit to samuelpordeus/asmjit that referenced this pull request Sep 28, 2020

Fix typo on features.h

51ec73f

Opening the PR as suggested here: erlang/otp#2745 (comment)

samuelpordeus mentioned this pull request Sep 28, 2020

Fix typo on features.h asmjit/asmjit#303

Merged

gerhard mentioned this pull request Nov 11, 2020

Erlang 22 series support docker-library/rabbitmq#336

Closed

jhogberg mentioned this pull request May 21, 2021

jit: Add support for 64-bit ARM processors #4869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement BeamAsm - a JIT for Erlang/OTP #2745

Implement BeamAsm - a JIT for Erlang/OTP #2745

garazdawi commented Sep 11, 2020 •

edited

Loading

nox commented Sep 11, 2020

isubasinghe commented Sep 12, 2020

gilbertwong96 commented Sep 12, 2020

dqzjh0319 commented Sep 12, 2020

tschuett commented Sep 12, 2020 •

edited

Loading

garazdawi commented Sep 12, 2020

tschuett commented Sep 12, 2020 •

edited

Loading

garazdawi commented Sep 12, 2020

tschuett commented Sep 12, 2020 •

edited

Loading

SisMaker commented Sep 13, 2020

nox commented Sep 13, 2020

jhogberg commented Sep 13, 2020

nox commented Sep 13, 2020

tschuett commented Sep 13, 2020

joaohf commented Sep 14, 2020

garazdawi commented Sep 16, 2020

garazdawi commented Sep 16, 2020 •

edited

Loading

dominicletz commented Sep 16, 2020 •

edited

Loading

garazdawi commented Sep 22, 2020

GeraldXv commented Nov 19, 2020

peaceful-james commented Dec 6, 2020

garazdawi commented Dec 7, 2020

heri16 commented Aug 21, 2021

Implement BeamAsm - a JIT for Erlang/OTP #2745

Implement BeamAsm - a JIT for Erlang/OTP #2745

Conversation

garazdawi commented Sep 11, 2020 • edited Loading

Implementation

Performance

Profiling/Debugging

Drawbacks

Try it out!

nox commented Sep 11, 2020

isubasinghe commented Sep 12, 2020

gilbertwong96 commented Sep 12, 2020

dqzjh0319 commented Sep 12, 2020

tschuett commented Sep 12, 2020 • edited Loading

garazdawi commented Sep 12, 2020

tschuett commented Sep 12, 2020 • edited Loading

garazdawi commented Sep 12, 2020

tschuett commented Sep 12, 2020 • edited Loading

SisMaker commented Sep 13, 2020

nox commented Sep 13, 2020

jhogberg commented Sep 13, 2020

nox commented Sep 13, 2020

tschuett commented Sep 13, 2020

joaohf commented Sep 14, 2020

garazdawi commented Sep 16, 2020

garazdawi commented Sep 16, 2020 • edited Loading

dominicletz commented Sep 16, 2020 • edited Loading

garazdawi commented Sep 22, 2020

GeraldXv commented Nov 19, 2020

peaceful-james commented Dec 6, 2020

garazdawi commented Dec 7, 2020

heri16 commented Aug 21, 2021

garazdawi commented Sep 11, 2020 •

edited

Loading

tschuett commented Sep 12, 2020 •

edited

Loading

tschuett commented Sep 12, 2020 •

edited

Loading

tschuett commented Sep 12, 2020 •

edited

Loading

garazdawi commented Sep 16, 2020 •

edited

Loading

dominicletz commented Sep 16, 2020 •

edited

Loading