Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT extension #8110

Closed
xorrvin opened this issue Feb 10, 2024 · 10 comments
Closed

JIT extension #8110

xorrvin opened this issue Feb 10, 2024 · 10 comments
Assignees
Labels
enhancement team:VM Assigned to OTP team VM

Comments

@xorrvin
Copy link

xorrvin commented Feb 10, 2024

Hi there,

I was wondering if you'll be ever considering any other JIT option apart from asmjit. The rationale:

  • asmjit still requires you to write assembler by hand in the VM;
  • miscellaneous architecture support. I've read the comments you didn't plan to add anything but amd64 initially, but people are already wondering about risc-v (Erlang RISC-V Architecture JIT Support #7498), and this list can only increase in the future. With asmjit (which's an amazing library without a doubt) you're limited to its architecture support, which is rather scarce. Given excellent OTP portability, it seems to me that having universal JIT would increase platform adoption much more.

So the question is would you consider implementing something like copy-and-patch JIT? For example, that's how they did it in Python:
python/cpython#113465 (first comment also contains links to the original paper, talk and slides)

I understand that changing JIT backend is by no means an easy task. For example, OTP has been historically running well on SPARC architecture, but the only option to increase performance was HiPe, which has been removed from upstream. With copy-and-patch, you would get beyond-HiPe levels of performance without the extra maintenance burden as of all heavy lifting is done by C compiler, and only very small portion of platform-specific code is required for the initial setup.

Thank you!

@okeuday
Copy link
Contributor

okeuday commented Feb 11, 2024

@95191334 A brief view of the history leading to the current JIT is at https://www.erlang.org/blog/the-road-to-the-jit/

@xorrvin
Copy link
Author

xorrvin commented Feb 11, 2024

@okeuday Thanks, I have read this before. However I still believe this chapter hasn't closed yet...

@kobalicek
Copy link

kobalicek commented Feb 11, 2024

As an author of AsmJit I'm really wondering how an alternative approach would look like, considering that a solution that used LLVM was abandoned for a reason? I'm gonna elaborate a little.

First a Little Bit of History

When I started AsmJit project the goal was to avoid an intermediate representation. At the time of writing the first lines (around 2007-2008) all projects that did JIT compilation had their own assemblers - either C or C++, many projects used forks of earlier projects, etc... In general, the approach at that time to write JIT was to write your own assembler, which is not fun anymore considering how complex today's processors are and how many instructions they support (both general purpose and SIMD). Stripping off these assemblers was possible, but since they only implemented features required for a single JIT, they were incomplete. That's why I decided to release AsmJit and used the phrase "Complete X86/X64 Assembler" initially to highlight the goals.

Intermediate Representation

Although I think that maybe 80% of common features can be abstracted into a simple IR, I think simple IR is not enough if you are serious about performance. For example AArch64 backend can take advantage of ARM's shifter in many instructions, and that's a very handy feature. So you can either make that part of your IR or have optimization passes that would optimize the existing code to use these features. And doing that is essentially like writing an optimizing compiler, which is a serious undertaking.

My opinion is that if you don't need IR, don't go for it.

Low Latency Compilation

Since Erlang's JIT compiler compiles everything on startup to make the JIT architecture simpler, what really matters is the performance of the compilation itself as every additional pass slows you down. For example AsmJit provides an optional register allocator (Compiler), which is pretty good and many people use it for simplicity, but when you use this tool it reduces the compilation throughput by more than 60% - and I would like to note here that AsmJit's register allocator is not an optimizing pass - it just scans the code and translates virtual registers to physical ones (and possibly inserts possible spills/loads). Optimization passes take much longer, possibly even orders of magnitude longer.

Which Architectures Actually Matter and Which Would Do In the Future?

Today, the whole market is dominated by X86_64 and AArch64 while RISC-V is slowly emerging. A lot of software is optimized for X86_64 and AArch64, because companies have actually built high-performance CPUs utilizing these ISAs. Market share of these is simply strong enough to cover the cost of the development (and for me the motivation to keep supporting these ISAs in AsmJit).

But is RISC-V today strong enough to make people actually optimize projects for it? I've already had a discussion about RISC-V support in AsmJit, however, if there are no parties willing to sponsor the effort it's a sign for me that companies are not interested that much at the moment, and I'm just going to wait until this changes.

BTW In this case I don't think there will be 100 more architectures in the future to cover. New architectures don't appear every day and generally it's very slow for the software to get optimized for new ISAs (this is usually about decades, not years). And what is interesting in this cases is also that software is also not commonly optimized even for the available extensions of existing ISAs.

Copy & Patch Compilation

You can do copy & patch with AsmJit too; and as a bonus you get code that you can actually debug. I don't personally like the approach used by Python JIT as I think it's not worth the few % they gained from it. And the biggest problem is that since you don't control the code generation, you cannot really aim higher. Compare that with a more complex approach taken by Facebook here:

Cinder uses AsmJit and actually optimizes the code, but that unfortunately requires much more than a copying existing pre-assembled pieces.

Conclusion

In general, I think that if anyone thinks that the current JIT architecture can be significantly improved by replacing a crucial library it would be great to propose an alternative and evaluate the cost. However, I think it could be difficult to find as AsmJit has a unique spot in JIT compilation universe - it doesn't compete with optimizing compilers such as LLVM, because its purpose is to offer a low-latency compilation instead, without abstractions, and without optimization passes that can make compilation an order of magnitude slower.

BTW I'm not discouraging anyone from trying to do things differently, but I'm pretty sure that it's not worth the effort considering that the current JIT is working great. Supporting more architectures directly in AsmJit would be much more beneficial for Erlang and all projects that use AsmJit.

@xorrvin
Copy link
Author

xorrvin commented Feb 11, 2024

@kobalicek Thanks for detailed answer! I'm in no way undermining your efforts and think that AsmJit is an amazing piece of technology. However, with AsmJit one is forced to operate in the model of the target architecture: you need to know specific assembly instructions, their layout, registry map and so on. Granted, it's nicely abstracted, but you still emit the code manually . With copy-and-patch, however, you write templates which are processed by an optimizing compiler before you even run your program. During JIT generation phase, you fetch the template and update the pointers and that's it. That's also pretty fast, but main focus here is on abstraction: instead of assembly, you write C and leave the rest to the compiler. Also, with AsmJit, once there's new architecture support (say, RISC-V), target application doesn't get that support "automagically": you still need to write hand-picked instructions for every opcode. Compare it with copy-and-patch, where you only need to add initial scaffolding for symbol relocations, and then you can reuse all your existing templates (because they're in C).

For example, check out this alternate Lua compiler which went even further and just generates whole JIT framework as a part of compilation step: https://sillycross.github.io/2022/11/22/2022-11-22/

To sum it up: copy-and-patch has higher overall effort/performance ratio than AsmJit, just because of this abstraction ability but for sure it would never be able to match absolute arch-specific performance AsmJit can give.

@kobalicek
Copy link

kobalicek commented Feb 11, 2024

To my knowledge the biggest problem of copy&patch in this case is that there is no possibility to further optimize the JIT code generated, because you don't control the code generation - nothing to improve once it's implemented.

Would it work? I think it would, but would it be better than what we have today? Here I'm skeptical...

But... The work I have been involved in for the past decade was all about architecture specific optimizations and SIMD, so I definitely see problems from a different perspective. And fortunately there is not that many architectures to optimize for, so architecture specific optimizations still pay off in my case :)

In case of Erlang I think it's all about the expected performance of the JIT and maintenance of it. The current JIT is simple, you have full control over the code generation, and you have AsmJit tooling, which I think is really helpful when writing JITs. But maybe the biggest advantage is that you can still improve the generated code if you want to, and I expect to see improvements in this area.

My conclusion is that any other solution would have to offer a better runtime performance, because regressing in this area is definitely not acceptable once you have raised the bar.

@xorrvin
Copy link
Author

xorrvin commented Feb 11, 2024

I understand. I wonder if OTP could be extend in such a way, that AsmJit is a primary JIT on supported arches, and copy-and-patch is used for all the rest, kind of multi-tier. Since my primary interest in this is non-x86 and non-arm arch support, interpreter is the only way to go currently (comparing to few versions ago when HiPe was present). But I can definitely imagine this approach being too much of a maintenance burden.

@bjorng bjorng added the team:VM Assigned to OTP team VM label Feb 12, 2024
@lin72h
Copy link

lin72h commented Feb 12, 2024

@kobalicek Thanks for your insightful reply, just curious what does asmjit compare to xbyak?

@kobalicek
Copy link

@lin72h I have never used xbyak myself, so I'm not sure I can compare these two clearly, but I can try. AsmJit as a project is structured differently. It's not a "single header library" for a single purpose, instead, AsmJit's goals are to provide a foundation that can be used to create tools on top of it, and most importantly, it tries to share this foundation across all backends.

Most built-in features that AsmJit provides are using that foundation. For example AsmJit's Logger is nothing more than a formatter that queries AsmJit's instruction database. AsmJit's Builder concept is nothing more than storing instruction representation in nodes, and finally AsmJit's Compiler is nothing more than Builder enhanced with register allocation. Basically, with AsmJit you can choose between different code generation abstractions and pick the one that is the most suitable for your job. And if picked wrongly at the beginning you can still change that later with a little effort as all the interfaces are pretty similar.

The foundation for all of these tools are the "compressed" instruction database that AsmJit provides and structs that define instructions and their operands. Instructions and operands can be created dynamically, processed, transformed, introspected, and passed to emitters such as Assembler, Builder, and Compiler.

I personally mostly use AsmJit's Compiler tool for code generation at the moment, because I got tired of allocating registers manually. Another feature I value is AsmJit's logging and error handling. From my experience it's only a matter of time you hit a crash or some other issue in JIT code, and logging and error handling is the first aid, an overview about the generated code.

BTW I have big plans with AsmJit and many ideas about improving it further. My only wish is that all the big companies that actually use it commercially hit the "sponsor" button on Github :)

@lin72h
Copy link

lin72h commented Feb 12, 2024

@kobalicek, thank you for the explanation. It makes me want to learn more about asmjit, and I'm looking forward to hearing about your ambitious future plans. Big companies should sponsor it and use it as a core technology in their stack.

@jhogberg jhogberg self-assigned this Feb 19, 2024
@jhogberg
Copy link
Contributor

Thanks for raising this issue! We're unlikely to do this for the reasons @kobalicek pointed out, and adding to that, the gains from experimenting with this approach in the past were quite small. It's just not worth the effort, I'm afraid. :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

6 participants