New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing jit #220

Open
rurban opened this Issue Nov 11, 2016 · 1 comment

Comments

Projects
1 participant
@rurban
Member

rurban commented Nov 11, 2016

The jit will be a tracing jit, not a method jit.
A tracing jit is slower, much more complex and needs more profiling state,
but needs much less memory, esp. on such dynamic apps with lot of dead code never executed.
We want to trace calls and loops, similar to v8. For us memory is more important than performance.
A perl5 jit has not much benefits, as the ops are way too dynamic, so we mostly just win with the icache, having the op calls aligned one after another in memory, and not jumping to random heap locations. With more and more type information a real jit, going into the ops would be worthwhile. e.g. for typed native arrays or native arith.

dynasm is currently the easiest, as it allows more archs, moar already uses it. But you have to write your insn manually, not abstract as in libjit or asmjit. but it supports other more important abstraction, like types and slots, ...
one nice thing would be to replace the dynasm.lua preprocessor with a simple perl script. this would need max 2 days.

See e.g. https://github.com/imasahiro/rujit/ for a memory hungry tracing jit, 2-3x faster.
We will always have the fallback to use the huge llvm jit, but I'm sceptical that it's fast enough with its overhead. Testing it at first, as libcperl.bc can be imported and used for LTO and inlining.
The experimental guile tracing jit nash looks better: https://github.com/8c6794b6/guile-tjit-documentation/blob/master/nash.pdf

We also need the jit for the ffi, so we can omit libffi, and just go with the jit.

@rurban rurban self-assigned this Nov 11, 2016

@rurban rurban added the enhancement label Nov 11, 2016

@rurban rurban changed the title from jit to tracing jit Nov 11, 2016

rurban pushed a commit that referenced this issue Nov 30, 2016

Reini Urban
use utf8 Script - declare unicode mixed script confusables
Document the new unicode mixed script confusable security
restriction. Declare valid unicode scripts via use utf8 arguments.
This bug was introduced with 5.16.

See #220.
@rurban

This comment has been minimized.

Show comment
Hide comment
@rurban

rurban Jan 15, 2017

Member

But first we will start with a very simple method jit in LLVM, to benchmark the cost/benefit ratio for the simple

PL_op = Perl_pp_enter(aTHX);
PL_op = Perl_pp_nextstate(aTHX);
...
PL_op = Perl_pp_leave(aTHX);

linearization, and do the simple and easiest op optimizations at first. esp. nextstate which is currently the most costly op, esp. unneeded stack reset on every single line. The jit knows the stack depth for most simple ops, and can easily bypass that (#18). The jit also knows about locals and tainted vars.

Then we can start counting calls and loops, and switch between the jit and bytecode runloop, if beneficial. The question is if the LLVM optimizer can inline the ops, or if it needs the IR of it. e.g. unladen_swallow needed to compile a complete libpython.bc runtime, and still needed a huge and slow LLVM abstraction library to emit the IR.

See the feature/gh220-llvmjit branch.

Member

rurban commented Jan 15, 2017

But first we will start with a very simple method jit in LLVM, to benchmark the cost/benefit ratio for the simple

PL_op = Perl_pp_enter(aTHX);
PL_op = Perl_pp_nextstate(aTHX);
...
PL_op = Perl_pp_leave(aTHX);

linearization, and do the simple and easiest op optimizations at first. esp. nextstate which is currently the most costly op, esp. unneeded stack reset on every single line. The jit knows the stack depth for most simple ops, and can easily bypass that (#18). The jit also knows about locals and tainted vars.

Then we can start counting calls and loops, and switch between the jit and bytecode runloop, if beneficial. The question is if the LLVM optimizer can inline the ops, or if it needs the IR of it. e.g. unladen_swallow needed to compile a complete libpython.bc runtime, and still needed a huge and slow LLVM abstraction library to emit the IR.

See the feature/gh220-llvmjit branch.

@rurban rurban added this to To Do in cperl Dec 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment