-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Feature or enhancement
Proposal:
To maximize traces executed, we should also trace from RESUME
in the JIT.
Some things I learnt along the way of implementing this:
-
We want a threshold for functions more than
JUMP_BACKWARDS
. Thanks to correspondence from CF Bolz-Tereick, I learnt that PyPy sets function warmup to be 60% higher than loop warmup. IIRC, Luajit sets it to 2x. I chose 2x. This number could use further investigation. -
We want to disable the check for recursion, and let the
first_instr == instr
/trace stack overflow/underflow/out of space handle it. This will allow us to automatically transform recursive functions into an iterative-like form. For example, this is recursive fibonacci on my branch (look at how nice it is!):
-
We want to avoid compiling short trunk/root traces that don't complete in a loop. The idea is that entering/exiting JIT code is expensive. I chose a trace length of 100 based on some benchmarking I did. This number could use further investigation. This doesn't apply to side/branch traces because they would be coming from other jitted code, as there is no penalty involved entering them (other than a jump).
-
We need a significantly more exponential backoff than what our current scheme does for function entry. The cost of function entry optimization attempt is high enough that it shows up as a 6% slowdown in
bm_coroutines
. Thus I'm only making it try-once for now. -
We need to trace into function executors too, to avoid shortening the length of loop traces.
Why am I bundling all these in a single PR instead of separating them out for benchmarking? Well without any of the above, RESUME tracing becomes ineffective in some pathological case, and may cause an overall slowdown! We need all of them at the same time for speedups to show!
Preliminary benchmarks on my computer: https://gist.github.com/Fidget-Spinner/8a972a8bcac52d0cf25249564e12d762
All these are partly thanks to Mark's implementation of graphviz executors. Without them I would've never learnt these. I figured these points out by manually inspecting the traces of many benchmarks.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response