-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Lately while hacking Snabb Switch I am spending a lot of time getting familiar with two mysterious technologies: trace-based just-in-time compilers and the latest Intel CPU microarchitectures.
Each one is complex enough to make your head hurt. Is it madness to have to contend with both at the same time? Maybe. However, I am starting to see symmetry and to enjoy thinking about them both in combination rather than separately in isolation.
Tracing JITs
Tracing just-in-time compilers work by creating chunks of code ("traces") with peculiar characteristics (slightly simplified):
- Flat: every single function call inlined.
- No branches.
- One loop.
CPUs can execute code blindingly fast while it is "on trace": that is, when you can keep the CPU running on one such block of code for a significant amount of time e.g. 100 nanoseconds. The trace compiler can make a whole new class of optimizations because it knows exactly which instructions will execute and exactly how control will flow.
Code runs slower when it does not stay on-trace. This extremely specialized code generation is less effective when several traces have to be patched together. So there is a major benefit to be had from keeping the trace compiler happy -- and a penalty to be paid when you do something to piss it off.
I want to have a really strong mental model of how code is compiled to traces. I am slowly getting there: I have even caught myself writing C code as if it were going to be trace compiled (which frankly would be very handy). However, this is a long journey, and in the meantime some of the optimization techniques are really surprising.
Consider these optimization tips:
- Avoid nested loops.
- Avoid lots of tiny loops.
- Avoid loops with unpredictable iteration counts.
- Avoid unpredictable branches.
Extreme, right? I mean, what is the point of having an if statement at all if the code is only allowed to take one of the alternatives? And when did loops, one of the most basic concepts in the history of computing, suddenly become taboo?
On the face of it you might think that Tracing JITs are an anomoly that will soon disappear, like programming in a straight jacket. Then you would go back to your favourite static compiler or method-based JIT and use all the loops and branches that you damned well please.
Intel microarchitecture
Here is the rub: Modern CPUs also have a long do-and-don't list for maximizing performance at the machine code level. This sounds bad because if you are already stretching your brain to make the JIT happy then the last thing you want is another set of complex rules to follow. However, in practice the demands of the JIT and the CPU seem to be surprisingly well aligned, and thinking about satisfying one actually helps you to to satisfy the other.
Here are a few rules from the Intel Optimization Reference Manual for Haswell that seem to be on point:
- Arrange code to make basic blocks contiguous and eliminate unnecessary branches.
- Avoid the use of conditional branches inside loops and consider using SSE instructions to eliminate branches.
- Favor inlining small functions that contain branches with poor prediction rates. If a branch misprediction results in a RETURN being prematurely predicted as taken, a performance penalty may be incurred.
There is even a hardware trace cache in the CPU that attempts to do some of the same optimizations as a software tracing JIT to improve performance.
So what does it all mean? I don't know for sure yet but I am really enjoying thinking it through.
I like to think that effort spent on making the JIT happy is also making the CPU happy. Then with a happy CPU we can better reap the benefits of mechanical sympathy and achieve seemingly impossible performance for more applications. Sure, a trace compiler takes some effort to please, but it is a lot more helpful and transparent than dealing with the CPU directly.
In any case tracing JITs and modern CPU microarchitectures are both extremely interesting technologies and the study of one does stimulate a lot of interesting ideas about the other.