The big difference here is that OS X runs with a small-code PIC code model, which doesn't place code in the low 4G of RAM. So we need to switch everything to be PC-relative, and do some shenanigans to try to get a code allocation near our .text segment -- in particular, since the allocator just starts at the binary and works upwards, we need to make sure that we allocate the buffer *before* we do the 4G map for the image's address space.
Generate the pc-checking prologue for every frag, but store a pointer immediately after the check as the code buffer in the frag cache, causing us to skip it most of the time. Give bt_translate_and_run an additional 'exact' parameter, which indicates whether it should chain to the start of a cfrag, or to the PC-checking prologue, and add a new bt_continue_ic, which is like bt_continue_chain but unsets the 'exact' flag.
memory as we generate code.
arithmetic instructions. Everything else is still emulated.