-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
The movw and movt instructions were introduced in ARMv7 to improve the performance of immediate generation.
An instruction sequence like
movw r0, #0xabcd
movt r0, #0xefff
can be retired in a single cycle by some of the more modern Cortex-A chips (in both the 32- and 64-bit variety.)
The ARMv8 equivalent (for 0xb01dfacedebac1e0) would be
movz x0, #0xc1e0
movk x0, #0xdeba, lsl #16
movk x0, #0xface, lsl #32
movk x0, #0xb01d, lsl #48
and can be retired in as little as one or two cycles as well (but this time at a small code size penalty.) Keep in mind that arm loads, particularly in ARMv8, have more than single-cycle result latencies, so this sequence of movs is likely faster even without the hardware optimization if the immediate is used... immediately.
I'd also expect there to be a small but significant improvement in L1 dcache misses, since this would get the dcache out of the equation entirely.
I'm not familiar enough with the toolchain to know whether or not obj would get grumpy if MOVW compiled to a different number of instructions depending on the immediate.