Primarily stresses the NEON VABA instruction (using the arguments with all bits set), which seems to be the most power hungry one. Additionally stresses branches and unaligned LDR instructions with all the bells and whistles in use (conditional execution, shifted index and base register writeback). Implemented both 32-bit and 64-bit variants. The 32-bit variant is a little bit more power hungry because ARM64 does not support conditional execution and base register writeback for this particular variant of the LDR instruction. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>