Some critical pieces of runtime package extensively use atomic operations (e.g.
runtime.lock/unlock, semaphores, scheduler, memory allocator, GC). It can make sense to
add compiler intrinsics for atomic memory accesses -- ATOMIC_LOAD/STORE/XADD/XCHG/CAS
similar to PREFETCH intrinsic. The intrinsics should generate inline code and do not
affect register allocator as bad as function calls.
The implementation details are to be discussed.