max-sanchez Fixes %8-shift endian issue, PTX ASM for rotr32
Fixed an endian-related issue that was causing rotations to be performed incorrectly. Also added optimized PTX for rotr32. No (meaningful) effect on sm_32+, but sm_30 and below may benefit.

Have yet to find a block, may still not work.
Latest commit c2fcc23 Mar 11, 2016