Micro-optimize pop_lsb() for 64bit case
On Intel, perhaps due to 'lea' instruction this way of zeroing the lsb of *b seems faster than a shift+negate. On perft (where any speed difference is magnified) I got a 6% speed up on my Intel i5 64bit. Suggested by Hongzhi Cheng. No functional change.