Numpy's types aren't doing much beside eating CPU
and doing a simple wraparound, which can be done via a simple modulo.
This commit also unroll a couple of loops.