by rayneolivetti:
Chapter 10 of Intel Architectures Optimization Reference Manual describes use cases of
several SSE4 instructions that can speed up various routines in strings, bytes, strconv
and runtime packages.
The instructions are missing from 6l/6a by the way.