I understand that better performance can be achieved by expanding under 128B, but since 8B cases are already being handled, why hasn't 64B been implemented? Especially when dealing with small sizes, won't the performance directly degrade to the base version?
Or did I miss something?
Thanks!