You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For different reasons I don't want to make block size bigger than 128
At the same time it looks like unpack function could use 256 bit registers to make less loads, stores and instructions.
Am I wrong? Or such idea doesn't provide speedup.
Or it wasn't the purpose?
Maybe it's not good idea to mix different registers, I'm not sure.
But at least for block with bit width that even possible to use only 256 bit instructions/registers
This discussion was converted from issue #30 on July 13, 2023 18:28.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
For different reasons I don't want to make block size bigger than 128
At the same time it looks like unpack function could use 256 bit registers to make less loads, stores and instructions.
Am I wrong? Or such idea doesn't provide speedup.
Or it wasn't the purpose?
Maybe it's not good idea to mix different registers, I'm not sure.
But at least for block with bit width that even possible to use only 256 bit instructions/registers
Beta Was this translation helpful? Give feedback.
All reactions