You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is me scribbling down a pair of optimization notes:
The first is that, if we peel off the last iteration of gemm/convolution/... reduction loops where we'll need to do masked loads (ex. because the K dimension isn't known to be evenly divisible by the reduction tile size), we won't need to do any masking of the loads except in that peeled iteration. This costs us code size, but might be a win in perf - someone should go try it.
The second note is that, if the mask condition is a relatively clean function of the wave/workgroup/... IDs, we should consider doing if (__unlikely(needs masking)) { slow path } else { fast path } fairly early during codegen so most parts of the kernel don't have to pay for inefficient loads at the cost of code size.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
This is me scribbling down a pair of optimization notes:
The first is that, if we peel off the last iteration of gemm/convolution/... reduction loops where we'll need to do masked loads (ex. because the K dimension isn't known to be evenly divisible by the reduction tile size), we won't need to do any masking of the loads except in that peeled iteration. This costs us code size, but might be a win in perf - someone should go try it.
The second note is that, if the mask condition is a relatively clean function of the wave/workgroup/... IDs, we should consider doing
if (__unlikely(needs masking)) { slow path } else { fast path }fairly early during codegen so most parts of the kernel don't have to pay for inefficient loads at the cost of code size.Beta Was this translation helpful? Give feedback.
All reactions