You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's currently almost no attempt to optimize performance for short inputs - e.g. tweaking the serialization factor or even the core algorithm to allow for better parallelizing cases in which we can fill entire blocks the way we would like to.
This would require scrutiny of the kernels, and no less importantly the launch configuration resolution code (both the general mechanism and the kernel-specific parts).
The text was updated successfully, but these errors were encountered:
There's currently almost no attempt to optimize performance for short inputs - e.g. tweaking the serialization factor or even the core algorithm to allow for better parallelizing cases in which we can fill entire blocks the way we would like to.
This would require scrutiny of the kernels, and no less importantly the launch configuration resolution code (both the general mechanism and the kernel-specific parts).
The text was updated successfully, but these errors were encountered: