You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@neoblizz suggests we do a persistent-thread version of operators. CUDA has increasing support for persistent thread programming and cooperative-groups has good synchronization capabilities, and we can reasonably expect this support will improve in future hw/sw. The specific benefit is that if we use a PT model, we can achieve kernel fusion within a PT kernel between two operators.
@neoblizz notes that we should consider implementing alternate operators for our current operators that are implemented as PT operators at the device level. Programs would not need to change; instead a command-line switch could decide between a PT operator and the current non-PT operator.
The text was updated successfully, but these errors were encountered:
@neoblizz suggests we do a persistent-thread version of operators. CUDA has increasing support for persistent thread programming and cooperative-groups has good synchronization capabilities, and we can reasonably expect this support will improve in future hw/sw. The specific benefit is that if we use a PT model, we can achieve kernel fusion within a PT kernel between two operators.
@neoblizz notes that we should consider implementing alternate operators for our current operators that are implemented as PT operators at the device level. Programs would not need to change; instead a command-line switch could decide between a PT operator and the current non-PT operator.
The text was updated successfully, but these errors were encountered: