-
Notifications
You must be signed in to change notification settings - Fork 37
Description
This is just a placeholder to discuss the idea of implementing smaller kernels.
A lot of pointers already exist related to this:
- Stefan's WIP PR WIP - Split kernels and more #242
- issue cuda graphs #12 about cuda graphs
- issue Cuda streams #11 about cuda streams
- issue "xxx" function interface: further separation of data access and calculations? #175 about the internal APIs
One of the main points towards using smaller kernels is the need to allow each ixx/oxxx and each ffv function to handle pointers to large buffers for many events and to do the indexing themselves. This is discussed in #175 (comment) for instance. Presently instead only the ixx/oxx functions are able to find an event in the input array, but then their output (and all inputs/outputs of the ffv functions) refer for CUDA to a single event. This is the first thing that must be changed to allow smaller kernels.
Metadata
Metadata
Assignees
Labels
No labels