Full representation of launch configurations + launch-with-full-config support #564

eyalroz · 2023-12-31T23:05:34Z

Since CUDA 12, the driver finally supports a proper launch configuration object, with a bunch of flags and features:

CUresult cuLaunchKernelEx (const CUlaunchConfig* config, CUfunction f, void** kernelParams, void** extra )

with the launch config being:

typedef struct CUlaunchConfig_st {
    CUlaunchAttribute * attrs
    unsigned int  blockDimX
    unsigned int  blockDimY
    unsigned int  blockDimZ
    unsigned int  gridDimX
    unsigned int  gridDimY
    unsigned int  gridDimZ
    CUstream hStream
    unsigned int  numAttrs
    unsigned int  sharedMemBytes 
} CUlaunchConfig;

Each attribute has an ID and a value in a union, and here is the current list of IDs:

CU_LAUNCH_ATTRIBUTE_IGNORE
CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW
CU_LAUNCH_ATTRIBUTE_COOPERATIVE
CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY
CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION
CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE
CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION
CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT
CU_LAUNCH_ATTRIBUTE_PRIORITY
CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP
CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN
CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT

some of these regard launch-related/scheduling-related events (which should be another missing-feature issue).

The text was updated successfully, but these errors were encountered:

…unch config attributes)

…upport * Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters * Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12 * New multi-wrapper-impl file for launch configurations - for mashalling launch attributes

* launch config <-> device validation now checks for block cooperation support when that's requested * Refactored and re-located some of the launch config validation code * Added: `device_t` method for checking block cooperation support * Now properly validating grid dimensions to ensure we don't exceed the maxima * Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls * Comment and spacing tweaks

* launch config <-> device validation now checks for block cooperation support when that's requested * Refactored and re-located some of the launch config validation code * Added: `device_t` method for checking block cooperation support * Now properly validating grid dimensions to ensure we don't exceed the maxima * Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls * In the error_handling example - now making the faulty launch configuration device-specific, otherwise we don't apply the valid-block-size limit * Comment and spacing tweaks

…upport * Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters * Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12 * New multi-wrapper-impl file for launch configurations - for mashalling launch attributes

* launch config <-> device validation now checks for block cooperation support when that's requested * Refactored and re-located some of the launch config validation code * Added: `device_t` method for checking block cooperation support * Now properly validating grid dimensions to ensure we don't exceed the maxima * Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls * In the error_handling example - now making the faulty launch configuration device-specific, otherwise we don't apply the valid-block-size limit * Comment and spacing tweaks

…upport * Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters * Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12 * New multi-wrapper-impl file for launch configurations - for mashalling launch attributes

eyalroz added this to the Full CUDA 12 Support milestone Dec 31, 2023

eyalroz added the task label Dec 31, 2023

eyalroz added a commit that referenced this issue Jan 29, 2024

Regards #564: WIP - representation of launch config struct fields (la…

860ea58

…unch config attributes)

eyalroz added a commit that referenced this issue Jan 29, 2024

Regards #564: WIP - representation of launch config struct fields (la…

d0dcaa9

…unch config attributes)

eyalroz added a commit that referenced this issue Jan 29, 2024

Regards #564: WIP - representation of launch config struct fields (la…

12ac341

…unch config attributes)

eyalroz added a commit that referenced this issue Jan 29, 2024

Regards #564: WIP - representation of launch config struct fields (la…

5f01d8c

…unch config attributes)

eyalroz added the resolved-on-development label Feb 9, 2024

eyalroz closed this as completed in 5b0e27f Mar 1, 2024

eyalroz mentioned this issue Mar 16, 2024

Support setting kernel block cluster dimensions #484

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full representation of launch configurations + launch-with-full-config support #564

Full representation of launch configurations + launch-with-full-config support #564

eyalroz commented Dec 31, 2023

Full representation of launch configurations + launch-with-full-config support #564

Full representation of launch configurations + launch-with-full-config support #564

Comments

eyalroz commented Dec 31, 2023