Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full representation of launch configurations + launch-with-full-config support #564

Closed
eyalroz opened this issue Dec 31, 2023 · 0 comments
Closed

Comments

@eyalroz
Copy link
Owner

eyalroz commented Dec 31, 2023

Since CUDA 12, the driver finally supports a proper launch configuration object, with a bunch of flags and features:

CUresult cuLaunchKernelEx (const CUlaunchConfig* config, CUfunction f, void** kernelParams, void** extra )

with the launch config being:

typedef struct CUlaunchConfig_st {
    CUlaunchAttribute * attrs
    unsigned int  blockDimX
    unsigned int  blockDimY
    unsigned int  blockDimZ
    unsigned int  gridDimX
    unsigned int  gridDimY
    unsigned int  gridDimZ
    CUstream hStream
    unsigned int  numAttrs
    unsigned int  sharedMemBytes 
} CUlaunchConfig;

Each attribute has an ID and a value in a union, and here is the current list of IDs:

CU_LAUNCH_ATTRIBUTE_IGNORE
CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW
CU_LAUNCH_ATTRIBUTE_COOPERATIVE
CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY
CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION
CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE
CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION
CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT
CU_LAUNCH_ATTRIBUTE_PRIORITY
CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP
CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN
CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT

some of these regard launch-related/scheduling-related events (which should be another missing-feature issue).

@eyalroz eyalroz added this to the Full CUDA 12 Support milestone Dec 31, 2023
@eyalroz eyalroz added the task label Dec 31, 2023
eyalroz added a commit that referenced this issue Jan 29, 2024
eyalroz added a commit that referenced this issue Jan 29, 2024
eyalroz added a commit that referenced this issue Jan 29, 2024
eyalroz added a commit that referenced this issue Jan 29, 2024
eyalroz added a commit that referenced this issue Feb 3, 2024
…upport

* Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters
* Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12
* New multi-wrapper-impl file for launch configurations - for mashalling launch attributes
eyalroz added a commit that referenced this issue Feb 5, 2024
…upport

* Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters
* Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12
* New multi-wrapper-impl file for launch configurations - for mashalling launch attributes
eyalroz added a commit that referenced this issue Feb 9, 2024
…upport

* Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters
* Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12
* New multi-wrapper-impl file for launch configurations - for mashalling launch attributes
eyalroz added a commit that referenced this issue Feb 9, 2024
* launch config <-> device validation now checks for block cooperation support when that's requested
* Refactored and re-located some of the launch config validation code
* Added: `device_t` method for checking block cooperation support
* Now properly validating grid dimensions to ensure we don't exceed the maxima
* Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls
* Comment and spacing tweaks
eyalroz added a commit that referenced this issue Feb 9, 2024
* launch config <-> device validation now checks for block cooperation support when that's requested
* Refactored and re-located some of the launch config validation code
* Added: `device_t` method for checking block cooperation support
* Now properly validating grid dimensions to ensure we don't exceed the maxima
* Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls
* Comment and spacing tweaks
eyalroz added a commit that referenced this issue Feb 10, 2024
* launch config <-> device validation now checks for block cooperation support when that's requested
* Refactored and re-located some of the launch config validation code
* Added: `device_t` method for checking block cooperation support
* Now properly validating grid dimensions to ensure we don't exceed the maxima
* Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls
* In the error_handling example - now making the faulty launch configuration device-specific, otherwise we don't apply the valid-block-size limit
* Comment and spacing tweaks
eyalroz added a commit that referenced this issue Feb 29, 2024
…upport

* Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters
* Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12
* New multi-wrapper-impl file for launch configurations - for mashalling launch attributes
eyalroz added a commit that referenced this issue Feb 29, 2024
* launch config <-> device validation now checks for block cooperation support when that's requested
* Refactored and re-located some of the launch config validation code
* Added: `device_t` method for checking block cooperation support
* Now properly validating grid dimensions to ensure we don't exceed the maxima
* Made sure the code paths inwards from the non-detail_ launching functions to the actual CUDA API calls all have appropriate validation calls
* In the error_handling example - now making the faulty launch configuration device-specific, otherwise we don't apply the valid-block-size limit
* Comment and spacing tweaks
eyalroz added a commit that referenced this issue Mar 1, 2024
…upport

* Now supporting CUDA-12-introduced launch attributes, including remote memory space, programmatic completion, launch completion events and clusters
* Avoiding a bit of code duplication in kernel launching (but an increase in duplication due to the unavailability of `cuLaunchKernelEx()` with attribute support before CUDA 12
* New multi-wrapper-impl file for launch configurations - for mashalling launch attributes
@eyalroz eyalroz closed this as completed in 5b0e27f Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant