-
Notifications
You must be signed in to change notification settings - Fork 808
[SYCL] Fix and enable bindless image copy tests on L0 #20096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix and enable bindless image copy tests on L0 #20096
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL RT changes LGTM.
NIT: It would be nice to have unit tests to check that we pass proper input_types depending on ext_oneapi_copy usage, but since there are e2e tests and we don't have existing unit tests for this functionality, it seems excessive for this pr.
- name: MEM_TO_MEM | ||
desc: "Memory to Memory" | ||
- name: IMAGE_TO_IMAGE | ||
desc: "Image handle to image handle" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I don't mind having a new enum, I do wonder if it would make sense to just fold it into ur__exp_image_copy_flags_t
. You could effectively do it in 4 bits:
Bit 0 => 0 if source is host, 1 if source is device
Bit 1 => 0 if destination is host, 1 if destination is device
Bit 2 => 0 if input is mem, 1 if input is image
Bit 3 => 0 if output is mem, 1 if output is image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed possible and that is exactly kind of feedback/direction/preference I'm looking for from @intel/unified-runtime-reviewers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the enum, it's going to be more readable both in code and in traces.
- name: MEM_TO_MEM | ||
desc: "Memory to Memory" | ||
- name: IMAGE_TO_IMAGE | ||
desc: "Image handle to image handle" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the enum, it's going to be more readable both in code and in traces.
…and-enable-bindless-image-copy-tests-on-l0
CUDA adapter failure wasn't present in the second to last commit and the last commit only tweaked E2E test. As such, I will proceed with merge (pending approval from @steffenlarsen)
|
|
The
sycl_ext_oneapi_bindless_images
extension specification documents multiple overloads for copying between bindless images depending on source and destination types (i.e. image mem handle to USM, USM to USM, etc.).At Unified Runtime level, there is a single function
urBindlessImagesImageCopyExp
which implements all kinds of copies. However, at lower levels such as CUDA and Level Zero there are several different APIs for different kinds of copies. To understand which one to call, a "copy direction" argument is passed.However, that "copy direction" information is useless for the L0 path, because L0 API is not based on a direction, but it is based on input types (mem handle vs USM). What we have today is tailored to CUDA and if we change copy directions to be tailored to L0, then CUDA breaks (because input types also matter there, but to lesser extent).
In order to fix all issues with bindless image copies and make them work across all adapters, a new argument is introduced into
urBindlessImagesImageCopyExp
to specify types of source and destination arguments. L0 UR adapter makes a decision about which API to call solely on this information, ignoring "copy direction".CUDA & HIP adapters are left as-is - they still use "copy direction" information and sometimes try to guess the type of input arguments to call the right API. With the new argument introduced by this patch, those adapters could be refactored to reduce amount of guessing, but that is outside of the scope of this PR.