-
Notifications
You must be signed in to change notification settings - Fork 808
Description
Describe the bug
The failure is discovered in #19819 while trying to make bindless image copies work on Intel GPUs:
FAIL: SYCL :: bindless_images/copies/copy_subregion_2D.cpp (876 of 1892)
******************** TEST 'SYCL :: bindless_images/copies/copy_subregion_2D.cpp' FAILED ********************
Exit Code: 1
Command Output (stdout):
--
# RUN: at line 6
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 6
env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
# executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
# .---command stdout------------
# | Result mismatch at index 36! Expected: 36, Actual: 0
# | copy_image_mem_handle_to_usm test failed
# | Result mismatch at index 36! Expected: 36, Actual: 0
# | copy_usm_to_image_mem_handle test failed
# | Result mismatch at index 36! Expected: 36, Actual: 0
# | copy_usm_to_usm test failed
# | Tests failed
# `-----------------------------
# error: command failed with exit status: 1
--
Default adapter for the platform is v1
and the test passes with it on Arc
To reproduce
Build the test as usual, use the environment from the log above to run it.
Environment
- OS: Linux
- Target device and vendor: Intel(R) Arc(TM) A750 Graphics
- DPC++ version: [SYCL][E2E] Drop CUDA requirement from bindless image tests #19819, because today bindless image copies don't work on Windows
- Dependencies version: NEO 25.31.34666.3
Additional context
Looking at
llvm/sycl/test-e2e/bindless_images/copies/copy_subregion_2D.cpp
Lines 87 to 90 in d43f90a
void copy_image_mem_handle_to_usm(const syclexp::image_descriptor &desc, | |
const std::vector<float> &dataIn, | |
sycl::device dev, sycl::queue q, | |
std::vector<float> &out) { |
- Copy from host pointer into an image handle (initialize image on device)
// Copy host input data to device. |
2. Copy from the image handle to a device USM (i.e. the main purpose of the sub-test)
// Copy data from device to device, using four sub-region copies. |
3. Copy from the device USM back to host for verification
// Copy device data back to host. |
The last 3rd step is where everything breaks. I printed contents of the host memory after each copy using the following snippet:
{
q.wait_and_throw();
std::cout << "test 1";
for (int i = 0; i < out.size(); ++i) {
if (i % desc.width == 0)
std::cout << std::endl;
std::cout << out[i] << " ";
}
std::cout << std::endl;
}
And the output looks like:
test 1
0 1 2 3 4 5 0 0 0 0 0 0
12 13 14 15 16 17 0 0 0 0 0 0
24 25 26 27 28 29 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
test 2
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
test 3
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
72 73 74 75 76 77 0 0 0 0 0 0
84 85 86 87 88 89 0 0 0 0 0 0
96 97 98 99 100 101 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
test 4
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99 100 101 102 103 104 105 106 107
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Result mismatch at index 36! Expected: 36, Actual: 0
copy_image_mem_handle_to_usm test failed
I.e. only half of the requested lines is being copied. If I change the copy extent height to the whole image instead of a half, then it just copies half the image, i.e. the L0 runtime consistently copies only half the data.
The code which invokes the copy is the same between V1 and V2 adapter, we just arrive to it through different code paths. The length of those different code paths is not that huge, but it is still unclear what is the root cause here and why only one adapter is affected.