Skip to content

bindless_images/copies/copy_subregion_2D.cpp fails on Arc with V2 L0 adapter #20004

@AlexeySachkov

Description

@AlexeySachkov

Describe the bug

The failure is discovered in #19819 while trying to make bindless image copies work on Intel GPUs:

  FAIL: SYCL :: bindless_images/copies/copy_subregion_2D.cpp (876 of 1892)
  ******************** TEST 'SYCL :: bindless_images/copies/copy_subregion_2D.cpp' FAILED ********************
  Exit Code: 1
  
  Command Output (stdout):
  --
  # RUN: at line 6
  env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
  # executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
  # note: command had no output on stdout or stderr
  # RUN: at line 6
  env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
  # executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/bindless_images/copies/Output/copy_subregion_2D.cpp.tmp.out
  # .---command stdout------------
  # | Result mismatch at index 36! Expected: 36, Actual: 0
  # | copy_image_mem_handle_to_usm test failed
  # | Result mismatch at index 36! Expected: 36, Actual: 0
  # | copy_usm_to_image_mem_handle test failed
  # | Result mismatch at index 36! Expected: 36, Actual: 0
  # | copy_usm_to_usm test failed
  # | Tests failed
  # `-----------------------------
  # error: command failed with exit status: 1
  
  --

Default adapter for the platform is v1 and the test passes with it on Arc

To reproduce

Build the test as usual, use the environment from the log above to run it.

Environment

Additional context

Looking at

void copy_image_mem_handle_to_usm(const syclexp::image_descriptor &desc,
const std::vector<float> &dataIn,
sycl::device dev, sycl::queue q,
std::vector<float> &out) {
sub-test, it does 3 main things:

  1. Copy from host pointer into an image handle (initialize image on device)

// Copy host input data to device.

2. Copy from the image handle to a device USM (i.e. the main purpose of the sub-test)

// Copy data from device to device, using four sub-region copies.

3. Copy from the device USM back to host for verification

// Copy device data back to host.

The last 3rd step is where everything breaks. I printed contents of the host memory after each copy using the following snippet:

{                                       
  q.wait_and_throw();                   
  std::cout << "test 1";                
  for (int i = 0; i < out.size(); ++i) {
    if (i % desc.width == 0)            
      std::cout << std::endl;           
    std::cout << out[i] << " ";         
  }                                     
  std::cout << std::endl;               
}                                       

And the output looks like:

test 1
0 1 2 3 4 5 0 0 0 0 0 0
12 13 14 15 16 17 0 0 0 0 0 0
24 25 26 27 28 29 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
test 2
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
test 3
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
72 73 74 75 76 77 0 0 0 0 0 0
84 85 86 87 88 89 0 0 0 0 0 0
96 97 98 99 100 101 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
test 4
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99 100 101 102 103 104 105 106 107
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Result mismatch at index 36! Expected: 36, Actual: 0
copy_image_mem_handle_to_usm test failed

I.e. only half of the requested lines is being copied. If I change the copy extent height to the whole image instead of a half, then it just copies half the image, i.e. the L0 runtime consistently copies only half the data.

The code which invokes the copy is the same between V1 and V2 adapter, we just arrive to it through different code paths. The length of those different code paths is not that huge, but it is still unclear what is the root cause here and why only one adapter is affected.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions