Skip to content

Matrix failures on BMG Linux #20594

@sarnex

Description

@sarnex

Describe the bug

https://github.com/intel/llvm/actions/runs/19151375733/job/54743282591?pr=20542

   SYCL :: Matrix/joint_matrix_bfloat16_accumulator.cpp
    SYCL :: Matrix/joint_matrix_half_accumulator.cpp

  
  ********************
  FAIL: SYCL :: Matrix/joint_matrix_bfloat16_accumulator.cpp (1688 of 1911)
  ******************** TEST 'SYCL :: Matrix/joint_matrix_bfloat16_accumulator.cpp' FAILED ********************
  Exit Code: -6
  
  Command Output (stdout):
  --
  # RUN: at line 21
  env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 21
  env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 22
  env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 22
  env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 23
  env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # executed command: env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # .---command stderr------------
  # | terminate called after throwing an instance of 'sycl::_V1::exception'
  # |   what():  level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
  # `-----------------------------
  # error: command failed with exit status: -6
  
  --
  
  ********************
  FAIL: SYCL :: Matrix/joint_matrix_half_accumulator.cpp (1761 of 1911)
  ******************** TEST 'SYCL :: Matrix/joint_matrix_half_accumulator.cpp' FAILED ********************
  Exit Code: -6
  
  Command Output (stdout):
  --
  # RUN: at line 22
  env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 22
  env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 23
  env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 23
  env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | B packed:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # RUN: at line 24
  env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # executed command: env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
  # .---command stdout------------
  # | B row major:
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 8 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 16 x 16 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 16 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 1 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 16 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # | Testing: 32 x 64 x 32 [TM x TN x TK]
  # `-----------------------------
  # .---command stderr------------
  # | terminate called after throwing an instance of 'sycl::_V1::exception'
  # |   what():  level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
  # `-----------------------------
  # error: command failed with exit status: -6
  
  --

To reproduce

  1. Include a code snippet that is as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the command which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment

  • OS: [e.g Windows/Linux]
  • Target device and vendor: [e.g. Intel GPU]
  • DPC++ version: [e.g. commit hash or output of clang++ --version]
  • Dependencies version: [e.g. the output of sycl-ls --verbose]

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions