(ROCM) Failed to distribute matmul in sdxl-turbo unet #17495

monorimet · 2024-05-23T23:50:33Z

What happened?

Module-level error:

<unknown>:0: error: cannot get concrete layout for contraction
<stdin>:2704:12: error: 'func.func' op failed to distribute
    %445 = torch.aten.convolution %338, %439, %440, %441, %442, %443, %false_465, %444, %int1_468 : !torch.vtensor<[1,320,64,64],f16>, !torch.vtensor<[640,320,1,1],f16>, !torch.vtensor<[640],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,640,64,64],f16>
           ^
<stdin>:1718:10: note: called from
    %6 = call @forward(%0, %1, %2, %3, %4, %5) : (!torch.vtensor<[1,4,128,128],f16>, !torch.vtensor<[1,64,2048],f16>, !torch.vtensor<[1,1280],f16>, !torch.vtensor<[1,6],f16>, !torch.vtensor<[1],f16>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,4,128,128],f16>
         ^
<stdin>:2704:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>
    %445 = torch.aten.convolution %338, %439, %440, %441, %442, %443, %false_465, %444, %int1_468 : !torch.vtensor<[1,320,64,64],f16>, !torch.vtensor<[640,320,1,1],f16>, !torch.vtensor<[640],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,640,64,64],f16>
           ^
<stdin>:1718:10: note: called from
    %6 = call @forward(%0, %1, %2, %3, %4, %5) : (!torch.vtensor<[1,4,128,128],f16>, !torch.vtensor<[1,64,2048],f16>, !torch.vtensor<[1,1280],f16>, !torch.vtensor<[1,6],f16>, !torch.vtensor<[1],f16>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,4,128,128],f16>
         ^

Dispatch-level Error:

iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: 1
Diagnostics:
failed to translate executables
/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:26:10: error: cannot get concrete layout for contraction
          %11 = arith.addf %out, %10 : f32 loc("/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/compiled_scheduled_unet_run_forward$async_dispatch_50.mlir":26:10)
         ^
/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:9:6: error: 'func.func' op failed to distribute
      func.func @run_forward$async_dispatch_50_matmul_like_64x64x640x320_f16xf16xf32() attributes {translation_info = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [128, 2, 1] subgroup_size = 64, {mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, subgroup_m_count = 2, subgroup_n_count = 2>}>} {
     ^
/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:2:2: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>
  hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>) {
 ^

Reproducer:
attention_and_matmul_spec_mfma.mlir
compiled_scheduled_unet_run_forward$async_dispatch_50.mlir

Invoked with:
 iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 --iree-opt-const-eval=false --iree-rocm-waves-per-eu=2 --iree-hal-dump-executable-files-to=/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches --iree-flow-enable-aggressive-fusion --iree-global-opt-enable-fuse-horizontal-contractions=true --iree-opt-aggressively-propagate-transposes=true --iree-global-opt-propagate-transposes=true --iree-opt-outer-dim-concat=true --iree-vm-target-truncate-unsupported-floats --iree-llvmgpu-enable-prefetch=true --iree-opt-data-tiling=false --iree-codegen-gpu-native-math-precision=true --iree-rocm-waves-per-eu=2 --iree-codegen-llvmgpu-use-vector-distribution=true --iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pad-to-intrinsics)) --iree-codegen-transform-dialect-library=/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/attention_and_matmul_spec_mfma.mlir --compile-from=flow compiled_scheduled_unet_run_forward\$async_dispatch_50.mlir

Steps to reproduce your issue

No response

What component(s) does this issue relate to?

Compiler

Version information

commit 2a2a4d0 (HEAD -> main, origin/main, origin/HEAD)

Additional context

This is a new version of the IR, where batch dim is 1. Hence, we likely have a vector distribution issue for this shape of this op.

The text was updated successfully, but these errors were encountered:

monorimet · 2024-05-24T00:42:27Z

FWIW this also fails with vector distribution explicitly disabled:, same dispatch:

<unknown>:0: error: LLVM Translation failed for operation: builtin.unrealized_conversion_cast
/home/eagarvey/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:2:2: error: failed to translate the MLIR LLVM dialect to the native llvm::Module
  hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>) {
 ^
/home/eagarvey/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:2:2: error: failed to serialize executable for target backend rocm
  hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>) {
 ^
/home/eagarvey/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:1:0: error: failed to serialize executables

antiagainst · 2024-08-08T23:03:01Z

I think SDXL for ROCm is working fine now. Closing.

monorimet added bug 🐞 Something isn't working codegen/rocm ROCm code generation compiler backend (HIP/HSA) labels May 23, 2024

antiagainst closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(ROCM) Failed to distribute matmul in sdxl-turbo unet #17495

(ROCM) Failed to distribute matmul in sdxl-turbo unet #17495

monorimet commented May 23, 2024 •

edited

Loading

monorimet commented May 24, 2024

antiagainst commented Aug 8, 2024

(ROCM) Failed to distribute matmul in sdxl-turbo unet #17495

(ROCM) Failed to distribute matmul in sdxl-turbo unet #17495

Comments

monorimet commented May 23, 2024 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

monorimet commented May 24, 2024

antiagainst commented Aug 8, 2024

monorimet commented May 23, 2024 •

edited

Loading