We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module-level error:
<unknown>:0: error: cannot get concrete layout for contraction <stdin>:2704:12: error: 'func.func' op failed to distribute %445 = torch.aten.convolution %338, %439, %440, %441, %442, %443, %false_465, %444, %int1_468 : !torch.vtensor<[1,320,64,64],f16>, !torch.vtensor<[640,320,1,1],f16>, !torch.vtensor<[640],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,640,64,64],f16> ^ <stdin>:1718:10: note: called from %6 = call @forward(%0, %1, %2, %3, %4, %5) : (!torch.vtensor<[1,4,128,128],f16>, !torch.vtensor<[1,64,2048],f16>, !torch.vtensor<[1,1280],f16>, !torch.vtensor<[1,6],f16>, !torch.vtensor<[1],f16>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,4,128,128],f16> ^ <stdin>:2704:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}> %445 = torch.aten.convolution %338, %439, %440, %441, %442, %443, %false_465, %444, %int1_468 : !torch.vtensor<[1,320,64,64],f16>, !torch.vtensor<[640,320,1,1],f16>, !torch.vtensor<[640],f16>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,640,64,64],f16> ^ <stdin>:1718:10: note: called from %6 = call @forward(%0, %1, %2, %3, %4, %5) : (!torch.vtensor<[1,4,128,128],f16>, !torch.vtensor<[1,64,2048],f16>, !torch.vtensor<[1,1280],f16>, !torch.vtensor<[1,6],f16>, !torch.vtensor<[1],f16>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,4,128,128],f16> ^
Dispatch-level Error:
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile Error code: 1 Diagnostics: failed to translate executables /home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:26:10: error: cannot get concrete layout for contraction %11 = arith.addf %out, %10 : f32 loc("/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/compiled_scheduled_unet_run_forward$async_dispatch_50.mlir":26:10) ^ /home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:9:6: error: 'func.func' op failed to distribute func.func @run_forward$async_dispatch_50_matmul_like_64x64x640x320_f16xf16xf32() attributes {translation_info = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [128, 2, 1] subgroup_size = 64, {mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, subgroup_m_count = 2, subgroup_n_count = 2>}>} { ^ /home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:2:2: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}> hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>) { ^
Reproducer: attention_and_matmul_spec_mfma.mlir compiled_scheduled_unet_run_forward$async_dispatch_50.mlir
Invoked with: iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 --iree-opt-const-eval=false --iree-rocm-waves-per-eu=2 --iree-hal-dump-executable-files-to=/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches --iree-flow-enable-aggressive-fusion --iree-global-opt-enable-fuse-horizontal-contractions=true --iree-opt-aggressively-propagate-transposes=true --iree-global-opt-propagate-transposes=true --iree-opt-outer-dim-concat=true --iree-vm-target-truncate-unsupported-floats --iree-llvmgpu-enable-prefetch=true --iree-opt-data-tiling=false --iree-codegen-gpu-native-math-precision=true --iree-rocm-waves-per-eu=2 --iree-codegen-llvmgpu-use-vector-distribution=true --iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pad-to-intrinsics)) --iree-codegen-transform-dialect-library=/home/monorimet/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/attention_and_matmul_spec_mfma.mlir --compile-from=flow compiled_scheduled_unet_run_forward\$async_dispatch_50.mlir
No response
Compiler
commit 2a2a4d0 (HEAD -> main, origin/main, origin/HEAD)
This is a new version of the IR, where batch dim is 1. Hence, we likely have a vector distribution issue for this shape of this op.
The text was updated successfully, but these errors were encountered:
FWIW this also fails with vector distribution explicitly disabled:, same dispatch:
<unknown>:0: error: LLVM Translation failed for operation: builtin.unrealized_conversion_cast /home/eagarvey/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:2:2: error: failed to translate the MLIR LLVM dialect to the native llvm::Module hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>) { ^ /home/eagarvey/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:2:2: error: failed to serialize executable for target backend rocm hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {mma_intrinsics = [#iree_gpu.mma_layout<MFMA_F16_16x16x16_F32>, #iree_gpu.mma_layout<MFMA_F16_32x32x8_F32>], target_arch = "gfx942", ukernels = "none", waves_per_eu = 2 : i64}>) { ^ /home/eagarvey/SHARK/models/stabilityai_sdxl_turbo_1_64_1024x1024_fp16_gfx942/EulerDiscrete_unet_4_dispatches/configured_compiled_scheduled_unet_run_forward$async_dispatch_50.mlir:1:0: error: failed to serialize executables
Sorry, something went wrong.
I think SDXL for ROCm is working fine now. Closing.
No branches or pull requests
What happened?
Module-level error:
Dispatch-level Error:
Reproducer:
attention_and_matmul_spec_mfma.mlir
compiled_scheduled_unet_run_forward$async_dispatch_50.mlir
Steps to reproduce your issue
No response
What component(s) does this issue relate to?
Compiler
Version information
commit 2a2a4d0 (HEAD -> main, origin/main, origin/HEAD)
Additional context
This is a new version of the IR, where batch dim is 1. Hence, we likely have a vector distribution issue for this shape of this op.
The text was updated successfully, but these errors were encountered: