adreno and mali vulkan unable to compile Unet models #12708

powderluv · 2023-03-21T20:11:51Z

With the unet:

What happened?

anush@MacBook-Pro SHARK-Runtime % ../iree-build/install/bin/iree-compile --iree-vulkan-target-triple=adreno-unknown-android30 --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' -o adreno_unet.vmfb ~/Downloads/unet64_512_512_fp16_stabilityai_stable_diffusion_2_1_base_torch.mlir
<eval_with_key>.204:7:13: error: failed to legalize operation 'arith.sitofp' that was explicitly marked illegal
<eval_with_key>.204:15:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, Float16, Int16, Int8, StorageBuffer16BitAccess, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer], [SPV_KHR_16bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, Qualcomm:IntegratedGPU, #spirv.resource_limits<max_compute_shared_memory_size = 32768, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 64], subgroup_size = 64, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.204:15:12: error: failed to serialize executables

Steps to reproduce your issue

Go to '...'
Click on '....'
Scroll down to '....'
See error

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

powderluv · 2023-03-21T20:50:23Z

Same happens for Mali

anush@MacBook-Pro SHARK-Runtime % ../iree-build/install/bin/iree-compile --iree-vulkan-target-triple=valhall-g78-android30 --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' -o adreno_unet.vmfb ~/Downloads/unet64_512_512_fp16_stabilityai_stable_diffusion_2_1_base_torch.mlir
<eval_with_key>.204:7:13: error: failed to legalize operation 'arith.sitofp' that was explicitly marked illegal
<eval_with_key>.204:15:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, Float16, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, ARM:IntegratedGPU, #spirv.resource_limits<max_compute_shared_memory_size = 32768, max_compute_workgroup_invocations = 512, max_compute_workgroup_size = [512, 512, 512], subgroup_size = 16, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.204:15:12: error: failed to serialize executables

antiagainst · 2023-03-21T23:28:52Z

Oh yeah, this is failing on arith.sitofp .. : i64 to f16, given those phones won't have native support for int64. We need to enhance wide-integer emulation for support this op.

benvanik · 2023-03-21T23:34:16Z

(should also make sure we need i64 - all those bits are obviously not required and we can propagate that back - someone really needs to get our existing narrowing passes working with integers :)

antiagainst · 2023-03-21T23:51:42Z

We are converting it into fp16, which at max can be 65504, that's an indication that it does not need to be int64 really (at least for this particular dispatch):

func.func @forward_dispatch_0_generic_2x160() {
  %c0 = arith.constant 0 : index
  %c65536 = arith.constant 65536 : index
  %cst = arith.constant 0.000000e+00 : f16
  %cst_0 = arith.constant -9.21033954 : f32
  %cst_1 = arith.constant 1.600000e+02 : f16
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<f16>>
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c65536) : !flow.dispatch.tensor<writeonly:tensor<2x160xf16>>
  %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f16>> -> tensor<f16>
  %3 = tensor.empty() : tensor<2x160xf16>
  %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> ()>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<f16>) outs(%3 : tensor<2x160xf16>) {
  ^bb0(%in: f16, %out: f16):
    %5 = linalg.index 1 : index
    %6 = arith.index_cast %5 : index to i64
    %7 = arith.sitofp %6 : i64 to f16
    %8 = arith.addf %7, %cst : f16
    %9 = arith.truncf %cst_0 : f32 to f16
    %10 = arith.mulf %8, %9 : f16
    %11 = arith.divf %10, %cst_1 : f16
    %12 = math.exp %11 : f16
    %13 = arith.mulf %in, %12 : f16
    linalg.yield %13 : f16
  } -> tensor<2x160xf16>
  flow.dispatch.tensor.store %4, %1, offsets = [0, 0], sizes = [2, 160], strides = [1, 1] : tensor<2x160xf16> -> !flow.dispatch.tensor<writeonly:tensor<2x160xf16>>
  return
}

Though these large models are really pushing it to the extreme there. I recall we have had correctness issues demoting int64 into int32; but cannot recall whether it's this particular model.

@powderluv could you give --iree-flow-demote-i64-to-i32 to see if everything works fine? (It at least compiles fine for me. Also note that the above options, particularly those processing steps, are meant for RDNA, so they won't likely be good here.)

benvanik · 2023-03-21T23:55:32Z

this looks like our own problem with linalg.index - with just a bit of work our numeric analysis/narrowing should be able to handle at least turning

    %5 = linalg.index 1 : index
    %6 = arith.index_cast %5 : index to i64
    %7 = arith.sitofp %6 : i64 to f16

into

    %5 = linalg.index 1 : index
    %6 = arith.index_cast %5 : index to i16
    %7 = arith.sitofp %6 : i16 to f16

(or whatever)

(and then of course we'll want to make sure we aren't emitting loops with i64s either, but so long as they are index and we have index -> i32 it'll probably be fine)

powderluv · 2023-03-21T23:55:32Z

Sounds good. Will give it a try. I'll drop the preprocessing for now

kuhar · 2023-03-22T02:06:56Z

Missing wide int emulation pattern: https://reviews.llvm.org/D146597
And separately, another reason to look into integer narrowing...

allieculp · 2023-04-19T20:00:28Z

@powderluv Dropping this to P2 for now, but please flag if this is needed sooner.

kuhar · 2023-04-19T20:05:40Z

this looks like our own problem with linalg.index - with just a bit of work our numeric analysis/narrowing should be able to handle at least turning
...

FYI @benvanik, @matthias-springer landed ValueBounds support for linalg.index which should allow us to perform this optimization: https://reviews.llvm.org/D148598

antiagainst · 2023-04-19T22:25:25Z

The particular issue reported here is fixed right now---I just verified that the model can be compiled with the given command line successfully.

powderluv added bug 🐞 Something isn't working awaiting-triage labels Mar 21, 2023

powderluv assigned antiagainst Mar 21, 2023

powderluv changed the title ~~adreno-unknown-android30 unable to compile Unet models~~ adreno and mali vulkan unable to compile Unet models Mar 21, 2023

antiagainst added codegen/spirv SPIR-V code generation compiler backend enhancement ➕ New feature or request and removed awaiting-triage bug 🐞 Something isn't working labels Mar 21, 2023

antiagainst assigned kuhar Mar 21, 2023

kuhar mentioned this issue Mar 22, 2023

[spirv][vulkan] Reduce operation bitwidth after extension #12543

Open

powderluv mentioned this issue Mar 24, 2023

[vulkan] Intel Arc Fails to Compile nod-ai/SHARK-Studio#1245

Open

julianwa added this to the Sprint: Compilation WS 1 milestone Apr 5, 2023

flo-th mentioned this issue Apr 18, 2023

ResNet50 doesn't build with pad fusion on CPU backend #11174

Open

allieculp removed this from the Sprint: Compilation Q2 2023 milestone Apr 18, 2023

antiagainst closed this as completed Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adreno and mali vulkan unable to compile Unet models #12708

adreno and mali vulkan unable to compile Unet models #12708

powderluv commented Mar 21, 2023 •

edited

Loading

powderluv commented Mar 21, 2023

antiagainst commented Mar 21, 2023

benvanik commented Mar 21, 2023

antiagainst commented Mar 21, 2023

benvanik commented Mar 21, 2023

powderluv commented Mar 21, 2023

kuhar commented Mar 22, 2023

allieculp commented Apr 19, 2023

kuhar commented Apr 19, 2023

antiagainst commented Apr 19, 2023

adreno and mali vulkan unable to compile Unet models #12708

adreno and mali vulkan unable to compile Unet models #12708

Comments

powderluv commented Mar 21, 2023 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

powderluv commented Mar 21, 2023

antiagainst commented Mar 21, 2023

benvanik commented Mar 21, 2023

antiagainst commented Mar 21, 2023

benvanik commented Mar 21, 2023

powderluv commented Mar 21, 2023

kuhar commented Mar 22, 2023

allieculp commented Apr 19, 2023

kuhar commented Apr 19, 2023

antiagainst commented Apr 19, 2023

powderluv commented Mar 21, 2023 •

edited

Loading