Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adreno and mali vulkan unable to compile Unet models #12708

Closed
powderluv opened this issue Mar 21, 2023 · 10 comments
Closed

adreno and mali vulkan unable to compile Unet models #12708

powderluv opened this issue Mar 21, 2023 · 10 comments
Assignees
Labels
codegen/spirv SPIR-V code generation compiler backend enhancement ➕ New feature or request

Comments

@powderluv
Copy link
Collaborator

powderluv commented Mar 21, 2023

With the unet:

What happened?

anush@MacBook-Pro SHARK-Runtime % ../iree-build/install/bin/iree-compile --iree-vulkan-target-triple=adreno-unknown-android30 --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' -o adreno_unet.vmfb ~/Downloads/unet64_512_512_fp16_stabilityai_stable_diffusion_2_1_base_torch.mlir
<eval_with_key>.204:7:13: error: failed to legalize operation 'arith.sitofp' that was explicitly marked illegal
<eval_with_key>.204:15:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, Float16, Int16, Int8, StorageBuffer16BitAccess, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer], [SPV_KHR_16bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, Qualcomm:IntegratedGPU, #spirv.resource_limits<max_compute_shared_memory_size = 32768, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 64], subgroup_size = 64, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.204:15:12: error: failed to serialize executables

Steps to reproduce your issue

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

No response

@powderluv powderluv added bug 🐞 Something isn't working awaiting-triage labels Mar 21, 2023
@powderluv powderluv changed the title adreno-unknown-android30 unable to compile Unet models adreno and mali vulkan unable to compile Unet models Mar 21, 2023
@powderluv
Copy link
Collaborator Author

Same happens for Mali

anush@MacBook-Pro SHARK-Runtime % ../iree-build/install/bin/iree-compile --iree-vulkan-target-triple=valhall-g78-android30 --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' -o adreno_unet.vmfb ~/Downloads/unet64_512_512_fp16_stabilityai_stable_diffusion_2_1_base_torch.mlir
<eval_with_key>.204:7:13: error: failed to legalize operation 'arith.sitofp' that was explicitly marked illegal
<eval_with_key>.204:15:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, Float16, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer, DotProduct, DotProductInputAll, DotProductInput4x8BitPacked, DotProductInput4x8Bit], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, ARM:IntegratedGPU, #spirv.resource_limits<max_compute_shared_memory_size = 32768, max_compute_workgroup_invocations = 512, max_compute_workgroup_size = [512, 512, 512], subgroup_size = 16, cooperative_matrix_properties_nv = []>>}>
<eval_with_key>.204:15:12: error: failed to serialize executables

@antiagainst
Copy link
Contributor

Oh yeah, this is failing on arith.sitofp .. : i64 to f16, given those phones won't have native support for int64. We need to enhance wide-integer emulation for support this op.

@antiagainst antiagainst added codegen/spirv SPIR-V code generation compiler backend enhancement ➕ New feature or request and removed awaiting-triage bug 🐞 Something isn't working labels Mar 21, 2023
@benvanik
Copy link
Collaborator

(should also make sure we need i64 - all those bits are obviously not required and we can propagate that back - someone really needs to get our existing narrowing passes working with integers :)

@antiagainst
Copy link
Contributor

We are converting it into fp16, which at max can be 65504, that's an indication that it does not need to be int64 really (at least for this particular dispatch):

func.func @forward_dispatch_0_generic_2x160() {
  %c0 = arith.constant 0 : index
  %c65536 = arith.constant 65536 : index
  %cst = arith.constant 0.000000e+00 : f16
  %cst_0 = arith.constant -9.21033954 : f32
  %cst_1 = arith.constant 1.600000e+02 : f16
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<f16>>
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c65536) : !flow.dispatch.tensor<writeonly:tensor<2x160xf16>>
  %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f16>> -> tensor<f16>
  %3 = tensor.empty() : tensor<2x160xf16>
  %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> ()>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<f16>) outs(%3 : tensor<2x160xf16>) {
  ^bb0(%in: f16, %out: f16):
    %5 = linalg.index 1 : index
    %6 = arith.index_cast %5 : index to i64
    %7 = arith.sitofp %6 : i64 to f16
    %8 = arith.addf %7, %cst : f16
    %9 = arith.truncf %cst_0 : f32 to f16
    %10 = arith.mulf %8, %9 : f16
    %11 = arith.divf %10, %cst_1 : f16
    %12 = math.exp %11 : f16
    %13 = arith.mulf %in, %12 : f16
    linalg.yield %13 : f16
  } -> tensor<2x160xf16>
  flow.dispatch.tensor.store %4, %1, offsets = [0, 0], sizes = [2, 160], strides = [1, 1] : tensor<2x160xf16> -> !flow.dispatch.tensor<writeonly:tensor<2x160xf16>>
  return
}

Though these large models are really pushing it to the extreme there. I recall we have had correctness issues demoting int64 into int32; but cannot recall whether it's this particular model.

@powderluv could you give --iree-flow-demote-i64-to-i32 to see if everything works fine? (It at least compiles fine for me. Also note that the above options, particularly those processing steps, are meant for RDNA, so they won't likely be good here.)

@benvanik
Copy link
Collaborator

this looks like our own problem with linalg.index - with just a bit of work our numeric analysis/narrowing should be able to handle at least turning

    %5 = linalg.index 1 : index
    %6 = arith.index_cast %5 : index to i64
    %7 = arith.sitofp %6 : i64 to f16

into

    %5 = linalg.index 1 : index
    %6 = arith.index_cast %5 : index to i16
    %7 = arith.sitofp %6 : i16 to f16

(or whatever)

(and then of course we'll want to make sure we aren't emitting loops with i64s either, but so long as they are index and we have index -> i32 it'll probably be fine)

@powderluv
Copy link
Collaborator Author

Sounds good. Will give it a try. I'll drop the preprocessing for now

@kuhar
Copy link
Member

kuhar commented Mar 22, 2023

Missing wide int emulation pattern: https://reviews.llvm.org/D146597
And separately, another reason to look into integer narrowing...

@julianwa julianwa added this to the Sprint: Compilation WS 1 milestone Apr 5, 2023
@allieculp allieculp removed this from the Sprint: Compilation Q2 2023 milestone Apr 18, 2023
@allieculp
Copy link

@powderluv Dropping this to P2 for now, but please flag if this is needed sooner.

@kuhar
Copy link
Member

kuhar commented Apr 19, 2023

this looks like our own problem with linalg.index - with just a bit of work our numeric analysis/narrowing should be able to handle at least turning
...

FYI @benvanik, @matthias-springer landed ValueBounds support for linalg.index which should allow us to perform this optimization: https://reviews.llvm.org/D148598

@antiagainst
Copy link
Contributor

The particular issue reported here is fixed right now---I just verified that the model can be compiled with the given command line successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/spirv SPIR-V code generation compiler backend enhancement ➕ New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants