-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adreno and mali vulkan unable to compile Unet models #12708
Comments
Same happens for Mali
|
Oh yeah, this is failing on |
(should also make sure we need i64 - all those bits are obviously not required and we can propagate that back - someone really needs to get our existing narrowing passes working with integers :) |
We are converting it into fp16, which at max can be 65504, that's an indication that it does not need to be int64 really (at least for this particular dispatch): func.func @forward_dispatch_0_generic_2x160() {
%c0 = arith.constant 0 : index
%c65536 = arith.constant 65536 : index
%cst = arith.constant 0.000000e+00 : f16
%cst_0 = arith.constant -9.21033954 : f32
%cst_1 = arith.constant 1.600000e+02 : f16
%0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<f16>>
%1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c65536) : !flow.dispatch.tensor<writeonly:tensor<2x160xf16>>
%2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f16>> -> tensor<f16>
%3 = tensor.empty() : tensor<2x160xf16>
%4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> ()>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<f16>) outs(%3 : tensor<2x160xf16>) {
^bb0(%in: f16, %out: f16):
%5 = linalg.index 1 : index
%6 = arith.index_cast %5 : index to i64
%7 = arith.sitofp %6 : i64 to f16
%8 = arith.addf %7, %cst : f16
%9 = arith.truncf %cst_0 : f32 to f16
%10 = arith.mulf %8, %9 : f16
%11 = arith.divf %10, %cst_1 : f16
%12 = math.exp %11 : f16
%13 = arith.mulf %in, %12 : f16
linalg.yield %13 : f16
} -> tensor<2x160xf16>
flow.dispatch.tensor.store %4, %1, offsets = [0, 0], sizes = [2, 160], strides = [1, 1] : tensor<2x160xf16> -> !flow.dispatch.tensor<writeonly:tensor<2x160xf16>>
return
} Though these large models are really pushing it to the extreme there. I recall we have had correctness issues demoting int64 into int32; but cannot recall whether it's this particular model. @powderluv could you give |
this looks like our own problem with linalg.index - with just a bit of work our numeric analysis/narrowing should be able to handle at least turning %5 = linalg.index 1 : index
%6 = arith.index_cast %5 : index to i64
%7 = arith.sitofp %6 : i64 to f16 into %5 = linalg.index 1 : index
%6 = arith.index_cast %5 : index to i16
%7 = arith.sitofp %6 : i16 to f16 (or whatever) (and then of course we'll want to make sure we aren't emitting loops with i64s either, but so long as they are index and we have index -> i32 it'll probably be fine) |
Sounds good. Will give it a try. I'll drop the preprocessing for now |
Missing wide int emulation pattern: https://reviews.llvm.org/D146597 |
@powderluv Dropping this to P2 for now, but please flag if this is needed sooner. |
FYI @benvanik, @matthias-springer landed |
The particular issue reported here is fixed right now---I just verified that the model can be compiled with the given command line successfully. |
With the unet:
What happened?
Steps to reproduce your issue
What component(s) does this issue relate to?
No response
Version information
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: