-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flow] Make sink reshapes changes less conservative. #17706
[Flow] Make sink reshapes changes less conservative. #17706
Conversation
3a50abd
to
e3066d6
Compare
Abbreviated Benchmark Summary@ commit 319360d130b43a73ace1fba52c56f4b90db2b5f0 (vs. base 7090f64b7bd60a597ee6c61b8ff0d624153cb7f0) Data-Tiling Comparison TableClick to show
Regressed Latencies 🚩
Improved Latencies 🎉
[Top 3 out of 13 results showed] Regressed Stream IR Dispatch Count (# of cmd.dispatch ops) 🚩
For more information: |
6cae1e9
to
5c9dac6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving but dispatch regression of 330 -> 380 needs investigation before landing.
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
Have been trying to repro it. Can't repro locally. |
I think the regression can be seen here: Afterfull after change %expanded_181 = tensor.expand_shape %43 [[0, 1], [2]] output_shape [1, 4, 768] : tensor<4x768xf32> into tensor<1x4x768xf32>
%expanded_182 = tensor.expand_shape %46 [[0, 1]] output_shape [1, 4] : tensor<4xf32> into tensor<1x4xf32>
%expanded_183 = tensor.expand_shape %45 [[0, 1]] output_shape [1, 4] : tensor<4xf32> into tensor<1x4xf32>
%47 = linalg.generic {
indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>,
affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>,
affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
iterator_types = ["parallel", "parallel", "parallel"]}
ins(%expanded_181, %expanded_182, %cst_60, %cst_59, %expanded_183
: tensor<1x4x768xf32>, tensor<1x4xf32>, tensor<768xf32>, tensor<768xf32>, tensor<1x4xf32>)
outs(%7 : tensor<1x4x768xf32>) {
^bb0(%in: f32, %in_584: f32, %in_585: f32, %in_586: f32, %in_587: f32, %out: f32):
%355 = arith.divf %in_584, %cst_9 : f32
%356 = arith.addf %355, %cst_8 : f32
%357 = math.rsqrt %356 : f32
%358 = arith.mulf %357, %in_585 : f32
%359 = arith.mulf %in_587, %358 : f32
%360 = arith.subf %in_586, %359 : f32
%361 = arith.mulf %in, %358 : f32
%362 = arith.addf %361, %360 : f32
linalg.yield %362 : f32
} -> tensor<1x4x768xf32>
%collapsed_184 = tensor.collapse_shape %47 [[0, 1], [2]] : tensor<1x4x768xf32> into tensor<4x768xf32> It should be possible to sink the expanded dims and fold with the collapse shape, but since each input indexing map is a projection the |
Thanks Ian... let me look into this a little bit more. |
41ed0e1
to
9bb15c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems mostly okay to me, but I'm not sure about some of the logic. It seems like there could be some edge cases that won't be handled right.
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/Flow/Transforms/SinkReshapes.cpp
Outdated
Show resolved
Hide resolved
While deciding if a reshape needs "sinking", for a `tensor.expand_shape` -> `linalg.*`, first check was to check that the `linalg.*` operation could already fuse with one of its existing producers. That check was broadly aggressive. The fusion only kicks in when the iteration domains match. Eventually the actual dispatch formation logic needs to be commoned to a single place to do this better, but kicking that to a follow up. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
9bb15c6
to
253aecc
Compare
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
b808b9b
to
f8163b2
Compare
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Ok, folks. I am landing this with the one regression in number of dispatches. I am checking some out of tree models and this seems to do better. |
While deciding if a reshape needs "sinking", for a `tensor.expand_shape` -> `linalg.*`, first check was to check that the `linalg.*` operation could already fuse with one of its existing producers. That check was broadly aggressive. The fusion only kicks in when the iteration domains match. Eventually the actual dispatch formation logic needs to be commoned to a single place to do this better, but kicking that to a follow up. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com> Signed-off-by: Lubo Litchev <lubol@google.com>
While deciding if a reshape needs "sinking", for a
tensor.expand_shape
->linalg.*
, first check was to check that thelinalg.*
operation could already fuse with one of its existing producers. That check was broadly aggressive. The fusion only kicks in when the iteration domains match. Eventually the actual dispatch formation logic needs to be commoned to a single place to do this better, but kicking that to a follow up.