-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][linalg] Add pattern to clean unused results after fusion #158627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1079,4 +1079,49 @@ module { | |
// CHECK-NOT: linalg.generic | ||
// CHECK: tensor.expand_shape | ||
// CHECK: linalg.generic {{.*}}, iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "parallel", "reduction"]} | ||
// CHECK-SAME: ins(%[[ARG0]], %[[FUSED]]#1 : tensor<1x1x2x1xf32>, tensor<4x1x1x1xf32>) | ||
// CHECK-SAME: ins(%[[ARG0]], %[[FUSED]]#1 : tensor<1x1x2x1xf32>, tensor<4x1x1x1xf32>) | ||
|
||
// ----- | ||
|
||
// CHECK-LABEL: @drop_unused_results | ||
// CHECK-SAME: [[ARG0:%[a-zA-Z0-9]+]]: tensor<64xf32>, [[ARG1:%[a-zA-Z0-9]+]]: tensor<1x56x56x64xf32> | ||
func.func @drop_unused_results(%arg0: tensor<64xf32>, %arg1: tensor<1x56x56x64xf32>) -> tensor<1x56x56x64xf32> { | ||
%cst = arith.constant 3.40282347E+38 : f32 | ||
%cst_0 = arith.constant 0.000000e+00 : f32 | ||
// CHECK: [[OUT:%[a-zA-Z0-9]+]] = tensor.empty() : tensor<1x56x56x64xf32> | ||
%0 = tensor.empty() : tensor<1x56x56x64xf32> | ||
// CHECK: [[RES:%[0-9]+]] = linalg.generic {{.*}} ins([[ARG0]], [[ARG1]] : tensor<64xf32>, tensor<1x56x56x64xf32>) outs([[OUT]] : tensor<1x56x56x64xf32>) | ||
%1:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg0 : tensor<64xf32>) outs(%arg1, %0 : tensor<1x56x56x64xf32>, tensor<1x56x56x64xf32>) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This input in theory is wrong. I understand your pattern is making the semantics of the operation "right". But for an operation with all We could make this explicitly a verifier error as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well having said that, there is a pattern There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [taking over Pavel whose internship is now finished] Ack. This pattern was the result of some tensor fusion pattern, but I need to investigate if it's an upstream pattern or not. I've put this PR as draft for the time being while I check whether it was an upstream pattern that caused this invalid IR. All I know is we seem to call populateMoveInitOperandsToInput implicitely via LinalgFoldUnitExtentDimsPass but when removing the pattern added by this patch we get worse code generation. I'll update once we've found the root cause. Thanks for the review so far! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MaheshRavishankar where is the documentation that this IR is invalid? I couldn't find something in the online Linalg dialect documentation about out being read-only for parallel-only iterator maps. Is there a verifier that checks that? |
||
^bb0(%in: f32, %out: f32, %out_1: f32): | ||
%2 = arith.addf %in, %out : f32 | ||
%3 = arith.minimumf %2, %cst : f32 | ||
%4 = arith.maximumf %3, %cst_0 : f32 | ||
linalg.yield %2, %4 : f32, f32 | ||
} -> (tensor<1x56x56x64xf32>, tensor<1x56x56x64xf32>) | ||
// CHECK: -> tensor<1x56x56x64xf32> | ||
// CHECK: return [[RES]] : tensor<1x56x56x64xf32> | ||
return %1#1 : tensor<1x56x56x64xf32> | ||
} | ||
pavlips marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
// ----- | ||
|
||
// CHECK-LABEL: @swap_drop_unused_results | ||
// CHECK-SAME: [[ARG0:%[a-zA-Z0-9]+]]: tensor<64xf32>, [[ARG1:%[a-zA-Z0-9]+]]: tensor<1x56x56x64xf32> | ||
func.func @swap_drop_unused_results(%arg0: tensor<64xf32>, %arg1: tensor<1x56x56x64xf32>) -> tensor<1x56x56x64xf32> { | ||
%cst = arith.constant 3.40282347E+38 : f32 | ||
%cst_0 = arith.constant 0.000000e+00 : f32 | ||
// CHECK: [[OUT:%[a-zA-Z0-9]+]] = tensor.empty() : tensor<1x56x56x64xf32> | ||
%0 = tensor.empty() : tensor<1x56x56x64xf32> | ||
// CHECK: [[RES:%[0-9]+]] = linalg.generic {{.*}} ins([[ARG0]] : tensor<64xf32>) outs([[OUT]] : tensor<1x56x56x64xf32>) | ||
%1:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg0 : tensor<64xf32>) outs(%arg1, %0 : tensor<1x56x56x64xf32>, tensor<1x56x56x64xf32>) { | ||
^bb0(%in: f32, %out_1: f32, %out: f32): | ||
%2 = arith.addf %in, %out : f32 | ||
%3 = arith.minimumf %2, %cst : f32 | ||
%4 = arith.maximumf %3, %cst_0 : f32 | ||
linalg.yield %2, %4 : f32, f32 | ||
} -> (tensor<1x56x56x64xf32>, tensor<1x56x56x64xf32>) | ||
// CHECK: -> tensor<1x56x56x64xf32> | ||
// CHECK: return [[RES]] : tensor<1x56x56x64xf32> | ||
return %1#0 : tensor<1x56x56x64xf32> | ||
} | ||
|
Uh oh!
There was an error while loading. Please reload this page.