Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] The number of dispatches regresses with folding pad op unit dims #16835

Closed
hanhanW opened this issue Mar 19, 2024 · 1 comment · Fixed by #16930
Closed

[CPU] The number of dispatches regresses with folding pad op unit dims #16835

hanhanW opened this issue Mar 19, 2024 · 1 comment · Fixed by #16930
Assignees
Labels
codegen Shared code generation infrastructure and dialects

Comments

@hanhanW
Copy link
Contributor

hanhanW commented Mar 19, 2024

It is caused by llvm/llvm-project@60e562d There are additional tensor.expand_shape bubbled up which becomes a fusion barrier between unpack and generic op. We don't see any reshape ops after outlining dispatches because we run CollapseDims after FormDispatchRegion. The pass creates below snippet and folds the reshapes away.

  %75 = flow.dispatch.region -> (tensor<784x96xf32>) {
    %unpack = tensor.unpack %74 outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %72 : tensor<49x6x16x16xf32> -> tensor<784x96xf32>
    flow.return %unpack : tensor<784x96xf32>
  }
  %expanded_100 = tensor.expand_shape %75 [[0, 1], [2]] : tensor<784x96xf32> into tensor<28x28x96xf32>
  %76 = tensor.empty() : tensor<28x28x96xf32>
  %collapsed_101 = tensor.collapse_shape %expanded_100 [[0, 1], [2]] : tensor<28x28x96xf32> into tensor<784x96xf32>
  %77 = flow.dispatch.region -> (tensor<784x96xf32>) {
    %cst_253 = arith.constant 0.166666672 : f32
    %cst_254 = arith.constant 0.000000e+00 : f32
    %cst_255 = arith.constant 6.000000e+00 : f32
    %cst_256 = arith.constant 3.000000e+00 : f32
    %330 = tensor.empty() : tensor<784x96xf32>
    %331 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%collapsed_101, %cst_36 : tensor<784x96xf32>, tensor<96xf32>) o
    ^bb0(%in: f32, %in_258: f32, %out: f32):
      %332 = arith.addf %in, %in_258 : f32
      %333 = arith.addf %332, %cst_256 : f32
      %334 = arith.minimumf %333, %cst_255 : f32
      %335 = arith.maximumf %334, %cst_254 : f32
      %336 = arith.mulf %332, %335 : f32
      %337 = arith.mulf %336, %cst_253 : f32
      linalg.yield %337 : f32
    } -> tensor<784x96xf32>
    %expanded_257 = tensor.expand_shape %331 [[0, 1], [2]] : tensor<784x96xf32> into tensor<28x28x96xf32>
    flow.return %331 : tensor<784x96xf32>
  }

There are couple potential solutions:

  1. Add pack/unpack cross reshape propagation to fusion on tensors. It is also the place that bubbling reshapes happens.
  2. Move collapsing dims pass around, and so we get chances to fold reshape ops away.

UPDATE: (1) does not work because it reshapes the packed dimensions.

@hanhanW hanhanW added the codegen Shared code generation infrastructure and dialects label Mar 19, 2024
@hanhanW hanhanW self-assigned this Mar 19, 2024
@hanhanW
Copy link
Contributor Author

hanhanW commented Mar 19, 2024

#16753

hanhanW added a commit that referenced this issue Apr 2, 2024
…ion (#16930)

It saves up to 20% number of dispatches in the benchmark suite.

Fixes #16835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen Shared code generation infrastructure and dialects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant