[CPU] The number of dispatches regresses with folding pad op unit dims #16835

hanhanW · 2024-03-19T20:17:24Z

It is caused by llvm/llvm-project@60e562d There are additional tensor.expand_shape bubbled up which becomes a fusion barrier between unpack and generic op. We don't see any reshape ops after outlining dispatches because we run CollapseDims after FormDispatchRegion. The pass creates below snippet and folds the reshapes away.

  %75 = flow.dispatch.region -> (tensor<784x96xf32>) {
    %unpack = tensor.unpack %74 outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %72 : tensor<49x6x16x16xf32> -> tensor<784x96xf32>
    flow.return %unpack : tensor<784x96xf32>
  }
  %expanded_100 = tensor.expand_shape %75 [[0, 1], [2]] : tensor<784x96xf32> into tensor<28x28x96xf32>
  %76 = tensor.empty() : tensor<28x28x96xf32>
  %collapsed_101 = tensor.collapse_shape %expanded_100 [[0, 1], [2]] : tensor<28x28x96xf32> into tensor<784x96xf32>
  %77 = flow.dispatch.region -> (tensor<784x96xf32>) {
    %cst_253 = arith.constant 0.166666672 : f32
    %cst_254 = arith.constant 0.000000e+00 : f32
    %cst_255 = arith.constant 6.000000e+00 : f32
    %cst_256 = arith.constant 3.000000e+00 : f32
    %330 = tensor.empty() : tensor<784x96xf32>
    %331 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%collapsed_101, %cst_36 : tensor<784x96xf32>, tensor<96xf32>) o
    ^bb0(%in: f32, %in_258: f32, %out: f32):
      %332 = arith.addf %in, %in_258 : f32
      %333 = arith.addf %332, %cst_256 : f32
      %334 = arith.minimumf %333, %cst_255 : f32
      %335 = arith.maximumf %334, %cst_254 : f32
      %336 = arith.mulf %332, %335 : f32
      %337 = arith.mulf %336, %cst_253 : f32
      linalg.yield %337 : f32
    } -> tensor<784x96xf32>
    %expanded_257 = tensor.expand_shape %331 [[0, 1], [2]] : tensor<784x96xf32> into tensor<28x28x96xf32>
    flow.return %331 : tensor<784x96xf32>
  }

There are couple potential solutions:

Add pack/unpack cross reshape propagation to fusion on tensors. It is also the place that bubbling reshapes happens.
Move collapsing dims pass around, and so we get chances to fold reshape ops away.

UPDATE: (1) does not work because it reshapes the packed dimensions.

The text was updated successfully, but these errors were encountered:

hanhanW · 2024-03-19T20:27:46Z

#16753

…ion (#16930) It saves up to 20% number of dispatches in the benchmark suite. Fixes #16835

hanhanW added the codegen Shared code generation infrastructure and dialects label Mar 19, 2024

hanhanW self-assigned this Mar 19, 2024

hanhanW mentioned this issue Mar 29, 2024

[Flow] Do not propagate reshape when it's blocking unpack+generic fusion #16930

Merged

hanhanW closed this as completed in #16930 Apr 2, 2024

hanhanW added a commit that referenced this issue Apr 2, 2024

[Flow] Do not propagate reshape when it's blocking unpack+generic fus…

05ff73f

…ion (#16930) It saves up to 20% number of dispatches in the benchmark suite. Fixes #16835

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] The number of dispatches regresses with folding pad op unit dims #16835

[CPU] The number of dispatches regresses with folding pad op unit dims #16835

hanhanW commented Mar 19, 2024 •

edited

hanhanW commented Mar 19, 2024

[CPU] The number of dispatches regresses with folding pad op unit dims #16835

[CPU] The number of dispatches regresses with folding pad op unit dims #16835

Comments

hanhanW commented Mar 19, 2024 • edited

hanhanW commented Mar 19, 2024

hanhanW commented Mar 19, 2024 •

edited