CollapseInsertionChains doesn't handle linear polynomial #578

j2kun · 2024-03-29T18:36:57Z

Using the linear_polynomial test from #575

bazel run --noallow_analysis_cache_discard //tools:heir-opt -- --secretize=entry-function=linear_polynomial --wrap-generic --canonicalize --cse --heir-simd-vectorizer $PWD/tests/heir_simd_vectorizer/linear_polynomial_64.mlir

Leaves in a sequence of extract/insert chains

    %0 = secret.generic ins(%arg0, %arg1, %arg2, %arg3 : !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>) {
    ^bb0(%arg4: tensor<64xi16>, %arg5: tensor<64xi16>, %arg6: tensor<64xi16>, %arg7: tensor<64xi16>):
      %1 = arith.muli %arg4, %arg6 : tensor<64xi16>
      %2 = arith.subi %1, %arg7 : tensor<64xi16>
      %3 = arith.subi %2, %arg5 : tensor<64xi16>
      %extracted = tensor.extract %3[%c0] : tensor<64xi16>
      %inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16>
      %4 = arith.subi %arg7, %1 : tensor<64xi16>
      %5 = arith.subi %4, %arg5 : tensor<64xi16>
      %extracted_0 = tensor.extract %5[%c1] : tensor<64xi16>
      %inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<64xi16>
      %extracted_2 = tensor.extract %5[%c2] : tensor<64xi16>
      %inserted_3 = tensor.insert %extracted_2 into %inserted_1[%c2] : tensor<64xi16>
      %extracted_4 = tensor.extract %5[%c3] : tensor<64xi16>
      %inserted_5 = tensor.insert %extracted_4 into %inserted_3[%c3] : tensor<64xi16>
      ...

Should be converted to a single rotation.

The text was updated successfully, but these errors were encountered:

j2kun · 2024-03-29T19:01:28Z

Looks like this is the problem: the item inserted into index zero comes from a different tensor than the rest of the insertions.

%3 = arith.subi %2, %arg5 : tensor<64xi16>
%extracted = tensor.extract %3[%c0] : tensor<64xi16>   // <--- extracts from %3
%inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16>
%4 = arith.subi %arg7, %1 : tensor<64xi16>
%5 = arith.subi %4, %arg5 : tensor<64xi16>
%extracted_0 = tensor.extract %5[%c1] : tensor<64xi16>   // <--- extracts from %5
%inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<64xi16>
...

j2kun · 2024-03-29T19:08:40Z

It looks like during insert-rotate something causes the order of the operands to the very first sub op to swap place

%1 = arith.muli %arg4, %arg6 : tensor<64xi16>
%2 = arith.subi %1, %arg7 : tensor<64xi16>  // arg7 should come first!
%3 = arith.subi %2, %arg5 : tensor<64xi16>
%extracted = tensor.extract %3[%c0] : tensor<64xi16>
%inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16>
%4 = arith.muli %arg4, %arg6 : tensor<64xi16>
%5 = arith.subi %arg7, %4 : tensor<64xi16>
%6 = arith.subi %5, %arg5 : tensor<64xi16>

Compare this to unrolled, but before insert-rotate:

%extracted = tensor.extract %arg4[%c0] : tensor<64xi16>
%extracted_0 = tensor.extract %arg5[%c0] : tensor<64xi16>
%extracted_1 = tensor.extract %arg6[%c0] : tensor<64xi16>
%extracted_2 = tensor.extract %arg7[%c0] : tensor<64xi16>
%1 = arith.muli %extracted, %extracted_1 : i16
%2 = arith.subi %extracted_2, %1 : i16   // extracted_2 comes first
%3 = arith.subi %2, %extracted_0 : i16
%inserted = tensor.insert %3 into %arg7[%c0] : tensor<64xi16>
%4 = affine.apply #map(%c0)
%extracted_3 = tensor.extract %arg4[%4] : tensor<64xi16>
%extracted_4 = tensor.extract %arg5[%4] : tensor<64xi16>
%extracted_5 = tensor.extract %arg6[%4] : tensor<64xi16>
%extracted_6 = tensor.extract %arg7[%4] : tensor<64xi16>
%5 = arith.muli %extracted_3, %extracted_5 : i16
%6 = arith.subi %extracted_6, %5 : i16   // extracted_6 comes first
%7 = arith.subi %6, %extracted_4 : i16
%inserted_7 = tensor.insert %7 into %inserted[%4] : tensor<64xi16>

j2kun self-assigned this Mar 29, 2024

j2kun added bug Something isn't working dialect: tensor_ext Issues concerning the tensor_ext dialect labels Mar 29, 2024

This was referenced Mar 29, 2024

Determine if nested AlignRotations patterns are still necessary #579

Open

Fix bad operand ordering causing collapse-insertion-chains to fail on examples with arith.subi #580

Merged

copybara-service bot closed this as completed in #580 Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CollapseInsertionChains doesn't handle linear polynomial #578

CollapseInsertionChains doesn't handle linear polynomial #578

j2kun commented Mar 29, 2024

j2kun commented Mar 29, 2024

j2kun commented Mar 29, 2024 •

edited

Loading

CollapseInsertionChains doesn't handle linear polynomial #578

CollapseInsertionChains doesn't handle linear polynomial #578

Comments

j2kun commented Mar 29, 2024

j2kun commented Mar 29, 2024

j2kun commented Mar 29, 2024 • edited Loading

j2kun commented Mar 29, 2024 •

edited

Loading