Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CollapseInsertionChains doesn't handle linear polynomial #578

Closed
j2kun opened this issue Mar 29, 2024 · 2 comments · Fixed by #580
Closed

CollapseInsertionChains doesn't handle linear polynomial #578

j2kun opened this issue Mar 29, 2024 · 2 comments · Fixed by #580
Assignees
Labels
bug Something isn't working dialect: tensor_ext Issues concerning the tensor_ext dialect

Comments

@j2kun
Copy link
Collaborator

j2kun commented Mar 29, 2024

Using the linear_polynomial test from #575

bazel run --noallow_analysis_cache_discard //tools:heir-opt -- --secretize=entry-function=linear_polynomial --wrap-generic --canonicalize --cse --heir-simd-vectorizer $PWD/tests/heir_simd_vectorizer/linear_polynomial_64.mlir

Leaves in a sequence of extract/insert chains

    %0 = secret.generic ins(%arg0, %arg1, %arg2, %arg3 : !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>) {
    ^bb0(%arg4: tensor<64xi16>, %arg5: tensor<64xi16>, %arg6: tensor<64xi16>, %arg7: tensor<64xi16>):
      %1 = arith.muli %arg4, %arg6 : tensor<64xi16>
      %2 = arith.subi %1, %arg7 : tensor<64xi16>
      %3 = arith.subi %2, %arg5 : tensor<64xi16>
      %extracted = tensor.extract %3[%c0] : tensor<64xi16>
      %inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16>
      %4 = arith.subi %arg7, %1 : tensor<64xi16>
      %5 = arith.subi %4, %arg5 : tensor<64xi16>
      %extracted_0 = tensor.extract %5[%c1] : tensor<64xi16>
      %inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<64xi16>
      %extracted_2 = tensor.extract %5[%c2] : tensor<64xi16>
      %inserted_3 = tensor.insert %extracted_2 into %inserted_1[%c2] : tensor<64xi16>
      %extracted_4 = tensor.extract %5[%c3] : tensor<64xi16>
      %inserted_5 = tensor.insert %extracted_4 into %inserted_3[%c3] : tensor<64xi16>
      ...

Should be converted to a single rotation.

@j2kun j2kun self-assigned this Mar 29, 2024
@j2kun j2kun added bug Something isn't working dialect: tensor_ext Issues concerning the tensor_ext dialect labels Mar 29, 2024
@j2kun
Copy link
Collaborator Author

j2kun commented Mar 29, 2024

Looks like this is the problem: the item inserted into index zero comes from a different tensor than the rest of the insertions.

%3 = arith.subi %2, %arg5 : tensor<64xi16>
%extracted = tensor.extract %3[%c0] : tensor<64xi16>   // <--- extracts from %3
%inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16>
%4 = arith.subi %arg7, %1 : tensor<64xi16>
%5 = arith.subi %4, %arg5 : tensor<64xi16>
%extracted_0 = tensor.extract %5[%c1] : tensor<64xi16>   // <--- extracts from %5
%inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<64xi16>
...

@j2kun
Copy link
Collaborator Author

j2kun commented Mar 29, 2024

It looks like during insert-rotate something causes the order of the operands to the very first sub op to swap place

%1 = arith.muli %arg4, %arg6 : tensor<64xi16>
%2 = arith.subi %1, %arg7 : tensor<64xi16>  // arg7 should come first!
%3 = arith.subi %2, %arg5 : tensor<64xi16>
%extracted = tensor.extract %3[%c0] : tensor<64xi16>
%inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16>
%4 = arith.muli %arg4, %arg6 : tensor<64xi16>
%5 = arith.subi %arg7, %4 : tensor<64xi16>
%6 = arith.subi %5, %arg5 : tensor<64xi16>

Compare this to unrolled, but before insert-rotate:

%extracted = tensor.extract %arg4[%c0] : tensor<64xi16>
%extracted_0 = tensor.extract %arg5[%c0] : tensor<64xi16>
%extracted_1 = tensor.extract %arg6[%c0] : tensor<64xi16>
%extracted_2 = tensor.extract %arg7[%c0] : tensor<64xi16>
%1 = arith.muli %extracted, %extracted_1 : i16
%2 = arith.subi %extracted_2, %1 : i16   // extracted_2 comes first
%3 = arith.subi %2, %extracted_0 : i16
%inserted = tensor.insert %3 into %arg7[%c0] : tensor<64xi16>
%4 = affine.apply #map(%c0)
%extracted_3 = tensor.extract %arg4[%4] : tensor<64xi16>
%extracted_4 = tensor.extract %arg5[%4] : tensor<64xi16>
%extracted_5 = tensor.extract %arg6[%4] : tensor<64xi16>
%extracted_6 = tensor.extract %arg7[%4] : tensor<64xi16>
%5 = arith.muli %extracted_3, %extracted_5 : i16
%6 = arith.subi %extracted_6, %5 : i16   // extracted_6 comes first
%7 = arith.subi %6, %extracted_4 : i16
%inserted_7 = tensor.insert %7 into %inserted[%4] : tensor<64xi16>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dialect: tensor_ext Issues concerning the tensor_ext dialect
Projects
None yet
1 participant