We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the linear_polynomial test from #575
linear_polynomial
bazel run --noallow_analysis_cache_discard //tools:heir-opt -- --secretize=entry-function=linear_polynomial --wrap-generic --canonicalize --cse --heir-simd-vectorizer $PWD/tests/heir_simd_vectorizer/linear_polynomial_64.mlir
Leaves in a sequence of extract/insert chains
%0 = secret.generic ins(%arg0, %arg1, %arg2, %arg3 : !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>, !secret.secret<tensor<64xi16>>) { ^bb0(%arg4: tensor<64xi16>, %arg5: tensor<64xi16>, %arg6: tensor<64xi16>, %arg7: tensor<64xi16>): %1 = arith.muli %arg4, %arg6 : tensor<64xi16> %2 = arith.subi %1, %arg7 : tensor<64xi16> %3 = arith.subi %2, %arg5 : tensor<64xi16> %extracted = tensor.extract %3[%c0] : tensor<64xi16> %inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16> %4 = arith.subi %arg7, %1 : tensor<64xi16> %5 = arith.subi %4, %arg5 : tensor<64xi16> %extracted_0 = tensor.extract %5[%c1] : tensor<64xi16> %inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<64xi16> %extracted_2 = tensor.extract %5[%c2] : tensor<64xi16> %inserted_3 = tensor.insert %extracted_2 into %inserted_1[%c2] : tensor<64xi16> %extracted_4 = tensor.extract %5[%c3] : tensor<64xi16> %inserted_5 = tensor.insert %extracted_4 into %inserted_3[%c3] : tensor<64xi16> ...
Should be converted to a single rotation.
The text was updated successfully, but these errors were encountered:
Looks like this is the problem: the item inserted into index zero comes from a different tensor than the rest of the insertions.
%3 = arith.subi %2, %arg5 : tensor<64xi16> %extracted = tensor.extract %3[%c0] : tensor<64xi16> // <--- extracts from %3 %inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16> %4 = arith.subi %arg7, %1 : tensor<64xi16> %5 = arith.subi %4, %arg5 : tensor<64xi16> %extracted_0 = tensor.extract %5[%c1] : tensor<64xi16> // <--- extracts from %5 %inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<64xi16> ...
Sorry, something went wrong.
It looks like during insert-rotate something causes the order of the operands to the very first sub op to swap place
sub
%1 = arith.muli %arg4, %arg6 : tensor<64xi16> %2 = arith.subi %1, %arg7 : tensor<64xi16> // arg7 should come first! %3 = arith.subi %2, %arg5 : tensor<64xi16> %extracted = tensor.extract %3[%c0] : tensor<64xi16> %inserted = tensor.insert %extracted into %arg7[%c0] : tensor<64xi16> %4 = arith.muli %arg4, %arg6 : tensor<64xi16> %5 = arith.subi %arg7, %4 : tensor<64xi16> %6 = arith.subi %5, %arg5 : tensor<64xi16>
Compare this to unrolled, but before insert-rotate:
%extracted = tensor.extract %arg4[%c0] : tensor<64xi16> %extracted_0 = tensor.extract %arg5[%c0] : tensor<64xi16> %extracted_1 = tensor.extract %arg6[%c0] : tensor<64xi16> %extracted_2 = tensor.extract %arg7[%c0] : tensor<64xi16> %1 = arith.muli %extracted, %extracted_1 : i16 %2 = arith.subi %extracted_2, %1 : i16 // extracted_2 comes first %3 = arith.subi %2, %extracted_0 : i16 %inserted = tensor.insert %3 into %arg7[%c0] : tensor<64xi16> %4 = affine.apply #map(%c0) %extracted_3 = tensor.extract %arg4[%4] : tensor<64xi16> %extracted_4 = tensor.extract %arg5[%4] : tensor<64xi16> %extracted_5 = tensor.extract %arg6[%4] : tensor<64xi16> %extracted_6 = tensor.extract %arg7[%4] : tensor<64xi16> %5 = arith.muli %extracted_3, %extracted_5 : i16 %6 = arith.subi %extracted_6, %5 : i16 // extracted_6 comes first %7 = arith.subi %6, %extracted_4 : i16 %inserted_7 = tensor.insert %7 into %inserted[%4] : tensor<64xi16>
AlignRotations
arith.subi
j2kun
Successfully merging a pull request may close this issue.
Using the
linear_polynomial
test from #575Leaves in a sequence of extract/insert chains
Should be converted to a single rotation.
The text was updated successfully, but these errors were encountered: