Demo elementwise lowering of tensor<poly> to loops #504

j2kun · 2024-03-07T20:45:45Z

Adds the MemRefElementTypeInterface to the polynomial type, and demonstrates the lowering to affine loops

bazel run //tools:heir-opt -- --convert-elementwise-to-linalg --one-shot-bufferize --convert-linalg-to-affine-loops $PWD/tests/polynomial/elementwise.mlir

The strange part is that lowering linalg to loops requires de-tensorizing the inputs to linalg.generic, which inserts bufferization.to_memref and bufferization.to_tensor as materializations. To completely eliminate those requires converting everything to memrefs, see https://github.com/google/heir/blob/main/tools/heir-opt.cpp#L85-L99

I tried running --polynomial-to-standard after this, but ran into another issue

/home/j2kun/fhe/heir/tests/polynomial/elementwise.mlir:8:8: 
error: failed to legalize unresolved materialization from 
'!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>' 
to 'tensor<1024xi25>' that remained live after conversion
  %0 = polynomial.add(%arg0, %arg1) : tensor<2x!poly>
       ^
/home/j2kun/fhe/heir/tests/polynomial/elementwise.mlir:8:8: 
note: see current operation: %7 = "builtin.unrealized_conversion_cast"(%6) :
 (!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>) -> tensor<1024xi25>
/home/j2kun/fhe/heir/tests/polynomial/elementwise.mlir:8:8: 
note: see existing live user here: %8 = arith.extsi %6 : tensor<1024xi25> to tensor<1024xi26>

AlexanderViand-Intel · 2024-03-07T23:02:03Z

The strange part is that lowering linalg to loops requires de-tensorizing the inputs to linalg.generic, which inserts bufferization.to_memref and bufferization.to_tensor as materializations. To completely eliminate those requires converting everything to memrefs, see https://github.com/google/heir/blob/main/tools/heir-opt.cpp#L85-L99

That's a bit unfortunate, but it's great to see that there is a way after all ❤️

AlexanderViand-Intel · 2024-03-08T01:12:35Z

include/Dialect/Polynomial/IR/PolynomialTypes.td


 // A base class for all types in this dialect
-class Polynomial_Type<string name, string typeMnemonic>
-    : TypeDef<Polynomial_Dialect, name> {
+class Polynomial_Type<string name, string typeMnemonic, list<Trait> traits = []>


Adding a pass-through list of traits is, imho, a good pattern to follow for any tablegen type class we define, so we should consider just adding this to all of them by default.

AlexanderViand-Intel

👍

For reference, this is how far we get after this when using the tensor-conversion (and unrealized conversion cast, for debugging) augmented -poynomial-to-standard (c.f. #143):

func.func @test_bin_ops(%arg0: tensor<2x1024xi25>, %arg1: tensor<2x1024xi25>) -> tensor<2x1024xi25> {
  %0 = builtin.unrealized_conversion_cast %arg1 : tensor<2x1024xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  %1 = builtin.unrealized_conversion_cast %arg0 : tensor<2x1024xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  %2 = bufferization.to_memref %0 : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
  %3 = bufferization.to_memref %1 : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  affine.for %arg2 = 0 to 2 {
    %6 = affine.load %3[%arg2] : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
    %7 = builtin.unrealized_conversion_cast %6 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
    %8 = affine.load %2[%arg2] : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
    %9 = builtin.unrealized_conversion_cast %8 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
    %cst = arith.constant dense<33538049> : tensor<1024xi26>
    %10 = arith.extsi %7 : tensor<1024xi25> to tensor<1024xi26>
    %11 = arith.extsi %9 : tensor<1024xi25> to tensor<1024xi26>
    %12 = arith.addi %10, %11 : tensor<1024xi26>
    %13 = arith.remsi %12, %cst : tensor<1024xi26>
    %14 = arith.trunci %13 : tensor<1024xi26> to tensor<1024xi25>
    %15 = builtin.unrealized_conversion_cast %14 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
    affine.store %15, %alloc[%arg2] : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  }
  %4 = bufferization.to_tensor %alloc : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  %5 = builtin.unrealized_conversion_cast %4 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> to tensor<2x1024xi25>
  return %5 : tensor<2x1024xi25>
}

Again, the issue seems to be that operations from affine and memref aren't marked as (dynamically) legal, so the type converter is never called on them (I think this might also be the root cause for #505?)

Demo elementwise lowering of tensor<poly> to loops

5757187

j2kun requested a review from AlexanderViand-Intel March 7, 2024 20:45

j2kun mentioned this pull request Mar 7, 2024

PolyToStandard: handling tensors of poly? #143

Closed

j2kun requested a review from asraa March 8, 2024 00:29

AlexanderViand-Intel reviewed Mar 8, 2024

View reviewed changes

AlexanderViand-Intel approved these changes Mar 8, 2024

View reviewed changes

j2kun added the pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing label Mar 8, 2024

copybara-service bot merged commit 322754e into google:main Mar 8, 2024
8 of 9 checks passed

AlexanderViand-Intel mentioned this pull request Mar 9, 2024

Add support for tensors in -polynomial-to-standard #508

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo elementwise lowering of tensor<poly> to loops #504

Demo elementwise lowering of tensor<poly> to loops #504

j2kun commented Mar 7, 2024

AlexanderViand-Intel commented Mar 7, 2024

AlexanderViand-Intel Mar 8, 2024

AlexanderViand-Intel left a comment

Demo elementwise lowering of tensor<poly> to loops #504

Demo elementwise lowering of tensor<poly> to loops #504

Conversation

j2kun commented Mar 7, 2024

AlexanderViand-Intel commented Mar 7, 2024

AlexanderViand-Intel Mar 8, 2024

Choose a reason for hiding this comment

AlexanderViand-Intel left a comment

Choose a reason for hiding this comment