Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo elementwise lowering of tensor<poly> to loops #504

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

j2kun
Copy link
Collaborator

@j2kun j2kun commented Mar 7, 2024

Adds the MemRefElementTypeInterface to the polynomial type, and demonstrates the lowering to affine loops

bazel run //tools:heir-opt -- --convert-elementwise-to-linalg --one-shot-bufferize --convert-linalg-to-affine-loops $PWD/tests/polynomial/elementwise.mlir

The strange part is that lowering linalg to loops requires de-tensorizing the inputs to linalg.generic, which inserts bufferization.to_memref and bufferization.to_tensor as materializations. To completely eliminate those requires converting everything to memrefs, see https://github.com/google/heir/blob/main/tools/heir-opt.cpp#L85-L99

I tried running --polynomial-to-standard after this, but ran into another issue

/home/j2kun/fhe/heir/tests/polynomial/elementwise.mlir:8:8: 
error: failed to legalize unresolved materialization from 
'!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>' 
to 'tensor<1024xi25>' that remained live after conversion
  %0 = polynomial.add(%arg0, %arg1) : tensor<2x!poly>
       ^
/home/j2kun/fhe/heir/tests/polynomial/elementwise.mlir:8:8: 
note: see current operation: %7 = "builtin.unrealized_conversion_cast"(%6) :
 (!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>) -> tensor<1024xi25>
/home/j2kun/fhe/heir/tests/polynomial/elementwise.mlir:8:8: 
note: see existing live user here: %8 = arith.extsi %6 : tensor<1024xi25> to tensor<1024xi26>

@AlexanderViand-Intel
Copy link
Collaborator

The strange part is that lowering linalg to loops requires de-tensorizing the inputs to linalg.generic, which inserts bufferization.to_memref and bufferization.to_tensor as materializations. To completely eliminate those requires converting everything to memrefs, see https://github.com/google/heir/blob/main/tools/heir-opt.cpp#L85-L99

That's a bit unfortunate, but it's great to see that there is a way after all ❤️

@j2kun j2kun requested a review from asraa March 8, 2024 00:29

// A base class for all types in this dialect
class Polynomial_Type<string name, string typeMnemonic>
: TypeDef<Polynomial_Dialect, name> {
class Polynomial_Type<string name, string typeMnemonic, list<Trait> traits = []>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a pass-through list of traits is, imho, a good pattern to follow for any tablegen type class we define, so we should consider just adding this to all of them by default.

Copy link
Collaborator

@AlexanderViand-Intel AlexanderViand-Intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

For reference, this is how far we get after this when using the tensor-conversion (and unrealized conversion cast, for debugging) augmented -poynomial-to-standard (c.f. #143):

func.func @test_bin_ops(%arg0: tensor<2x1024xi25>, %arg1: tensor<2x1024xi25>) -> tensor<2x1024xi25> {
  %0 = builtin.unrealized_conversion_cast %arg1 : tensor<2x1024xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  %1 = builtin.unrealized_conversion_cast %arg0 : tensor<2x1024xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  %2 = bufferization.to_memref %0 : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
  %3 = bufferization.to_memref %1 : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  affine.for %arg2 = 0 to 2 {
    %6 = affine.load %3[%arg2] : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
    %7 = builtin.unrealized_conversion_cast %6 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
    %8 = affine.load %2[%arg2] : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, strided<[?], offset: ?>>
    %9 = builtin.unrealized_conversion_cast %8 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
    %cst = arith.constant dense<33538049> : tensor<1024xi26>
    %10 = arith.extsi %7 : tensor<1024xi25> to tensor<1024xi26>
    %11 = arith.extsi %9 : tensor<1024xi25> to tensor<1024xi26>
    %12 = arith.addi %10, %11 : tensor<1024xi26>
    %13 = arith.remsi %12, %cst : tensor<1024xi26>
    %14 = arith.trunci %13 : tensor<1024xi26> to tensor<1024xi25>
    %15 = builtin.unrealized_conversion_cast %14 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
    affine.store %15, %alloc[%arg2] : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  }
  %4 = bufferization.to_tensor %alloc : memref<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
  %5 = builtin.unrealized_conversion_cast %4 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> to tensor<2x1024xi25>
  return %5 : tensor<2x1024xi25>
}

Again, the issue seems to be that operations from affine and memref aren't marked as (dynamically) legal, so the type converter is never called on them (I think this might also be the root cause for #505?)

@j2kun j2kun added the pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing label Mar 8, 2024
@copybara-service copybara-service bot merged commit 322754e into google:main Mar 8, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants