Release v1.2.21 · microsoft/Accera

What's Changed

Merged PR 3101: [build] install pkg-config for macos buddy builds.
[Lisa Ong]

Fixes macos packaging build failure:

https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=47235&view=results
Merged PR 3098: [nfc] Move vectorization code to separate files.
[Mason Remy]

[nfc] Move vectorization code to separate files

Moves vectorization code out of ExecutionPlanToAffineLoweringPass in
preparation for better separating out a vectorization pass that can be
run later than vectorization is currently happening
Merged PR 3100: Adds CMake dependencies to acc-translate to ensure
correct build. [Kern Handa]

Adds CMake dependencies to acc-translate to ensure correct build
Merged PR 3095: Remove duplicate SubArray class. [Mason Remy]

Remove duplicate SubArray class

Merged PR 3073: vectorize masked load store. [JUBI TANEJA]

This PR handles vectorization specifically for a masked buffer fill, where the output size is larger than the input. There is a conditional load and vector store.

Given the nest:

        @nest.iteration_logic
        def _nest():
            def store_value():
                Output[i] = Input[i]
            def store_zero():
                Output[i] = 0
            _If(i < N_input, store_value).Else(store_zero)

The unoptimized MLIR is as follows:

  %c0_i32 = arith.constant 0 : i32
  %c5 = arith.constant 5 : index
  "accv.lambda"() ({
    affine.for %arg2 = 0 to 8 {
      %0 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
      scf.if %0 {
        %1 = affine.load %arg0[%arg2] : memref<5xi32>
        affine.store %1, %arg1[%arg2] : memref<8xi32>
      } else {
        affine.store %c0_i32, %arg1[%arg2] : memref<8xi32>
      }
    }

On vectorizing this for loop, we get the vectorized MLIR (simplified version) as follows:

  %c5 = arith.constant 5 : index
  %cst = arith.constant dense<false> : vector<8xi1>
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c3 = arith.constant 3 : index
  %c4 = arith.constant 4 : index
  %c6 = arith.constant 6 : index
  %c7 = arith.constant 7 : index
  %c0_i32 = arith.constant 0 : i32
  "accv.lambda"() ({
    affine.for %arg2 = 0 to 8 step 8 {

      %7 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %9 = "accv.cmp"(%0, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %11 = "accv.cmp"(%1, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %13 = "accv.cmp"(%2, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %15 = "accv.cmp"(%3, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %17 = "accv.cmp"(%4, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %19 = "accv.cmp"(%5, %c5) {predicate = 2 : i64} : (index, index) -> i1
      %21 = "accv.cmp"(%6, %c5) {predicate = 2 : i64} : (index, index) -> i1

      %23 = memref.reinterpret_cast %arg0 to offset: [0], sizes: [5], strides: [1] : memref<5xi32> to memref<5xi32>
      %24 = vector.transfer_read %23[%arg2], %c0_i32, %22 : memref<5xi32>, vector<8xi32>

      %25 = memref.reinterpret_cast %arg1 to offset: [0], sizes: [8], strides: [1] : memref<8xi32> to memref<8xi32>
      vector.store %24, %25[%arg2] : memref<8xi32>, vector<8xi32>
    }

Merged PR 3093: Add meaningful error messages for c++ exceptions.
[Captain Jack Sparrow]

Add meaningful error messages for c++ exceptions
Merged PR 3092: Add type size getter utility. [Captain Jack Sparrow]

Add type size getter utility
Merged PR 3074: Add rudimentary pass to fix redundant load/store
issue. [Chuck Jacobs]

This PR adds a simple pattern to ValueSimplifyPass that looks for the redundant load/store pattern we often see at the end of kernels, and removes them.
Merged PR 3075: Enable fast_exp operation. [Chuck Jacobs]

This PR makes a few changes to enable the fast_exp operation:
- Adds fast_exp to the python DSL
- Enables vectorization of abs instruction (which is used by fast_exp)
It also makes a couple of other minor changes:
- Improves auto-naming of nest indices
- Better support for using custom LLVM builds with Accera
Merged PR 3088: Support dynamic sub_array shape, split_dim size.
[Mason Remy]

Support dynamic sub_array shape, split_dim size

This still requires that the sizes are static before lowering, but it
supports dynamic sizes temporarily before inlining into an outer static
function
Merged PR 3078: Adds reinterpret_cast functionality to Array. [Kern
Handa]

Adds reinterpret_cast functionality to Array
Merged PR 3070: Fixes for sub_array and _split_dimension. [Mason Remy]

Fixes for sub_array and _split_dimension

This fixes the sub array and split dim ops to work with the accera
codebase that has updated around them. Some MemoryLayout assumptions are
getting in the way and have been disabled in the short-term, however
long term our memory layout behavior should more closely match what MLIR
affine maps can represent for more generalized dynamic support
Merged PR 3063: Refactor Dimension with C++ backend container class
and few other fixes. [Captain Jack Sparrow]
- Refactor Dimension with C++ backend container (ScalarDimension)
- Enable output scalar variables
- Fix dynamic sized TEMP arrays
Merged PR 3072: Bump hatlib version to 0.0.34, skip unsupported test
on arm64 macOS, minor targets doc update. [Lisa Ong]

Update hatlib version since there is no incompatibility

Full Changelog: v1.2.20...v1.2.21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.21

What's Changed