Skip to content

v1.2.21

Compare
Choose a tag to compare
@lisaong lisaong released this 20 Feb 09:58
· 9 commits to main since this release

What's Changed

  • Merged PR 3101: [build] install pkg-config for macos buddy builds.
    [Lisa Ong]

    Fixes macos packaging build failure:

    https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=47235&view=results

  • Merged PR 3098: [nfc] Move vectorization code to separate files.
    [Mason Remy]

    [nfc] Move vectorization code to separate files

    Moves vectorization code out of ExecutionPlanToAffineLoweringPass in
    preparation for better separating out a vectorization pass that can be
    run later than vectorization is currently happening

  • Merged PR 3100: Adds CMake dependencies to acc-translate to ensure
    correct build. [Kern Handa]

    Adds CMake dependencies to acc-translate to ensure correct build

  • Merged PR 3095: Remove duplicate SubArray class. [Mason Remy]

    Remove duplicate SubArray class

  • Merged PR 3073: vectorize masked load store. [JUBI TANEJA]

    This PR handles vectorization specifically for a masked buffer fill, where the output size is larger than the input. There is a conditional load and vector store.

    Given the nest:

            @nest.iteration_logic
            def _nest():
                def store_value():
                    Output[i] = Input[i]
                def store_zero():
                    Output[i] = 0
                _If(i < N_input, store_value).Else(store_zero)
    

    The unoptimized MLIR is as follows:

      %c0_i32 = arith.constant 0 : i32
      %c5 = arith.constant 5 : index
      "accv.lambda"() ({
        affine.for %arg2 = 0 to 8 {
          %0 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
          scf.if %0 {
            %1 = affine.load %arg0[%arg2] : memref<5xi32>
            affine.store %1, %arg1[%arg2] : memref<8xi32>
          } else {
            affine.store %c0_i32, %arg1[%arg2] : memref<8xi32>
          }
        }
    

    On vectorizing this for loop, we get the vectorized MLIR (simplified version) as follows:

      %c5 = arith.constant 5 : index
      %cst = arith.constant dense<false> : vector<8xi1>
      %c0 = arith.constant 0 : index
      %c1 = arith.constant 1 : index
      %c2 = arith.constant 2 : index
      %c3 = arith.constant 3 : index
      %c4 = arith.constant 4 : index
      %c6 = arith.constant 6 : index
      %c7 = arith.constant 7 : index
      %c0_i32 = arith.constant 0 : i32
      "accv.lambda"() ({
        affine.for %arg2 = 0 to 8 step 8 {
    
          %7 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %9 = "accv.cmp"(%0, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %11 = "accv.cmp"(%1, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %13 = "accv.cmp"(%2, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %15 = "accv.cmp"(%3, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %17 = "accv.cmp"(%4, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %19 = "accv.cmp"(%5, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %21 = "accv.cmp"(%6, %c5) {predicate = 2 : i64} : (index, index) -> i1
    
          %23 = memref.reinterpret_cast %arg0 to offset: [0], sizes: [5], strides: [1] : memref<5xi32> to memref<5xi32>
          %24 = vector.transfer_read %23[%arg2], %c0_i32, %22 : memref<5xi32>, vector<8xi32>
    
          %25 = memref.reinterpret_cast %arg1 to offset: [0], sizes: [8], strides: [1] : memref<8xi32> to memref<8xi32>
          vector.store %24, %25[%arg2] : memref<8xi32>, vector<8xi32>
        }
    
  • Merged PR 3093: Add meaningful error messages for c++ exceptions.
    [Captain Jack Sparrow]

    Add meaningful error messages for c++ exceptions

  • Merged PR 3092: Add type size getter utility. [Captain Jack Sparrow]

    Add type size getter utility

  • Merged PR 3074: Add rudimentary pass to fix redundant load/store
    issue. [Chuck Jacobs]

    This PR adds a simple pattern to ValueSimplifyPass that looks for the redundant load/store pattern we often see at the end of kernels, and removes them.

  • Merged PR 3075: Enable fast_exp operation. [Chuck Jacobs]

    This PR makes a few changes to enable the fast_exp operation:

    • Adds fast_exp to the python DSL
    • Enables vectorization of abs instruction (which is used by fast_exp)

    It also makes a couple of other minor changes:

    • Improves auto-naming of nest indices
    • Better support for using custom LLVM builds with Accera
  • Merged PR 3088: Support dynamic sub_array shape, split_dim size.
    [Mason Remy]

    Support dynamic sub_array shape, split_dim size

    This still requires that the sizes are static before lowering, but it
    supports dynamic sizes temporarily before inlining into an outer static
    function

  • Merged PR 3078: Adds reinterpret_cast functionality to Array. [Kern
    Handa]

    Adds reinterpret_cast functionality to Array

  • Merged PR 3070: Fixes for sub_array and _split_dimension. [Mason Remy]

    Fixes for sub_array and _split_dimension

    This fixes the sub array and split dim ops to work with the accera
    codebase that has updated around them. Some MemoryLayout assumptions are
    getting in the way and have been disabled in the short-term, however
    long term our memory layout behavior should more closely match what MLIR
    affine maps can represent for more generalized dynamic support

  • Merged PR 3063: Refactor Dimension with C++ backend container class
    and few other fixes. [Captain Jack Sparrow]

    • Refactor Dimension with C++ backend container (ScalarDimension)
    • Enable output scalar variables
    • Fix dynamic sized TEMP arrays
  • Merged PR 3072: Bump hatlib version to 0.0.34, skip unsupported test
    on arm64 macOS, minor targets doc update. [Lisa Ong]

    Update hatlib version since there is no incompatibility

Full Changelog: v1.2.20...v1.2.21