Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 37efd7c8223542c3d953f6127308542013c159b8
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Fri Mar 18 00:34:18 2022 +0000

    Merged PR 2439: Downstream doc changes from github/main

    Squashed commit of the following:

    commit 8a6e553
    Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com>
    Date:   Sat Feb 26 16:50:57 2022 +0500

        complete refactoring of introduction.md file in manual docs (#15)

        * Feedback addressed

        * Addressed the pending comments

    commit 329d695
    Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com>
    Date:   Fri Feb 25 21:37:19 2022 +0500

        Complete refactoring of file array.md and simple affine loop nests.md file in manual docs (#16)

        * complete refactoring of introduction.md file

        * completed array.md and simple affine loop nests.md files

        * Took care of extra semicolon

    commit 04af790
    Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com>
    Date:   Tue Feb 22 05:42:22 2022 +0500

        README.md refactoring (#13)

        * initial commit

        * worked on README.md until goals of accera section. Took the liberty of changing some headings, restructuring the paragraphs, and adding one more goal

        * Feedback addressed regarding README.md file

        * Take care of last comment and completed the whole file from my side

        Co-authored-by: Lisa Ong <11318241+lisaong@users.noreply.github.com>

commit 356872bf787b3b076ac45aa86d2275ffcd15364e
Author: Abdul Dakkak <adakkak@microsoft.com>
Date:   Thu Mar 17 12:35:33 2022 +0000

    Merged PR 2440: Enable tensorization for Rocm target

commit 5557ff59f398ddad818e9c5b93cd00408bd7637c
Author: Kern Handa <kerha@microsoft.com>
Date:   Wed Mar 16 22:03:29 2022 +0000

    Merged PR 2470: Adds support for the execution of GPU (CUDA only) functions via hat

commit fb803a9fbaf0bfa7f809f5bdd8366629febb9bd0
Author: Denny Sun <dennys@microsoft.com>
Date:   Wed Mar 16 20:18:23 2022 +0000

    Merged PR 2467: Adding multiple functions in package.add() can't work with stateful auxiliary metadata and index_map

    These bugs are all about sharing Python objects among different functions, like auxiliary metadata and schedule's indexes, when we call pacakge.add() to add multiple parameterized functions, we add functions one by one, then emit functions one by one, at each step, the state of shared Python object is changed which results in only the first function added being correctly emitted, to make _add_function work, we need to make these shared Python objects stateless.

    Related work items: #3662

commit e149bac1147d160b05aa55ad8ef4416423c20925
Author: Mason Remy <masonr@microsoft.com>
Date:   Wed Mar 16 06:31:10 2022 +0000

    Merged PR 2469: Convert 'Local' memory space to 'Private'

    Convert 'Local' memory space to 'Private'

commit 65363d35f7a31dfc682366ba70caaf301806a44b
Author: Mason Remy <masonr@microsoft.com>
Date:   Wed Mar 16 02:41:31 2022 +0000

    Merged PR 2463: Enable specifying double buffer memory space

    Enable specifying double buffer memory space

commit f80b46af2b12689ff617ba3a491fee6ae9aad010
Author: Kern Handa <kerha@microsoft.com>
Date:   Wed Mar 16 01:57:46 2022 +0000

    Merged PR 2468: Move to VS2022 for builds

    Move to VS2022 for builds

commit 0870cb27ccbe52fa8182b960140f5b6d562ab929
Author: Abdul Dakkak <adakkak@microsoft.com>
Date:   Tue Mar 15 14:01:15 2022 +0000

    Merged PR 2465: extend gpu target spec

    extend gpu target spec

commit 07088ecd0700fee16efdab677a581aa47a6a8690
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Mar 15 09:30:22 2022 +0000

    Merged PR 2464: Compute a stable hash for function name suffixes

    Create a stable hash using md5 and json serialization of these stringized entries:
    - Array args: shape, type, role, layout
    - parameter dictionary
    - Target

    Example output:
    ```
    test_unequal_iteration_space_fusing_1 (__main__.DSLTest_04Fusing) ... DEBUG:root:Adding wrapped function
    DEBUG:root:Adding wrapped function
    Building function fusing_test_32d12fb1a01061ec
    DEBUG:root:Detected logic function _ uses indices i,j
    DEBUG:root:Detected logic function _ uses indices i,j
    Building function _debug_check_allclose_16_16_4cfd65a8b606655b
    ```

commit 63e82be5e7b92f750fdf6c19347609c119cc5642
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Mar 15 00:25:13 2022 +0000

    Merged PR 2460: [nfc] Fix build.sh setting for vcpkg debug builds

commit d5ca516084dd68966e8c14b6d64d4402f572349a
Author: Mason Remy <masonr@microsoft.com>
Date:   Mon Mar 14 19:53:46 2022 +0000

    Merged PR 2461: Replace MemoryType with MemorySpace for consistency

    Replace MemoryType with MemorySpace for consistency

commit fdb503611bd235ca59c7769bd0d752519ce42bf5
Author: Mason Remy <masonr@microsoft.com>
Date:   Mon Mar 14 18:42:45 2022 +0000

    Merged PR 2416: Implement initial thrifty caching support

    Implement initial thrifty caching support

    - This is a simple brute-force approach where each thrifty cache is
      examined element-by-element alongside the array it is caching to check
      whether there is a stride of 1 between every access
    - Currently this thrifty analysis and the potential erasing of thrifty
      caches happens after the cache ops have been created. This is due to
      needing the cache mapping to have already run in order to support
      hierarchical caching scenarios. Eventually this should be refactored
      and the thrifty analysis should be used to prevent creating the cache
      ops, but that is a larger refactor than the scope for this task.
    - When creating affine loads and stores into caches, this change also
      tacks on some attributes onto the load/store ops to indicate how the
      original load or store accessed the base array. Since the base array
      -> cache position mapping is not always invertible (consider
      coefficient cache layout cases), this is one of the only ways to
      encode this information. Unfortunately, canonicalization on affine
      load/store ops will scrub away these attributes, so any reliance on
      them has to occur before a canonicalization pass. Similarly, the
      MakeCacheOps recording which argument to their accesses are the base
      array positions depends on the operand list being unchanged, however
      canonicalization may remove operands if it determines they are not
      used - while this is fine for the load/store op itself, any assumption
      like "base array indices are at positions N...N+K in the operand list"
      are no longer valid

    Related work items: #3575

commit 3591856bf285c90195eae7431a2c25314820669f
Author: Kern Handa <kerha@microsoft.com>
Date:   Mon Mar 14 04:31:13 2022 +0000

    Merged PR 2459: Changes the order of the LLVM_SETUP_VARIANT detection

    Changes the order of the LLVM_SETUP_VARIANT detection

commit fa1a527b549bd15431d59ca7c4946562d485a3fa
Author: Kern Handa <kerha@microsoft.com>
Date:   Sat Mar 12 00:50:39 2022 +0000

    Merged PR 2458: Fixes building with clang++ on Linux/WSL

    Fixes building with clang++ on Linux/WSL

commit a8b98da932216aa74b8356e44191eb0b247d227e
Author: Mason Remy <masonr@microsoft.com>
Date:   Sat Mar 12 00:08:40 2022 +0000

    Merged PR 2438: Support for double-buffer caching

    Support for double-buffer caching

    - Adds plumbing from python dsl for double_buffer flag to cache API
    - Implements double buffering by hoisting the initial cache fill outside
      of the cache trigger loop parent, then creating a prologue subnest
      that fills a temporary buffer with the i+1'st iterations data and an
      epilogue subnest that moves that temporary buffer data into the main
      cache buffer. The last iteration of the trigger loop parent loop is
      unswitched and no cache filling is done in that loop.
    - On GPU the temporary buffer is allocated in private memory and if the
      cache is in shared memory each thread just holds onto their own
      contribution to the cache in their own private memory buffer until the
      epilogue fill nest
    - Barrier ops are hoisted out of conditionals to avoid potential for
      deadlocks. The conditionals introduced in this PR should be
      always-true or always-false, but this is added as a safety measure.
      Currently the hoisting is naive - any barrier within a conditional is
      erased and barriers are placed before and after the conditional block.
      This is not correct for all future conditional scenarios as any
      operations that happen within the conditional that depend on the
      barrier existing will be broken, however it works for how conditionals
      are used currently and can be improved on over time

    Related work items: #3659

commit b6db90faabf919b46b32eb822bf5620450797bab
Author: Denny Sun <dennys@microsoft.com>
Date:   Fri Mar 11 00:39:58 2022 +0000

    Merged PR 2450: Automatically add parameter dict as auxiliary data

    Automatically add parameter dict as auxiliary data

    Related work items: #3662

commit 52dadbfa73c4db94928bb17723184e7d16f93305
Author: Kern Handa <kerha@microsoft.com>
Date:   Thu Mar 10 16:49:53 2022 +0000

    Merged PR 2456: Updates CUDA source emission based on testing with nvrtc

    Updates CUDA source emission based on testing with nvrtc

commit 9c48b11b59b5a38f00c0f5ffb371ad2232b14e00
Author: Kern Handa <kerha@microsoft.com>
Date:   Wed Mar 9 21:54:55 2022 +0000

    Merged PR 2453: Sets CPU targets to default to openmp

    Sets CPU targets to default to openmp

commit 40fe9516f6c946ba72434cba286033b16bc4476b
Author: Abdul Dakkak <adakkak@microsoft.com>
Date:   Wed Mar 9 14:02:43 2022 +0000

    Merged PR 2443: Add FP16 support

    preparation for adding mfma support for CUDA which only operates on FP16

commit 6b79fdc5f060bb7dbf1d97a74ad334a248090dc6
Author: Kern Handa <kerha@microsoft.com>
Date:   Wed Mar 9 08:48:12 2022 +0000

    Merged PR 2452: Updates GPU source emitting path to emit host launcher and device function pairs

commit 4a345df664d45c2015585cf1a51449afae955617
Author: Kern Handa <kerha@microsoft.com>
Date:   Wed Mar 9 02:17:17 2022 +0000

    Merged PR 2451: Updates IR util ResolveExec[Target,Runtime] to allow for exact matches

    Updates IR util ResolveExec[Target,Runtime] to allow for exact matches

commit 710efe2cb7eb95eaac4e6400dbf847ae0440745b
Author: Kern Handa <kerha@microsoft.com>
Date:   Tue Mar 8 23:44:01 2022 +0000

    Merged PR 2447: Makes Vulkan specific behavior pred. on Runtime

    Makes Vulkan specific behavior pred. on Runtime

commit 5ae4ae88ee7a92c069f2789f25724943d6444259
Author: Kern Handa <kerha@microsoft.com>
Date:   Tue Mar 8 23:03:46 2022 +0000

    Merged PR 2446: Updates Runtime enum in Targets.py to be more comprehensive

    Updates Runtime enum in Targets.py to be more comprehensive

commit 52c7d6355cbdb448c65876c3d840b3953c410f27
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Mar 8 12:42:02 2022 +0000

    Merged PR 2449: [Cleanup] Replace "rc*_" prefixes with "acc*_" prefixes in tablegen'ed code

    For *.td, perform the following replacements for ops:

    s/rcv_/accv_/g
    s/rc_/acc_/g
    s/rcxp_/accxp_/g
    s/rcln_/accln_/g

commit d345616611e8294863ca7df7f609db899b203b9c
Author: Abdul Dakkak <adakkak@microsoft.com>
Date:   Tue Mar 8 09:03:09 2022 +0000

    Merged PR 2448: fix typo in the condition for mod in range analysis

    fix typo in the condition for mod in range analysis

commit c18aee909e83656a9650bdfc1a1a167687c0d7e2
Author: Abdul Dakkak <adakkak@microsoft.com>
Date:   Mon Mar 7 23:04:23 2022 +0000

    Merged PR 2445: Fix bind command when index is further split

commit 62d10e9214f4be7ad31e5507002957b78a1f3b76
Author: Abdul Dakkak <adakkak@microsoft.com>
Date:   Mon Mar 7 21:11:11 2022 +0000

    Merged PR 2444: add range remainder

    add range remainder

commit a77c9c0a24b6f66e7563ad8269542ee75b2cab15
Author: Mason Remy <masonr@microsoft.com>
Date:   Fri Mar 4 05:07:01 2022 +0000

    Merged PR 2441: Fix APInt usage in RangeValueOptimizePass

    Run the RangeValueOptimizePass as part of acc-to-llvm

commit 5b9e7020ad774447a4970a823b1103656d0d2e93
Merge: e6088d9 1dba1b7
Author: Mason Remy <masonr@microsoft.com>
Date:   Fri Mar 4 02:02:51 2022 +0000

    Merged PR 2442: Move ExecutionOptions to ir lib and create arrayattr <-> struct utils

    Move ExecutionOptions to ir lib and create arrayattr <-> struct utils

commit 1dba1b7e4e50d343f03dde1b1527bafdef1bed82
Author: Mason Remy <masonr@microsoft.com>
Date:   Thu Mar 3 14:59:49 2022 -0800

    simplify target passthrough layer

commit e6088d9b8ebe36792c508c8b88b72eb42414e41a
Merge: 9f9f912 7dc3591
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Thu Mar 3 22:45:41 2022 +0000

    Merged PR 2430: Remove unnecessary barrier ops

    This PR adds an optimization pass that removes redundant / unnecessary barrier ops around shared memory usage.

    The optimization pass in this PR is pretty simple and has a couple of limitations:
    - it only works on straight-line code (that is, when all the loads, stores, and barriers are at the same loop level as each other).
    - it considers all accesses to a specific array to be conflicts (that is, any write to an array followed by a read of that array will want to have a barrier in between them, even if the writes and reads are to different elements in the array)

    I should be following up with a PR that deals with barrier and memory ops at different loops levels pretty soon after this.

    Related work items: #3648

commit 8a0c0aa82bed26547757579b56fe82f5f9f54d77
Author: Mason Remy <masonr@microsoft.com>
Date:   Thu Mar 3 13:33:27 2022 -0800

    Move ExecutionOptions to ir lib and create arrayattr <-> struct utils

commit 7dc3591080644c5c906454e4605585a6e2a7c650
Author: Charles Jacobs <cjacobs@microsoft.com>
Date:   Thu Mar 3 13:31:02 2022 -0800

    PR comments
  • Loading branch information
Lisa Ong committed Mar 18, 2022
1 parent 458225d commit 2060948
Show file tree
Hide file tree
Showing 134 changed files with 8,355 additions and 2,301 deletions.
2 changes: 1 addition & 1 deletion .azure/win-accera.yml
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ steps:
workingDirectory: "$(Build.SourcesDirectory)/"

- script: |
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
set PATH=%VULKAN_SDK%\bin;%PATH%
python -m accera.test.smoke_test
displayName: Smoke test
Expand Down
4 changes: 2 additions & 2 deletions .azure/win-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ steps:
continueOnError: false
inputs:
workingDirectory: 'build\RelWithDebInfo'
cmakeArgs: '..\.. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_LIT_ARGS=-vv -G"Visual Studio 16 2019" -Ax64 -DLLVM_SETUP_VARIANT=$(LLVM_SETUP_VARIANT)'
cmakeArgs: '..\.. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_LIT_ARGS=-vv -G"Visual Studio 17 2022" -Ax64 -DLLVM_SETUP_VARIANT=$(LLVM_SETUP_VARIANT)'
condition: eq( variables['Agent.OS'], 'Windows_NT' )

- task: CMake@1
Expand Down Expand Up @@ -70,7 +70,7 @@ steps:
workingDirectory: "$(Build.SourcesDirectory)/"

- script: |
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
python -m pip install -r $(Build.SourcesDirectory)/accera/onnx-emitter/test/requirements.txt
ctest -C RelWithDebInfo -T test -VV -LE benchmark
displayName: Run all ctest targets
Expand Down
8 changes: 4 additions & 4 deletions CMake/LLVMSetup.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@
####################################################################################################
#
# Gets the following variables:
#
#
# LLVM_SETUP_VARIANT: An optional environment variable or CMake define
# that specifies the LLVM build source:
# LLVM_SETUP_VARIANT="Default" - uses vcpkg to acquire LLVM
# Pre-requisite: `vcpkg install accera-llvm` or
# `vcpkg install accera-llvm:x64-windows`
#
# LLVM_SETUP_VARIANT="Conan" - uses Conan to acquire LLVM
# LLVM_SETUP_VARIANT="Conan" - uses Conan to acquire LLVM
# (for internal use only)
#
# Sets the following variables:
Expand All @@ -34,10 +34,10 @@
# Include guard so we don't try to find or download LLVM more than once
include_guard()

set(LLVM_SETUP_VARIANT "Default" CACHE STRING "Source for LLVM binaries")
if(DEFINED ENV{LLVM_SETUP_VARIANT})
set(LLVM_SETUP_VARIANT $ENV{LLVM_SETUP_VARIANT} )
set(LLVM_SETUP_VARIANT $ENV{LLVM_SETUP_VARIANT} CACHE STRING "" FORCE)
endif()
set(LLVM_SETUP_VARIANT "Default" CACHE STRING "Source for LLVM binaries")

message(STATUS "Using LLVMSetup${LLVM_SETUP_VARIANT}.cmake")

Expand Down
9 changes: 7 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,10 @@ option(USE_MKL "Build with Intel MKL" OFF)

option(USE_LIBCXX "Build with libc++ if using the Clang compiler" OFF)
if(CMAKE_CXX_COMPILER_ID STREQUAL Clang)
if(USE_LIBCXX OR (CMAKE_HOST_SYSTEM_NAME STREQUAL Darwin))
add_compile_options(-stdlib=libc++)
link_libraries(-lc++ -lc++abi)
endif(USE_LIBCXX OR (CMAKE_HOST_SYSTEM_NAME STREQUAL Darwin))
endif(CMAKE_CXX_COMPILER_ID STREQUAL Clang)

# Try to create a compilation database, which is useful to have when working
Expand Down Expand Up @@ -156,11 +158,14 @@ else()
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -ggdb3")
set(CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELWITHDEBINFO} -ggdb3")
if(${CMAKE_CXX_COMPILER_ID} STREQUAL Clang)
if(CMAKE_BUILD_TYPE STREQUAL Debug)
# Set options for Control Flow Integrity
add_compile_options(-fsanitize=cfi)
endif(CMAKE_BUILD_TYPE STREQUAL Debug)

add_compile_options(-Wno-backslash-newline-escape)
add_compile_options(-Wno-self-assign)
add_compile_options(-fcolor-diagnostics)
# Set options for Control Flow Integrity
add_compile_options(-fsanitize=cfi)
# Enable Shadow Stack mitigation
add_compile_options(-fsanitize=shadow-call-stack)
# Exit after the first 2 errors are reported
Expand Down
54 changes: 54 additions & 0 deletions accera/acc-opt/test/barrier_opt.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// RUN: acc-opt --verify-each=false --optimize-barriers %s | FileCheck %s

// CHECK-LABEL: module @barrier_test_1
// CHECK: %2 = "accv.alloc"()
// CHECK-NEXT: %3 = "accv.alloc"() : () -> memref<16xf32, 3>
// CHECK-NEXT: %4 = affine.load %arg0[symbol(%0) + symbol(%1) * 16] : memref<1xf32>
// CHECK-NEXT: affine.store %4, %2[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
// CHECK-NEXT: %5 = affine.load %arg1[symbol(%0) + symbol(%1) * 16] : memref<1xf32>
// CHECK-NEXT: affine.store %5, %2[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
// CHECK-NEXT: "accv.barrier"() {scope = "Block"} : () -> ()
// CHECK-NEXT: %6 = affine.load %2[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
// CHECK-NEXT: %7 = affine.load %3[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
// CHECK-NEXT: %8 = "accv.bin_op"(%6, %7) {predicate = 0 : i64} : (f32, f32) -> f32
// CHECK-NEXT: affine.store %8, %arg2[symbol(%0) + symbol(%1) * 16] : memref<1xf32>
// CHECK: accv.return
module @barrier_test_1 attributes {llvm.data_layout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} {
accv.module "barrier_test_1" {
accv.func nested @barrier_test_1_d9502818_impl_8438933964186859281(%arg0: memref<1xf32>, %arg1: memref<1xf32>, %arg2: memref<1xf32>) attributes {exec_target = 0 : i64} {
"accv.lambda"() ( {
%0 = "gpu.thread_id"() {dimension = "x"} : () -> index
%1 = "gpu.block_id"() {dimension = "x"} : () -> index
affine.for %arg3 = 0 to 1 {
affine.for %arg4 = 0 to 1 {
affine.for %arg5 = 0 to 1 {
affine.for %arg6 = 0 to 1 {
%2 = "accv.alloc"() : () -> memref<16xf32, 3>
%3 = "accv.alloc"() : () -> memref<16xf32, 3>
"accv.barrier"() {scope = "Block"} : () -> ()
%4 = affine.load %arg0[symbol(%0) + symbol(%1) * 16] : memref<1xf32>
affine.store %4, %2[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
"accv.barrier"() {scope = "Block"} : () -> ()
%5 = affine.load %arg1[symbol(%0) + symbol(%1) * 16] : memref<1xf32>
affine.store %5, %2[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
"accv.barrier"() {scope = "Block"} : () -> ()
%6 = affine.load %2[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
%7 = affine.load %3[symbol(%0) + symbol(%1) * 16] : memref<16xf32, 3>
%8 = "accv.bin_op"(%6, %7) {predicate = 0 : i64} : (f32, f32) -> f32
affine.store %8, %arg2[symbol(%0) + symbol(%1) * 16] : memref<1xf32>
"accv.barrier"() {scope = "Block"} : () -> ()
} {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i,5}">, kernels = ["_"], accv_gpu_map = "ThreadY", subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 1]}
} {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_i,3}">, accv_gpu_map = "ThreadX", subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 16]}
} {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,4}">, accv_gpu_map = "BlockY", subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [16, 16]}
} {begin = 0 : i64, end = 256 : i64, index = #accln<"index{i_o,2}">, accv_gpu_map = "BlockX", subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [16, 256]}
accv.return
}) {exec_target = 1 : i64, gpu_launch = [16 : index, 16 : index, 1 : index, 16 : index, 16 : index, 1 : index], sym_name = "NestFunction_0", type = () -> ()} : () -> ()
accv.return
}
accv.func @barrier_test_1_d9502818(%arg0: memref<1xf32>, %arg1: memref<1xf32>, %arg2: memref<1xf32>) attributes {accv.base_name = "barrier_test_1", accv.emit_header_decl, accv.emit_raw_pointer_api, exec_target = 0 : i64} {
accv.launch_func @barrier_test_1_d9502818_impl_8438933964186859281(%arg0, %arg1, %arg2) {exec_target = 0 : i64, gpu_launch = "gpu_launch"} : (memref<1xf32>, memref<1xf32>, memref<1xf32>) -> ()
accv.return
}
}
}

96 changes: 96 additions & 0 deletions accera/acc-opt/test/thrifty_caching.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
// RUN: acc-opt --verify-each=false --pass-pipeline="accv.module(accv.func(loopnest-to-value-func))" %s | FileCheck %s

// This function has two caches initially, both marked thrifty, and one of them should
// get elided based on thrifty checks but the other should not

// This is the graph at the LoopNestToValueFuncPass_Subpasses_0_10_Canonicalize.mlir stage,
// which is the last canonicalize stage before the thrifty checks and the subpasses
// before the thrifty phase create ops that the thrifty check depends on not being
// canonicalized before it runs
module @test_thrifty_caching_simple_input_cache attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} {
accv.module "test_thrifty_caching_simple_input_cache" {
accv.func nested @test_thrifty_caching_simple_input_cache_1127a105_impl_6891397719071098712(%arg0: memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, %arg1: memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, %arg2: memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>) attributes {exec_target = 0 : i64} {
%0 = accln.sym_index {name = "i_i"} #accln<"index{i_i,4}">
%1 = accln.sym_index {name = "i_o"} #accln<"index{i_o,3}">
%2 = accln.sym_index {name = "k_o"} #accln<"index{k_o,7}">
%3 = accln.sym_index {name = "j_i"} #accln<"index{j_i,6}">
%4 = accln.sym_index {name = "k_i"} #accln<"index{k_i,8}">
%5 = accln.sym_index {name = "j_o"} #accln<"index{j_o,5}">
"accv.lambda"() ( {
%6 = "accxp.make_cache"() {memorySpace = 0 : i64, multiCacheAccessIndices = [], offsetAccessIndices = [], offsetArrayToCacheAccessMap = affine_map<(d0) -> (d0)>} : () -> memref<?xf32, 3>
%7 = "accxp.begin_cache_region"(%arg0, %6, %arg0, %1, %2, %0, %4, %1, %2) {activeBlockCache, cacheAccessMaps = {manualCacheDimOrder = [0, 1]}, cacheHierarchyLevel = 0 : i64, cacheIndex = #accln<"index{i_i,4}">, cacheRegionBaseIndices = [[#accln<"index{i,0}">], [#accln<"index{k,2}">]], cacheRegionRelevantIndexRanges = [#accln<"indexrange{i_i,4}={0:4:1}">, #accln<"indexrange{k_i,8}={0:32:1}">], dimReorderCache, id = 0 : i64, operand_segment_sizes = dense<[1, 1, 1, 4, 2]> : vector<5xi32>, thrifty, triggerIndex = #accln<"index{i_i,4}">} : (memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, memref<?xf32, 3>, memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, index, index, index, index, index, index) -> index
"accxp.end_cache_region"(%7) : (index) -> ()
%8 = "accxp.make_cache"() {memorySpace = 0 : i64, multiCacheAccessIndices = [], offsetAccessIndices = [], offsetArrayToCacheAccessMap = affine_map<(d0) -> (d0)>} : () -> memref<?xf32, 3>
%9 = "accxp.begin_cache_region"(%arg1, %8, %arg1, %5, %2, %3, %4, %5) {activeBlockCache, cacheAccessMaps = {manualCacheDimOrder = [0, 1]}, cacheHierarchyLevel = 0 : i64, cacheIndex = #accln<"index{k_o,7}">, cacheRegionBaseIndices = [[#accln<"index{k,2}">], [#accln<"index{j,1}">], [#accln<"index{k,2}">]], cacheRegionRelevantIndexRanges = [#accln<"indexrange{k_o,7}={0:32:32}">, #accln<"indexrange{j_i,6}={0:16:1}">, #accln<"indexrange{k_i,8}={0:32:1}">], dimReorderCache, id = 1 : i64, operand_segment_sizes = dense<[1, 1, 1, 4, 1]> : vector<5xi32>, thrifty, triggerIndex = #accln<"index{k_o,7}">} : (memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, memref<?xf32, 3>, memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, index, index, index, index, index) -> index
"accxp.end_cache_region"(%9) : (index) -> ()
affine.for %arg3 = 0 to 32 step 4 {
affine.for %arg4 = 0 to 32 step 16 {
%10 = "accxp.begin_cache_region"(%arg1, %8, %arg1, %arg4, %2, %3, %4, %arg4) {activeBlockCache, cacheAccessMaps = {manualCacheDimOrder = [0, 1]}, cacheHierarchyLevel = 0 : i64, cacheIndex = #accln<"index{k_o,7}">, cacheRegionBaseIndices = [[#accln<"index{k,2}">], [#accln<"index{j,1}">], [#accln<"index{k,2}">]], cacheRegionRelevantIndexRanges = [#accln<"indexrange{k_o,7}={0:32:32}">, #accln<"indexrange{j_i,6}={0:16:1}">, #accln<"indexrange{k_i,8}={0:32:1}">], dimReorderCache, id = 1 : i64, operand_segment_sizes = dense<[1, 1, 1, 4, 1]> : vector<5xi32>, thrifty, triggerIndex = #accln<"index{k_o,7}">} : (memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, memref<?xf32, 3>, memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, index, index, index, index, index) -> index
affine.for %arg5 = 0 to 32 step 32 {
%11 = "accxp.begin_cache_region"(%arg0, %6, %arg0, %arg3, %arg5, %0, %4, %arg3, %arg5) {activeBlockCache, cacheAccessMaps = {manualCacheDimOrder = [0, 1]}, cacheHierarchyLevel = 0 : i64, cacheIndex = #accln<"index{i_i,4}">, cacheRegionBaseIndices = [[#accln<"index{i,0}">], [#accln<"index{k,2}">]], cacheRegionRelevantIndexRanges = [#accln<"indexrange{i_i,4}={0:4:1}">, #accln<"indexrange{k_i,8}={0:32:1}">], dimReorderCache, id = 0 : i64, operand_segment_sizes = dense<[1, 1, 1, 4, 2]> : vector<5xi32>, thrifty, triggerIndex = #accln<"index{i_i,4}">} : (memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, memref<?xf32, 3>, memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>, index, index, index, index, index, index) -> index
affine.for %arg6 = 0 to 4 {
affine.for %arg7 = 0 to 16 {
affine.for %arg8 = 0 to 32 {
%12 = affine.load %arg0[%arg3 + %arg6, %arg5 + %arg8] : memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>
%13 = affine.load %arg1[%arg5 + %arg8, %arg4 + %arg7] : memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>
%14 = "accv.bin_op"(%12, %13) {predicate = 2 : i64} : (f32, f32) -> f32
%15 = affine.load %arg2[%arg3 + %arg6, %arg4 + %arg7] : memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>
%16 = "accv.bin_op"(%15, %14) {predicate = 0 : i64} : (f32, f32) -> f32
affine.store %16, %arg2[%arg3 + %arg6, %arg4 + %arg7] : memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>
%17 = affine.load %arg2[%arg3 + %arg6, %arg4 + %arg7] : memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>
affine.store %17, %arg2[%arg3 + %arg6, %arg4 + %arg7] : memref<32x32xf32, affine_map<(d0, d1) -> (d0 * 32 + d1)>>
} {begin = 0 : i64, end = 32 : i64, index = #accln<"index{k_i,8}">, kernels = ["_"], subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 1]}
} {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i,6}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 32]}
} {begin = 0 : i64, end = 4 : i64, index = #accln<"index{i_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 32]}
"accxp.end_cache_region"(%11) : (index) -> ()
} {begin = 0 : i64, end = 32 : i64, index = #accln<"index{k_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 32]}
"accxp.end_cache_region"(%10) : (index) -> ()
} {begin = 0 : i64, end = 32 : i64, index = #accln<"index{j_o,5}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 32]}
} {begin = 0 : i64, end = 32 : i64, index = #accln<"index{i_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 32, 32]}
accv.return
}) {exec_target = 0 : i64, sym_name = "NestFunction_0", type = () -> ()} : () -> ()
accv.return
}
}
}

// CHECK: #map = affine_map<(d0, d1) -> (d0 * 32 + d1)>
// CHECK: module @test_thrifty_caching_simple_input_cache attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} {
// CHECK: accv.module "test_thrifty_caching_simple_input_cache" {
// CHECK: "accv.global"() {sym_name = "cache_3", type = memref<32x16xf32, 3>} : () -> ()
// CHECK: accv.func nested @test_thrifty_caching_simple_input_cache_1127a105_impl_6891397719071098712(%arg0: memref<32x32xf32, #map>, %arg1: memref<32x32xf32, #map>, %arg2: memref<32x32xf32, #map>) attributes {exec_target = 0 : i64} {
// CHECK: "accv.lambda"() ( {
// CHECK: %0 = "accv.ref_global"() {global_name = @cache_3} : () -> memref<32x16xf32, 3>
// CHECK: affine.for %arg3 = 0 to 32 step 4 {
// CHECK: affine.for %arg4 = 0 to 32 step 16 {
// CHECK: "accv.lambda"() ( {
// CHECK: affine.for %arg5 = 0 to 32 {
// CHECK: affine.for %arg6 = 0 to 16 {
// CHECK: %1 = affine.load %arg1[%arg5, %arg4 + %arg6] : memref<32x32xf32, #map>
// CHECK: affine.store %1, %0[%arg5, %arg6] : memref<32x16xf32, 3>
// CHECK: } {accxp.access_bounds_check, begin = 0 : i64, end = 16 : i64, index = #accln<"index{j,5}">, kernels = ["cache_internal_loopnest_kernel_active_block_copy"], scheduledIndex = #accln<"index{j,5}">, subdomainIndexOrder = [#accln<"index{i,4}">, #accln<"index{j,5}">], subdomainSize = [1, 1]}
// CHECK: } {accxp.access_bounds_check, begin = 0 : i64, end = 32 : i64, index = #accln<"index{i,4}">, scheduledIndex = #accln<"index{i,4}">, subdomainIndexOrder = [#accln<"index{i,4}">, #accln<"index{j,5}">], subdomainSize = [1, 16]}
// CHECK: accv.return
// CHECK: }) {exec_target = 0 : i64, sym_name = "NestFunction_2", type = () -> ()} : () -> ()
// CHECK: affine.for %arg5 = 0 to 4 {
// CHECK: affine.for %arg6 = 0 to 16 {
// CHECK: affine.for %arg7 = 0 to 32 {
// CHECK: %1 = affine.load %arg0[%arg3 + %arg5, %arg7] : memref<32x32xf32, #map>
// CHECK: %2 = affine.load %0[%arg7, %arg6] : memref<32x16xf32, 3>
// CHECK: %3 = "accv.bin_op"(%1, %2) {predicate = 2 : i64} : (f32, f32) -> f32
// CHECK: %4 = affine.load %arg2[%arg3 + %arg5, %arg4 + %arg6] : memref<32x32xf32, #map>
// CHECK: %5 = "accv.bin_op"(%4, %3) {predicate = 0 : i64} : (f32, f32) -> f32
// CHECK: affine.store %5, %arg2[%arg3 + %arg5, %arg4 + %arg6] : memref<32x32xf32, #map>
// CHECK: %6 = affine.load %arg2[%arg3 + %arg5, %arg4 + %arg6] : memref<32x32xf32, #map>
// CHECK: affine.store %6, %arg2[%arg3 + %arg5, %arg4 + %arg6] : memref<32x32xf32, #map>
// CHECK: } {begin = 0 : i64, end = 32 : i64, index = #accln<"index{k_i,8}">, kernels = ["_"], subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 1]}
// CHECK: } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i,6}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 32]}
// CHECK: } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{i_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 32]}
// CHECK: } {begin = 0 : i64, end = 32 : i64, index = #accln<"index{j_o,5}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 32]}
// CHECK: } {begin = 0 : i64, end = 32 : i64, index = #accln<"index{i_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 32, 32]}
// CHECK: accv.return
// CHECK: }) {exec_target = 0 : i64, sym_name = "NestFunction_0", type = () -> ()} : () -> ()
// CHECK: accv.return
// CHECK: }
// CHECK: }
// CHECK: }
Loading

0 comments on commit 2060948

Please sign in to comment.