Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 6654c774fb0d2d6fac760b911a547b4e66b23127
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Wed Apr 27 00:44:53 2022 +0000

    Merged PR 2522: Generalize array indexing in tensorized GEMM

    This PR generalizes the MFMA tensorization pass to improve the handling of code in the innermost loop. It recognizes more ways of writing the GEMM kernel, and rejects many ill-formed GEMM kernels.

    There are also a number of tests.

    This PR doesn't yet generalize to batch-GEMM, where the matrices (typically) have 3 indices.

    Related work items: #3676

commit 4d030709101f3653712b805bd8f3698e0e293bd3
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Apr 26 17:50:18 2022 +0000

    Merged PR 2551: [nfc][ci] Switch hosted pipelines to 1ES hosted pool

    * The Linux1ESPool is created to support internal builds of LLVM

    * Fix regression in pipeline due to overzealous .dockerignore

commit 9b9d6b4b77c46b12788665412b9d0d1c2ff62d18
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Apr 26 10:43:28 2022 +0000

    Merged PR 2550: [nfc] [docs] Merge changes from GitHub remote

    In preparation for merge from ADO to GitHub for Case Studies publishing

commit c1298946d18fb785788c556ea2959b9438f9c6b7
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Apr 26 08:10:47 2022 +0000

    Merged PR 2549: [Compliance] Switching from Dockerhub to ACR for third party containers

    Updating Dockerfile references

commit 0c7a3610ba082e82e554297bdadbf9579b094745
Author: Denny Sun <dennys@microsoft.com>
Date:   Tue Apr 26 04:40:05 2022 +0000

    Merged PR 2548: Add README file for case studies

    README file has a table where each case study points to the external repo link.

commit edbc50edd00efe8f12a675735d7e52371e43f7b1
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Apr 25 23:49:15 2022 +0000

    Merged PR 2546: [dev] [nfc] Natively support macOS/arm64 for development

    Limited to local development scenarios (LLVM_SETUP_VARIANT=Default)

    No plans to release pip packages until there is CI support

    Verified on: Big Sur (MacOSX 12.3 arm64) / Python 3.10

commit 166e333a3d10b77c804dc3edc1c71bfc5716c768
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Mon Apr 25 17:50:22 2022 +0000

    Merged PR 2543: Add precomputed offset map optimization for tensorization (no caching)

    - Add flag to tensorize() to enable optimization (off by default)
    - Optimization only affects load/store of accumulator (C) argument
    - Supports all 4 mfma shapes

    Related work items: #3671

commit e11c4d4e87bbae87f7cb9035eff8e6af650c9d1a
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Sun Apr 24 01:00:41 2022 +0000

    Merged PR 2542: An assortment of minor fixes

    This PR is a hodgepodge of tiny fixes. I'm happy to split it up into separate PRs if a kitchen-sink PR is too gross.

    The specific things are:
    - Add 2 new target models to `Targets.py` (that correspond to my local dev boxes)
    - Change the snapshot IR format for sub-passes to use the same format as the top-level passes (that is, not "generic" format)
    - Print a warning message if `check_correctness` skips a correctness check because no hat file was generated
    - Add a "minimum version" constraint to `requirements.txt` for `hatlib`

commit 8da7903ac9b6d8612711593308e49a7a3e82678d
Author: Kern Handa <kerha@microsoft.com>
Date:   Sat Apr 23 23:59:53 2022 +0000

    Merged PR 2545: Unifies CUDA and CPP enum values to SOURCE for Package.Format

    Unifies CUDA and CPP enum values to SOURCE for Package.Format

    Related work items: #3679

commit fe2c40fa8f1c28dcf47e1533223457fd3e6bf195
Author: Kern Handa <kerha@microsoft.com>
Date:   Sat Apr 23 23:17:43 2022 +0000

    Merged PR 2544: [nfc] Removes now unnecessary ldebug output

    [nfc] Removes now unnecessary ldebug output

commit 32090d786ce13299bb77a6675c3478b3d7cdf48c
Author: Mason Remy <masonr@microsoft.com>
Date:   Fri Apr 22 21:31:01 2022 +0000

    Merged PR 2527: Enable vectorized shared memory write

    Enable vectorized shared memory write

    - This adds mod simplification support needed for vecotrizing shared
      memory writes
    - Also refactors some of the affine simplification code slightly to
      share some common code between the floordiv and mod simplifications

    Related work items: #3586, #3661, #3689

commit 0eb698af118b94bf3f4d4862a142c86055f8b7bb
Author: Mason Remy <masonr@microsoft.com>
Date:   Fri Apr 22 19:13:27 2022 +0000

    Merged PR 2526: Enable GPU global read vectorization

    Enable GPU global read vectorization

    - Implements a floor div simplification that enables better recognition
      of vectorizable load and stores

    Related work items: #3661, #3690

commit df849f066ff6c2c82c796d9b48e3bea6390c7877
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Fri Apr 22 06:03:27 2022 +0000

    Merged PR 2541: Fix a few issues with GEMM benchmarking script

    This PR fixes a couple of errors:
    - there was a bug in the GEMM kernel
    - sometimes hatlib would fail to return a compiled function, but not throw an exception. These are now flagged as "uncompilable"

    It makes a couple of other tweaks:
    - it fails if the `alpha` and `beta` parameters aren't `1.0` and `0.0`
    - it culls some variants with known-uncompilable tensorization parameters before trying to compile them

commit 339253767ae4bb4f7e5c323f77fc938ba1a4ab92
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Fri Apr 22 01:26:53 2022 +0000

    Merged PR 2538: Fix std::pair unpacking issue in TensorizeAffineForOpConversion

    In debug builds, we are getting garbage values for warpSizeX and warpSizeY, resulting in division by 0 errors in the emitted .cu files

commit 075c83247d34bfd9fb291e4ea6b9df059a94993a
Author: Denny Sun <dennys@microsoft.com>
Date:   Fri Apr 22 00:26:56 2022 +0000

    Merged PR 2536: Parameter supports most of the arithmetic/binary/unary operations defined in operator lib

    Parameter supports the basic arithmetic operations (+, -, *, //, %), for example, the user can write the following code:

    fma_unit_count, vector_size = acc.create_parameters(2)​
    jjj = schedule.split(jj, fma_unit_count * vector_size)​
    jjjj = schedule.split(jjjj, vector_size)

    Related work items: #3692

commit 6d5e71899c6fb606e32ec46ee871ae1af25d3cd6
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Thu Apr 21 18:22:12 2022 +0000

    Merged PR 2539: [nfc][docs] Merging commits from Github/main

    commit ee28126a338d905eb5931038d3c5daba6ead3811
    Author: Lisa Ong <11318241+lisaong@users.noreply.github.com>
    Date:   Wed Apr 20 21:35:20 2022 +0800

        Update arrow label positions (#35)

        * [nfc] [doc] Update arrow label positions

        * make arrowhead more visible

        * nfc

    commit ddcecaaffd9dd0861999a6d29443dc7c37d79665
    Author: Lisa Ong <11318241+lisaong@users.noreply.github.com>
    Date:   Wed Apr 20 21:34:40 2022 +0800

        demo fixes for hatlib 0.0.11 (#36)
  • Loading branch information
Lisa Ong committed Apr 27, 2022
1 parent 89850bf commit 5b0f142
Show file tree
Hide file tree
Showing 56 changed files with 3,337 additions and 522 deletions.
1 change: 1 addition & 0 deletions .azure/llvm-canary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ pool:
container:
# Container with the latest available vcpkg LLVM port + patches
image: $(CONTAINER_REGISTRY)/accera-llvm-ubuntu:latest
endpoint: acceracontainers

steps:
- script: |
Expand Down
5 changes: 4 additions & 1 deletion .azure/manylinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
# Usage: call docker build from the root of this repository
# docker build -f .azure\manylinux\Dockerfile . -t registry_name/accera-llvm-manylinux2014:latest
####################################################################################################
FROM quay.io/pypa/manylinux2014_x86_64:latest

# cf: quay.io/pypa/manylinux2014_x86_64:2022-04-24-d28e73e
# cf. https://quay.io/repository/pypa/manylinux2010_x86_64?tab=tags
FROM acceracontainers.azurecr.io/pypa/manylinux2014_x86_64:2022-04-24-d28e73e

ADD .azure/manylinux/scripts /tmp/scripts
ADD requirements.txt /tmp/scripts/requirements.txt
Expand Down
2 changes: 1 addition & 1 deletion .azure/manylinux/manylinux-llvm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ trigger:
include:
- external/llvm

pool: LinuxScaleSetAgentPool
pool: Linux1ESPool

steps:
- script: |
Expand Down
4 changes: 3 additions & 1 deletion .azure/rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@

# https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
ARG ROCMVER=5.1.1-ub20
FROM amddcgpuce/rocm:${ROCMVER}

# cf: amddcgpuce/rocm:${ROCMVER}
FROM acceracontainers.azurecr.io/rocm:${ROCMVER}

ARG ROCMVER
RUN echo "ROCm Version: " ${ROCMVER}
Expand Down
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
build/
external/
external/vcpkg/downloads
external/vcpkg/buildtrees
*.egg-info
15 changes: 15 additions & 0 deletions CMake/BuildTargetSetup.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
####################################################################################################
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in the project root for license information.
####################################################################################################

if(APPLE)
# cf. https://discourse.cmake.org/t/how-to-determine-which-architectures-are-available-apple-m1/2401/10
# on macOS "uname -m" returns the architecture (x86_64 or arm64)
execute_process(
COMMAND uname -m
RESULT_VARIABLE result
OUTPUT_VARIABLE OSX_NATIVE_ARCH
OUTPUT_STRIP_TRAILING_WHITESPACE
)
endif()
9 changes: 7 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ set(PACKAGE_ROOT ${ACCERA_EXTERNAL_DIR})
# Set up install location in build directory
set(CMAKE_INSTALL_PREFIX ${CMAKE_BINARY_DIR}/install)

include(BuildTargetSetup)

if(USE_MKL)
include(MKLSetup)
else()
Expand Down Expand Up @@ -174,11 +176,14 @@ else()
else() # GCC
add_compile_options(-Wno-ignored-attributes)
add_compile_options(-fdiagnostics-color=always)
# Set options for Control Flow Integrity
add_compile_options(-fcf-protection)
add_compile_options(-Wl,dynamicbase)
# Enable Shadow Stack mitigation
add_compile_options(-Wshadow)

if(NOT ${OSX_NATIVE_ARCH} STREQUAL "arm64")
# Set options for Control Flow Integrity (not supported on macos/arm64)
add_compile_options(-fcf-protection)
endif()
endif()
endif()

Expand Down
138 changes: 138 additions & 0 deletions accera/acc-opt/test/affine_simplification.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
// RUN: acc-opt --verify-each=false --acc-affine-simplify %s | FileCheck %s

module @test_accera_affine_simplification {
accv.module "test_accera_affine_simplification" {

// FloorDiv simplification tests

// CHECK-LABEL accv.func nested @test_simplify_floordiv_no_terms_strides
accv.func nested @test_simplify_floordiv_no_terms_strides(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 4 {
// CHECK: %1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32>
// CHECK: affine.store %1, %0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32>
affine.store %1, %0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// CHECK-LABEL accv.func nested @test_simplify_floordiv_no_terms_range
accv.func nested @test_simplify_floordiv_no_terms_range(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 5 { // This range being 5 will prevent the simplification from removing this term
// CHECK: %1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32>
// CHECK: affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32>
affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// CHECK-LABEL accv.func nested @test_simplify_floordiv_one_term
accv.func nested @test_simplify_floordiv_one_term(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 4 {
// CHECK: %1 = affine.load %arg0[%arg1 * 2 + (%arg2 * 48) floordiv 32] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 48 + %arg3) floordiv 32] : memref<32xf32>
// CHECK: affine.store %1, %0[%arg1 * 2 + (%arg2 * 48) floordiv 32] : memref<32xf32>
affine.store %1, %0[(%arg1 * 64 + %arg2 * 48 + %arg3) floordiv 32] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// CHECK-LABEL accv.func nested @test_simplify_floordiv_two_terms
accv.func nested @test_simplify_floordiv_two_terms(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 4 {
// CHECK: %1 = affine.load %arg0[%arg1 * 2] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 128 + %arg2 * 4 + %arg3) floordiv 64] : memref<32xf32>
// CHECK: affine.store %1, %0[%arg1 * 2] : memref<32xf32>
affine.store %1, %0[(%arg1 * 128 + %arg2 * 4 + %arg3) floordiv 64] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// Mod simplification tests

// CHECK-LABEL accv.func nested @test_simplify_mod_no_terms_strides
accv.func nested @test_simplify_mod_no_terms_strides(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 4 {
// CHECK: %1 = affine.load %arg0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32>
// CHECK: affine.store %1, %0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32>
affine.store %1, %0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// CHECK-LABEL accv.func nested @test_simplify_mod_no_terms_range
accv.func nested @test_simplify_mod_no_terms_range(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 5 { // This range being 5 will prevent the simplification from removing this term
// CHECK: %1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32>
// CHECK: affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32>
affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// CHECK-LABEL accv.func nested @test_simplify_mod_one_term
accv.func nested @test_simplify_mod_one_term(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<32xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 4 {
// CHECK: %1 = affine.load %arg0[%arg3 + (%arg1 * 68 + %arg2 * 48) mod 32] : memref<32xf32>
%1 = affine.load %arg0[(%arg1 * 68 + %arg2 * 48 + %arg3) mod 32] : memref<32xf32>
// CHECK: affine.store %1, %0[%arg3 + (%arg1 * 68 + %arg2 * 48) mod 32] : memref<32xf32>
affine.store %1, %0[(%arg1 * 68 + %arg2 * 48 + %arg3) mod 32] : memref<32xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}

// CHECK-LABEL accv.func nested @test_simplify_mod_all_terms
accv.func nested @test_simplify_mod_all_terms(%arg0: memref<64xf32>) attributes {exec_target = 0 : i64} {
%0 = memref.alloc() : memref<64xf32>
affine.for %arg1 = 0 to 16 {
affine.for %arg2 = 0 to 16 {
affine.for %arg3 = 0 to 4 {
// CHECK: %1 = affine.load %arg0[%arg3 + %arg2 * 4] : memref<64xf32>
%1 = affine.load %arg0[(%arg1 * 128 + %arg2 * 4 + %arg3) mod 64] : memref<64xf32>
// CHECK: affine.store %1, %0[%arg3 + %arg2 * 4] : memref<64xf32>
affine.store %1, %0[(%arg1 * 128 + %arg2 * 4 + %arg3) mod 64] : memref<64xf32>
} {begin = 0 : i64, end = 4 : i64}
} {begin = 0 : i64, end = 16 : i64}
} {begin = 0 : i64, end = 16 : i64}
accv.return
}
}
}
40 changes: 20 additions & 20 deletions accera/acc-opt/test/barrier_opt_tests/barrier_opt_test_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
def build_package(plan, args, name):
package = acc.Package()
package.add(plan, args=args, base_name=name)
package.build(name, format=acc.Package.Format.MLIR_VERBOSE | acc.Package.Format.CUDA, output_dir="build")
package.build(name, format=acc.Package.Format.MLIR_VERBOSE | acc.Package.Format.DEFAULT, output_dir="build")


def barrier():
Expand All @@ -31,7 +31,7 @@ def barrier_trivial_test_1():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

shA[i] = A[i]
A[i] *= 2.0
B[i] = shA[i]
Expand All @@ -55,7 +55,7 @@ def barrier_single_warp_test_1():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([N]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -85,7 +85,7 @@ def barrier_single_warp_test_2():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([N]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -117,7 +117,7 @@ def barrier_single_warp_test_3():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([N]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -150,7 +150,7 @@ def barrier_multi_warp_test_1():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([N]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -182,9 +182,9 @@ def barrier_seq_test_1():

@nest.iteration_logic
def _():
# Performs excessive barriers.
# Performs excessive barriers.
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -214,10 +214,10 @@ def barrier_seq_test_2():

@nest.iteration_logic
def _():
# Performs excessive barriers.
# Performs excessive barriers.
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))
shB = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -247,10 +247,10 @@ def barrier_seq_test_3():

@nest.iteration_logic
def _():
# Performs excessive barriers.
# Performs excessive barriers.
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))
shB = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

shB[i] = A[i]
barrier()
shA[i] = A[i]
Expand Down Expand Up @@ -317,7 +317,7 @@ def barrier_if_test_2():
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))
shB = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

def if_block():
barrier()
shB[i] = A[i]
Expand Down Expand Up @@ -416,9 +416,9 @@ def else_block():
barrier()
shB[i] = B[i]
barrier()

acc._lang_python._lang._If(i < acc._lang_python._lang.as_index(N), if_block).Else(else_block)

barrier()
shA[i] = A[i]
barrier()
Expand Down Expand Up @@ -448,7 +448,7 @@ def barrier_loop_test_1():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

start = acc.Scalar(0)
stop = acc.Scalar(32)
step = acc.Scalar(1)
Expand Down Expand Up @@ -490,7 +490,7 @@ def barrier_loop_test_2():
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))
shB = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

start = acc.Scalar(0)
stop = acc.Scalar(32)
step = acc.Scalar(1)
Expand Down Expand Up @@ -533,7 +533,7 @@ def barrier_loop_test_3():
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))
shB = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

start = acc.Scalar(0)
stop = acc.Scalar(32)
step = acc.Scalar(1)
Expand Down Expand Up @@ -594,7 +594,7 @@ def barrier_loop_test_4():
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))
shB = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

start = acc.Scalar(0)
stop = acc.Scalar(32)
step = acc.Scalar(1)
Expand Down Expand Up @@ -650,7 +650,7 @@ def barrier_loop_test_5():
@nest.iteration_logic
def _():
shA = acc.NativeArray(acc.Allocate(type=acc.ScalarType.float32, layout=acc._lang_python._MemoryLayout([blocksize]).set_memory_space(acc._lang_python._lang._MemorySpace.SHARED)))

start = acc.Scalar(0)
stop = acc.Scalar(32)
step = acc.Scalar(1)
Expand Down
Loading

0 comments on commit 5b0f142

Please sign in to comment.