-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
commit 6654c774fb0d2d6fac760b911a547b4e66b23127 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Wed Apr 27 00:44:53 2022 +0000 Merged PR 2522: Generalize array indexing in tensorized GEMM This PR generalizes the MFMA tensorization pass to improve the handling of code in the innermost loop. It recognizes more ways of writing the GEMM kernel, and rejects many ill-formed GEMM kernels. There are also a number of tests. This PR doesn't yet generalize to batch-GEMM, where the matrices (typically) have 3 indices. Related work items: #3676 commit 4d030709101f3653712b805bd8f3698e0e293bd3 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 17:50:18 2022 +0000 Merged PR 2551: [nfc][ci] Switch hosted pipelines to 1ES hosted pool * The Linux1ESPool is created to support internal builds of LLVM * Fix regression in pipeline due to overzealous .dockerignore commit 9b9d6b4b77c46b12788665412b9d0d1c2ff62d18 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 10:43:28 2022 +0000 Merged PR 2550: [nfc] [docs] Merge changes from GitHub remote In preparation for merge from ADO to GitHub for Case Studies publishing commit c1298946d18fb785788c556ea2959b9438f9c6b7 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 08:10:47 2022 +0000 Merged PR 2549: [Compliance] Switching from Dockerhub to ACR for third party containers Updating Dockerfile references commit 0c7a3610ba082e82e554297bdadbf9579b094745 Author: Denny Sun <dennys@microsoft.com> Date: Tue Apr 26 04:40:05 2022 +0000 Merged PR 2548: Add README file for case studies README file has a table where each case study points to the external repo link. commit edbc50edd00efe8f12a675735d7e52371e43f7b1 Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Apr 25 23:49:15 2022 +0000 Merged PR 2546: [dev] [nfc] Natively support macOS/arm64 for development Limited to local development scenarios (LLVM_SETUP_VARIANT=Default) No plans to release pip packages until there is CI support Verified on: Big Sur (MacOSX 12.3 arm64) / Python 3.10 commit 166e333a3d10b77c804dc3edc1c71bfc5716c768 Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Apr 25 17:50:22 2022 +0000 Merged PR 2543: Add precomputed offset map optimization for tensorization (no caching) - Add flag to tensorize() to enable optimization (off by default) - Optimization only affects load/store of accumulator (C) argument - Supports all 4 mfma shapes Related work items: #3671 commit e11c4d4e87bbae87f7cb9035eff8e6af650c9d1a Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Sun Apr 24 01:00:41 2022 +0000 Merged PR 2542: An assortment of minor fixes This PR is a hodgepodge of tiny fixes. I'm happy to split it up into separate PRs if a kitchen-sink PR is too gross. The specific things are: - Add 2 new target models to `Targets.py` (that correspond to my local dev boxes) - Change the snapshot IR format for sub-passes to use the same format as the top-level passes (that is, not "generic" format) - Print a warning message if `check_correctness` skips a correctness check because no hat file was generated - Add a "minimum version" constraint to `requirements.txt` for `hatlib` commit 8da7903ac9b6d8612711593308e49a7a3e82678d Author: Kern Handa <kerha@microsoft.com> Date: Sat Apr 23 23:59:53 2022 +0000 Merged PR 2545: Unifies CUDA and CPP enum values to SOURCE for Package.Format Unifies CUDA and CPP enum values to SOURCE for Package.Format Related work items: #3679 commit fe2c40fa8f1c28dcf47e1533223457fd3e6bf195 Author: Kern Handa <kerha@microsoft.com> Date: Sat Apr 23 23:17:43 2022 +0000 Merged PR 2544: [nfc] Removes now unnecessary ldebug output [nfc] Removes now unnecessary ldebug output commit 32090d786ce13299bb77a6675c3478b3d7cdf48c Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 22 21:31:01 2022 +0000 Merged PR 2527: Enable vectorized shared memory write Enable vectorized shared memory write - This adds mod simplification support needed for vecotrizing shared memory writes - Also refactors some of the affine simplification code slightly to share some common code between the floordiv and mod simplifications Related work items: #3586, #3661, #3689 commit 0eb698af118b94bf3f4d4862a142c86055f8b7bb Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 22 19:13:27 2022 +0000 Merged PR 2526: Enable GPU global read vectorization Enable GPU global read vectorization - Implements a floor div simplification that enables better recognition of vectorizable load and stores Related work items: #3661, #3690 commit df849f066ff6c2c82c796d9b48e3bea6390c7877 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Fri Apr 22 06:03:27 2022 +0000 Merged PR 2541: Fix a few issues with GEMM benchmarking script This PR fixes a couple of errors: - there was a bug in the GEMM kernel - sometimes hatlib would fail to return a compiled function, but not throw an exception. These are now flagged as "uncompilable" It makes a couple of other tweaks: - it fails if the `alpha` and `beta` parameters aren't `1.0` and `0.0` - it culls some variants with known-uncompilable tensorization parameters before trying to compile them commit 339253767ae4bb4f7e5c323f77fc938ba1a4ab92 Author: Lisa Ong <onglisa@microsoft.com> Date: Fri Apr 22 01:26:53 2022 +0000 Merged PR 2538: Fix std::pair unpacking issue in TensorizeAffineForOpConversion In debug builds, we are getting garbage values for warpSizeX and warpSizeY, resulting in division by 0 errors in the emitted .cu files commit 075c83247d34bfd9fb291e4ea6b9df059a94993a Author: Denny Sun <dennys@microsoft.com> Date: Fri Apr 22 00:26:56 2022 +0000 Merged PR 2536: Parameter supports most of the arithmetic/binary/unary operations defined in operator lib Parameter supports the basic arithmetic operations (+, -, *, //, %), for example, the user can write the following code: fma_unit_count, vector_size = acc.create_parameters(2) jjj = schedule.split(jj, fma_unit_count * vector_size) jjjj = schedule.split(jjjj, vector_size) Related work items: #3692 commit 6d5e71899c6fb606e32ec46ee871ae1af25d3cd6 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 21 18:22:12 2022 +0000 Merged PR 2539: [nfc][docs] Merging commits from Github/main commit ee28126a338d905eb5931038d3c5daba6ead3811 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Wed Apr 20 21:35:20 2022 +0800 Update arrow label positions (#35) * [nfc] [doc] Update arrow label positions * make arrowhead more visible * nfc commit ddcecaaffd9dd0861999a6d29443dc7c37d79665 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Wed Apr 20 21:34:40 2022 +0800 demo fixes for hatlib 0.0.11 (#36)
- Loading branch information
Lisa Ong
committed
Apr 27, 2022
1 parent
89850bf
commit 5b0f142
Showing
56 changed files
with
3,337 additions
and
522 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ trigger: | |
include: | ||
- external/llvm | ||
|
||
pool: LinuxScaleSetAgentPool | ||
pool: Linux1ESPool | ||
|
||
steps: | ||
- script: | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
build/ | ||
external/ | ||
external/vcpkg/downloads | ||
external/vcpkg/buildtrees | ||
*.egg-info |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#################################################################################################### | ||
# Copyright (c) Microsoft Corporation. All rights reserved. | ||
# Licensed under the MIT License. See LICENSE in the project root for license information. | ||
#################################################################################################### | ||
|
||
if(APPLE) | ||
# cf. https://discourse.cmake.org/t/how-to-determine-which-architectures-are-available-apple-m1/2401/10 | ||
# on macOS "uname -m" returns the architecture (x86_64 or arm64) | ||
execute_process( | ||
COMMAND uname -m | ||
RESULT_VARIABLE result | ||
OUTPUT_VARIABLE OSX_NATIVE_ARCH | ||
OUTPUT_STRIP_TRAILING_WHITESPACE | ||
) | ||
endif() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
// RUN: acc-opt --verify-each=false --acc-affine-simplify %s | FileCheck %s | ||
|
||
module @test_accera_affine_simplification { | ||
accv.module "test_accera_affine_simplification" { | ||
|
||
// FloorDiv simplification tests | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_floordiv_no_terms_strides | ||
accv.func nested @test_simplify_floordiv_no_terms_strides(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 4 { | ||
// CHECK: %1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 64 + %arg2 * 33 + %arg3 * 31) floordiv 32] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_floordiv_no_terms_range | ||
accv.func nested @test_simplify_floordiv_no_terms_range(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 5 { // This range being 5 will prevent the simplification from removing this term | ||
// CHECK: %1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) floordiv 32] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_floordiv_one_term | ||
accv.func nested @test_simplify_floordiv_one_term(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 4 { | ||
// CHECK: %1 = affine.load %arg0[%arg1 * 2 + (%arg2 * 48) floordiv 32] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 48 + %arg3) floordiv 32] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[%arg1 * 2 + (%arg2 * 48) floordiv 32] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 64 + %arg2 * 48 + %arg3) floordiv 32] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_floordiv_two_terms | ||
accv.func nested @test_simplify_floordiv_two_terms(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 4 { | ||
// CHECK: %1 = affine.load %arg0[%arg1 * 2] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 128 + %arg2 * 4 + %arg3) floordiv 64] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[%arg1 * 2] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 128 + %arg2 * 4 + %arg3) floordiv 64] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// Mod simplification tests | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_mod_no_terms_strides | ||
accv.func nested @test_simplify_mod_no_terms_strides(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 4 { | ||
// CHECK: %1 = affine.load %arg0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 68 + %arg2 * 33 + %arg3 * 31) mod 32] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_mod_no_terms_range | ||
accv.func nested @test_simplify_mod_no_terms_range(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 5 { // This range being 5 will prevent the simplification from removing this term | ||
// CHECK: %1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 64 + %arg2 * 4 + %arg3) mod 32] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_mod_one_term | ||
accv.func nested @test_simplify_mod_one_term(%arg0: memref<32xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<32xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 4 { | ||
// CHECK: %1 = affine.load %arg0[%arg3 + (%arg1 * 68 + %arg2 * 48) mod 32] : memref<32xf32> | ||
%1 = affine.load %arg0[(%arg1 * 68 + %arg2 * 48 + %arg3) mod 32] : memref<32xf32> | ||
// CHECK: affine.store %1, %0[%arg3 + (%arg1 * 68 + %arg2 * 48) mod 32] : memref<32xf32> | ||
affine.store %1, %0[(%arg1 * 68 + %arg2 * 48 + %arg3) mod 32] : memref<32xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
|
||
// CHECK-LABEL accv.func nested @test_simplify_mod_all_terms | ||
accv.func nested @test_simplify_mod_all_terms(%arg0: memref<64xf32>) attributes {exec_target = 0 : i64} { | ||
%0 = memref.alloc() : memref<64xf32> | ||
affine.for %arg1 = 0 to 16 { | ||
affine.for %arg2 = 0 to 16 { | ||
affine.for %arg3 = 0 to 4 { | ||
// CHECK: %1 = affine.load %arg0[%arg3 + %arg2 * 4] : memref<64xf32> | ||
%1 = affine.load %arg0[(%arg1 * 128 + %arg2 * 4 + %arg3) mod 64] : memref<64xf32> | ||
// CHECK: affine.store %1, %0[%arg3 + %arg2 * 4] : memref<64xf32> | ||
affine.store %1, %0[(%arg1 * 128 + %arg2 * 4 + %arg3) mod 64] : memref<64xf32> | ||
} {begin = 0 : i64, end = 4 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
} {begin = 0 : i64, end = 16 : i64} | ||
accv.return | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.