Demo fixes for hatlib 0.0.11 #36

lisaong · 2022-04-20T13:34:28Z

Describe the pull request

What does your PR fix?

Fixes signature changes to hat.load from hatlib 0.0.11
Is this a documentation-only fix?

Yes

If you are still working on the PR, open it as a Draft: https://github.blog/2019-02-14-introducing-draft-pull-requests/

commit 6654c774fb0d2d6fac760b911a547b4e66b23127 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Wed Apr 27 00:44:53 2022 +0000 Merged PR 2522: Generalize array indexing in tensorized GEMM This PR generalizes the MFMA tensorization pass to improve the handling of code in the innermost loop. It recognizes more ways of writing the GEMM kernel, and rejects many ill-formed GEMM kernels. There are also a number of tests. This PR doesn't yet generalize to batch-GEMM, where the matrices (typically) have 3 indices. Related work items: #3676 commit 4d030709101f3653712b805bd8f3698e0e293bd3 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 17:50:18 2022 +0000 Merged PR 2551: [nfc][ci] Switch hosted pipelines to 1ES hosted pool * The Linux1ESPool is created to support internal builds of LLVM * Fix regression in pipeline due to overzealous .dockerignore commit 9b9d6b4b77c46b12788665412b9d0d1c2ff62d18 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 10:43:28 2022 +0000 Merged PR 2550: [nfc] [docs] Merge changes from GitHub remote In preparation for merge from ADO to GitHub for Case Studies publishing commit c1298946d18fb785788c556ea2959b9438f9c6b7 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 08:10:47 2022 +0000 Merged PR 2549: [Compliance] Switching from Dockerhub to ACR for third party containers Updating Dockerfile references commit 0c7a3610ba082e82e554297bdadbf9579b094745 Author: Denny Sun <dennys@microsoft.com> Date: Tue Apr 26 04:40:05 2022 +0000 Merged PR 2548: Add README file for case studies README file has a table where each case study points to the external repo link. commit edbc50edd00efe8f12a675735d7e52371e43f7b1 Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Apr 25 23:49:15 2022 +0000 Merged PR 2546: [dev] [nfc] Natively support macOS/arm64 for development Limited to local development scenarios (LLVM_SETUP_VARIANT=Default) No plans to release pip packages until there is CI support Verified on: Big Sur (MacOSX 12.3 arm64) / Python 3.10 commit 166e333a3d10b77c804dc3edc1c71bfc5716c768 Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Apr 25 17:50:22 2022 +0000 Merged PR 2543: Add precomputed offset map optimization for tensorization (no caching) - Add flag to tensorize() to enable optimization (off by default) - Optimization only affects load/store of accumulator (C) argument - Supports all 4 mfma shapes Related work items: #3671 commit e11c4d4e87bbae87f7cb9035eff8e6af650c9d1a Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Sun Apr 24 01:00:41 2022 +0000 Merged PR 2542: An assortment of minor fixes This PR is a hodgepodge of tiny fixes. I'm happy to split it up into separate PRs if a kitchen-sink PR is too gross. The specific things are: - Add 2 new target models to `Targets.py` (that correspond to my local dev boxes) - Change the snapshot IR format for sub-passes to use the same format as the top-level passes (that is, not "generic" format) - Print a warning message if `check_correctness` skips a correctness check because no hat file was generated - Add a "minimum version" constraint to `requirements.txt` for `hatlib` commit 8da7903ac9b6d8612711593308e49a7a3e82678d Author: Kern Handa <kerha@microsoft.com> Date: Sat Apr 23 23:59:53 2022 +0000 Merged PR 2545: Unifies CUDA and CPP enum values to SOURCE for Package.Format Unifies CUDA and CPP enum values to SOURCE for Package.Format Related work items: #3679 commit fe2c40fa8f1c28dcf47e1533223457fd3e6bf195 Author: Kern Handa <kerha@microsoft.com> Date: Sat Apr 23 23:17:43 2022 +0000 Merged PR 2544: [nfc] Removes now unnecessary ldebug output [nfc] Removes now unnecessary ldebug output commit 32090d786ce13299bb77a6675c3478b3d7cdf48c Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 22 21:31:01 2022 +0000 Merged PR 2527: Enable vectorized shared memory write Enable vectorized shared memory write - This adds mod simplification support needed for vecotrizing shared memory writes - Also refactors some of the affine simplification code slightly to share some common code between the floordiv and mod simplifications Related work items: #3586, #3661, #3689 commit 0eb698af118b94bf3f4d4862a142c86055f8b7bb Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 22 19:13:27 2022 +0000 Merged PR 2526: Enable GPU global read vectorization Enable GPU global read vectorization - Implements a floor div simplification that enables better recognition of vectorizable load and stores Related work items: #3661, #3690 commit df849f066ff6c2c82c796d9b48e3bea6390c7877 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Fri Apr 22 06:03:27 2022 +0000 Merged PR 2541: Fix a few issues with GEMM benchmarking script This PR fixes a couple of errors: - there was a bug in the GEMM kernel - sometimes hatlib would fail to return a compiled function, but not throw an exception. These are now flagged as "uncompilable" It makes a couple of other tweaks: - it fails if the `alpha` and `beta` parameters aren't `1.0` and `0.0` - it culls some variants with known-uncompilable tensorization parameters before trying to compile them commit 339253767ae4bb4f7e5c323f77fc938ba1a4ab92 Author: Lisa Ong <onglisa@microsoft.com> Date: Fri Apr 22 01:26:53 2022 +0000 Merged PR 2538: Fix std::pair unpacking issue in TensorizeAffineForOpConversion In debug builds, we are getting garbage values for warpSizeX and warpSizeY, resulting in division by 0 errors in the emitted .cu files commit 075c83247d34bfd9fb291e4ea6b9df059a94993a Author: Denny Sun <dennys@microsoft.com> Date: Fri Apr 22 00:26:56 2022 +0000 Merged PR 2536: Parameter supports most of the arithmetic/binary/unary operations defined in operator lib Parameter supports the basic arithmetic operations (+, -, *, //, %), for example, the user can write the following code: fma_unit_count, vector_size = acc.create_parameters(2) jjj = schedule.split(jj, fma_unit_count * vector_size) jjjj = schedule.split(jjjj, vector_size) Related work items: #3692 commit 6d5e71899c6fb606e32ec46ee871ae1af25d3cd6 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 21 18:22:12 2022 +0000 Merged PR 2539: [nfc][docs] Merging commits from Github/main commit ee28126a338d905eb5931038d3c5daba6ead3811 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Wed Apr 20 21:35:20 2022 +0800 Update arrow label positions (#35) * [nfc] [doc] Update arrow label positions * make arrowhead more visible * nfc commit ddcecaaffd9dd0861999a6d29443dc7c37d79665 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Wed Apr 20 21:34:40 2022 +0800 demo fixes for hatlib 0.0.11 (#36)

commit f3a1a2becb6740ae8cf7873b5029c6df140f5c19 Author: Kern Handa <kerha@microsoft.com> Date: Tue Jul 12 16:52:41 2022 +0000 Merged PR 2744: [doc] Fixes link in reference/functions/cast.md, revs version on all docs [doc] Fixes link in reference/functions/cast.md commit 23f4c8fbf2415b02e8b0090a76380d34790205fa Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Jul 12 05:55:48 2022 +0000 Merged PR 2743: [DSL] Document implicit casting rules and the explicit `cast` function * Document implicit casting rules implemented by !2693 * Promote `acc.cast` to a documented function to give the user control to override implicit casting behavior commit 3ec63b62705327a65decc4da7ec4cb5412dc7299 Author: Kern Handa <kerha@microsoft.com> Date: Mon Jul 11 23:57:23 2022 +0000 Merged PR 2739: Updates ROCM tensorization pattern to handle casting Updates ROCM tensorization pattern to handle casting commit 60c082dd38ff1b0bc030a7e28dc19f553bad9099 Author: Mason Remy <masonr@microsoft.com> Date: Mon Jul 11 22:58:42 2022 +0000 Merged PR 2643: Some fixes for last major array caching in tensorization Some fixes for last major array caching in tensorization commit 812c3065b7d4d6c9d716acf4fb1df4be66ef101d Author: Kern Handa <kerha@microsoft.com> Date: Mon Jul 11 20:43:12 2022 +0000 Merged PR 2693: Updates DSL codegen to implicitly cast if possible Updates DSL codegen to implicitly cast if possible commit 6ed316e50e8f9e398f9ee6b8bfa8e6aa05fbffb1 Author: Ritwik Das <ritdas@microsoft.com> Date: Sat Jul 9 05:52:22 2022 +0000 Merged PR 2735: Pass multiple input files as comma-separated list to benchmark tool https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41588&view=logs&j=d78921a4-2f18-50b0-77ad-4c6803f3371b&t=f97c60f6-ada7-5ec9-5ea1-510216c408e9 Above pipeline did not run the 2nd set of input sizes since the 1st process did not exit until pipeline timeout was hit. After the fix, we will always have a single job. commit e5010caebc5a135e40464a06432a5cf1fc965203 Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Jun 27 23:32:49 2022 +0000 Merged PR 2721: Remove unnecessary logging in benchmarks Remove unnecessary logging in benchmarks commit e0c5945d3ef218a5be858bc0934274793972abdb Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Jun 21 01:12:02 2022 +0000 Merged PR 2674: Support emitting runtime array sizes in the Value DSL * Minimum set of changes to support runtime sizes in the Value DSL without transformations * Add a ScalarDimension type (name TBC) which is aliased to Scalar * Support variable ends in MemoryLayout, ScheduledLoopOp, RangeValueAnalysis * Use mlir::ShapedType::kDynamicSize and mlir::ShapedType::kDynamicStrideOrOffset as sentinel values, following the pattern in MemRefOps, TensorOps, etc. * TODO: E2E verification in the next PR * TODO: Python DSL changes in the next PR Output of mlir-translate for the runtime_sizes_all case, where %21, %22 and %23 are the runtime sizes for M, N, and K: ``` define void @NestMatMul(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, float* %7, float* %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, float* %14, float* %15, i64 %16, i64 %17, i64 %18, i64 %19, i64 %20, i64 %21, i64 %22, i64 %23) !dbg !3 { br label %25, !dbg !7 25: ; preds = %57, %24 %26 = phi i64 [ %58, %57 ], [ 0, %24 ] %27 = icmp slt i64 %26, %21, !dbg !9 br i1 %27, label %28, label %59, !dbg !10 28: ; preds = %25 br label %29, !dbg !11 29: ; preds = %55, %28 %30 = phi i64 [ %56, %55 ], [ 0, %28 ] %31 = icmp slt i64 %30, %22, !dbg !12 br i1 %31, label %32, label %57, !dbg !13 32: ; preds = %29 br label %33, !dbg !14 33: ; preds = %36, %32 %34 = phi i64 [ %54, %36 ], [ 0, %32 ] %35 = icmp slt i64 %34, %23, !dbg !15 br i1 %35, label %36, label %55, !dbg !16 36: ; preds = %33 %37 = mul i64 %26, %5, !dbg !17 %38 = add i64 %37, %34, !dbg !18 %39 = getelementptr float, float* %1, i64 %38, !dbg !19 %40 = load float, float* %39, align 4, !dbg !20 %41 = mul i64 %34, %12, !dbg !21 %42 = add i64 %41, %30, !dbg !22 %43 = getelementptr float, float* %8, i64 %42, !dbg !23 %44 = load float, float* %43, align 4, !dbg !24 %45 = fmul float %40, %44, !dbg !25 %46 = mul i64 %26, %19, !dbg !26 %47 = add i64 %46, %30, !dbg !27 %48 = getelementptr float, float* %15, i64 %47, !dbg !28 %49 = load float, float* %48, align 4, !dbg !29 %50 = fadd float %49, %45, !dbg !30 %51 = mul i64 %26, %19, !dbg !31 %52 = add i64 %51, %30, !dbg !32 %53 = getelementptr float, float* %15, i64 %52, !dbg !33 store float %50, float* %53, align 4, !dbg !34 %54 = add i64 %34, 1, !dbg !35 br label %33, !dbg !36 55: ; preds = %33 %56 = add i64 %30, 1, !dbg !37 br label %29, !dbg !38 57: ; preds = %29 %58 = add i64 %26, 1, !dbg !39 br label %25, !dbg !40 59: ; preds = %25 ret void, !dbg !41 } ``` Related work items: #3716, #3717 commit 51a07e5c60009c47c3b375b402ac96f47619ca8f Author: Ritwik Das <ritdas@microsoft.com> Date: Tue Jun 21 00:18:02 2022 +0000 Merged PR 2682: Add nvidia device optimized sizes and some benchmark fixes Add nvidia dev opt sizes and some bench fixes commit 6325b5e5bc68136d29e4a65d657699a4e781214d Author: Ritwik Das <ritdas@microsoft.com> Date: Sat Jun 18 17:59:50 2022 +0000 Merged PR 2676: Add automated weekly rocm baseline benchmark https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41316&view=logs&j=4f7f213a-5f0f-58b0-1189-99ef12faf0d8&t=687344d2-d6b6-5d8c-dd9d-6aab558fd96c https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41314&view=logs&j=4f7f213a-5f0f-58b0-1189-99ef12faf0d8 commit 940e599ff7026e7c41cb1b2566eec44d70709e96 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Jun 17 16:34:22 2022 +0000 Merged PR 2673: Add automated weekly baseline benchmarks on Nvidia GPU commit 1a521c78783538e233230ecc90d2bc347092e0da Author: Ritwik Das <ritdas@microsoft.com> Date: Thu Jun 16 00:07:26 2022 +0000 Merged PR 2657: Add conversion pass from gpu ops to rocdl ops - switch to gpu dialect for gpu index ops - add conversion pass from gpu dialect to rocdl commit 3c3271deb2a2a42bac5e6356f28d7e04e6ff7678 Author: Ritwik Das <ritdas@microsoft.com> Date: Wed Jun 15 02:52:50 2022 +0000 Merged PR 2652: Add integer tensor ops support for AMD targets - int mfma ops - tests - static_cast in c++ Related work items: #3727 commit 8b96d139067119de37ab6bcd1ecb0382ce7f46b3 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Jun 14 20:25:33 2022 +0000 Merged PR 2650: [release] Docs version to 1.2.6, sync Github to ADO Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Jun 14 17:04:53 2022 +0800 bump docs version to 1.2.6 commit 4014c176e8e5eb270404915c1bdc7404d657d9f6 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat Jun 4 07:11:15 2022 +0800 Bump bottle from 0.12.19 to 0.12.20 in /tools/viz (#44) Bumps [bottle](https://github.com/bottlepy/bottle) from 0.12.19 to 0.12.20. - [Release notes](https://github.com/bottlepy/bottle/releases) - [Changelog](https://github.com/bottlepy/bottle/blob/master/docs/changelog.rst) - [Commits](https://github.com/bottlepy/bottle/compare/0.12.19...0.12.20) commit 72ede2b19b5e21f3d96fc800bcc1ed30d69c389c Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Jun 13 23:17:59 2022 +0000 Merged PR 2624: Add more MMA shapes for CUDA Add more MMA shapes for CUDA - 32x8x16 - 8x32x16 commit f237d89d12e4935af98c85b8a6ef5cdd2777272f Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Jun 13 08:19:08 2022 +0000 Merged PR 2644: Enable CUDA benchmarks only for A6000 * Manually set the Target.Model user capability on agents running A6000 * Update benchmarking pipelines to demand A6000s https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml#feedback commit ad46d3c4bd399b1bd3fa6b24a90aa29f1f0bf685 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Jun 10 19:44:02 2022 +0000 Merged PR 2634: Remove couple more big gemm sizes Remove couple more big gemm sizes commit 21638e465b02e4730842f14ac08f3caa28942b65 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Jun 9 23:23:10 2022 +0000 Merged PR 2626: [refactor] Moving debug mode to its own lowering pass Move the emitting of the debug mode wrapper function out of MLIREmitterContext into a lowering pass. This makes it easier to expand debug mode in the future. commit 91c178f348c61b45a7505ee69da068f55536cdc9 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Jun 9 17:20:50 2022 +0000 Merged PR 2633: Bump hatlib to 0.0.19 to unblock CUDA T4 devices https://github.com/microsoft/hat/releases/tag/v0.0.19 commit 209effd66c47a36d7399a83a286447f9bc70f5b4 Author: Ritwik Das <ritdas@microsoft.com> Date: Thu Jun 9 09:22:03 2022 +0000 Merged PR 2630: Add batched gemm support with tensorization Related work items: #3677 commit 12cb73e175bb590e7a1b575b0ae75a3d99b812ff Author: Ritwik Das <ritdas@microsoft.com> Date: Thu Jun 9 07:12:17 2022 +0000 Merged PR 2631: Add cosmosdb key env var and shuffle gemm sizes - Add env var for ACCOUNT_KEY - shuffle gemm sizes from small to big - remove correctness check from big inputs and fp16 commit bb4de9008fdfbc36768e922ce05efe5e28733cfd Author: JUBI TANEJA <jubitaneja@microsoft.com> Date: Thu Jun 9 00:02:41 2022 +0000 Merged PR 2607: Infrastructure for plan.auto() to support a basic none cache heuristics approach Infrastructure for plan.auto() to support a basic none cache heuristics approach This is a basic approach to test parameterization of cache arguments, index and layout. User only needs to specify the source they want to cache, and AutoPlanner's NoneCacheHeuristics algorithm will synthesize the remaining parameters for caching with possible set of values. **Overall idea at DSL level:** Given input - schedule.reorder(i, j, k, ii, jj, kk) plan.auto(accera.algorithms.NoneCacheHeuristics(source = B, index = j)) Internally, auto() invokes cache and adds two functions with a unique value of layout. plan.cache(source = B, index = j, layout = {FIRST_MAJOR, LAST_MAJOR}) **Important change in this PR:** - Add a new algorithms module in Accera - Do not delay resolution of delayed parameters to get the value, instead it now allows setting parameters with a possible set of values and this can be passed between heuristics and plan object. Check: Parameter.py - Parameters constructed by heuristics are termed as "herustic parameters". They are not available to the external users of Accera, but just named separately in the implementation to differentiate them from user-defined "parameters". **Limitation/Changes coming in the subsequent PRs:** - Allow user-defined parameters and heuristic parameters both for AutoPlanner test cases. For now, the code only focuses on testing AutoPlanner without any user-defined parameters that one can create using API: `create_parameters`. - Documentation of AutoPlanner -- design goals, tutorial, API description, etc. is coming in the next PR. commit c437a48aede39d2bb8dcb963de6c3c3b5c6ac682 Author: Mason Remy <masonr@microsoft.com> Date: Tue Jun 7 07:30:34 2022 +0000 Merged PR 2600: Refactor MFMA indexing calculations Refactor MFMA indexing calculations - Use the iteration space position when determing MFMA computation locations rather than computing the position from the thread id - Construct the full subschedules for AMD MFMA ops so that the bound loop indices are ordered appropriately for the MFMA op being invoked - Update unit tests accordingly. The schedule changes may need to be moved to an under-the-hood feature of tensorization commit ea74434226671de7dac68dd70eb01c6142a1257e Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Jun 6 23:29:26 2022 +0000 Merged PR 2627: Raise error for invalid block dimensions Raise error for invalid block dimensions based on target info Related work items: #3715 commit 1ccd939a800ec565835c5b4917d3411aa526f992 Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Jun 6 17:12:42 2022 +0000 Merged PR 2625: [nfc] Block debug mode for unsupported GPU targets Debug mode is not yet supported for GPU targets * Fail early * Update documentation commit e757df5e8792d358d37ab8da3fe90d1310485274 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Jun 3 21:36:08 2022 +0000 Merged PR 2622: Fix dependencies for benchmark tools Fix dependencies for benchmark tools commit 34a6f3f061ded0a0e0620393f120ce0e6431bc92 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Jun 3 18:16:30 2022 +0000 Merged PR 2604: Add bfloat16 support for tensor ops on rocm Add bfloat16 support for tensor ops on cuda and rocm Related work items: #3713 commit f5c864eaed962e7e3775e732abbf5e73df8e6c4b Author: Lisa Ong <onglisa@microsoft.com> Date: Fri Jun 3 00:41:50 2022 +0000 Merged PR 2621: Merge changes from Github repo commit 5b5f5eff6b0c5422f026634153b5de219db2c628 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Fri Jun 3 00:14:04 2022 +0800 [ci] Fix out of disk space errors for CI workflow (#43) * Split into debug/release builds * Cleanup vcpkg buildtrees commit b7135deb61766e52b4583db0800926799084dafe Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Thu Jun 2 22:21:56 2022 +0800 Update ci.yml Cleanup build folder commit b59a60435eb72334898f0ca227ad24a15e003498 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Jun 2 16:20:50 2022 +0800 Bump urllib3 from 1.25.8 to 1.26.5 in /tools/benchmarkers (#42) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit cf9868ebfb0346903335d55d745eefdc051d9cbd Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Jun 2 22:00:11 2022 +0000 Merged PR 2620: Upgrade GPU self-hosted agents to g++-10 The stock g++-9 from Ubuntu 20.04 crashes when compiling pybind11 alongside mlir/Dialect/IR/Affine/AffineOp.h. This change updates to g++-10 for the self-hosted images only, as this issue only affects images that we build for ROCm and CUDA. Azure DevOps agents will continue to run on their pre-installed g++-9. commit efdff3963b07469af391ffc950c6b751d41b81df Author: Denny Sun <dennys@microsoft.com> Date: Thu Jun 2 04:06:57 2022 +0000 Merged PR 2619: Parameterize Plan.bind ``` P0, P1, P2, P3, P4, P5 = create_parameters() plan.bind(mapping={ P0: P3, P1: P4, P2: P5 }) package.add( plan, args=(A, B, C), parameters={ P0: i, P1: j, P2: k, P3: v100.GridUnit.BLOCK_X, P4: v100.GridUnit.THREAD_X, P5: v100.GridUnit.THREAD_Y, }, base_name=test_name) ``` Related work items: #3708 commit 5c0fdfde2512baed8a37bde8c50c4bce649930b3 Author: Mason Remy <masonr@microsoft.com> Date: Wed Jun 1 20:17:00 2022 +0000 Merged PR 2599: Support parameterizing caches based on memory space Support parameterizing caches based on memory space - Identifies bound indices that the cache should be parameterized on, rather than shaped by. e.g. for a private memory cache inserted at a gpu block level, the computed memory space will not be the full active block at that level, but the portion derived from loops that weren't bound to gpu thread dims. - Adds some BoundProcessorOp utilities and shares some common binding code commit 2c4cceca16084554b4af458525fc68f06503bea1 Author: Ritwik Das <ritdas@microsoft.com> Date: Wed Jun 1 08:46:52 2022 +0000 Merged PR 2618: Fix memory allocation bug during benchmark verification Fix memory allocation bug during benchmark verification commit 2977b3b905109d2405524d7e6eb1583eed9f2d7d Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Jun 1 04:19:38 2022 +0000 Merged PR 2617: [nfc] [doc] Fix typo and re-sync models table commit 53dfbdb0e42250cf877da0c0e04f93acedc5caf4 Author: Denny Sun <dennys@microsoft.com> Date: Wed Jun 1 03:38:27 2022 +0000 Merged PR 2616: Formatting Python code a bit for the better readability 1. Some functions have a long list of parameters, add line wrap 2. Separate external imports from internal ones commit 92565fa4495e828efb698078e30a5233d343f287 Author: Ritwik Das <ritdas@microsoft.com> Date: Tue May 31 07:41:56 2022 +0000 Merged PR 2614: Remove redundant variable and cosmosdb fix Cosmos DB error when upserting from multiple processes: Process runner0: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/azp/_work/2/s/tools/benchmarkers/accera_gemm.py", line 633, in gemm_runner cosmosdb.upsert_benchmark_results(resultRows, containerName, verboseLogs) File "/azp/_work/2/s/tools/benchmarkers/cosmosdb.py", line 27, in upsert_benchmark_results container = get_container(containerName, verboseLogs) File "/azp/_work/2/s/tools/benchmarkers/cosmosdb.py", line 18, in get_container container = db.create_container_if_not_exists(id=containerName, partition_key=PartitionKey(path='/partitionKey')) File "/usr/local/lib/python3.8/dist-packages/azure/core/tracing/decorator.py", line 62, in wrapper_use_tracer return func(*args, **kwargs) # type: ignore File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/database.py", line 287, in create_container_if_not_exists container_proxy.read( File "/usr/local/lib/python3.8/dist-packages/azure/core/tracing/decorator.py", line 62, in wrapper_use_tracer return func(*args, **kwargs) # type: ignore File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/container.py", line 145, in read self._properties = self.client_connection.ReadContainer( File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_cosmos_client_connection.py", line 469, in ReadContainer return self.Read(path, "colls", collection_id, None, options, **kwargs) File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_cosmos_client_connection.py", line 2162, in Read result, self.last_response_headers = self.__Get(path, request_params, headers, **kwargs) File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_cosmos_client_connection.py", line 2209, in __Get return synchronized_request.SynchronizedRequest( File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_synchronized_request.py", line 210, in SynchronizedRequest return _retry_utility.Execute( File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_retry_utility.py", line 73, in Execute result = ExecuteFunction(function, global_endpoint_manager, *args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_retry_utility.py", line 130, in ExecuteFunction return function(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/azure/cosmos/_synchronized_request.py", line 158, in _Request raise exceptions.CosmosHttpResponseError(message=data, response=response) azure.cosmos.exceptions.CosmosHttpResponseError: Status code: 400 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd"> <HTML><HEAD><TITLE>Bad Request</TITLE> <META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD> <BODY><h2>Bad Request</h2> <hr><p>HTTP Erro... commit 34fadcd529161a656a34f67288fb934b42a73524 Author: Ritwik Das <ritdas@microsoft.com> Date: Tue May 31 00:37:59 2022 +0000 Merged PR 2613: Enable daily CUDA benchmarks - Enable CUDA benchmarks - some refactoring commit 52d7f77ecea2d437f658be9cf898724ee220e670 Author: Mason Remy <masonr@microsoft.com> Date: Mon May 30 05:57:31 2022 +0000 Merged PR 2596: Updates to affine simplifications Updates to affine simplifications - Run simplifications on AffineApplyOps - Detect and simplify some single-element-numerator cases for floordiv and mod - Detect GPU constants such grid dim size and block dim size and incorporate those constants into affine maps for later simplification - Detect GPU bound dimensions block id and thread id in affine ops and incorporate those ranges into simplification passes Related work items: #3667 commit 2e1b837854692a64df920b7b3ef44d9d5a5ca3fa Author: Mason Remy <masonr@microsoft.com> Date: Fri May 27 22:02:47 2022 +0000 Merged PR 2594: Always resolve unrealized loopnest indices when computing cache positions Always resolve unrealized loopnest indices when computing cache positions commit 53edefaddd4de4f2eba90033fda3e9b61d2c3c95 Author: Mason Remy <masonr@microsoft.com> Date: Fri May 27 21:14:50 2022 +0000 Merged PR 2574: Support binding multiple indices to a processor handle Support binding multiple indices to a processor handle - This creates a mapping of the processor handle to the index iterations based on the ordering of the indices in the tuple commit 6cf6faf95f72c7df3bf6acf39a29bb2591275781 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Fri May 27 19:55:20 2022 +0000 Merged PR 2611: Fix issue when splitting indices by factors that don't divide evenly This PR fixes an issue when splitting by a factor that doesn't evenly divide the parent index's range. E.g., if `i` has a range of `[0, 320)`, then `ii = split(i, 128)` would end up with `ii` having a range of `192` instead of `128`. commit 8c1a2da2e83e92fa07dd45c6d734cca1ab678f74 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri May 27 16:09:39 2022 +0000 Merged PR 2612: Add missing psutil dependency - Add missing psutil dependency - Remove private branch from benchmarks commit 3e4a765acb1ab2905e57e6a37ee9e2448faaa870 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri May 27 07:17:14 2022 +0000 Merged PR 2608: Caching fixes and benchmarking optimizations - Explore k_split independently of outer tile dims, allows for arbitrary k splits - Fix for workPerThread < 1 (from Mason), which was exposed since the benchmark now explores k-split of size 1, 2, 4, etc. and this causes small active blocks for caching, and when work per thread becomes less than 1 the compiler crashes during package.build. commit 11810900f837a261d97bb7589620a8fe7e82b70a Author: Lisa Ong <onglisa@microsoft.com> Date: Fri May 27 01:29:45 2022 +0000 Merged PR 2610: Opportunistically add more targets used in CI machines and update Model.md * Renamed some fields to add units * Added some Intel Xeon models as we encounter them * Updated some cache sizes commit e6d31889676e69d9f884da9e543b295b1bfc7aed Author: Denny Sun <dennys@microsoft.com> Date: Thu May 26 19:47:02 2022 +0000 Merged PR 2606: Parameterize Array.sub_array ` P0, P1 = create_parameters() arr = Array(role=Array.Role.INPUT_OUTPUT, element_type=ScalarType.float32, shape=(256, 256)) arr0 = arr.sub_array(offsets=(0, 0), shape=(P0, P1)) package.add(nest, args=(arr0, ), parameters={P0: 128, P1: 128}) ` Related work items: #3707 commit cd39c8d4ead0392c7be87a059a2df70e6cf373a4 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu May 26 09:13:30 2022 +0000 Merged PR 2609: [build] peg protobuf to 3.20.1 due to incompatibilities with latest version Even though we peg to onnx==1.9.0, onnx requires protobuf >= 3.20.1 which pulls an incompatible version of protobuf (4.x). commit ee0ad728c60353a3e8562d428dc1f6d9355cffa2 Author: Lisa Ong <onglisa@microsoft.com> Date: Wed May 25 08:08:31 2022 +0000 Merged PR 2576: [doc] MFMA thread assignment visualizations for AMD Some helper visualizations for MFMA: * 2x2x16 * 4x4x32 commit 90edd8ad70675bfb10bf448e0919f18a95b8453f Author: Lisa Ong <onglisa@microsoft.com> Date: Wed May 25 07:12:07 2022 +0000 Merged PR 2601: [ci] CUDA pipeline and buddy build * Container for CUDA self-hosted Azure devops agent * Initial buddy build pipeline (similar to ROCm) * Replaces references to Dockerhub with Azure Container Registry for compliance purposes commit 0c2d6416bb1bf96e184b43289c55cf01522fa05b Author: Lisa Ong <onglisa@microsoft.com> Date: Wed May 25 06:25:53 2022 +0000 Merged PR 2603: Add CUDA pipeline host to known targets Note that the CPU frequency is conflicting, I went with cpuinfo and dmesg. References: ``` > python -m cpuinfo Python Version: 3.8.10.final.0 (64 bit) Cpuinfo Version: 8.0.0 Vendor ID Raw: AuthenticAMD Hardware Raw: Brand Raw: AMD EPYC 7V12 64-Core Processor Hz Advertised Friendly: 3.3049 GHz Hz Actual Friendly: 3.3049 GHz Hz Advertised: (3304919000, 0) Hz Actual: (3304919000, 0) Arch: X86_64 Bits: 64 Count: 128 Arch String Raw: x86_64 L1 Data Cache Size: 2 MiB L1 Instruction Cache Size: 2 MiB L2 Cache Size: 32 MiB L2 Cache Line Size: 512 L2 Cache Associativity: 6 L3 Cache Size: 524288 Stepping: Model: 49 Family: 23 Processor Type: Flags: 3dnowext, 3dnowprefetch, abm, adx, aes, aperfmperf, apic, arat, avic, avx, avx2, bmi1, bmi2, bpext, cat_l3, cdp_l3, clflush, clflushopt, clwb, clzero, cmov, cmp_legacy, constant_tsc, cpb, cpuid, cqm, cqm_llc, cqm_mbm_local, cqm_mbm_total, cqm_occup_llc, cr8_legacy, cx16, cx8, dbx, de, decodeassists, extapic, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fxsr, fxsr_opt, ht, hw_pstate, ibpb, ibrs, ibs, irperf, lahf_lm, lbrv, lm, mba, mca, mce, misalignsse, mmx, mmxext, monitor, movbe, msr, mtrr, mwaitx, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, overflow_recov, pae, pat, pausefilter, pci_l2i, pclmulqdq, pdpe1gb, perfctr_core, perfctr_llc, perfctr_nb, pfthreshold, pge, pni, popcnt, pqe, pqm, pse, pse36, rdpid, rdrand, rdrnd, rdseed, rdt_a, rdtscp, rep_good, sep, sev, sha, sha_ni, skinit, smap, smca, sme, smep, ssbd, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, stibp, succor, svm, svm_lock, syscall, tce, topoext, tsc, tsc_scale, umip, v_vmsave_vmload, vgif, vmcb_clean, vme, vmmcall, wbnoinvd, wdt, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves ``` ``` > dmesg | grep MHz [ 0.000000] tsc: Detected 2450.083 MHz processor [ 7.731766] hpet0: 3 comparators, 32-bit 14.318180 MHz counter [ 8.979712] tsc: Refined TSC clocksource calibration: 2449.961 MHz ``` ``` > lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7V12 64-Core Processor Stepping: 0 Frequency boost: enabled CPU MHz: 1497.558 CPU max MHz: 2450.0000 CPU min MHz: 1500.0000 BogoMIPS: 4900.16 Virtualization: AMD-V L1d cache: 2 MiB L1i cache: 2 MiB L2 cache: 32 MiB L3 cache: 2... commit d97008bfa45e1acbc2a37efee640bd15c3dfb8b4 Author: Ritwik Das <ritdas@microsoft.com> Date: Wed May 25 05:47:07 2022 +0000 Merged PR 2602: Add rocwmma plumbing in tensorize - Add rocwmma plumbing in tensorize - Cannot use this flag until the 5.2 ROCm release which natively supports rocWmma. Related work items: #3672 commit 426d382e6f2441fe40b29748895a1eba2cb2677d Author: Ritwik Das <ritdas@microsoft.com> Date: Wed May 25 03:52:48 2022 +0000 Merged PR 2570: Enhancements to the gpu benchmark tool - Add multiprocess package builders and runners - Support for running on different GPU devices - Add clock speed determinism - add composable_kernel benchmarks - add cutlass benchmarks - add cublas and rocblas benchmarks - Add Cosmos DB result upload capability Related work items: #3683, #3700, #3705, #3685 commit 4454f6455b1324b1a97706604f9051cc9076f97b Author: Mason Remy <masonr@microsoft.com> Date: Wed May 25 01:23:34 2022 +0000 Merged PR 2598: Fix mfma enum name typo Fix mfma enum name typo commit aaeb8d1883ce24873fd614efa9cc835383576f64 Author: Kern Handa <kerha@microsoft.com> Date: Tue May 24 07:29:17 2022 +0000 Merged PR 2595: [nfc] Renames smoke_test.py -> smoke_tests.py [nfc] Renames smoke_test.py -> smoke_tests.py commit fa7e10bcfd7f7424c976a7decaaaa1569d818525 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue May 24 04:51:43 2022 +0000 Merged PR 2593: [docs] [release] bump docs version to 1.2.5 in preparation for release bump docs version to 1.2.5 in preparation for release commit 33aaf54a3018ab0904e722f965f441df3c9c9f9f Author: Denny Sun <dennys@microsoft.com> Date: Mon May 23 23:05:17 2022 +0000 Merged PR 2586: Loop order and indices as parameters With this change, the user can write a schedule with loop_order parameterized: loop_order = create_parameters() schedule.reorder(order=loop_order ) parameter_grid = { loop_order : (j, k, i, ii, jj, kk) } parameters = create_parameter_grid(parameter_grid, filter_func = lambda *p : schedule.is_valid_loop_order(p[0][0]), sample=5) # Add another function to the package package.add( plan, args=(A, B, C), parameters=parameters, base_name="matmul_256_256_256" ) Related work items: #3693 commit efe3934c4bf4619999995322da5bea00d162e1f4 Author: Kern Handa <kerha@microsoft.com> Date: Fri May 20 19:49:29 2022 +0000 Merged PR 2591: Fixes more warnings. Enables STRICT_MODE for Linux PR CI commit 6a882080f7b19e4cc7e4f10cd3b82956fb758e8f Author: Lisa Ong <onglisa@microsoft.com> Date: Fri May 20 11:35:29 2022 +0000 Merged PR 2588: [test] Trim out redundant tests from ROCm pipeline The ROCm pipeline is currently on a single agent, avoid running CPU tests that are already running in other pipelines to speed up the pipeline execution. commit d9d11c44ad2d576b1c42ad09a5b267d1c8694994 Author: Kern Handa <kerha@microsoft.com> Date: Fri May 20 10:06:50 2022 +0000 Merged PR 2590: [nfc] Fixes a bunch of warnings in C++ layer [nfc] Fixes a bunch of warnings in C++ layer commit 1440a549ce2510c99f920f29c1cff15ebd159b55 Author: Kern Handa <kerha@microsoft.com> Date: Fri May 20 09:20:18 2022 +0000 Merged PR 2589: [test] Adds DSL tests for Schedule.pad Adds DSL tests for Schedule.pad commit adbf2ecd95d1389905e742e522702ef6fe66b615 Author: Lisa Ong <onglisa@microsoft.com> Date: Fri May 20 01:31:22 2022 +0000 Merged PR 2587: Sync Github to ADO commit b934ad05f6b8cd84420226b93f57b8ac3229eadc Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Thu May 12 08:44:15 2022 +0800 Update CONTRIBUTING.md commit f9f967cbbf36d8fe85b3078ae2e55c64501ac839 Author: Marina Neseem <marinahesham21@gmail.com> Date: Wed May 11 20:40:07 2022 -0400 Add link to the NCHWc 2D Convolution Case Study (#41) * Add link to the NCHWC 2D Convolution Case Study * Update README.md commit ea3b02fa3736c13b65f1eee38cee3035c9830b3b Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Thu May 19 20:16:27 2022 +0000 Merged PR 2585: Use conditional instead of loop-unswitching on GPU This PR changes how boundary conditions are handled on GPU-bound loop indices. If a loop's increment doesn't evenly divide its bounds, the body is guarded by a conditional instead of unswitching that loop. Related work items: #3703 commit f34beb256d542e2f30b72c36b92bd02d96a7dba8 Author: Denny Sun <dennys@microsoft.com> Date: Wed May 18 20:12:00 2022 +0000 Merged PR 2571: Add random seed to enable reproducible sampling Giving users control over sampling strategies. commit 7ece029087bcb0458e3c597b8ed1c3340b4307c1 Author: Ritwik Das <ritdas@microsoft.com> Date: Wed May 18 03:19:13 2022 +0000 Merged PR 2581: Add CUDA tensor core support - Added CUDA tensor ops (no caching) - Added validation tests - Changed MMA enum names - Bit of generated tensor op code in cuda: ``` ... vhalf *var11 = (vhalf*)arg2; wmma::fragment<wmma::accumulator, 16, 16, 16, vhalf> mmaMatrix_12; wmma::load_matrix_sync(mmaMatrix_12, var11 + var9 * 16 + var10, 16, wmma::layout_t::mem_row_major); vhalf *var13 = (vhalf*)arg0; wmma::fragment<wmma::matrix_a, 16, 16, 16, vhalf, wmma::row_major> mmaMatrix_14; wmma::load_matrix_sync(mmaMatrix_14, var13 + var9 * 16 + 0, 16); vhalf *var15 = (vhalf*)arg1; wmma::fragment<wmma::matrix_b, 16, 16, 16, vhalf, wmma::row_major> mmaMatrix_16; wmma::load_matrix_sync(mmaMatrix_16, var15 + 0 * 16 + var10, 16); wmma::fragment<wmma::accumulator, 16, 16, 16, vhalf> mmaMatrix_17; wmma::mma_sync(mmaMatrix_17, mmaMatrix_14, mmaMatrix_16, mmaMatrix_12); wmma::store_matrix_sync(var11 + var9 * 16 + var10, mmaMatrix_17, 16, wmma::layout_t::mem_row_major); ``` Related work items: #3694 commit 92e02a82046a1d8547b4ed4cae027cebabf663ff Author: Kern Handa <kerha@microsoft.com> Date: Tue May 17 20:42:38 2022 +0000 Merged PR 2584: Adds cublas_gemm benchmarking tool Adds cublas_gemm benchmarking tool commit 35ef308ded44d277a62ba3861b5aadcef8c327c7 Author: Mason Remy <masonr@microsoft.com> Date: Mon May 16 20:08:47 2022 +0000 Merged PR 2583: Don't hold ResolveWarpSize results with rvalue Don't hold ResolveWarpSize results with rvalue gcc appears to be inlining ResolveWarpSize incorrectly in some cases and not holding the result with an rvalue pair appears to fix it. This was resulting in some mod 0's and floordiv 0's when we would expect the warp size constants to either be 32 or 64 exactly. commit bece463adb08fd82a0698e571afe1e9cf850c082 Author: Kern Handa <kerha@microsoft.com> Date: Fri May 13 23:44:58 2022 +0000 Merged PR 2580: Fixes rocblas_gemm's fp32 -> fp16 conversion commit 3023409e3d229811e5c3e1bfb1c522684cbdf090 Author: Kern Handa <kerha@microsoft.com> Date: Thu May 12 09:22:08 2022 +0000 Merged PR 2579: Improves accera_gemm.py's handling of unsupported configs Improves accera_gemm.py's handling of unsupported configs commit 279b916ad2c38f069170ed457df3a2f41c7b4afd Author: Kern Handa <kerha@microsoft.com> Date: Thu May 12 07:46:35 2022 +0000 Merged PR 2578: Fixes time unit conversions in accera_gemm.py Also addresses comments for the previous rocblas_gemm PR commit 2e34b46db3bb524085dddc91250d6f95ec04ec02 Author: Kern Handa <kerha@microsoft.com> Date: Wed May 11 22:17:22 2022 +0000 Merged PR 2577: Fixes accera_gemm.py code after Plan.tensorize API change Fixes accera_gemm.py code after Plan.tensorize API change commit 754c2125c7a2a254552bd27e8030c4deab18e8ae Author: Kern Handa <kerha@microsoft.com> Date: Wed May 11 17:27:09 2022 +0000 Merged PR 2575: Adds library warmup to rocblas_gemm benchmarker Adds library warmup to rocblas_gemm benchmarker commit 01fed5a32d19c2ca2cf212cb73ff7244a1bcc94f Author: Kern Handa <kerha@microsoft.com> Date: Tue May 10 12:50:59 2022 +0000 Merged PR 2572: [nfc] Move accera/viz -> tools/viz [nfc] Move accera/viz -> tools/viz commit 1eaf20d24fab93e08709c4639acfd2aaa9a7f072 Author: Mason Remy <masonr@microsoft.com> Date: Tue May 10 09:09:58 2022 +0000 Merged PR 2573: Update setup.cfg hatlib dependency version Update setup.cfg hatlib dependency version commit e946c9f8c6b6bbd5263c48340231d51137fdd1fd Author: Kern Handa <kerha@microsoft.com> Date: Tue May 10 07:39:40 2022 +0000 Merged PR 2557: Overhauls the benchmarking tool This change moves the benchmarking tool to a top-level `tools/benchmarkers` directory. The tool has also been split up so that the accera portion is in its own file, while the driver portion of the tool remains intact and has gained the ability to run a rocblas gemm benchmarking utility. The aforementioned rocblas gemm benchmarking utility is also added in this change. `rocblas_gemm` is a new executable that is not built by default since it relies on the rocblas library, which may not be available everywhere. Once this tool has been explicitly built, it can be passed in as an argument to the benchmarker tool, which will use it to generate a comparison between accera's benchmark results and rocblas's. An example: ```sh <build accera like usual> ninja -C `git rev-parse --show-toplevel`/build/temp.linux-x86_64-3.8 rocblas_gemm cd tools/benchmarkers mkdir ~/accera_benchmarks ./gpu_benchmark_tool.py -i sgemm_bert_assorted.csv -t 'AMD MI100' -o ~/accera_benchmarks/results -r `git rev-parse --show-toplevel`/build/temp.linux-x86_64-3.8/tools/benchmarkers/rocblas/rocblas_gemm ``` Related work items: #3685 commit 6a41aa27c07bdacd1475bff0f830bfd4c6fd514b Author: Ritwik Das <ritdas@microsoft.com> Date: Tue May 10 03:32:50 2022 +0000 Merged PR 2569: Make tensorization passes configurable, remove dependency from split indices - Make the mfma type a required parameter for tensorize() - this only chooses the underlyting mfma op to use - Additionally, user can pass in the total number of passes (which defaults to 1) which needs to run instead of implicitly calculating a square tile. - Added documentation for the new enum type. - Added some tests - Current code does not work with K > M (still investigating this, but should not block this PR) Related work items: #3688 commit 9897093985ed70d9f62a4c980f03a37c94ae46d6 Author: Mason Remy <masonr@microsoft.com> Date: Tue May 10 02:32:45 2022 +0000 Merged PR 2567: Fix vectorized access of LAST_MAJOR arrays Fix vectorized access of LAST_MAJOR arrays - mlir::vector::LoadOp and mlir::vector::StoreOp only support unit strides on the minor dimension of the memref they access, so reinterpretcast the memref to a flat buffer to pass that check - add translation for reinterpretcastop - improve vectorization of LAST_MAJOR matrices in cache accesses by changing the traversal order of the cache region (when filling/reducing) based on the memory ordering of the outer array being acted on. commit 79d169d7ab8acaf989e19b6fb13cf960ed5f6260 Author: Lisa Ong <onglisa@microsoft.com> Date: Mon May 9 22:50:55 2022 +0000 Merged PR 2568: [Compliance] [nfc] Switch to Azure Container Registry for ROCm build agent commit 43d0883d6b353533dbe754092d4b34435d71fb2f Author: Ritwik Das <ritdas@microsoft.com> Date: Fri May 6 07:12:21 2022 +0000 Merged PR 2560: Make register allocation during tensorization tunable - Add controllable number of fused mfma passes - Add controllable scheduling policy of mfma ops - Add tests Related work items: #3687 commit 056e108c3e31a5fc1a2529690062c695a3613005 Author: Lisa Ong <onglisa@microsoft.com> Date: Fri May 6 05:11:23 2022 +0000 Merged PR 2565: [build] bump hatlib dependency to 0.0.13 hatlib 0.0.13 contains a fix to unblock ROCm buddy builds commit 5c3050fa5dcc64af16aa46fe882c25795a4ec9ac Author: Denny Sun <dennys@microsoft.com> Date: Thu May 5 06:08:05 2022 +0000 Merged PR 2563: Add a table of operators and code examples to the Parameters.md Update the Manuals with the supported operators and code examples. commit 5417fa787349a866adda36f3d7d8531c83e42699 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu May 5 01:43:25 2022 +0000 Merged PR 2562: [nfc] Add some macOS targets and synced Model.md * Re-generated Model.md to add missing models * Handle zero (unknown) vector_bytes cases in tests * Opportunistically added these models used during development: * 2016 macbook pro * M1 max commit 9672086febe0523785988a47922e44692ae18a00 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu May 5 00:09:18 2022 +0000 Merged PR 2561: [docs][nfc] Sync changes from Github remote, bump doc versions to 1.2.4 commit 7b67f68344fe3ecedf9e5a84a4ac9667d7eaea96 Author: Lisa Ong <onglisa@microsoft.com> Date: Wed May 4 18:26:56 2022 +0000 Merged PR 2558: [nfc] update requirements to latest version of six Fixes this warning: ``` <frozen importlib._bootstrap>:914: ImportWarning: _SixMetaPathImporter.find_spec() not found; falling back to find_module() ``` commit 6d3f21a5136c9122feee71bb9990f85970e3db98 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Tue May 3 19:03:44 2022 +0000 Merged PR 2559: Finer-granularity error reporting for python tests This PR modifies how the python tests are invoked, so that they can report pass/fail results per test. Hopefully that'll make it easier to pinpoint where things are failing during CI builds. commit b77a99ea7cce3a82a3b36272458895f47773b666 Author: Ritwik Das <ritdas@microsoft.com> Date: Sat Apr 30 01:57:09 2022 +0000 Merged PR 2556: [non-functional] Change ROCM code to generate gcn intrinsics when possible - Use amd gcn intrinsics when possible (threadIdx, blockIdx, barrier) - Add helpers which automatically check for runtime before emitting the proper code Related work items: #3698 commit ca0a6fe390900c46ea345ac03662c44827827ca5 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Apr 29 05:43:55 2022 +0000 Merged PR 2547: [non-functional] Change custom mfma types to Memref and some refactoring Make inital changes to remove custom mfma types Related work items: #3691 commit b8b9631601eb5eab88af035eca1e5ecdf27741ba Author: Denny Sun <dennys@microsoft.com> Date: Thu Apr 28 01:11:54 2022 +0000 Merged PR 2555: create_parameters(count: int) no longer needs count as an argument 1. Remove the count of parameters to be created from the DSL 2. Throw exception when users write the following code: create_parameters() 3. The correct way of calling create_parameters() is: p1, p2 , p3 ..., pN = create_parameters() commit 68eb52ad7cea75671916c49a13617d40f6488471 Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Apr 27 18:23:57 2022 +0000 Merged PR 2554: [doc] Updated some missing enums and fixed Case Study path commit 6654c774fb0d2d6fac760b911a547b4e66b23127 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Wed Apr 27 00:44:53 2022 +0000 Merged PR 2522: Generalize array indexing in tensorized GEMM This PR generalizes the MFMA tensorization pass to improve the handling of code in the innermost loop. It recognizes more ways of writing the GEMM kernel, and rejects many ill-formed GEMM kernels. There are also a number of tests. This PR doesn't yet generalize to batch-GEMM, where the matrices (typically) have 3 indices. Related work items: #3676 commit 4d030709101f3653712b805bd8f3698e0e293bd3 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 17:50:18 2022 +0000 Merged PR 2551: [nfc][ci] Switch hosted pipelines to 1ES hosted pool * The Linux1ESPool is created to support internal builds of LLVM * Fix regression in pipeline due to overzealous .dockerignore commit 9b9d6b4b77c46b12788665412b9d0d1c2ff62d18 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 10:43:28 2022 +0000 Merged PR 2550: [nfc] [docs] Merge changes from GitHub remote In preparation for merge from ADO to GitHub for Case Studies publishing commit c1298946d18fb785788c556ea2959b9438f9c6b7 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 26 08:10:47 2022 +0000 Merged PR 2549: [Compliance] Switching from Dockerhub to ACR for third party containers Updating Dockerfile references commit 0c7a3610ba082e82e554297bdadbf9579b094745 Author: Denny Sun <dennys@microsoft.com> Date: Tue Apr 26 04:40:05 2022 +0000 Merged PR 2548: Add README file for case studies README file has a table where each case study points to the external repo link. commit edbc50edd00efe8f12a675735d7e52371e43f7b1 Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Apr 25 23:49:15 2022 +0000 Merged PR 2546: [dev] [nfc] Natively support macOS/arm64 for development Limited to local development scenarios (LLVM_SETUP_VARIANT=Default) No plans to release pip packages until there is CI support Verified on: Big Sur (MacOSX 12.3 arm64) / Python 3.10 commit 166e333a3d10b77c804dc3edc1c71bfc5716c768 Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Apr 25 17:50:22 2022 +0000 Merged PR 2543: Add precomputed offset map optimization for tensorization (no caching) - Add flag to tensorize() to enable optimization (off by default) - Optimization only affects load/store of accumulator (C) argument - Supports all 4 mfma shapes Related work items: #3671 commit e11c4d4e87bbae87f7cb9035eff8e6af650c9d1a Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Sun Apr 24 01:00:41 2022 +0000 Merged PR 2542: An assortment of minor fixes This PR is a hodgepodge of tiny fixes. I'm happy to split it up into separate PRs if a kitchen-sink PR is too gross. The specific things are: - Add 2 new target models to `Targets.py` (that correspond to my local dev boxes) - Change the snapshot IR format for sub-passes to use the same format as the top-level passes (that is, not "generic" format) - Print a warning message if `check_correctness` skips a correctness check because no hat file was generated - Add a "minimum version" constraint to `requirements.txt` for `hatlib` commit 8da7903ac9b6d8612711593308e49a7a3e82678d Author: Kern Handa <kerha@microsoft.com> Date: Sat Apr 23 23:59:53 2022 +0000 Merged PR 2545: Unifies CUDA and CPP enum values to SOURCE for Package.Format Unifies CUDA and CPP enum values to SOURCE for Package.Format Related work items: #3679 commit fe2c40fa8f1c28dcf47e1533223457fd3e6bf195 Author: Kern Handa <kerha@microsoft.com> Date: Sat Apr 23 23:17:43 2022 +0000 Merged PR 2544: [nfc] Removes now unnecessary ldebug output [nfc] Removes now unnecessary ldebug output commit 32090d786ce13299bb77a6675c3478b3d7cdf48c Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 22 21:31:01 2022 +0000 Merged PR 2527: Enable vectorized shared memory write Enable vectorized shared memory write - This adds mod simplification support needed for vecotrizing shared memory writes - Also refactors some of the affine simplification code slightly to share some common code between the floordiv and mod simplifications Related work items: #3586, #3661, #3689 commit 0eb698af118b94bf3f4d4862a142c86055f8b7bb Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 22 19:13:27 2022 +0000 Merged PR 2526: Enable GPU global read vectorization Enable GPU global read vectorization - Implements a floor div simplification that enables better recognition of vectorizable load and stores Related work items: #3661, #3690 commit df849f066ff6c2c82c796d9b48e3bea6390c7877 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Fri Apr 22 06:03:27 2022 +0000 Merged PR 2541: Fix a few issues with GEMM benchmarking script This PR fixes a couple of errors: - there was a bug in the GEMM kernel - sometimes hatlib would fail to return a compiled function, but not throw an exception. These are now flagged as "uncompilable" It makes a couple of other tweaks: - it fails if the `alpha` and `beta` parameters aren't `1.0` and `0.0` - it culls some variants with known-uncompilable tensorization parameters before trying to compile them commit 339253767ae4bb4f7e5c323f77fc938ba1a4ab92 Author: Lisa Ong <onglisa@microsoft.com> Date: Fri Apr 22 01:26:53 2022 +0000 Merged PR 2538: Fix std::pair unpacking issue in TensorizeAffineForOpConversion In debug builds, we are getting garbage values for warpSizeX and warpSizeY, resulting in division by 0 errors in the emitted .cu files commit 075c83247d34bfd9fb291e4ea6b9df059a94993a Author: Denny Sun <dennys@microsoft.com> Date: Fri Apr 22 00:26:56 2022 +0000 Merged PR 2536: Parameter supports most of the arithmetic/binary/unary operations defined in operator lib Parameter supports the basic arithmetic operations (+, -, *, //, %), for example, the user can write the following code: fma_unit_count, vector_size = acc.create_parameters(2) jjj = schedule.split(jj, fma_unit_count * vector_size) jjjj = schedule.split(jjjj, vector_size) Related work items: #3692 commit 6d5e71899c6fb606e32ec46ee871ae1af25d3cd6 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 21 18:22:12 2022 +0000 Merged PR 2539: [nfc][docs] Merging commits from Github/main commit ee28126a338d905eb5931038d3c5daba6ead3811 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Wed Apr 20 21:35:20 2022 +0800 Update arrow label positions (#35) * [nfc] [doc] Update arrow label positions * make arrowhead more visible * nfc commit ddcecaaffd9dd0861999a6d29443dc7c37d79665 Author: Lisa Ong <11318241+lisaong@users.noreply.github.com> Date: Wed Apr 20 21:34:40 2022 +0800 demo fixes for hatlib 0.0.11 (#36) commit 9531a2eb4bb9edf9484d09a20a7b2fd74b73720c Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 21 09:55:57 2022 +0000 Merged PR 2535: [ci] Self-hosted Azure DevOps build agent for ROCm smoke tests * Docker image for self-hosted build agent on the ROCm development machine * Pipeline will front-load the Python ROCm tests so that we fail faster * The agent runs ROCm 5.1.1 (the current latest). We can build/launch different containers for different versions if needed. * CUDA_VISIBLE_DEVICES = 0 by default. This can be overwritten at pipeline scheduling time. * The pipeline currently fails in the ROCm Python tests, so it does not block completion of the PR. * Included some fixes that are not related to ROCm but generally needed to run on systems whose CPU names are resolved (e.g. "zen2"), i.e. the build agent itself. Related work items: #3682 commit 49f176ad8f2f56a20b3028e9c1648b0518b71bd4 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 21 07:27:28 2022 +0000 Merged PR 2537: [Compliance] Make dependency on ffmpeg optional ffmpeg-python is only needed for video export from the Iteration Visualizer Tool Removing the hard dependency from the tool. commit 8519558ff63c2142ad1cb6ab3ebcfec556416432 Author: Mason Remy <masonr@microsoft.com> Date: Thu Apr 21 01:06:25 2022 +0000 Merged PR 2525: Fix vectorization plumbing for GPU scenarios Fix vectorization plumbing for GPU scenarios Related work items: #3661 commit 793343a838492fba63b344dbe0c3c147721b11da Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 21 00:09:09 2022 +0000 Merged PR 2531: [nfc][docs] Merging weekly commits from Github/main commit d75d4a6b9cec2ccf90bdf27911d843be1833bc8d Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Mon Apr 18 20:15:49 2022 +0500 Refactoring of functions docs in reference files (#34) * complete refactoring of safety analysis * minor tweaks * rebasing and minor tweak * Update create_parameter_grid.md Co-authored-by: Lisa Ong <11318241+lisaong@users.noreply.github.com> commit fe880fb269cefb5af774b7085b0e4c1a95692630 Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Mon Apr 18 19:12:46 2022 +0500 Complete refactoring of safety analysis (#33) * minor fixes and ensure that all links are working * complete refactoring of safety analysis * Update accera.md * Update safety_analysis.md Co-authored-by: Lisa Ong <11318241+lisaong@users.noreply.github.com> commit d21918b8b366c369d63a507ede696236cbbd8dc6 Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Mon Apr 18 16:22:41 2022 +0500 Refactoring of Accera.md from reference docs (#32) * minor fixes and ensure that all links are working * Update accera.md Co-authored-by: Lisa Ong <11318241+lisaong@users.noreply.github.com> commit 4dc0ce9ee841db837350f9288c821257df53acc3 Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Mon Apr 18 07:26:00 2022 +0500 Docs refactoring tutorials optimized matmul (#31) * minor tweeks * initial fixes * complete the file with minor tweeks and grammatical fixes * did grammatical fixes, rephrasing, and ensure conciseness * Update Optimized_MatMul.md * Update Optimized_MatMul.md Co-authored-by: Lisa Ong <11318241+lisaong@users.noreply.github.com> commit ae92be4e67b45b60dd37921974387bc8dd34088e Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Mon Apr 18 07:07:20 2022 +0500 Docs refactoring tutorials hello matmul gpu (#30) * minor tweeks * initial fixes * complete the file with minor tweeks and grammatical fixes * Addressed the provided feedback commit 543ea83e8272923b2e44c363bc376a50131622d5 Author: Kern Handa <kerha@microsoft.com> Date: Wed Apr 20 21:05:53 2022 +0000 Merged PR 2530: Adds initial GPU benchmarking infrastructure Related work items: #3685 commit 6cf59ed41eada571019a71e1af11a887d39a7aad Author: Mason Remy <masonr@microsoft.com> Date: Wed Apr 20 20:17:49 2022 +0000 Merged PR 2524: [nfc] Refactor RangeValue utilities to separate file [nfc] Refactor RangeValue utilities to separate file Related work items: #3661 commit fe78918d25f2925053a37b13ee8c71d7f111b32f Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Apr 20 07:44:57 2022 +0000 Merged PR 2532: [prog] Fallback to known TargetDevice names for looking up the LLVM triple Resolves the issue where the CPU type is resolved (e.g. "zen2"), but does not match anything in the known triples list in TargetDevice.cpp Future work can consider lifting the TargetDevice.cpp list to the Python layer commit 734dd15193c571bef2d0ce3b62f24c29778f01d8 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 19 23:46:54 2022 +0000 Merged PR 2523: [nfc][docs] Incorporate generated visualizations from Iteration Space Visualizer * Add Alex's visualization tool to our tree * Updated Schedule documentation and examples to align with existing visualizations * Moved logos to subfolder under assets TODO: Add Fusing visualizations in a subsequent PR commit b48296b39a6ef84b3dd3220624fa1c681b98caf0 Author: Kern Handa <kerha@microsoft.com> Date: Sun Apr 17 18:33:10 2022 +0000 Merged PR 2521: Updates formatting of the unknown HOST warning message Updates formatting of the unknown HOST warning message commit 8d487fa45d49c2379a4899810716aa3dcde2eb46 Author: Kern Handa <kerha@microsoft.com> Date: Fri Apr 15 09:47:42 2022 +0000 Merged PR 2514: Makes module compilation resist func compilation fails Makes module compilation resist func compilation fails commit 6bcbd1892edba8bcd5c0c82d7cbde57ff5896c0b Author: Denny Sun <dennys@microsoft.com> Date: Thu Apr 14 00:24:40 2022 +0000 Merged PR 2517: Get the known device for host machine and give a warning if the host is an unknown device When it is a host target, we call cpuinfo to query cpu model from the host machine, then use regex to match with the model names in known devices, we will use the configs in known devices if matched, or else we will use some default configs to generate code for the host target and give our users a warning about the potential suboptimum code. Related work items: #3546 commit 41166ffd6b012464dd70eb021efba0dc1485fe0f Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Apr 13 18:40:23 2022 +0000 Merged PR 2519: Merging changes from Github remote commit ee8ad1ed7b7911109d76a40fb3990a419de05fe5 Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Tue Apr 12 16:16:38 2022 +0500 Revise Pi3_Cross_Compilation.md (28) commit aa1b9672d1d76fe1f1493959c8cced6b89e1b0a0 Author: Arslan-e-Mustafa <70168134+Arslan-e-Mustafa@users.noreply.github.com> Date: Fri Apr 8 17:51:34 2022 +0500 Docs refactoring install (27) commit 77f4ae34c3cf1dd506bbe6bd148577c670d4ec53 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Tue Apr 12 20:12:52 2022 +0000 Merged PR 2513: Removed inaccurate warp size computation for Vulkan targets The previous barrier optimization PR added so inaccurate code to `util::resolveWarpSize()` for Vulkan targets. This PR removes that, and fixes up some tests that depended on it. commit 9fa84695b15c73a403649c269a4b71c31c2375d5 Author: Ritwik Das <ritdas@microsoft.com> Date: Tue Apr 12 17:59:15 2022 +0000 Merged PR 2516: Add fp16 support for mfma in the DSL (+tests) - Add support for fp16 input and fp32 output - Support fp16 input and output - Clean up some tests Related work items: #3670 commit 79594afb12332e368e8f156393461838c8592a1e Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Apr 8 05:42:24 2022 +0000 Merged PR 2510: Add different mfma tile sizes for FP32 - Fix couple of offset bugs - Add multi-block tile sizes - Add unit tests Related work items: #3666 commit 2046b0529f9aac47c0a7ea50467a1b08a36dc5bb Author: Mason Remy <masonr@microsoft.com> Date: Fri Apr 8 02:26:37 2022 +0000 Merged PR 2511: Enable smoke test GPU matmul correctness checks Enable smoke test GPU matmul correctness checks - Also fix some FP16 scenarios - Add some more Accera <-> numpy mapping utilities commit 40ad2ee4bfb1dc44ec028baeef5908204f375c30 Author: Mason Remy <masonr@microsoft.com> Date: Thu Apr 7 18:37:31 2022 +0000 Merged PR 2502: Support different input array layouts for GPU caching Support different input array layouts for GPU caching This change mainly configures the thread assignments in order to get coalesced global memory access. The logical accessing should have already been correct, this is primarily for performance. Related work items: #3660 commit 3a62c46308be00ae70d28e3f05df9a60b4999021 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Thu Apr 7 16:32:34 2022 +0000 Merged PR 2487: Barrier optimization, part 2 This PR improves the previous barrier optimization code. It now works with non-straight-line code (if/else constructs and loops). It doesn't yet do the "move barriers outside of loops" optimization. For debugging, there's an option to output a graphviz dot file showing the graph of relevant instructions that are used during the optimization: ``` acc-opt ... --barrier-opt-dot --barrier-opt-dot-filename="barrier.dot" ``` Related work items: #3649 commit a306c54ce09c8257d791a7927da8ac9f80ddafa4 Author: Lisa Ong <onglisa@microsoft.com> Date: Thu Apr 7 00:59:58 2022 +0000 Merged PR 2509: [nfc] sync quickstart demo from GitHub/demo branch Use a subset of MLAS optimizations that are sufficient to show a 3x improvement over the default schedule. This version was already in the Github repo for some time. commit 7af0adfe88b4db022c9a7c8cfe712181ab3c4df5 Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Apr 6 22:34:28 2022 +0000 Merged PR 2508: [release] Bump docs version to 1.2.3 In preparation for a PyPI release to facilitate community contributions for case studies Synced doc editorials from public Github repo commit 958d64e74ed02a08462d2316c488b21dc51804cd Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Apr 6 09:32:27 2022 +0000 Merged PR 2503: [prog] Support unsigned integer types in the DSL * Add ScalarType.uint8/16/32/64 support * Use UnrealizedConversionCastOps to convert these unsigned ints to signless ints * Refactored CastImpl now that we have to handle both unsigned and signless cases for casts to/from ints * Use a tuple of (mlir Type, llvm Type) to infer the C type when writing function declarations in the HAT file. The former holds sign-ness information, the latter determines the C type (e.g. pointer or not) * Simplified CheckAllClose function to reduce unnecessary casting * Doc updates * Fixed HAT file issues with ScalarType.bool Note: Pipelines will fail until the next release of hatlib (https://github.com/microsoft/hat/pull/37) Related work items: #3520 commit 510f2f59eef1de8c6a4872c3bb5a6ff147da2f5f Author: Kern Handa <kerha@microsoft.com> Date: Wed Apr 6 01:58:48 2022 +0000 Merged PR 2507: Updates acc-translate output for ROCm 5.1 commit adcc6f0888fa111d76049e90e9d77168e0a47c68 Author: Denny Sun <dennys@microsoft.com> Date: Wed Apr 6 00:48:02 2022 +0000 Merged PR 2437: Add more known targets(from our team's devices) The new list covers the following cpus, these cpus are being used by our devs, Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz 2.11 GHz Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz Related work items: #3546 commit c2d5af44077a6344eb1cc595cd4bcf137f77164c Author: Kern Handa <kerha@microsoft.com> Date: Tue Apr 5 07:32:47 2022 +0000 Merged PR 2505: [nfc] Rename parameters for schedule.tile and plan.bind [nfc] Rename parameters for schedule.tile and plan.bind commit d61827f078676de5d660ba819f7d30d2896cb449 Author: Kern Handa <kerha@microsoft.com> Date: Tue Apr 5 06:00:32 2022 +0000 Merged PR 2501: Adds support for more than one GPU function per package Adds support for more than one GPU function per package Related work items: #3686 commit 2d2605a3b90ffc70fc9b078315237bbff51b1273 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Apr 5 05:16:10 2022 +0000 Merged PR 2504: [docs] Update stale versions in Reference docs Fixing while considering better approaches.... commit 7fd0d8f0412343d0dd82a8817775bf853a700074 Author: Kern Handa <kerha@microsoft.com> Date: Tue Apr 5 00:53:42 2022 +0000 Merged PR 2499: Updates the syntax for schedule.tile Updates the syntax for schedule.tile commit e221cad3ffabe4c31f7c92e9bc0e112002581b1a Author: Kern Handa <kerha@microsoft.com> Date: Mon Apr 4 23:21:14 2022 +0000 Merged PR 2498: Updates the syntax for plan.bind Updates the syntax for plan.bind Re…

demo fixes for hatlib 0.0.11

adc70ce

lisaong merged commit 31b8ff7 into main Apr 20, 2022

lisaong deleted the dev/onglisa/demo_hatlib_0.0.11 branch April 20, 2022 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo fixes for hatlib 0.0.11 #36

Demo fixes for hatlib 0.0.11 #36

lisaong commented Apr 20, 2022

Demo fixes for hatlib 0.0.11 #36

Demo fixes for hatlib 0.0.11 #36

Conversation

lisaong commented Apr 20, 2022

What does your PR fix?

Is this a documentation-only fix?