v1.2.3
What's Changed
- Docs refactoring manual fusing by @Arslan-e-Mustafa in #26
-
Merged PR 2508: [release] Bump docs version to 1.2.3. [Lisa Ong]
In preparation for a PyPI release to facilitate community contributions for case studies
Synced doc editorials from public Github repo
-
Merged PR 2503: [prog] Support unsigned integer types in the DSL.
[Lisa Ong]- Add ScalarType.uint8/16/32/64 support
- Use UnrealizedConversionCastOps to convert these unsigned ints to signless ints
- Refactored CastImpl now that we have to handle both unsigned and signless cases for casts to/from ints
- Use a tuple of (mlir Type, llvm Type) to infer the C type when writing function declarations in the HAT file. The former holds sign-ness information, the latter determines the C type (e.g. pointer or not)
- Simplified CheckAllClose function to reduce unnecessary casting
- Doc updates
- Fixed HAT file issues with ScalarType.bool
-
Merged PR 2507: Updates acc-translate output for ROCm 5.1. [Kern
Handa] -
Merged PR 2437: Add more known targets(from our team's devices) [Denny
Sun]The new list covers the following cpus, these cpus are being used by our devs,
Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz
11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz 2.11 GHz
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHzRelated work items: #3546
-
Merged PR 2505: [nfc] Rename parameters for schedule.tile and
plan.bind. [Kern Handa][nfc] Rename parameters for schedule.tile and plan.bind
-
Merged PR 2501: Adds support for more than one GPU function per
package. [Kern Handa]Adds support for more than one GPU function per package
Related work items: #3686
-
Merged PR 2504: [docs] Update stale versions in Reference docs. [Lisa
Ong]Fixing while considering better approaches....
-
Merged PR 2499: Updates the syntax for schedule.tile. [Kern Handa]
Updates the syntax for schedule.tile
-
Merged PR 2498: Updates the syntax for plan.bind. [Kern Handa]
Updates the syntax for plan.bind
Related work items: #3678
-
Merged PR 2500: Adds support for specifying index bitwidth for acc-
translate. [Kern Handa]Adds support for specifying index bitwidth for acc-translate
Story #3669
Related work items: #3669
-
Merged PR 2490: Restore CMake Export. [Abdul Dakkak]
Restore the CMake Export feature as it is used by argo-experiments. Note that you cannot use this feature if you are using the vcpkg llvm build
-
Merged PR 2497: Fix vectorization plumbing to correctly handle zero
vectorization budget cases in cache reduce ops. [Mason Remy]Fix vectorization plumbing to correctly handle zero vectorization budget cases in cache reduce ops
-
Merged PR 2496: [nfc] Switch docs versioning to bump2version, replace
VERSION with simple git tag-based version. [Lisa Ong]- Populate ACCERA_VERSION from the latest git tag
- bump2version is now configured for the docs/ tree
-
Merged PR 2495: [test] Import break with python -m unittest discover.
[Lisa Ong]python -m unittest discover accera/test *.pywill interrogate verifiers.py and fail because of the relative import -
Merged PR 2492: Updates test verifier code to match hatlib API
changes. [Kern Handa]Updates test verifier code to match hatlib API changes
-
Merged PR 2488: Simplify RangeValue analysis. [Abdul Dakkak]
Uses LLVM's ConstantRange instead of implementing our own to delete a lot of code
-
Merged PR 2489: add missing type_traits include. [Abdul Dakkak]
add missing type_traits include
-
Merged PR 2482: Fix parameterized caches producing multiple caches
erroneously. [Mason Remy]Fix parameterized caches producing multiple caches erroneously
- This is more of a one-off fix. A more generalized fix for resetting
schedules/plans for different parameter value resolution should be
implemented down the road
- This is more of a one-off fix. A more generalized fix for resetting
-
Merged PR 2479: FP16 tensorization for ROCM. [Abdul Dakkak]
-
Merged PR 2472: Tensorization + Caching. [Abdul Dakkak]
-
Merged PR 2485: Add another keyword to function's auxiliary table.
[Denny Sun]Add 'parameters' keyword to the parameter values in a function's auxiliary table, then the table will look like:
[functions.matmul_256_256_256_bdec0fac.auxiliary.accera]
[functions.matmul_256_256_256_bdec0fac.auxiliary.accera.parameters]
p_m_split_size = 16
p_n_split_size = 128
p_s_split_size = 256
p_s_split_2_size = 8
p_n_split_2_size = 16
p_n_split_3_size = 4Related work items: #3662
-
Merged PR 2484: [Pipelines] Enable uploads to PyPI when tagging a
release. [Lisa Ong]Configurable service connection variable, allows setting of test and production PyPI service connections during scheduling.
Also cleaned up a stale workaround for auditwheel in the ManyLinux pipeline.
-
Merged PR 2471: Fix to caching. [Abdul Dakkak]
This avoids the aggressive cache deletion specifically when it occurs within loop. This is a temporary fix, and a more elegant one is to handle memory access info across loop boundaries.
-
Merged PR 2476: Add accera.create_parameter_grid() with self-defined
filter and sample as arguments. [Denny Sun]Provide a generic function in DSL for users to create the parameters list from a dictionary(grid), self define a filter function to remove invalid parameter values and limit the number of parameter grid as well as the number of functions generated.
We find out the requirement for this function when updating our matmul grid search case study.
Related work items: #3662
-
Merged PR 2483: [Test] Integrate FileCheck into Python tests. [Lisa
Ong]- Added FileCheck utility to the accera-llvm package
- Can be run on any output file produced by the Package.build process, e.g. .cu, .mlir
- Support some basic directives
- Added examples for caching and rocm validation
Example error spew:
/root/Accera/build/lib.linux-x86_64-3.9/accera/bin/FileCheck /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu.filecheck --input-file /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu.filecheck:2:16: error: CHECK-COUNT: expected string not found in input (4 out of 4) CHECK-COUNT-4: for (int64_t idx{{[0-9]}} = 0; idx{{[0-9]}} < 16; idx{{[0-9]}} += 1) { ^ /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu:42:47: note: scanning from here for (int64_t idx2 = 0; idx2 < 16; idx2 += 1) { ^ Input file: /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu Check file: /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu.filecheck -dump-input=help explains the following input dump. Input was: <<<<<< . . . 37: 38: 39: extern "C" __global__ __launch_bounds__(1) void test_rocm_gemm_tiled_output_710d7d7d2ca9ca9e__gpu__(float *arg0, float *arg1, float *arg2) { 40: for (int64_t idx0 = 0; idx0 < 16; idx0 += 1) { 41: for (int64_t idx1 = 0; idx1 < 16; idx1 += 1) { 42: for (int64_t idx2 = 0; idx2 < 16; idx2 += 1) { count:2 X error: no match found 43: /*%0 = memref.load %arg0[%arg3, %arg5] : memref<16x16xf32, affine_map<(d0, d1) -> (d0 * 16 + d1)>>*/ count:2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 44: const auto arg0_offset0 = affine_map_func_0_i0(idx0, idx2); count:2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45: float var3 = ((float*)arg0)[arg0_offset0]; count:2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 46: /*%1 = memref.load %arg1[%arg5, %arg4] : memref<16x16xf32, affine_map<(d0, d1) -> (d0 * 16 + d1)>>*/ count:2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 47: const auto arg1_offset1 = affine_map_func_0_i0(idx2, idx1); count:2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . . >>>>>> -
Merged PR 2480: Clean up cache vectorization argument plumbing. [Mason
Remy]Clean up cache vectorization argument plumbing
-
Merged PR 2481: Enables verification for ROCm smoke tests. [Kern
Handa] -
Merged PR 2473: Extends range analysis by adding support for
udiv,sdiv,urem,srem. [Abdul Dakkak]these come up during code gen
-
Merged PR 2474: Add vectorize arg to plan.cache. [Mason Remy]
Add vectorize arg to plan.cache
- Enables specifying whether or not to vectorize ops for a given cache,
including an "AUTO" option, which will behave how caching
vectorization has behaved in the past, where it vectorizes the cache
if any loop in the loopnest is also vectorized - Also fix some include paths
- Enables specifying whether or not to vectorize ops for a given cache,
Full Changelog: v1.2.2...v1.2.3