v1.2.10
What's Changed
- Update ci.yml to fix path changes by @lisaong in #49
- Add unrolled convolution case study link by @marina-neseem in #50
- Bump protobuf from 3.20.1 to 3.20.2 in /accera/onnx-emitter/test by @dependabot in #51
-
Merged PR 2886: [release] Bump docs to 1.2.10, sync GH to ADO. [Lisa
Ong]-
Bulk docs version update
-
Bump protobuf from 3.20.1 to 3.20.2 in /accera/onnx-emitter/test (d1b87ec)
-
Also fixing a minor docs bug (errant backtick)
-
-
Merged PR 2884: Add DSL test for runtime size correctness. [Denny Sun]
-
Merged PR 2878: Optimize warp id calculation by forcing scalar
registers. [Ritwik Das]- ROCM: use __builtin_amdgcn_readfirstlane to force scalar reg usage
- CUDA: don't use anything special since __shfl_sync seems to generate slower code
-
Merged PR 2885: Updates python dependencies. [Kern Handa]
Updates hatlib version
-
Merged PR 2881: Fix the runtime crash caused by incorrectly generated
LLVM IR. [Denny Sun]- Call the specific version of LLVM type converter for dynamic memory
- Create MemRefDescriptor from dynamic memory shape by associating the arrays with correct size arguments
With this change, the following DSL test can succeed and pass correctness check.
M = Dimension() N = Dimension() K = Dimension() A = Array(shape=(M, K), element_type=ScalarType.float32, role=Array.Role.INPUT) B = Array(shape=(K, N), element_type=ScalarType.float32, role=Array.Role.INPUT) C = Array(shape=(M, N), element_type=ScalarType.float32, role=Array.Role.INPUT_OUTPUT) @nest.iteration_logic def _(): C[i, j] += A[i, k] * B[k, j] M_test = np.int64(64) N_test = np.int64(128) K_test = np.int64(32) A_test = np.random.random((M_test, K_test)).astype(np.float32) B_test = np.random.random((K_test, N_test)).astype(np.float32) C_test = np.random.random((M_test, N_test)).astype(np.float32) correctness_check_values = { "pre": [M_test, N_test, K_test, A_test, B_test, C_test], "post": [M_test, N_test, K_test, A_test, B_test, C_test + A_test @ B_test], } function = package.add(nest, args=(M, N, K, A, B, C), base_name="runtimesizes") with verifiers.VerifyPackage(self, "test_runtimesizes", TEST_PACKAGE_DIR) as v: package.build("test_runtimesizes", format=TEST_FORMAT | Package.Format.MLIR_VERBOSE, mode=TEST_MODE, output_dir=TEST_PACKAGE_DIR) if correctness_check_values: v.check_correctness( function.name, before=correctness_check_values["pre"], after=correctness_check_values["post"], ) -
Merged PR 2879: Fix exception in GPU baseline benchmark. [Ritwik Das]
Fix exception in GPU baseline benchmark
-
Merged PR 2856: Enable output caching in ROCM for all MMA shapes.
[Ritwik Das] -
Merged PR 2876: Introduce warp bindings in CUDA. [Ritwik Das]
- Bind indices to WARP_X/Y along with tensorization (exclusively from thread id mapping)
- warp x dim is always a multiple of warp size in the x dimension. e.g. if for dividing a 64x64 block tile into 4 subtiles of 32x32 each where each subtile is computed by a single warp then the blockDim would be (64,2,1).
- This is required since with tensorization we would want block dims to be generated in a specific way than without it. Calculating offsets within the matrix based on warps is non-trivial if not impossible with just thread bindings.
Related work items: #3726
-
Merged PR 2874: Add unrolled convolution case study link (#50) [Lisa
Ong]Add unrolled convolution case study link (#50)
- Update README.md
Add unrolled convolution case study reference link
- Update the reference link
Update the reference according to latest updates in the case study
-
Merged PR 2873: Convert function signature from dynamic memref type to
llvm type. [Denny Sun]With this change, Accera is able to write the correct function signature of dynamic memref type to HAT file
-
Merged PR 2871: Update hatlib version. [Denny Sun]
from 0.0.23 to 0.0.25
-
Merged PR 2870: Filter benchmark kernels based on scheduling policy.
[Ritwik Das]Filter benchmark kernels based on scheduling policy
-
Merged PR 2867: [build][github] Update test path in github actions.
[Lisa Ong]Fixes https://github.com/microsoft/Accera/actions/runs/3071905923
Full Changelog: v1.2.9...v1.2.10