Skip to content

v1.2.10

Choose a tag to compare

@lisaong lisaong released this 29 Sep 01:33
· 24 commits to main since this release

What's Changed

  • Merged PR 2886: [release] Bump docs to 1.2.10, sync GH to ADO. [Lisa
    Ong]

    • Bulk docs version update

    • Bump protobuf from 3.20.1 to 3.20.2 in /accera/onnx-emitter/test (d1b87ec)

    • Also fixing a minor docs bug (errant backtick)

  • Merged PR 2884: Add DSL test for runtime size correctness. [Denny Sun]

  • Merged PR 2878: Optimize warp id calculation by forcing scalar
    registers. [Ritwik Das]

    • ROCM: use __builtin_amdgcn_readfirstlane to force scalar reg usage
    • CUDA: don't use anything special since __shfl_sync seems to generate slower code
  • Merged PR 2885: Updates python dependencies. [Kern Handa]

    Updates hatlib version

  • Merged PR 2881: Fix the runtime crash caused by incorrectly generated
    LLVM IR. [Denny Sun]

    1. Call the specific version of LLVM type converter for dynamic memory
    2. Create MemRefDescriptor from dynamic memory shape by associating the arrays with correct size arguments

    With this change, the following DSL test can succeed and pass correctness check.

            M = Dimension()
            N = Dimension()
            K = Dimension()
    
            A = Array(shape=(M, K), element_type=ScalarType.float32,
                role=Array.Role.INPUT)
    
            B = Array(shape=(K, N), element_type=ScalarType.float32,
                role=Array.Role.INPUT)
    
            C = Array(shape=(M, N),
                        element_type=ScalarType.float32,
                        role=Array.Role.INPUT_OUTPUT)
    
            @nest.iteration_logic
            def _():
                C[i, j] += A[i, k] * B[k, j]
    
            M_test = np.int64(64)
            N_test = np.int64(128)
            K_test = np.int64(32)
            A_test = np.random.random((M_test, K_test)).astype(np.float32)
            B_test = np.random.random((K_test, N_test)).astype(np.float32)
            C_test = np.random.random((M_test, N_test)).astype(np.float32)
    
            correctness_check_values = {
                "pre": [M_test, N_test, K_test, A_test, B_test, C_test],
                "post": [M_test, N_test, K_test, A_test, B_test, C_test + A_test @ B_test],
            }
    
            function = package.add(nest, args=(M, N, K, A, B, C), base_name="runtimesizes")
    
            with verifiers.VerifyPackage(self, "test_runtimesizes", TEST_PACKAGE_DIR) as v:
                package.build("test_runtimesizes", format=TEST_FORMAT | Package.Format.MLIR_VERBOSE, mode=TEST_MODE, output_dir=TEST_PACKAGE_DIR)
                if correctness_check_values:
                    v.check_correctness(
                        function.name,
                        before=correctness_check_values["pre"],
                        after=correctness_check_values["post"],
                    )
    
  • Merged PR 2879: Fix exception in GPU baseline benchmark. [Ritwik Das]

    Fix exception in GPU baseline benchmark

  • Merged PR 2856: Enable output caching in ROCM for all MMA shapes.
    [Ritwik Das]

  • Merged PR 2876: Introduce warp bindings in CUDA. [Ritwik Das]

    • Bind indices to WARP_X/Y along with tensorization (exclusively from thread id mapping)
    • warp x dim is always a multiple of warp size in the x dimension. e.g. if for dividing a 64x64 block tile into 4 subtiles of 32x32 each where each subtile is computed by a single warp then the blockDim would be (64,2,1).
    • This is required since with tensorization we would want block dims to be generated in a specific way than without it. Calculating offsets within the matrix based on warps is non-trivial if not impossible with just thread bindings.

    Related work items: #3726

  • Merged PR 2874: Add unrolled convolution case study link (#50) [Lisa
    Ong]

    Add unrolled convolution case study link (#50)

    • Update README.md

    Add unrolled convolution case study reference link

    • Update the reference link

    Update the reference according to latest updates in the case study

  • Merged PR 2873: Convert function signature from dynamic memref type to
    llvm type. [Denny Sun]

    With this change, Accera is able to write the correct function signature of dynamic memref type to HAT file

  • Merged PR 2871: Update hatlib version. [Denny Sun]

    from 0.0.23 to 0.0.25

  • Merged PR 2870: Filter benchmark kernels based on scheduling policy.
    [Ritwik Das]

    Filter benchmark kernels based on scheduling policy

  • Merged PR 2867: [build][github] Update test path in github actions.
    [Lisa Ong]

    Fixes https://github.com/microsoft/Accera/actions/runs/3071905923

Full Changelog: v1.2.9...v1.2.10