Skip to content

v1.2.3

Choose a tag to compare

@lisaong lisaong released this 07 Apr 00:11
· 69 commits to main since this release

What's Changed

  • Merged PR 2508: [release] Bump docs version to 1.2.3. [Lisa Ong]

    In preparation for a PyPI release to facilitate community contributions for case studies

    Synced doc editorials from public Github repo

  • Merged PR 2503: [prog] Support unsigned integer types in the DSL.
    [Lisa Ong]

    • Add ScalarType.uint8/16/32/64 support
    • Use UnrealizedConversionCastOps to convert these unsigned ints to signless ints
    • Refactored CastImpl now that we have to handle both unsigned and signless cases for casts to/from ints
    • Use a tuple of (mlir Type, llvm Type) to infer the C type when writing function declarations in the HAT file. The former holds sign-ness information, the latter determines the C type (e.g. pointer or not)
    • Simplified CheckAllClose function to reduce unnecessary casting
    • Doc updates
    • Fixed HAT file issues with ScalarType.bool
  • Merged PR 2507: Updates acc-translate output for ROCm 5.1. [Kern
    Handa]

  • Merged PR 2437: Add more known targets(from our team's devices) [Denny
    Sun]

    The new list covers the following cpus, these cpus are being used by our devs,
    Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz
    11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
    Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz 2.11 GHz
    Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
    Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz

    Related work items: #3546

  • Merged PR 2505: [nfc] Rename parameters for schedule.tile and
    plan.bind. [Kern Handa]

    [nfc] Rename parameters for schedule.tile and plan.bind

  • Merged PR 2501: Adds support for more than one GPU function per
    package. [Kern Handa]

    Adds support for more than one GPU function per package

    Related work items: #3686

  • Merged PR 2504: [docs] Update stale versions in Reference docs. [Lisa
    Ong]

    Fixing while considering better approaches....

  • Merged PR 2499: Updates the syntax for schedule.tile. [Kern Handa]

    Updates the syntax for schedule.tile

  • Merged PR 2498: Updates the syntax for plan.bind. [Kern Handa]

    Updates the syntax for plan.bind

    Related work items: #3678

  • Merged PR 2500: Adds support for specifying index bitwidth for acc-
    translate. [Kern Handa]

    Adds support for specifying index bitwidth for acc-translate

    Story #3669

    Related work items: #3669

  • Merged PR 2490: Restore CMake Export. [Abdul Dakkak]

    Restore the CMake Export feature as it is used by argo-experiments. Note that you cannot use this feature if you are using the vcpkg llvm build

  • Merged PR 2497: Fix vectorization plumbing to correctly handle zero
    vectorization budget cases in cache reduce ops. [Mason Remy]

    Fix vectorization plumbing to correctly handle zero vectorization budget cases in cache reduce ops

  • Merged PR 2496: [nfc] Switch docs versioning to bump2version, replace
    VERSION with simple git tag-based version. [Lisa Ong]

    • Populate ACCERA_VERSION from the latest git tag
    • bump2version is now configured for the docs/ tree
  • Merged PR 2495: [test] Import break with python -m unittest discover.
    [Lisa Ong]

    python -m unittest discover accera/test *.py will interrogate verifiers.py and fail because of the relative import

  • Merged PR 2492: Updates test verifier code to match hatlib API
    changes. [Kern Handa]

    Updates test verifier code to match hatlib API changes

  • Merged PR 2488: Simplify RangeValue analysis. [Abdul Dakkak]

    Uses LLVM's ConstantRange instead of implementing our own to delete a lot of code

  • Merged PR 2489: add missing type_traits include. [Abdul Dakkak]

    add missing type_traits include

  • Merged PR 2482: Fix parameterized caches producing multiple caches
    erroneously. [Mason Remy]

    Fix parameterized caches producing multiple caches erroneously

    • This is more of a one-off fix. A more generalized fix for resetting
      schedules/plans for different parameter value resolution should be
      implemented down the road
  • Merged PR 2479: FP16 tensorization for ROCM. [Abdul Dakkak]

  • Merged PR 2472: Tensorization + Caching. [Abdul Dakkak]

  • Merged PR 2485: Add another keyword to function's auxiliary table.
    [Denny Sun]

    Add 'parameters' keyword to the parameter values in a function's auxiliary table​, then the table will look like:

    [functions.matmul_256_256_256_bdec0fac.auxiliary.accera]​
    [functions.matmul_256_256_256_bdec0fac.auxiliary.accera.parameters]​
    p_m_split_size = 16​
    p_n_split_size = 128​
    p_s_split_size = 256​
    p_s_split_2_size = 8​
    p_n_split_2_size = 16​
    p_n_split_3_size = 4

    Related work items: #3662

  • Merged PR 2484: [Pipelines] Enable uploads to PyPI when tagging a
    release. [Lisa Ong]

    Configurable service connection variable, allows setting of test and production PyPI service connections during scheduling.

    Also cleaned up a stale workaround for auditwheel in the ManyLinux pipeline.

  • Merged PR 2471: Fix to caching. [Abdul Dakkak]

    This avoids the aggressive cache deletion specifically when it occurs within loop. This is a temporary fix, and a more elegant one is to handle memory access info across loop boundaries.

  • Merged PR 2476: Add accera.create_parameter_grid() with self-defined
    filter and sample as arguments. [Denny Sun]

    Provide a generic function in DSL for users to create the parameters list from a dictionary(grid), self define a filter function to remove invalid parameter values and limit the number of parameter grid as well as the number of functions generated.

    We find out the requirement for this function when updating our matmul grid search case study.

    Related work items: #3662

  • Merged PR 2483: [Test] Integrate FileCheck into Python tests. [Lisa
    Ong]

    • Added FileCheck utility to the accera-llvm package
    • Can be run on any output file produced by the Package.build process, e.g. .cu, .mlir
    • Support some basic directives
    • Added examples for caching and rocm validation

    Example error spew:

    /root/Accera/build/lib.linux-x86_64-3.9/accera/bin/FileCheck /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu.filecheck --input-file /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu
    
    /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu.filecheck:2:16: error: CHECK-COUNT: expected string not found in input (4 out of 4)
    CHECK-COUNT-4: for (int64_t idx{{[0-9]}} = 0; idx{{[0-9]}} < 16; idx{{[0-9]}} += 1) {
                   ^
    /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu:42:47: note: scanning from here
    for (int64_t idx2 = 0; idx2 < 16; idx2 += 1) {
                                                  ^
    
    Input file: /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu
    Check file: /root/Accera/build/lib.linux-x86_64-3.9/test_acccgen/test_rocm_gemm_tiled_output/test_rocm_gemm_tiled_output.cu.filecheck
    
    -dump-input=help explains the following input dump.
    
    Input was:
    <<<<<<
             .
             .
             .
            37:
            38:
            39: extern "C" __global__ __launch_bounds__(1) void test_rocm_gemm_tiled_output_710d7d7d2ca9ca9e__gpu__(float *arg0, float *arg1, float *arg2) {
            40: for (int64_t idx0 = 0; idx0 < 16; idx0 += 1) {
            41: for (int64_t idx1 = 0; idx1 < 16; idx1 += 1) {
            42: for (int64_t idx2 = 0; idx2 < 16; idx2 += 1) {
    count:2                                                   X error: no match found
            43: /*%0 = memref.load %arg0[%arg3, %arg5] : memref<16x16xf32, affine_map<(d0, d1) -> (d0 * 16 + d1)>>*/
    count:2     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            44: const auto arg0_offset0 = affine_map_func_0_i0(idx0, idx2);
    count:2     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            45: float var3 = ((float*)arg0)[arg0_offset0];
    count:2     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            46: /*%1 = memref.load %arg1[%arg5, %arg4] : memref<16x16xf32, affine_map<(d0, d1) -> (d0 * 16 + d1)>>*/
    count:2     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            47: const auto arg1_offset1 = affine_map_func_0_i0(idx2, idx1);
    count:2     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
    >>>>>>
    
  • Merged PR 2480: Clean up cache vectorization argument plumbing. [Mason
    Remy]

    Clean up cache vectorization argument plumbing

  • Merged PR 2481: Enables verification for ROCm smoke tests. [Kern
    Handa]

  • Merged PR 2473: Extends range analysis by adding support for
    udiv,sdiv,urem,srem. [Abdul Dakkak]

    these come up during code gen

  • Merged PR 2474: Add vectorize arg to plan.cache. [Mason Remy]

    Add vectorize arg to plan.cache

    • Enables specifying whether or not to vectorize ops for a given cache,
      including an "AUTO" option, which will behave how caching
      vectorization has behaved in the past, where it vectorizes the cache
      if any loop in the loopnest is also vectorized
    • Also fix some include paths

Full Changelog: v1.2.2...v1.2.3