Skip to content

v1.2.11

Choose a tag to compare

@lisaong lisaong released this 18 Oct 07:45
· 22 commits to main since this release

What's Changed

  • Update vcpkg by @AtariDreams in #52
  • Merged PR 2924: Update hatlib dependency in setup.cfg, add comment.
    [Lisa Ong]

  • Merged PR 2922: [Github] Update vcpkg. [Lisa Ong]

    From c2177e6 Mon Sep 17 00:00:00 2001

  • Merged PR 2910: Updates hatlib dependency to 0.0.29. [Kern Handa]

  • Merged PR 2905: Fix internal param name in GPU benchmarks. [Captain
    Jack Sparrow]

    Fix internal param name in GPU benchmarks

  • Merged PR 2902: Increase ROCm baseline benchmark timeout to 10 hours.
    [Captain Jack Sparrow]

    • Increase ROCm baseline benchmark to 10 hours
    • Add category to the gemm input for classification
  • Merged PR 2901: Increase ROCm baseline timeout to 7 hours. [Captain
    Jack Sparrow]

    Increase ROCm baseline timeout to 7 hours

  • Merged PR 2900: Prune gemm benchmark input for big sizes by removing
    NT and TT configs. [Captain Jack Sparrow]

    • Prune gemm benchmark input for big sizes by removing NT and TT configs
    • Disable verification for resnet sizes
    • Fix baseline tagging for pytorch
  • Merged PR 2896: Dynamic shared memory allocation support. [Captain
    Jack Sparrow]

    • Add optional param in plan.cache for memory offset
    • Add optional param in schedule.create_plan for total dynamic memory size in bytes
    • Update benchmarks to allow dynamic shared memory usage

    Related work items: #3735

  • Merged PR 2898: Add pytorch gemm implementation for GPU benchmark
    baselines. [Ritwik Das]

    Add pytorch gemm implementation for GPU benchmark baselines

  • Merged PR 2897: Generalize partial dynamic size support. [Mason Remy]

    Generalize partial dynamic size support

    Plumbs through mappings from arrays to which args provide the dimension
    sizes for those arrays more generically.

    This also generalizes dynamic size support beyond matmul scenarios.

    Note: due to assumptions in the debug mode plumbing, the size arguments
    still must occur first in the argument list, and a later PR should
    generalize that

  • Merged PR 2894: Add one test case for partially dynamic sized array.
    [Denny Sun]

  • Merged PR 2891: [nfc][release] Rev docs to 1.2.11. [Lisa Ong]

  • Merged PR 2882: Add tests for thread coarsening and update GPU
    benchmarks. [Ritwik Das]

    • Add tests for thread coarsening and update GPU benchmarks

    Related work items: #3684

  • Merged PR 2890: Add folding scenario for cast ops where the only
    downcasts are. [Mason Remy]

    Add folding scenario for cast ops where the only downcasts are
    internally-generated

    This is useful for converting uint8uint8->uint8 to
    int16
    int16->int32 using cache element types as is needed in the
    vpmaddwd matmul scenario

  • Merged PR 2889: [refactoring] Prevent overloading of keyword "Tensor"

    • disambiguate with "MMAFragment" [Ritwik Das]

    Prevent overloading of keyword "Tensor" - disambiguate with "MMAFragment"

New Contributors

  • @AtariDreams made their first contribution in #52

Full Changelog: v1.2.10...v1.2.11