v1.2.11
What's Changed
- Update vcpkg by @AtariDreams in #52
-
Merged PR 2924: Update hatlib dependency in setup.cfg, add comment.
[Lisa Ong] -
Merged PR 2922: [Github] Update vcpkg. [Lisa Ong]
From c2177e6 Mon Sep 17 00:00:00 2001
-
Merged PR 2910: Updates hatlib dependency to 0.0.29. [Kern Handa]
-
Merged PR 2905: Fix internal param name in GPU benchmarks. [Captain
Jack Sparrow]Fix internal param name in GPU benchmarks
-
Merged PR 2902: Increase ROCm baseline benchmark timeout to 10 hours.
[Captain Jack Sparrow]- Increase ROCm baseline benchmark to 10 hours
- Add category to the gemm input for classification
-
Merged PR 2901: Increase ROCm baseline timeout to 7 hours. [Captain
Jack Sparrow]Increase ROCm baseline timeout to 7 hours
-
Merged PR 2900: Prune gemm benchmark input for big sizes by removing
NT and TT configs. [Captain Jack Sparrow]- Prune gemm benchmark input for big sizes by removing NT and TT configs
- Disable verification for resnet sizes
- Fix baseline tagging for pytorch
-
Merged PR 2896: Dynamic shared memory allocation support. [Captain
Jack Sparrow]- Add optional param in plan.cache for memory offset
- Add optional param in schedule.create_plan for total dynamic memory size in bytes
- Update benchmarks to allow dynamic shared memory usage
Related work items: #3735
-
Merged PR 2898: Add pytorch gemm implementation for GPU benchmark
baselines. [Ritwik Das]Add pytorch gemm implementation for GPU benchmark baselines
-
Merged PR 2897: Generalize partial dynamic size support. [Mason Remy]
Generalize partial dynamic size support
Plumbs through mappings from arrays to which args provide the dimension
sizes for those arrays more generically.This also generalizes dynamic size support beyond matmul scenarios.
Note: due to assumptions in the debug mode plumbing, the size arguments
still must occur first in the argument list, and a later PR should
generalize that -
Merged PR 2894: Add one test case for partially dynamic sized array.
[Denny Sun] -
Merged PR 2891: [nfc][release] Rev docs to 1.2.11. [Lisa Ong]
-
Merged PR 2882: Add tests for thread coarsening and update GPU
benchmarks. [Ritwik Das]- Add tests for thread coarsening and update GPU benchmarks
Related work items: #3684
-
Merged PR 2890: Add folding scenario for cast ops where the only
downcasts are. [Mason Remy]Add folding scenario for cast ops where the only downcasts are
internally-generatedThis is useful for converting uint8uint8->uint8 to
int16int16->int32 using cache element types as is needed in the
vpmaddwd matmul scenario -
Merged PR 2889: [refactoring] Prevent overloading of keyword "Tensor"
- disambiguate with "MMAFragment" [Ritwik Das]
Prevent overloading of keyword "Tensor" - disambiguate with "MMAFragment"
New Contributors
- @AtariDreams made their first contribution in #52
Full Changelog: v1.2.10...v1.2.11