v0.2.8
What's Changed
- Support passing Triton function object to
hl.triton_kernel()by @yf225 in #1263 - chore: Bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #1270
- chore: Bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #1271
- [Distributed]
one_shot_allreduce_bias_rmsnormexample by @yf225 in #1266 - [Distributed]
matmul_reduce_scatterexample by @yf225 in #1269 - feat(benchmarks): add shapes to json output by @fulvius31 in #1273
- [Autotuner] Log the 'started' state to CSV, for easier user monitoring of kernel hanging at runtime by @yf225 in #1279
- default pattern search by @v0i0 in #1259
- Set LFBOPatternSearch as default by @ethche in #1280
- fix surrogate search for singleton population by @v0i0 in #1281
- Ignore bzl files in git. by @Myrthan in #1282
- chunk fused_linear_jsd by @v0i0 in #1277
- Fix buggy interation between XYZProgramIDs and L2GroupingProgramIDs by @jansel in #1288
- Fix bug with torch.rand_like compile error by @jansel in #1289
- [autotuner] print path to generated Triton code after selection of kernel by @bringlein in #1285
- Add support for torch.gather by @jansel in #1290
- [docs] Add more example to docs by @oulgen in #1301
- [lint] remove dead ignores by @oulgen in #1302
- Add proper error handling for torch.split and torch.tensor_split in device loops by @oulgen in #1297
- Fix BlockReductionStrategy to use existing index variables for argmax/argmin operations by @oulgen in #1298
- Skip more failing tests on cpu backend by @oulgen in #1304
- [CI] Fix broken notebook by @oulgen in #1305
- Fix shape inference for tile indexing on size-1 dimensions and use broadcast_to for block_ptr by @oulgen in #1299
- Fix codegen broadcasting for tile indexing on size-1 tensor dimensions by @oulgen in #1300
- Enable tests on py314 by @oulgen in #1306
New Contributors
- @bringlein made their first contribution in #1285
Full Changelog: v0.2.7...v0.2.8