v0.2.7
What's Changed
- [CI] Skip all failing distributed tests by @yf225 in #1206
- Include index_dtype in the printed decorator snippet by @choijon5 in #1207
- Add dict comprehension support by @oulgen in #1191
- settings: set appropriate dot_precision default by @fulvius31 in #1184
- [Interpret Mode] Support custom block size by @yf225 in #1194
- [Autotuner] Add
autotune_benchmark_fnsetting by @yf225 in #1199 - jagged_dense_bmm (#1126) by @trieuat in #1213
- benchmarks: Include AMD GCN arch in get_device_name() by @fulvius31 in #1214
- Fix linter errors by @yf225 in #1218
- Fix unit test breakage due to upstream change by @yf225 in #1219
- Fix
static_shapessetting in test_dot.py by @yf225 in #1220 - Fix memory leak when Triton compile error occurs by @yf225 in #1217
- [Interpret Mode] Re-enable block-size dependent tests by @yf225 in #1212
- [Interpret Mode] Raise error if
hl.storeis used with duplicate indices by @yf225 in #1221 - [Interpret Mode] Fix
hl.storeautomatic dtype conversion by @yf225 in #1226 - [Interpret Mode] Fix
hl.loadwith multiple 1D tensor indices by @yf225 in #1227 - [CI] Fix NVSHMEM env vars and re-enable distributed CI job by @yf225 in #1201
- Move jagged_dense_bmm expected code to the right place by @yf225 in #1232
- Reduce log volume by moving output code logging behind HELION_PRINT_OUTPUT_CODE=1 by @yf225 in #1233
- Add setup for Helion to compile on MTIA with basic test by @Myrthan in #1169
- Make
hl.triton_kernelsupport global var and recursive kernel call by @yf225 in #1234 - Make
hl.triton_kernelsupport output_like=None without being DCE'd by @yf225 in #1237 - Show errors when pre-commit fails by @oulgen in #1238
- example: gated delta net fwd_h by @v0i0 in #1119
- Change property name from camel case to snake case. by @Myrthan in #1239
- Move distributed examples to
examples/distributed/by @yf225 in #1240 - fix for circular dependency by @mengluy0125 in #1236
- Fix mask propagation for indexed stores when block_id is 0 by checking is not None instead of truthiness by @oulgen in #1244
- Clean up distributed examples path refs by @yf225 in #1241
- Fix RNG codegen for constant (specialized) dimensions by @yf225 in #1253
- Avoid broadcasting for non-consecutive tensor indexers by @yf225 in #1254
- Implement torch.sort support by @oulgen in #1247
- Implement torch.topk support by @oulgen in #1248
- Allow using
hl.specializeto specialize on tensor strides by @yf225 in #1215 - Use
torch._dynamo.mark_static()API to allow tensor shape specialization outside of the kernel code by @yf225 in #1210 - chore: Bump actions/cache from 4 to 5 by @dependabot[bot] in #1257
- Fix invalid Triton code for mixed scalar/block indexing in store operations when block dimension has size 1 by @oulgen in #1258
New Contributors
Full Changelog: v0.2.6...v0.2.7