Skip to content

v0.2.7

Choose a tag to compare

@oulgen oulgen released this 13 Dec 08:16
· 1030 commits to main since this release
d5a2d61

What's Changed

  • [CI] Skip all failing distributed tests by @yf225 in #1206
  • Include index_dtype in the printed decorator snippet by @choijon5 in #1207
  • Add dict comprehension support by @oulgen in #1191
  • settings: set appropriate dot_precision default by @fulvius31 in #1184
  • [Interpret Mode] Support custom block size by @yf225 in #1194
  • [Autotuner] Add autotune_benchmark_fn setting by @yf225 in #1199
  • jagged_dense_bmm (#1126) by @trieuat in #1213
  • benchmarks: Include AMD GCN arch in get_device_name() by @fulvius31 in #1214
  • Fix linter errors by @yf225 in #1218
  • Fix unit test breakage due to upstream change by @yf225 in #1219
  • Fix static_shapes setting in test_dot.py by @yf225 in #1220
  • Fix memory leak when Triton compile error occurs by @yf225 in #1217
  • [Interpret Mode] Re-enable block-size dependent tests by @yf225 in #1212
  • [Interpret Mode] Raise error if hl.store is used with duplicate indices by @yf225 in #1221
  • [Interpret Mode] Fix hl.store automatic dtype conversion by @yf225 in #1226
  • [Interpret Mode] Fix hl.load with multiple 1D tensor indices by @yf225 in #1227
  • [CI] Fix NVSHMEM env vars and re-enable distributed CI job by @yf225 in #1201
  • Move jagged_dense_bmm expected code to the right place by @yf225 in #1232
  • Reduce log volume by moving output code logging behind HELION_PRINT_OUTPUT_CODE=1 by @yf225 in #1233
  • Add setup for Helion to compile on MTIA with basic test by @Myrthan in #1169
  • Make hl.triton_kernel support global var and recursive kernel call by @yf225 in #1234
  • Make hl.triton_kernel support output_like=None without being DCE'd by @yf225 in #1237
  • Show errors when pre-commit fails by @oulgen in #1238
  • example: gated delta net fwd_h by @v0i0 in #1119
  • Change property name from camel case to snake case. by @Myrthan in #1239
  • Move distributed examples to examples/distributed/ by @yf225 in #1240
  • fix for circular dependency by @mengluy0125 in #1236
  • Fix mask propagation for indexed stores when block_id is 0 by checking is not None instead of truthiness by @oulgen in #1244
  • Clean up distributed examples path refs by @yf225 in #1241
  • Fix RNG codegen for constant (specialized) dimensions by @yf225 in #1253
  • Avoid broadcasting for non-consecutive tensor indexers by @yf225 in #1254
  • Implement torch.sort support by @oulgen in #1247
  • Implement torch.topk support by @oulgen in #1248
  • Allow using hl.specialize to specialize on tensor strides by @yf225 in #1215
  • Use torch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code by @yf225 in #1210
  • chore: Bump actions/cache from 4 to 5 by @dependabot[bot] in #1257
  • Fix invalid Triton code for mixed scalar/block indexing in store operations when block dimension has size 1 by @oulgen in #1258

New Contributors

Full Changelog: v0.2.6...v0.2.7