Skip to content

v0.2.4

Choose a tag to compare

@oulgen oulgen released this 01 Dec 22:30
· 1075 commits to main since this release
9d0b8bd

What's Changed

  • Add user-customizable autotune_baseline_atol / rtol settings by @yf225 in #1136
  • Fix specialize + reshape use case by @yf225 in #1146
  • Emit tl.constexpr dims for block-size-only view/reshape shapes by @oulgen in #1149
  • Add hl.triton_kernel to call Triton kernels from device code by @oulgen in #1150
  • Add torch.library.custom_op compatibility to @helion.kernel by @gmagogsfm in #1153
  • chore: Bump actions/checkout from 5 to 6 by @dependabot[bot] in #1154
  • Skip Resource temporarily unavailable error by @mengluy0125 in #1156
  • Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning by @gmagogsfm in #1158
  • Fix min hoisting bug by @yf225 in #1157
  • Fix scalar broadcast bug in inductor lowering by @gmagogsfm in #1159
  • Add LFBO Pattern Search by @ethche in #1115
  • benchmarks: allow external kernel mappings for Helion run.py by @fulvius31 in #1160
  • Fix CI dependency error for nvidia-nvshmem-cu12 when using PyTorch nightly and other CI lint errors from pyrefly change. by @choijon5 in #1165
  • Support AMD-specific autotune parameters: waves_per_eu and matrix_instr_nonkdim by @choijon5 in #1162
  • Get remote tensors inside @helion.kernel by @kwen2501 in #1122
  • fix shape bug in lfbo pattern search by @ethche in #1170
  • Fix lint errors in local dev env by @yf225 in #1174
  • [Ref Mode] Fix error message by @yf225 in #1175
  • Add support for x.view() by @oulgen in #1176
  • Add support for hl.randint by @oulgen in #1177
  • Support torch.tensor in helion.kernel by @oulgen in #1178
  • Support data-dependent hl.tile/hl.grid bounds in persistent kernels by @oulgen in #1180
  • [CI] remove all conda and move to uv by @oulgen in #1181
  • Fix unbackend symints in generated code by @oulgen in #1179

New Contributors

Full Changelog: v0.2.3...v0.2.4