v0.2.4
What's Changed
- Add user-customizable autotune_baseline_atol / rtol settings by @yf225 in #1136
- Fix specialize + reshape use case by @yf225 in #1146
- Emit tl.constexpr dims for block-size-only view/reshape shapes by @oulgen in #1149
- Add hl.triton_kernel to call Triton kernels from device code by @oulgen in #1150
- Add torch.library.custom_op compatibility to @helion.kernel by @gmagogsfm in #1153
- chore: Bump actions/checkout from 5 to 6 by @dependabot[bot] in #1154
- Skip Resource temporarily unavailable error by @mengluy0125 in #1156
- Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning by @gmagogsfm in #1158
- Fix min hoisting bug by @yf225 in #1157
- Fix scalar broadcast bug in inductor lowering by @gmagogsfm in #1159
- Add LFBO Pattern Search by @ethche in #1115
- benchmarks: allow external kernel mappings for Helion run.py by @fulvius31 in #1160
- Fix CI dependency error for nvidia-nvshmem-cu12 when using PyTorch nightly and other CI lint errors from pyrefly change. by @choijon5 in #1165
- Support AMD-specific autotune parameters: waves_per_eu and matrix_instr_nonkdim by @choijon5 in #1162
- Get remote tensors inside
@helion.kernelby @kwen2501 in #1122 - fix shape bug in lfbo pattern search by @ethche in #1170
- Fix lint errors in local dev env by @yf225 in #1174
- [Ref Mode] Fix error message by @yf225 in #1175
- Add support for x.view() by @oulgen in #1176
- Add support for hl.randint by @oulgen in #1177
- Support torch.tensor in helion.kernel by @oulgen in #1178
- Support data-dependent hl.tile/hl.grid bounds in persistent kernels by @oulgen in #1180
- [CI] remove all conda and move to uv by @oulgen in #1181
- Fix unbackend symints in generated code by @oulgen in #1179
New Contributors
- @gmagogsfm made their first contribution in #1153
- @ethche made their first contribution in #1115
- @kwen2501 made their first contribution in #1122
Full Changelog: v0.2.3...v0.2.4