Release v0.3.3 · pytorch/helion

What's Changed

bump cublas for b200 benchmarks by @v0i0 in #1666
Remove need to pass device=... into torch functions by @hinriksnaer in #1657
[docs] Updated docs to communicate fast math support by @hinriksnaer in #1669
Fix #762: Replace internal assertion with user-facing error in set_pid by @tianrengao in #1670
[Helion + torch.compile] Add expected_num_kernels validation to torch.compile tests by @yf225 in #1655
[pallas-tpu] Fix segfault xfails: softmax tests now pass, reclassify others by @v0i0 in #1721
Reject data-dependent output shapes in infer_output_spec by @gmagogsfm in #1722
[pallas-tpu] Add test_long_sum_manual to verify range bound fix by @v0i0 in #1732
[pallas-tpu] fix default configs for TPU examples by @v0i0 in #1731
Move Triton-specific implementations from Backend base class to TritonBackend. by @norx1991 in #1728
Fix lint in ref_mode.py by @jansel in #1681
Add missing rebenchmark, finishing phase, and effort profile wiring to DESurrogateHybrid by @fulvius31 in #1680
Faster expected generation in test_indexing.py by @jansel in #1682
[cutedsl] Implement layout planning phase by @jansel in #1664
Revert "[pallas-tpu] fix default configs for TPU examples (#1731)" by @norx1991 in #1740
[pallas-tpu] Fix Pallas test_add by making non-contiguous inputs contiguous in pallas launcher. by @norx1991 in #1737
[Helion + torch.compile] Refactor HelionTemplateBuffer to use TemplateBuffer base class by @yf225 in #1723
precompile in the current process by @shunting314 in #1730
[CI] Fix Pyrefly lint error in template_buffer.py by @yf225 in #1746
[Autotuner] Add autotune_baseline_accuracy_check_fn for custom accuracy checks by @yf225 in #1733
Add Dockerfile by @jansel in #1748
Add scripts/runpod.py by @jansel in #1749
[cutedls] Initial mma support by @jansel in #1742
Fix doubled test output in non-distributed CI jobs by @norx1991 in #1741
Add hl.jagged_tile by @nullplay in #1651
Fix logging to be compatible with pytest by @bringlein in #1734
Add autotune_initial_population_strategy kernel setting by @bringlein in #1735
[CI] Increase atol for test_squeeze_and_excitation_net_fwd on B200 by @yf225 in #1752
Unpin H100 nightly torch and Triton versions by @v0i0 in #1654
Enable pyrefly on macOS with ignore-missing-imports by @aditvenk in #1760
[metal] Register "metal" backend with minimal MetalBackend and launcher by @aditvenk in #1761
[metal] Respect force_tile_mask() in NDTileStrategy mask generation by @aditvenk in #1762
Remove past hackathon event from README by @choijon5 in #1780
Remove deadcode _clone_tree and _assert_args_close by @choijon5 in #1781
Fix PermutationFragment.encode() returning wrong value by @choijon5 in #1779
Fix atomic_max ref to return previous value by @choijon5 in #1782
Fix lints by @jansel in #1775
Add runpod SKILL.md by @jansel in #1776
[Helion + torch.compile] Add store/load transform hooks and prologue/epilogue fusion codegen by @yf225 in #1724
[Helion + torch.compile] Enable torch.compile fusion tests by @yf225 in #1727
[Helion + torch.compile] Simplify _remap_or_resolve for compound sympy expressions by @yf225 in #1785
Add scheduled workflow to rerun GPU health check failures by @v0i0 in #1683
Fix device sync in generic benchmarking functions for TPU/Pallas by @norx1991 in #1773
Removing skips and in some cases adding skipIfNotCUDA for cuda only features. by @umechand-amd in #1790
APIs to debug distributed kernel by @shunting314 in #1743
[cutedsl] tcgen05 MMA support by @jansel in #1777
[cutedsl] Fix broadcast and reshape-backed matmul lowering by @jansel in #1783
[cutedsl] Fix packed-RHS lowering and add general stack/reshape views by @jansel in #1784
Fix ROCm failures on main by @jansel in #1805
add kernel-filter to select kernel for allreduce-rmsnorm by @shunting314 in #1744
helion distributed kernel autotuning by @shunting314 in #1532
[Helion + torch.compile] Add ref baseline kernel count checks by @yf225 in #1786

New Contributors

@norx1991 made their first contribution in #1728
@nullplay made their first contribution in #1651

Full Changelog: v0.3.2...v0.3.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!