v0.1.6
What's Changed
- ci: Always auth for benchmarking workflows by @seemethere in #719
- [Benchmark] jagged_sum kernel and test by @Sibylau in #676
- Skip default config printing if in ref eager mode by @yf225 in #721
- [Benchmark CI] Make benchmark runner respect custom CLI args by @yf225 in #723
- Upgrade rocm CI to 7.0 by @oulgen in #720
- Add eviction policy argument to tl.load by @oulgen in #714
- [CI] use complete rocm docker images by @oulgen in #724
- More inconsistent naming by @oulgen in #725
- [Benchmark] jagged_layer_norm kernel and test by @Sibylau in #704
- [Bug fix] Preserve masks on reduction inputs that depend on reduction outputs; fix layer_norm accuracy check failure by @yf225 in #722
- Support torch.matmul with 3D inputs by @yf225 in #715
- Slightly improve logs by @angelayi in #740
- Autotuning Progress Bar by @msaroufim in #739
- make tritonbench optional in run.py so install works again by @v0i0 in #746
- fix new factory when size comes from kwargs by @v0i0 in #750
- Add linting instructions to README by @msaroufim in #763
- Add backward kernel for exp by @aditvenk in #736
- fix roll reduction meta when for ops with none output (like wait), cl… by @v0i0 in #767
- Move upload benchmark results to a separate workflows by @huydhn in #758
- Add flash_attention to benchmarks by @oulgen in #769
- Fix jagged_layer_norm linter error by @yf225 in #770
- Add SIGINT handler for clean interrupt of autotuning background processes by @msaroufim in #766
- Enable tensor descriptor for XPU by @EikanWang in #765
- Fix the issue that the XPU kernels cannot be cached well by @EikanWang in #761
- Print Helion kernel source line in symbolic shape debugging by @yf225 in #771
- ci: Set fail-fast to false by @seemethere in #776
- Add XPU support for RNG operations by @EikanWang in #774
- Enable test_dot for XPU by @EikanWang in #773
- Handle XPU compilation error by @adam-smnk in #779
- Fix type prop for and/or by @oulgen in #781
- Make print output code more robust by @oulgen in #780
- Revert "Add SIGINT handler for clean interrupt of autotuning background processes" by @oulgen in #784
- Add torch compile unit test to helion by @oulgen in #782
New Contributors
- @seemethere made their first contribution in #719
- @angelayi made their first contribution in #740
- @msaroufim made their first contribution in #739
- @aditvenk made their first contribution in #736
- @EikanWang made their first contribution in #765
- @adam-smnk made their first contribution in #779
Full Changelog: v0.1.5...v0.1.6