Skip to content

v0.1.6

Choose a tag to compare

@oulgen oulgen released this 02 Oct 22:32
· 1372 commits to main since this release
3322ca9

What's Changed

  • ci: Always auth for benchmarking workflows by @seemethere in #719
  • [Benchmark] jagged_sum kernel and test by @Sibylau in #676
  • Skip default config printing if in ref eager mode by @yf225 in #721
  • [Benchmark CI] Make benchmark runner respect custom CLI args by @yf225 in #723
  • Upgrade rocm CI to 7.0 by @oulgen in #720
  • Add eviction policy argument to tl.load by @oulgen in #714
  • [CI] use complete rocm docker images by @oulgen in #724
  • More inconsistent naming by @oulgen in #725
  • [Benchmark] jagged_layer_norm kernel and test by @Sibylau in #704
  • [Bug fix] Preserve masks on reduction inputs that depend on reduction outputs; fix layer_norm accuracy check failure by @yf225 in #722
  • Support torch.matmul with 3D inputs by @yf225 in #715
  • Slightly improve logs by @angelayi in #740
  • Autotuning Progress Bar by @msaroufim in #739
  • make tritonbench optional in run.py so install works again by @v0i0 in #746
  • fix new factory when size comes from kwargs by @v0i0 in #750
  • Add linting instructions to README by @msaroufim in #763
  • Add backward kernel for exp by @aditvenk in #736
  • fix roll reduction meta when for ops with none output (like wait), cl… by @v0i0 in #767
  • Move upload benchmark results to a separate workflows by @huydhn in #758
  • Add flash_attention to benchmarks by @oulgen in #769
  • Fix jagged_layer_norm linter error by @yf225 in #770
  • Add SIGINT handler for clean interrupt of autotuning background processes by @msaroufim in #766
  • Enable tensor descriptor for XPU by @EikanWang in #765
  • Fix the issue that the XPU kernels cannot be cached well by @EikanWang in #761
  • Print Helion kernel source line in symbolic shape debugging by @yf225 in #771
  • ci: Set fail-fast to false by @seemethere in #776
  • Add XPU support for RNG operations by @EikanWang in #774
  • Enable test_dot for XPU by @EikanWang in #773
  • Handle XPU compilation error by @adam-smnk in #779
  • Fix type prop for and/or by @oulgen in #781
  • Make print output code more robust by @oulgen in #780
  • Revert "Add SIGINT handler for clean interrupt of autotuning background processes" by @oulgen in #784
  • Add torch compile unit test to helion by @oulgen in #782

New Contributors

Full Changelog: v0.1.5...v0.1.6