v2.0.1
What's Changed
- Add documentation for MkDocs setup and API reference by @LoserCheems in #257
- Remove Public Package Exports section from API reference documentation by @LoserCheems in #258
- Update docstrings in attention functions for consistency by @LoserCheems in #259
- Add return type annotations for attention functions by @LoserCheems in #260
- Init cute version by @LoserCheems in #261
- Update CuTe namespace and enhance dependencies by @LoserCheems in #262
- Add support for upstream split reference in sync scripts by @LoserCheems in #263
- [BUG FIX] Refactor CuTe namespace and enhance sync scripts by @LoserCheems in #264
- [BUG FIX] Optimize LSE computation in forward combine kernel by @LoserCheems in #265
- Update CuTe namespace and functionality by @LoserCheems in #266
- Enhance sync script with cherry-pick functionality and improve merge conflict handling Co-authored-by: Copilot copilot@github.com by @LoserCheems in #267
- Rename triton function by @LoserCheems in #268
- Revert "Rename triton function" by @LoserCheems in #269
- Rename forward combine functions and clarify comments by @LoserCheems in #270
- [FEATURE SUPPORT] Add Triton decode support with KV-cache APIs by @LoserCheems in #271
- Enhance decoding functions with FP8 and quantization support by @LoserCheems in #272
- Fix bug for decode benchmark by @LoserCheems in #273
- Cache optim by @LoserCheems in #274
- [PERFORMANCE OPTIMIZATION] Add compile-time CHECK_NAN toggle to finalize for decode kernel fast-path by @LoserCheems in #275
- [FEATURE SUPPORT] Add HuggingFace Kernel Hub support by @LoserCheems in #276
- Bump version to 2.0.1 by @LoserCheems in #277
Full Changelog: v2.0.0...v2.0.1
What's Changed
- Add documentation for MkDocs setup and API reference by @LoserCheems in #257
- Remove Public Package Exports section from API reference documentation by @LoserCheems in #258
- Update docstrings in attention functions for consistency by @LoserCheems in #259
- Add return type annotations for attention functions by @LoserCheems in #260
- Init cute version by @LoserCheems in #261
- Update CuTe namespace and enhance dependencies by @LoserCheems in #262
- Add support for upstream split reference in sync scripts by @LoserCheems in #263
- [BUG FIX] Refactor CuTe namespace and enhance sync scripts by @LoserCheems in #264
- [BUG FIX] Optimize LSE computation in forward combine kernel by @LoserCheems in #265
- Update CuTe namespace and functionality by @LoserCheems in #266
- Enhance sync script with cherry-pick functionality and improve merge conflict handling Co-authored-by: Copilot copilot@github.com by @LoserCheems in #267
- Rename triton function by @LoserCheems in #268
- Revert "Rename triton function" by @LoserCheems in #269
- Rename forward combine functions and clarify comments by @LoserCheems in #270
- [FEATURE SUPPORT] Add Triton decode support with KV-cache APIs by @LoserCheems in #271
- Enhance decoding functions with FP8 and quantization support by @LoserCheems in #272
- Fix bug for decode benchmark by @LoserCheems in #273
- Cache optim by @LoserCheems in #274
- [PERFORMANCE OPTIMIZATION] Add compile-time CHECK_NAN toggle to finalize for decode kernel fast-path by @LoserCheems in #275
- [FEATURE SUPPORT] Add HuggingFace Kernel Hub support by @LoserCheems in #276
- Bump version to 2.0.1 by @LoserCheems in #277
Full Changelog: v2.0.0...v2.0.1