Release v2.0.0 · HKUSTDial/flash-sparse-attention

What's Changed

Improve numerical stability in sparse attention with sink auxiliary logits by @LoserCheems in #220
[PERFORMANCE OPTIMIZATION] Flash Sparse Attention by @LoserCheems in #221
[BUG FIX] Refactor block min/max calculations by @LoserCheems in #223
[BUG FIX] Improve packed GQA handling by @LoserCheems in #224
Add utility functions for device management and input validation by @LoserCheems in #225
[PERFORMANCE OPTIMIZATION] Triton Sparse Base Forward Kernel with Gate-Based Sparsity by @LoserCheems in #226
[FEATURE] Enhance forward combine kernel and split attention by @LoserCheems in #227
Improves softmax stability with log2 scaling by @LoserCheems in #228
Renames variables and refactors functions for clarity by @LoserCheems in #229
Improve performance and configuration for SM90 forward path by @LoserCheems in #231
Refactor rescaling logic in online_softmax and rescale_o functions by @LoserCheems in #232
[BUG FIX] Improve forward kernel configuration and validation by @LoserCheems in #233
Refactor qheads_per_kvhead calculations for clarity by @LoserCheems in #234
[FEATURE SUPPORT] Add Triton backward support by @LoserCheems in #235
[FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Forward Kernel by @LoserCheems in #236
Refactor log_sigmoid function for improved performance and accuracy by @LoserCheems in #237
[FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Backward Kernel by @LoserCheems in #238
Enhance forward kernel for block range and masking logic by @LoserCheems in #239
Refactor backward kernels for clarity and optimization by @LoserCheems in #240
[BUG FIX] Update launch configuration for RTX Pro 6000 by @LoserCheems in #241
Add benchmark functions for Triton attention operations by @LoserCheems in #242
[FEATURE SUPPORT] Enable Softmax-Threshold Block Skipping in Triton Dense/Sparse Forward Attention by @LoserCheems in #243
[BUG FIX] Improve clarity and accuracy in gating mechanisms by @LoserCheems in #244
[BUG FIX] Update stride parameters for consistency by @LoserCheems in #245
Add softmax threshold parameter for enhanced flexibility by @LoserCheems in #246
[FEATURE] Implement dense attention with masking support by @LoserCheems in #247
Enhance sparse attention implementation and documentation by @LoserCheems in #248
[FEATURE] Implement gated attention mechanism and enhance performance by @LoserCheems in #249
Update project structure and dependencies by @LoserCheems in #250
[BUG FIX] Improve error reporting and occupancy in benchmarks by @LoserCheems in #251
Update repository URLs and improve documentation by @LoserCheems in #252
Refactor benchmark tests to simplify tensor initialization by @LoserCheems in #253
Refactor test utilities and add CUDA tensor operation tests by @LoserCheems in #254
Refactor masking logic in backward kernel functions by @LoserCheems in #255
Refactor GitHub Actions workflows for package building and publishing by @LoserCheems in #256

Full Changelog: v1.2.4...v2.0.0

What's Changed

Improve numerical stability in sparse attention with sink auxiliary logits by @LoserCheems in #220
[PERFORMANCE OPTIMIZATION] Flash Sparse Attention by @LoserCheems in #221
[BUG FIX] Refactor block min/max calculations by @LoserCheems in #223
[BUG FIX] Improve packed GQA handling by @LoserCheems in #224
Add utility functions for device management and input validation by @LoserCheems in #225
[PERFORMANCE OPTIMIZATION] Triton Sparse Base Forward Kernel with Gate-Based Sparsity by @LoserCheems in #226
[FEATURE] Enhance forward combine kernel and split attention by @LoserCheems in #227
Improves softmax stability with log2 scaling by @LoserCheems in #228
Renames variables and refactors functions for clarity by @LoserCheems in #229
Improve performance and configuration for SM90 forward path by @LoserCheems in #231
Refactor rescaling logic in online_softmax and rescale_o functions by @LoserCheems in #232
[BUG FIX] Improve forward kernel configuration and validation by @LoserCheems in #233
Refactor qheads_per_kvhead calculations for clarity by @LoserCheems in #234
[FEATURE SUPPORT] Add Triton backward support by @LoserCheems in #235
[FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Forward Kernel by @LoserCheems in #236
Refactor log_sigmoid function for improved performance and accuracy by @LoserCheems in #237
[FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Backward Kernel by @LoserCheems in #238
Enhance forward kernel for block range and masking logic by @LoserCheems in #239
Refactor backward kernels for clarity and optimization by @LoserCheems in #240
[BUG FIX] Update launch configuration for RTX Pro 6000 by @LoserCheems in #241
Add benchmark functions for Triton attention operations by @LoserCheems in #242
[FEATURE SUPPORT] Enable Softmax-Threshold Block Skipping in Triton Dense/Sparse Forward Attention by @LoserCheems in #243
[BUG FIX] Improve clarity and accuracy in gating mechanisms by @LoserCheems in #244
[BUG FIX] Update stride parameters for consistency by @LoserCheems in #245
Add softmax threshold parameter for enhanced flexibility by @LoserCheems in #246
[FEATURE] Implement dense attention with masking support by @LoserCheems in #247
Enhance sparse attention implementation and documentation by @LoserCheems in #248
[FEATURE] Implement gated attention mechanism and enhance performance by @LoserCheems in #249
Update project structure and dependencies by @LoserCheems in #250
[BUG FIX] Improve error reporting and occupancy in benchmarks by @LoserCheems in #251
Update repository URLs and improve documentation by @LoserCheems in #252
Refactor benchmark tests to simplify tensor initialization by @LoserCheems in #253
Refactor test utilities and add CUDA tensor operation tests by @LoserCheems in #254
Refactor masking logic in backward kernel functions by @LoserCheems in #255
Refactor GitHub Actions workflows for package building and publishing by @LoserCheems in #256

Full Changelog: v1.2.4...v2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

What's Changed

Contributors

Uh oh!