Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Introduces a new split-K attention computation kernel that enables parallel processing across the sequence dimension. The implementation includes:

  • Split-K attention computation with configurable number of splits
  • Sparse matrix multiplication using active masks for both key and value operations
  • Dynamic masking support with causal and window-based attention patterns
  • Key-value cache appending with optional rotary positional embeddings
  • Accumulator combination logic for merging results across splits

Improves memory efficiency and enables better parallelization for long sequence attention computations by distributing work across multiple thread blocks.

Introduces a new split-K attention computation kernel that enables parallel processing across the sequence dimension. The implementation includes:

- Split-K attention computation with configurable number of splits
- Sparse matrix multiplication using active masks for both key and value operations
- Dynamic masking support with causal and window-based attention patterns
- Key-value cache appending with optional rotary positional embeddings
- Accumulator combination logic for merging results across splits

Improves memory efficiency and enables better parallelization for long sequence attention computations by distributing work across multiple thread blocks.

This comment was marked as outdated.

@LoserCheems LoserCheems requested a review from Copilot June 23, 2025 06:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@LoserCheems
Copy link
Collaborator Author

Hi, @wubingheng111,@Evanwu1125, @SNHuan, the copilot is unable to review this PR. Please review it instead!🤗

@Evanwu1125 Evanwu1125 merged commit 5c3dad2 into main Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants