Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Jan 19, 2024

Before this PR, FlashInfer supports KV sequence parallelism for single decode/prefill and batch decode, but not batch prefill, however, this feature is also important for batch prefill kernel. This PR implements KV partition for batch prefill kernels (on both Paged & Ragged KV-Cache).

@yzh119 yzh119 mentioned this pull request Feb 27, 2024
3 tasks
@AgrawalAmey
Copy link

AgrawalAmey commented Mar 14, 2024

@yzh119 is this PR good to use? This would be extremely useful for some of my work.

@yzh119
Copy link
Collaborator Author

yzh119 commented Mar 16, 2024

@AgrawalAmey We did a huge amount of code refactor since the last commit of this PR, so I need to rebase and add some new commits, please stay tuned :)

@AgrawalAmey
Copy link

@yzh119 looking forward to it! I would be happy to help accelerate this, please let me know if I can help in any way.

@ZSL98
Copy link

ZSL98 commented Apr 3, 2024

Looking forward to it!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants