Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sparse SGMV #64

Merged
merged 12 commits into from
Nov 28, 2023
Merged

Implement sparse SGMV #64

merged 12 commits into from
Nov 28, 2023

Conversation

tgaddair
Copy link
Contributor

@tgaddair tgaddair commented Nov 27, 2023

One the major limitations of the original SGMV kernel is that it can only be applied to a batch consisting entirely of adapters of the same rank. This meant that in cases where ranks differed within a batch, we needed to fallback to the loop and mask approach. This was particularly problematic in cases where batches mix between the base model and adapters, as in a production setting it will frequently be the case that some significant portion of the requests will be to the base model.

One workaround would be to apply a zero weight matrix as a stand-in for the LoRA weights for rows corresponding to the base model or adapters of different ranks, but this means we need to allocate additional matrices for every rank, which can add up.

To workaround this, we extend the existing SGMV kernels to support a sparse list of segments (meaning that not every segment in the batch should have the SGMV operation applied to it). This allows us to avoid applying anything to the base model rows, and handle batches that contain mixed rank adapters.

In cases where adapters in a batch have mixed rank, we process each rank in turn, such that the number of SGMV operations becomes O(R) where R is the number of distinct ranks in the batch. This is a significant improvement of the O(A) cost of the loop implementation, where A is the number of adapters.

With this change, the only case where SGMV is not being used is with tensor parallelism, which we'll address in a follow-up PR shortly.

@tgaddair tgaddair merged commit 54fafb9 into main Nov 28, 2023
1 check passed
@tgaddair tgaddair deleted the sparse-sgmv branch November 28, 2023 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants