Implement sparse SGMV #64

tgaddair · 2023-11-27T05:32:18Z

One the major limitations of the original SGMV kernel is that it can only be applied to a batch consisting entirely of adapters of the same rank. This meant that in cases where ranks differed within a batch, we needed to fallback to the loop and mask approach. This was particularly problematic in cases where batches mix between the base model and adapters, as in a production setting it will frequently be the case that some significant portion of the requests will be to the base model.

One workaround would be to apply a zero weight matrix as a stand-in for the LoRA weights for rows corresponding to the base model or adapters of different ranks, but this means we need to allocate additional matrices for every rank, which can add up.

To workaround this, we extend the existing SGMV kernels to support a sparse list of segments (meaning that not every segment in the batch should have the SGMV operation applied to it). This allows us to avoid applying anything to the base model rows, and handle batches that contain mixed rank adapters.

In cases where adapters in a batch have mixed rank, we process each rank in turn, such that the number of SGMV operations becomes O(R) where R is the number of distinct ranks in the batch. This is a significant improvement of the O(A) cost of the loop implementation, where A is the number of adapters.

With this change, the only case where SGMV is not being used is with tensor parallelism, which we'll address in a follow-up PR shortly.

tgaddair added 8 commits November 24, 2023 14:03

WIP: sparse sgmv

e4c2c5f

SGMV on each rank

01f8268

Fixed rank segments

a9562c3

Merge scaling factor into lora_b matrix

fbd77ca

Comment

2e39bf8

Merge branch 'main' into sparse-sgmv

ff99403

Updated sgmv function

1794957

Sparse SGMV kernels

beffdc2

tgaddair requested review from geoffreyangus and magdyksaleh November 27, 2023 05:32

geoffreyangus approved these changes Nov 27, 2023

View reviewed changes

tgaddair added 4 commits November 27, 2023 21:30

Merge

fe2a93a

Updated shrink kernel

4ae50b0

Fixed name clash

306bd52

Off by one

c15154e

tgaddair merged commit 54fafb9 into main Nov 28, 2023
1 check passed

tgaddair deleted the sparse-sgmv branch November 28, 2023 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sparse SGMV #64

Implement sparse SGMV #64

tgaddair commented Nov 27, 2023 •

edited

Loading

Implement sparse SGMV #64

Implement sparse SGMV #64

Conversation

tgaddair commented Nov 27, 2023 • edited Loading

tgaddair commented Nov 27, 2023 •

edited

Loading