Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting #324

tgaddair · 2024-03-13T05:24:10Z

Fixes #308.

Previous implementation assumed the rank set in the config was correct and we only need to scale by process group size. However, this may not always be the case depending on how the weights get split for tensor parallelism (column vs row parallel). As such, we should ignore the config and instead rely on the true rank dimension of the tensor independent of the process group size.

jeffreyftang · 2024-03-13T16:32:18Z

server/lorax_server/models/model.py

@@ -261,9 +261,9 @@ def load_batched_adapter_weights(
        lora_a_list = [pad_rank(w, dim=1, world_size=self.world_size) for w in lora_a_list]
        lora_b_list = [pad_rank(w, dim=0, world_size=self.world_size) for w in lora_b_list]

-        if lora_b_list:
+        if lora_a_list:


Curious what the significance of swapping from lora_b_list to lora_a_list here is.

It is the same, actually.

tgaddair added 3 commits March 12, 2024 21:40

WIP: fix tensor parallel issue

50395a3

Fixed

41db5b0

Removed debug code

d3618de

tgaddair marked this pull request as ready for review March 13, 2024 05:29

tgaddair mentioned this pull request Mar 13, 2024

SGMV not working for llama 70b #308

Closed

4 tasks

Added tests

16ea686

tgaddair requested review from jeffreyftang and magdyksaleh March 13, 2024 16:06

jeffreyftang approved these changes Mar 13, 2024

View reviewed changes

tgaddair added 3 commits March 13, 2024 09:39

Mock

da8839f

Reroder

a8db9f1

Fix mock

5bf1599

tgaddair merged commit 49f3f53 into main Mar 14, 2024
1 check passed

tgaddair deleted the fix-308 branch March 14, 2024 04:24

tgaddair mentioned this pull request Mar 14, 2024

performance issue #323

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting #324

Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting #324

tgaddair commented Mar 13, 2024 •

edited

Loading

jeffreyftang Mar 13, 2024

tgaddair Mar 13, 2024

Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting #324

Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting #324

Conversation

tgaddair commented Mar 13, 2024 • edited Loading

jeffreyftang Mar 13, 2024

Choose a reason for hiding this comment

tgaddair Mar 13, 2024

Choose a reason for hiding this comment

tgaddair commented Mar 13, 2024 •

edited

Loading