Fix: Support model parallelism in HF transformer #3459

gavrishp · 2024-02-20T13:08:24Z

What this PR does / why we need it:

Include device_map changes and set it to "auto" to allow model parallelism prevent from OOM trying to fit model into a single device

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Test B
Logs

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Tested with large models like in llama2 70B

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

python/huggingfaceserver/huggingfaceserver/model.py

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

yuzisun · 2024-03-03T07:06:23Z

python/huggingfaceserver/huggingfaceserver/model.py

+        if self.model._no_split_modules:
+            self.device_map = "auto"


should we call infer_auto_device_map here?

https://huggingface.co/docs/accelerate/en/concept_guides/big_model_inference

I think the functionality wise it's the same

yuzisun · 2024-03-17T18:34:37Z

/lgtm
/approve

oss-prow-bot · 2024-03-17T18:34:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Support model parallelism in HF transformer Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Support models that dont supoort slipt Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix padding Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix defauults Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * set cuda as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * set cuda as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * update automodel Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comment Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * update review comment Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * update comment Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>

oss-prow-bot bot requested review from cmaddalozzo and Iamlovingit February 20, 2024 13:08

terrytangyuan reviewed Feb 20, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

python/huggingfaceserver/huggingfaceserver/model.py Show resolved Hide resolved

johnugeorge reviewed Feb 20, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Show resolved Hide resolved

terrytangyuan mentioned this pull request Feb 21, 2024

Fix:Support Parallelism in vllm runtime #3464

Merged

9 tasks

gavrishp added 11 commits February 26, 2024 19:42

Support model parallelism in HF transformer

edea30e

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

Support models that dont supoort slipt

1c8d76d

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix padding

7bad39d

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix defauults

e11d5a4

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

set cuda as default

7c33845

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

set cuda as default

299288e

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix lint

27f4394

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

update automodel

baafc67

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix review comment

898d0d3

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

update review comment

28b60ab

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

update comment

7ed7893

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

gavrishp force-pushed the hf_distributed branch from 74ef248 to 7ed7893 Compare February 26, 2024 19:42

yuzisun reviewed Mar 3, 2024

View reviewed changes

oss-prow-bot bot assigned yuzisun Mar 17, 2024

oss-prow-bot bot added the lgtm label Mar 17, 2024

oss-prow-bot bot added the approved label Mar 17, 2024

oss-prow-bot bot merged commit aa1aa24 into kserve:master Mar 17, 2024
60 checks passed

This was referenced Mar 21, 2024

Explicitly specify pad token id when generating tokens #3535

Closed

Explicitly specify pad token id when generating tokens #3536

Closed

sivanantha321 mentioned this pull request Apr 2, 2024

Explicitly specify pad token id when generating tokens #3565

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Support model parallelism in HF transformer #3459

Fix: Support model parallelism in HF transformer #3459

gavrishp commented Feb 20, 2024

yuzisun Mar 3, 2024

yuzisun Mar 3, 2024

gavrishp Mar 3, 2024

yuzisun commented Mar 17, 2024

oss-prow-bot bot commented Mar 17, 2024

Fix: Support model parallelism in HF transformer #3459

Fix: Support model parallelism in HF transformer #3459

Conversation

gavrishp commented Feb 20, 2024

yuzisun Mar 3, 2024

Choose a reason for hiding this comment

yuzisun Mar 3, 2024

Choose a reason for hiding this comment

gavrishp Mar 3, 2024

Choose a reason for hiding this comment

yuzisun commented Mar 17, 2024

oss-prow-bot bot commented Mar 17, 2024