Fix:Support Parallelism in vllm runtime #3464

gavrishp · 2024-02-21T05:26:04Z

What this PR does / why we need it:

Changes to support parallelism in vllm runtime

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Test B
Logs

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

terrytangyuan

Should this be part of #3459?

yuzisun · 2024-02-24T10:41:02Z

python/huggingfaceserver/huggingfaceserver/model.py

@@ -105,6 +105,7 @@ def load(self) -> bool:
            # TODO Read the mapping file, index to object name
        if self.use_vllm and self.device == torch.device("cuda"):   # vllm needs gpu
            if self.infer_vllm_supported_from_model_architecture(model_id_or_path):
+                self.vllm_engine_args.tensor_parallel_size = torch.cuda.device_count()


set to device count only when tensor_parallel_size is not passed in

yuzisun · 2024-02-26T18:56:36Z

/lgtm
/approve

oss-prow-bot · 2024-02-26T18:59:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* set tensoor parallel to num of devices Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Add support for pluggable explainer runtimes Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml> * Fix azure workload identity federation by excluding azure client secret (kserve#3390) * Fix azure workload identity federation by excluding azure client secret Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml> * comment code Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml> --------- Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml> Co-authored-by: Robbert van der Gugten <rvandergugten@deeploy.ml> * Revert a wrong commit for self.ready Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Publish v0.12.0 release (kserve#3458) Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Change `certificate` to `ca_bundle` in json style of s3 storageSecret (kserve#3463) * change certificate field for s3 storageSecret in python Signed-off-by: jooho <jlee@redhat.com> * change the field name to ca_bundle that is the same pattern with aws Signed-off-by: jooho <jlee@redhat.com> --------- Signed-off-by: jooho <jlee@redhat.com> * fix: Add 'model_version' to InferResponse in python library (kserve#3466) * Add model_version to InferResponse class Signed-off-by: Adam Stewart <adam.stewart73@gmail.com> * Update infer_type tests Signed-off-by: Adam Stewart <adam.stewart73@gmail.com> * Updated from_grpc class method argument ordering Signed-off-by: Adam Stewart <adam.stewart73@gmail.com> --------- Signed-off-by: Adam Stewart <adam.stewart73@gmail.com> * Fix:Support Parallelism in vllm runtime (kserve#3464) * set tensoor parallel to num of devices Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Enhance CI environment (kserve#3440) * Increase workers to 4 Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * merge fast and slow Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Increase workers to 6 and merge explainer, transformer test Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Separate graph tests Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Change artifact upload/download action Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Use docker-container driver Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Update all e2e tests Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Remove knative installation in raw tests Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Fix raw logger test Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Wait for model to be ready in mlserver e2e tests Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Cleanup markers Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Clean up workflow and upgrade actions Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Use worksteal dist type for pytest Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * fix typo Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Prettify pod log printing in status check Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Add inference graph pod logs for debugging in status check Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Reorder Go docker files for better layer caching Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * rebase master Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * revert removal of time.sleep Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> --------- Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Fixed go lint error using golangci-lint tool. (kserve#3378) * Fixed go lint error using golangci-lint tool. Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * go lint error fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * golint error fix for math/random Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * fix go tests and go-lint Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * commit for code review comments Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * fix for golint errcheck Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * gosec error fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * Add support for pluggable explainer runtimes Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml> * Add support for runtimes for explainers * Add Deeploy explainability sample * Add predictor_host arg to Deeploy Shap ClusterServingRuntime * Improvements after peer review * Move new runtime to correct place in codebase --------- Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml> Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: jooho <jlee@redhat.com> Signed-off-by: Adam Stewart <adam.stewart73@gmail.com> Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Co-authored-by: Robbert van der Gugten <robbertvdg@gmail.com> Co-authored-by: Robbert van der Gugten <rvandergugten@deeploy.ml> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com> Co-authored-by: Jooho Lee <jlee@redhat.com> Co-authored-by: Adam Stewart <ajstewart@users.noreply.github.com> Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com>

gavrishp added 3 commits February 20, 2024 17:53

set tensoor parallel to num of devices

b16b524

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

Merge branch 'master' into vllm_distributed

a47d943

fix lint

f6fd442

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

oss-prow-bot bot requested review from Iamlovingit and theofpa February 21, 2024 05:26

terrytangyuan reviewed Feb 21, 2024

View reviewed changes

yuzisun reviewed Feb 24, 2024

View reviewed changes

oss-prow-bot bot assigned yuzisun Feb 26, 2024

oss-prow-bot bot added the lgtm label Feb 26, 2024

oss-prow-bot bot added the approved label Feb 26, 2024

oss-prow-bot bot merged commit 3b15ef5 into kserve:master Feb 26, 2024
60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix:Support Parallelism in vllm runtime #3464

Fix:Support Parallelism in vllm runtime #3464

gavrishp commented Feb 21, 2024 •

edited by yuzisun

terrytangyuan left a comment

yuzisun Feb 24, 2024

yuzisun commented Feb 26, 2024

oss-prow-bot bot commented Feb 26, 2024

Fix:Support Parallelism in vllm runtime #3464

Fix:Support Parallelism in vllm runtime #3464

Conversation

gavrishp commented Feb 21, 2024 • edited by yuzisun

terrytangyuan left a comment

Choose a reason for hiding this comment

yuzisun Feb 24, 2024

Choose a reason for hiding this comment

yuzisun commented Feb 26, 2024

oss-prow-bot bot commented Feb 26, 2024

gavrishp commented Feb 21, 2024 •

edited by yuzisun