Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix:Support Parallelism in vllm runtime #3464

Merged
merged 3 commits into from
Feb 26, 2024

Conversation

gavrishp
Copy link
Contributor

@gavrishp gavrishp commented Feb 21, 2024

What this PR does / why we need it:

Changes to support parallelism in vllm runtime

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:


Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be part of #3459?

@@ -105,6 +105,7 @@ def load(self) -> bool:
# TODO Read the mapping file, index to object name
if self.use_vllm and self.device == torch.device("cuda"): # vllm needs gpu
if self.infer_vllm_supported_from_model_architecture(model_id_or_path):
self.vllm_engine_args.tensor_parallel_size = torch.cuda.device_count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set to device count only when tensor_parallel_size is not passed in

@yuzisun
Copy link
Member

yuzisun commented Feb 26, 2024

/lgtm
/approve

Copy link

oss-prow-bot bot commented Feb 26, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@oss-prow-bot oss-prow-bot bot merged commit 3b15ef5 into kserve:master Feb 26, 2024
60 checks passed
TimKleinloog pushed a commit to TimKleinloog/kserve that referenced this pull request Feb 28, 2024
* set tensoor parallel to num of devices

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix lint

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

---------

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
TimKleinloog added a commit to TimKleinloog/kserve that referenced this pull request Apr 2, 2024
* Add support for pluggable explainer runtimes

Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>

* Fix azure workload identity federation by excluding azure client secret (kserve#3390)

* Fix azure workload identity federation by excluding azure client secret

Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml>

* comment code

Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml>

---------

Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml>
Co-authored-by: Robbert van der Gugten <rvandergugten@deeploy.ml>

* Revert a wrong commit for self.ready

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Publish v0.12.0 release (kserve#3458)

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Change `certificate` to `ca_bundle` in json style of s3 storageSecret  (kserve#3463)

* change certificate field for s3 storageSecret in python

Signed-off-by: jooho <jlee@redhat.com>

* change the field name to ca_bundle that is the same pattern with aws

Signed-off-by: jooho <jlee@redhat.com>

---------

Signed-off-by: jooho <jlee@redhat.com>

* fix: Add 'model_version' to InferResponse in python library (kserve#3466)

* Add model_version to InferResponse class

Signed-off-by: Adam Stewart <adam.stewart73@gmail.com>

* Update infer_type tests

Signed-off-by: Adam Stewart <adam.stewart73@gmail.com>

* Updated from_grpc class method argument ordering

Signed-off-by: Adam Stewart <adam.stewart73@gmail.com>

---------

Signed-off-by: Adam Stewart <adam.stewart73@gmail.com>

* Fix:Support Parallelism in vllm runtime (kserve#3464)

* set tensoor parallel to num of devices

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix lint

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

---------

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Enhance CI environment (kserve#3440)

* Increase workers to 4

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* merge fast and slow

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Increase workers to 6 and merge explainer, transformer test

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Separate graph tests

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Change artifact upload/download action

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Use docker-container driver

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Update all e2e tests

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Remove knative installation in raw tests

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Fix raw logger test

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Wait for model to be ready in mlserver e2e tests

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Cleanup markers

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Clean up workflow and upgrade actions

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Use worksteal dist type for pytest

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* fix typo

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Prettify pod log printing in status check

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Add inference graph pod logs for debugging in status check

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Reorder Go docker files for better layer caching

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* rebase master

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* revert removal of time.sleep

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

---------

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Fixed go lint error using golangci-lint tool. (kserve#3378)

* Fixed go lint error using golangci-lint tool.

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* go lint error fix

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* golint error fix for math/random

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* fix go tests and go-lint

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* commit for code review comments

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* fix for golint errcheck

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* gosec error fix

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

---------

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Add support for pluggable explainer runtimes

Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>

* Add support for runtimes for explainers

* Add Deeploy explainability sample

* Add predictor_host arg to Deeploy Shap ClusterServingRuntime

* Improvements after peer review

* Move new runtime to correct place in codebase

---------

Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>
Signed-off-by: Robbert van der Gugten <rvandergugten@deeploy.ml>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: jooho <jlee@redhat.com>
Signed-off-by: Adam Stewart <adam.stewart73@gmail.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
Co-authored-by: Robbert van der Gugten <robbertvdg@gmail.com>
Co-authored-by: Robbert van der Gugten <rvandergugten@deeploy.ml>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com>
Co-authored-by: Jooho Lee <jlee@redhat.com>
Co-authored-by: Adam Stewart <ajstewart@users.noreply.github.com>
Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants