Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for vllm as alternative backend #3415

Merged
merged 18 commits into from Feb 18, 2024

Conversation

gavrishp
Copy link
Contributor

@gavrishp gavrishp commented Feb 7, 2024

What this PR does / why we need it:
Support vllm as alternative backend for text generation use-cases

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Have not added tests as vllm expects gpu driver as prerequiste

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:


@gavrishp gavrishp changed the title Initial commit to support vllm as alternative backend [WIP] Initial commit to support vllm as alternative backend Feb 7, 2024
@gavrishp gavrishp changed the title [WIP] Initial commit to support vllm as alternative backend [WIP] Implement support for vllm as alternative backend Feb 7, 2024
@gavrishp gavrishp changed the title [WIP] Implement support for vllm as alternative backend Implement support for vllm as alternative backend Feb 7, 2024
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in draft?

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
parser.add_argument('--tensor_input_names', type=list_of_strings, default=None,
help='the tensor input names passed to the model')
parser.add_argument('--task', required=False, help="The ML task name")
parser.add_argument('--disable_vllm', action='store_true', help="Do not use vllm as the default runtime")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing default here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on/off flag argument not an option that takes value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the default is true? I think we should default to false unless user wants to explicitly turn it off and let it fallback to huggingface API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action='store_true' provide capability to set bool value if the flag is set. In this case if --disable_vllm is set, the value of arg.disable_vllm will be true and in all other cases it is false

https://docs.python.org/3/library/argparse.html#action

The experience is if --disable_vllm is set only then vLLM will not be used explicitly. if this flag is not set then vLLM to be used, with huggingface API as fallback

Comment on lines 60 to 61
if args.model != args.model_id: # vllm sets default model
args.model = args.model_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just assign directly to model_id without checking

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Comment on lines +38 to +40
parser.add_argument('--disable_lower_case', action='store_true',
help='do not use lower case for the tokenizer')
parser.add_argument('--disable_special_tokens', action='store_true',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reference where these should be default to be disabled?

Copy link
Contributor Author

@gavrishp gavrishp Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in latest python versions parser.add_argument(type=bool, ..) for type bool() does not work as expected. It sets values as string.

https://docs.python.org/3/library/argparse.html#type
https://stackoverflow.com/questions/60999816/argparse-not-parsing-boolean-arguments
https://docs.python.org/3/library/argparse.html#action

To support it I added the same functionality by inverting as expectation is to have add_special_tokens as true by default. Hence unless --disable_special_tokens is set add_special_tokens will be True. Similarly for vllm and lower_case

return model_cls

def load(self, engine_args=None) -> bool:
if self.use_vllm and self.device == torch.device("cuda"): # vllm needs gpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the use_vllm if we can determine the list of models architecture vllm can support?

Copy link
Contributor Author

@gavrishp gavrishp Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought going this route to set whether to use vLLM instead of using the global variable _vllm
https://github.com/kserve/kserve/pull/3415/files#diff-d02c376ad82bad8a11ceaecb4ea815822123e34b961f29fd5050225af82e62b4R80
self.use_vllm = not kwargs.get('disable_vllm', False) if _vllm else False

This is a pre-check if someone explicitly disables vLLM or does not have vLLM package in local.

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@terrytangyuan
Copy link
Member

HuggingFace out-of-box support has been added in case you'd like to test it together. @gavrishp #3395

@gavrishp
Copy link
Contributor Author

HuggingFace out-of-box support has been added in case you'd like to test it together. @gavrishp #3395

That's great. I'll give it a try

@@ -92,6 +93,7 @@ spec:
- --model_id=bert-base-uncased
- --predictor_protocol=v2
- --tensor_input_names=input_ids
- --disable_vllm
Copy link
Member

@yuzisun yuzisun Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably we do not want to confuse user that they have to set this config to get Bert to work. I think a better example would be the case that vllm is by default and show an example that it can be disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a disable vllm example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just trying out the new huggingface runtime and notice this. I feel like a bit werid to have vllm as default runtime (as the PR mentioned vllm as alternative backend).

how about --backend=vllm, having huggingface as default and vllm as alternative and future can support other backend like tensorRT

what do you think? @yuzisun

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
@yuzisun
Copy link
Member

yuzisun commented Feb 18, 2024

Great work on this @gavrishp !

/lgtm
/approve

Copy link

oss-prow-bot bot commented Feb 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@oss-prow-bot oss-prow-bot bot merged commit 1433e95 into kserve:master Feb 18, 2024
60 checks passed
TimKleinloog pushed a commit to TimKleinloog/kserve that referenced this pull request Feb 19, 2024
* Initial commit to support vllm as alternative backend

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* include minor fixes and readme changes

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix poetry lock issues

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix lint issues

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* use_vllm support True as default

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* refactor code and fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* build failure - fix tests and install vllm part of dockerfile

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix poetry lock issue

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* include string constants

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* linting fix

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix tests

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* add support in vllm  for locally downloaded models

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update Readme

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update Readme

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update python/huggingfaceserver/README.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>
TimKleinloog pushed a commit to TimKleinloog/kserve that referenced this pull request Feb 20, 2024
* Initial commit to support vllm as alternative backend

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* include minor fixes and readme changes

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix poetry lock issues

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix lint issues

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* use_vllm support True as default

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* refactor code and fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* build failure - fix tests and install vllm part of dockerfile

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix poetry lock issue

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* include string constants

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* linting fix

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix tests

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* add support in vllm  for locally downloaded models

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update Readme

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update Readme

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update python/huggingfaceserver/README.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>
timothyjlaurent pushed a commit to timothyjlaurent/kserve that referenced this pull request Feb 21, 2024
* Initial commit to support vllm as alternative backend

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* include minor fixes and readme changes

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix poetry lock issues

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix lint issues

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* use_vllm support True as default

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* refactor code and fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* build failure - fix tests and install vllm part of dockerfile

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix poetry lock issue

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* include string constants

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* linting fix

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix tests

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix review comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* add support in vllm  for locally downloaded models

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update Readme

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update Readme

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update python/huggingfaceserver/README.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants