Implement support for vllm as alternative backend #3415

gavrishp · 2024-02-07T09:21:58Z

What this PR does / why we need it:
Support vllm as alternative backend for text generation use-cases

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Have not added tests as vllm expects gpu driver as prerequiste

Logs

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

terrytangyuan

Should this be in draft?

python/huggingfaceserver/huggingfaceserver/model.py

python/huggingfaceserver/pyproject.toml

python/huggingfaceserver/huggingfaceserver/model.py

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

terrytangyuan · 2024-02-09T21:05:40Z

python/huggingfaceserver/huggingfaceserver/__main__.py

 parser.add_argument('--tensor_input_names', type=list_of_strings, default=None,
                    help='the tensor input names passed to the model')
 parser.add_argument('--task', required=False, help="The ML task name")
+parser.add_argument('--disable_vllm', action='store_true', help="Do not use vllm as the default runtime")


Missing default here

This is on/off flag argument not an option that takes value.

Why the default is true? I think we should default to false unless user wants to explicitly turn it off and let it fallback to huggingface API.

action='store_true' provide capability to set bool value if the flag is set. In this case if --disable_vllm is set, the value of arg.disable_vllm will be true and in all other cases it is false

https://docs.python.org/3/library/argparse.html#action

The experience is if --disable_vllm is set only then vLLM will not be used explicitly. if this flag is not set then vLLM to be used, with huggingface API as fallback

terrytangyuan · 2024-02-09T21:06:38Z

python/huggingfaceserver/huggingfaceserver/__main__.py

+        if args.model != args.model_id:   # vllm sets default model
+            args.model = args.model_id


We can just assign directly to model_id without checking

python/huggingfaceserver/huggingfaceserver/model.py

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

yuzisun · 2024-02-12T03:20:14Z

python/huggingfaceserver/huggingfaceserver/__main__.py

+parser.add_argument('--disable_lower_case', action='store_true',
+                    help='do not use lower case for the tokenizer')
+parser.add_argument('--disable_special_tokens', action='store_true',


Is there a reference where these should be default to be disabled?

I noticed in latest python versions parser.add_argument(type=bool, ..) for type bool() does not work as expected. It sets values as string.

https://docs.python.org/3/library/argparse.html#type
https://stackoverflow.com/questions/60999816/argparse-not-parsing-boolean-arguments
https://docs.python.org/3/library/argparse.html#action

To support it I added the same functionality by inverting as expectation is to have add_special_tokens as true by default. Hence unless --disable_special_tokens is set add_special_tokens will be True. Similarly for vllm and lower_case

python/huggingfaceserver/huggingfaceserver/__main__.py

yuzisun · 2024-02-12T03:23:15Z

python/huggingfaceserver/huggingfaceserver/model.py

+        return model_cls
+
+    def load(self, engine_args=None) -> bool:
+        if self.use_vllm and self.device == torch.device("cuda"):   # vllm needs gpu


Do we need the use_vllm if we can determine the list of models architecture vllm can support?

Thought going this route to set whether to use vLLM instead of using the global variable _vllm
https://github.com/kserve/kserve/pull/3415/files#diff-d02c376ad82bad8a11ceaecb4ea815822123e34b961f29fd5050225af82e62b4R80
self.use_vllm = not kwargs.get('disable_vllm', False) if _vllm else False

This is a pre-check if someone explicitly disables vLLM or does not have vLLM package in local.

python/huggingfaceserver/huggingfaceserver/model.py

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

terrytangyuan · 2024-02-12T14:44:01Z

HuggingFace out-of-box support has been added in case you'd like to test it together. @gavrishp #3395

gavrishp · 2024-02-12T14:48:02Z

HuggingFace out-of-box support has been added in case you'd like to test it together. @gavrishp #3395

That's great. I'll give it a try

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

yuzisun · 2024-02-15T13:54:59Z

python/huggingfaceserver/README.md

@@ -92,6 +93,7 @@ spec:
      - --model_id=bert-base-uncased
      - --predictor_protocol=v2
      - --tensor_input_names=input_ids
+      - --disable_vllm


I think we probably we do not want to confuse user that they have to set this config to get Bert to work. I think a better example would be the case that vllm is by default and show an example that it can be disabled.

I have added a disable vllm example

I just trying out the new huggingface runtime and notice this. I feel like a bit werid to have vllm as default runtime (as the PR mentioned vllm as alternative backend).

how about --backend=vllm, having huggingface as default and vllm as alternative and future can support other backend like tensorRT

what do you think? @yuzisun

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

python/huggingfaceserver/README.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

yuzisun · 2024-02-18T00:42:06Z

Great work on this @gavrishp !

/lgtm
/approve

oss-prow-bot · 2024-02-18T00:42:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Initial commit to support vllm as alternative backend Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include minor fixes and readme changes Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * use_vllm support True as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * refactor code and fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * build failure - fix tests and install vllm part of dockerfile Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issue Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include string constants Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * linting fix Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * add support in vllm for locally downloaded models Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update python/huggingfaceserver/README.md Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>

* Initial commit to support vllm as alternative backend Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include minor fixes and readme changes Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * use_vllm support True as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * refactor code and fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * build failure - fix tests and install vllm part of dockerfile Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issue Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include string constants Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * linting fix Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * add support in vllm for locally downloaded models Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update python/huggingfaceserver/README.md Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net>

oss-prow-bot bot requested review from alexagriffith and cmaddalozzo February 7, 2024 09:22

gavrishp changed the title ~~Initial commit to support vllm as alternative backend~~ [WIP] Initial commit to support vllm as alternative backend Feb 7, 2024

oss-prow-bot bot added the do-not-merge/work-in-progress label Feb 7, 2024

gavrishp changed the title ~~[WIP] Initial commit to support vllm as alternative backend~~ [WIP] Implement support for vllm as alternative backend Feb 7, 2024

gavrishp changed the title ~~[WIP] Implement support for vllm as alternative backend~~ Implement support for vllm as alternative backend Feb 7, 2024

oss-prow-bot bot removed the do-not-merge/work-in-progress label Feb 7, 2024

gavrishp force-pushed the vllm_backend branch from 077dd46 to 0d71b61 Compare February 7, 2024 14:33

terrytangyuan reviewed Feb 7, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Show resolved Hide resolved

johnugeorge reviewed Feb 8, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

gavrishp mentioned this pull request Feb 8, 2024

Support vLLM and deepspeed-fastgen for LLM inference #3347

Open

yuzisun reviewed Feb 9, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

yuzisun reviewed Feb 9, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

yuzisun reviewed Feb 9, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

yuzisun reviewed Feb 9, 2024

View reviewed changes

python/huggingfaceserver/pyproject.toml Outdated Show resolved Hide resolved

terrytangyuan reviewed Feb 9, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

gavrishp added 9 commits February 9, 2024 16:08

Initial commit to support vllm as alternative backend

95d2825

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

include minor fixes and readme changes

069d468

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix poetry lock issues

6eb4cab

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix lint issues

a45d068

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

use_vllm support True as default

cb0e530

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

refactor code and fix review comments

fbdbeb6

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

build failure - fix tests and install vllm part of dockerfile

6a7fe52

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix poetry lock issue

deb8748

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

include string constants

9dcf554

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

gavrishp force-pushed the vllm_backend branch from 391b05c to 9dcf554 Compare February 9, 2024 16:11

linting fix

917aa1b

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

terrytangyuan reviewed Feb 9, 2024

View reviewed changes

gavrishp added 2 commits February 10, 2024 13:57

fix review comments

dbba3b5

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

fix tests

eb3ccd4

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

yuzisun reviewed Feb 12, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/__main__.py Outdated Show resolved Hide resolved

yuzisun reviewed Feb 12, 2024

View reviewed changes

python/huggingfaceserver/huggingfaceserver/model.py Outdated Show resolved Hide resolved

fix review comments

c342552

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

gavrishp added 2 commits February 12, 2024 14:53

add support in vllm for locally downloaded models

42e301b

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

Merge branch 'kserve:master' into vllm_backend

f0c7955

yuzisun reviewed Feb 15, 2024

View reviewed changes

gavrishp added 2 commits February 15, 2024 17:11

Update Readme

dc8202d

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

Update Readme

856c943

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

yuzisun reviewed Feb 17, 2024

View reviewed changes

python/huggingfaceserver/README.md Outdated Show resolved Hide resolved

Update python/huggingfaceserver/README.md

ce0923a

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

oss-prow-bot bot assigned yuzisun Feb 18, 2024

oss-prow-bot bot added the lgtm label Feb 18, 2024

oss-prow-bot bot added the approved label Feb 18, 2024

oss-prow-bot bot merged commit 1433e95 into kserve:master Feb 18, 2024
60 checks passed

terrytangyuan mentioned this pull request Mar 1, 2024

docs: Add tutorial on deploying vLLM model with KServe vllm-project/vllm#2586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for vllm as alternative backend #3415

Implement support for vllm as alternative backend #3415

gavrishp commented Feb 7, 2024

terrytangyuan left a comment

terrytangyuan Feb 9, 2024

gavrishp Feb 10, 2024

yuzisun Feb 12, 2024

gavrishp Feb 12, 2024

terrytangyuan Feb 9, 2024

yuzisun Feb 12, 2024

gavrishp Feb 12, 2024 •

edited

yuzisun Feb 12, 2024

gavrishp Feb 12, 2024 •

edited

terrytangyuan commented Feb 12, 2024

gavrishp commented Feb 12, 2024

yuzisun Feb 15, 2024 •

edited

gavrishp Feb 15, 2024

lizzzcai Mar 24, 2024

yuzisun commented Feb 18, 2024

oss-prow-bot bot commented Feb 18, 2024

		if args.model != args.model_id: # vllm sets default model
		args.model = args.model_id

Implement support for vllm as alternative backend #3415

Implement support for vllm as alternative backend #3415

Conversation

gavrishp commented Feb 7, 2024

terrytangyuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gavrishp Feb 12, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gavrishp Feb 12, 2024 • edited

Choose a reason for hiding this comment

terrytangyuan commented Feb 12, 2024

gavrishp commented Feb 12, 2024

yuzisun Feb 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun commented Feb 18, 2024

oss-prow-bot bot commented Feb 18, 2024

gavrishp Feb 12, 2024 •

edited

gavrishp Feb 12, 2024 •

edited

yuzisun Feb 15, 2024 •

edited