New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for vllm as alternative backend #3415
Conversation
077dd46
to
0d71b61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in draft?
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
391b05c
to
9dcf554
Compare
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
parser.add_argument('--tensor_input_names', type=list_of_strings, default=None, | ||
help='the tensor input names passed to the model') | ||
parser.add_argument('--task', required=False, help="The ML task name") | ||
parser.add_argument('--disable_vllm', action='store_true', help="Do not use vllm as the default runtime") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing default here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on/off flag argument not an option that takes value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the default is true? I think we should default to false unless user wants to explicitly turn it off and let it fallback to huggingface API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
action='store_true'
provide capability to set bool value if the flag is set. In this case if --disable_vllm
is set, the value of arg.disable_vllm
will be true and in all other cases it is false
https://docs.python.org/3/library/argparse.html#action
The experience is if --disable_vllm
is set only then vLLM will not be used explicitly. if this flag is not set then vLLM to be used, with huggingface API as fallback
if args.model != args.model_id: # vllm sets default model | ||
args.model = args.model_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just assign directly to model_id without checking
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
parser.add_argument('--disable_lower_case', action='store_true', | ||
help='do not use lower case for the tokenizer') | ||
parser.add_argument('--disable_special_tokens', action='store_true', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference where these should be default to be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed in latest python versions parser.add_argument(type=bool, ..)
for type bool() does not work as expected. It sets values as string.
https://docs.python.org/3/library/argparse.html#type
https://stackoverflow.com/questions/60999816/argparse-not-parsing-boolean-arguments
https://docs.python.org/3/library/argparse.html#action
To support it I added the same functionality by inverting as expectation is to have add_special_tokens
as true by default. Hence unless --disable_special_tokens
is set add_special_tokens will be True. Similarly for vllm and lower_case
return model_cls | ||
|
||
def load(self, engine_args=None) -> bool: | ||
if self.use_vllm and self.device == torch.device("cuda"): # vllm needs gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the use_vllm
if we can determine the list of models architecture vllm can support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought going this route to set whether to use vLLM instead of using the global variable _vllm
https://github.com/kserve/kserve/pull/3415/files#diff-d02c376ad82bad8a11ceaecb4ea815822123e34b961f29fd5050225af82e62b4R80
self.use_vllm = not kwargs.get('disable_vllm', False) if _vllm else False
This is a pre-check if someone explicitly disables vLLM or does not have vLLM package in local.
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
python/huggingfaceserver/README.md
Outdated
@@ -92,6 +93,7 @@ spec: | |||
- --model_id=bert-base-uncased | |||
- --predictor_protocol=v2 | |||
- --tensor_input_names=input_ids | |||
- --disable_vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably we do not want to confuse user that they have to set this config to get Bert to work. I think a better example would be the case that vllm is by default and show an example that it can be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a disable vllm example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just trying out the new huggingface runtime and notice this. I feel like a bit werid to have vllm as default runtime (as the PR mentioned vllm as alternative backend).
how about --backend=vllm
, having huggingface
as default and vllm as alternative and future can support other backend like tensorRT
what do you think? @yuzisun
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Great work on this @gavrishp ! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gavrishp, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Initial commit to support vllm as alternative backend Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include minor fixes and readme changes Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * use_vllm support True as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * refactor code and fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * build failure - fix tests and install vllm part of dockerfile Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issue Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include string constants Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * linting fix Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * add support in vllm for locally downloaded models Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update python/huggingfaceserver/README.md Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>
* Initial commit to support vllm as alternative backend Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include minor fixes and readme changes Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * use_vllm support True as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * refactor code and fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * build failure - fix tests and install vllm part of dockerfile Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issue Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include string constants Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * linting fix Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * add support in vllm for locally downloaded models Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update python/huggingfaceserver/README.md Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Tim Kleinloog <tkleinloog@deeploy.ml>
* Initial commit to support vllm as alternative backend Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include minor fixes and readme changes Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint issues Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * use_vllm support True as default Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * refactor code and fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * build failure - fix tests and install vllm part of dockerfile Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock issue Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * include string constants Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * linting fix Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix review comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * add support in vllm for locally downloaded models Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update python/huggingfaceserver/README.md Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
What this PR does / why we need it:
Support vllm as alternative backend for text generation use-cases
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Have not added tests as vllm expects gpu driver as prerequiste
Special notes for your reviewer:
Checklist:
Release note: