-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenAI API support to Huggingfaceserver #3582
Conversation
68ecc53
to
b8a7ed8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commits seem a bit messy since it's based on #3477
Would you like to clean it up?
kwargs=vars(args), | ||
) | ||
engine_args = build_vllm_engine_args(args) | ||
model = VLLMModel(args.model_name, engine_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious if the film load fails here, should we fall back to HF?
f5921a2
to
503aefe
Compare
c2976a5
to
dd1e68b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
For now, you might need to rebase, @yuzisun has merged a PR yesterday, I guess, to pin ray version to 2.10 to avoid this issue for now. |
5f8149a
to
7bf13e1
Compare
self._tokenizer = AutoTokenizer.from_pretrained( | ||
str(model_id_or_path), | ||
revision=self.tokenizer_revision, | ||
do_lower_case=self.do_lower_case, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you once verify the tokenizer args? Tokenizer also has device_map setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having a hard time finding any reference to device_map
in the HF transformers code. There's also no mention of tokenizers supporting device_map
in the docs. This comment suggests it's not needed/supported: huggingface/transformers#16359 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try this again tomorrow. Btw, model used was gemma-2b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also can not find anything from gemma-2b
Signed-off-by: Tessa Pham <hpham111@bloomberg.net> more components for OpenAI endpoints Signed-off-by: Tessa Pham <hpham111@bloomberg.net> add OpenAI endpoints to router Signed-off-by: Tessa Pham <hpham111@bloomberg.net> modify generate() in data plane Signed-off-by: Tessa Pham <hpham111@bloomberg.net> class OpenAIModel Signed-off-by: Tessa Pham <hpham111@bloomberg.net> delete and rename files Signed-off-by: Tessa Pham <hpham111@bloomberg.net> add create_chat_completion() to OpenAIModel Signed-off-by: Tessa Pham <hpham111@bloomberg.net> update routers and lint Signed-off-by: Tessa Pham <hpham111@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Fix tests. Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Pass loop as argument to the background request handler. Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
cf76834
to
6f00e87
Compare
@@ -351,7 +350,7 @@ async def generate( | |||
|
|||
Args: | |||
model_name (str): Model name. | |||
request (bytes|GenerateRequest): Generate Request body data. | |||
request (bytes|GenerateRequest): Generate Request / ChatCompletion Request body data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this generate function still used, I think it uses the openai data plane now right?
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
eae67b3
to
b9e37f8
Compare
Don't try to load table question answering models as they are not supported. Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cmaddalozzo, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* master: Add OpenAI API support to Huggingfaceserver (kserve#3582) Allow rerunning failed workflows by comment (kserve#3550) Fix CVE-2023-45288 for qpext (kserve#3618) chore: v0.12.1 install files (kserve#3619) build: Fix CRD copying in generate-install.sh (kserve#3620) Fix Pydantic 2 warnings (kserve#3622) Fix make deploy-dev-storage-initializer not working (kserve#3617)
What this PR does / why we need it:
This PR adds support for the OpenAI completion and chat completion endpoints to the HuggingfaceServer.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3419, #3580
Type of changes
Please delete options that are not relevant.
Checklist:
Release note: