Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAI API support to Huggingfaceserver #3582

Merged
merged 19 commits into from
Apr 25, 2024

Conversation

cmaddalozzo
Copy link
Contributor

@cmaddalozzo cmaddalozzo commented Apr 8, 2024

What this PR does / why we need it:
This PR adds support for the OpenAI completion and chat completion endpoints to the HuggingfaceServer.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3419, #3580

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

Support OpenAi completion and chat completion endpoints in huggingfaceserver.

@cmaddalozzo cmaddalozzo force-pushed the huggingfaceserver-openai branch 6 times, most recently from 68ecc53 to b8a7ed8 Compare April 8, 2024 19:03
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commits seem a bit messy since it's based on #3477

Would you like to clean it up?

kwargs=vars(args),
)
engine_args = build_vllm_engine_args(args)
model = VLLMModel(args.model_name, engine_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious if the film load fails here, should we fall back to HF?

@cmaddalozzo cmaddalozzo force-pushed the huggingfaceserver-openai branch 2 times, most recently from f5921a2 to 503aefe Compare April 22, 2024 15:29
@cmaddalozzo cmaddalozzo changed the title Huggingfaceserver openai Add OpenAI API support to Huggingfaceserver Apr 22, 2024
@cmaddalozzo

This comment was marked as outdated.

@spolti
Copy link
Contributor

spolti commented Apr 23, 2024

For now, you might need to rebase, @yuzisun has merged a PR yesterday, I guess, to pin ray version to 2.10 to avoid this issue for now.

self._tokenizer = AutoTokenizer.from_pretrained(
str(model_id_or_path),
revision=self.tokenizer_revision,
do_lower_case=self.do_lower_case,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you once verify the tokenizer args? Tokenizer also has device_map setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time finding any reference to device_map in the HF transformers code. There's also no mention of tokenizers supporting device_map in the docs. This comment suggests it's not needed/supported: huggingface/transformers#16359 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this again tomorrow. Btw, model used was gemma-2b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also can not find anything from gemma-2b

tessapham and others added 5 commits April 24, 2024 16:01
Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

more components for OpenAI endpoints

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

add OpenAI endpoints to router

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

modify generate() in data plane

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

class OpenAIModel

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

delete and rename files

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

add create_chat_completion() to OpenAIModel

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>

update routers and lint

Signed-off-by: Tessa Pham <hpham111@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Fix tests.

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Pass loop as argument to the background request handler.

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
@@ -351,7 +350,7 @@ async def generate(

Args:
model_name (str): Model name.
request (bytes|GenerateRequest): Generate Request body data.
request (bytes|GenerateRequest): Generate Request / ChatCompletion Request body data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generate function still used, I think it uses the openai data plane now right?

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Don't try to load table question answering models as they are not
supported.

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
@yuzisun
Copy link
Member

yuzisun commented Apr 25, 2024

/lgtm
/approve

Copy link

oss-prow-bot bot commented Apr 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cmaddalozzo, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuzisun yuzisun merged commit a9d747e into kserve:master Apr 25, 2024
56 of 57 checks passed
cmaddalozzo added a commit to cmaddalozzo/kserve that referenced this pull request Apr 26, 2024
* master:
  Add OpenAI API support to Huggingfaceserver (kserve#3582)
  Allow rerunning failed workflows by comment (kserve#3550)
  Fix CVE-2023-45288 for qpext (kserve#3618)
  chore: v0.12.1 install files (kserve#3619)
  build: Fix CRD copying in generate-install.sh (kserve#3620)
  Fix Pydantic 2 warnings (kserve#3622)
  Fix make deploy-dev-storage-initializer not working (kserve#3617)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support OpenAI Schema for KServe LLM runtime
8 participants