Merge sequence steps response #3690

asd981256 · 2024-05-14T12:20:52Z

What this PR does / why we need it:
add field in InferenceStep to let Sequence Node can return informations wanted which are produced in the middle of model chain.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3639

Type of changes
Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Added unit test to verify that specified step response is return alongside final step's reponse
Test B
Logs

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

oss-prow-bot · 2024-05-14T12:20:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: asd981256
Once this PR has been reviewed and has the lgtm label, please assign yuzisun for approval by writing /assign @yuzisun in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: asd981256 <asd981256@gmail.com>

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

… backend (kserve#3657) * Assign device of input tensors Signed-off-by: sailgpu <sailesh.duddupudi@nutanix.com> * lint fix Signed-off-by: sailgpu <sailesh.duddupudi@nutanix.com> --------- Signed-off-by: sailgpu <sailesh.duddupudi@nutanix.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* Test image builds for ARM64 arch in CI Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Update lockfiles Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Add ARM64 support for paddle Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> --------- Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* Encoder-decoder models do not include input tokens in their output Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Pass stopping criteria into streamer Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> --------- Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

…3615) * Added the field AdditionalIngressDomains into the struct IngressConfig Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Added the additional ingress domains into the hosts Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Fixed the indentation Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Added isvc name and namespace into the domain name * Added the validation for the URLs Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Validate the domain in the additionalIngressDomains Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Create the hosts from the list of additionalIngressDomains Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Change the way to validate the host Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Change the validation error message Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Revert the name to url Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Get all the available domain list Signed-off-by: Vincent Hou <shou73@bloomberg.net> * gofmt -s -w the file Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Add additionalIngressDomains into the charts Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Added the comments and refactor the tests Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Regenerate the manifests Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Modify createHTTPMatchRequest, the charts and the test cases Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Run make generate Signed-off-by: Vincent Hou <shou73@bloomberg.net> --------- Signed-off-by: Vincent Hou <shou73@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

* Add fall back pad token for tokenizer Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Make linter happy Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Update test Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Rebase master Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> --------- Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Fix quick install does not cleansup Istio installer Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: asd981256 <asd981256@gmail.com>

…rver (kserve#3621) * Add model to proxy requests to an OpenAI-enabled predictor Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Set default timeout Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add error handling Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add missing licenses Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> --------- Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

* Add headers to predictor exception logging Signed-off-by: grandbora <grandbora@fb.com> * Log request id only Signed-off-by: grandbora <grandbora@fb.com> * Update log Signed-off-by: grandbora <grandbora@fb.com> --------- Signed-off-by: grandbora <grandbora@fb.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* workflow file for cherry-pick on comment Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * updated release notes and workflow Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * Remove obsolete cherry pick workflow Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* Enhance controller setup based on available CRDs This enhances the setup of the InferenceService controller and the InferenceGraph controller. Instead of relying on the `defaultDeploymentMode` configuration to determine what CRDs to watch, the setup now checks whether KNative Services and Istio VirtualServices are available in the cluster and setup the watches (invoke `Owns`) accordingly. This enhancement has the following advantages: * A crashloop is prevented if the CRDs are missing in the cluster. The user would still be able to create InferenceServices by taking care of annotating the ISVC for RawDeployment mode. * If RawDeployment mode is configured as the default mode, the controllers would still watch for KNative and Istio resources if these components are available. This will let the controller watch for changes for the dependent resources if the user uses Serverless mode for some of the InferenceServices. * In the InferenceService controller, the watch for the VirtualServices is still conditioned to the value of the `disableVirtualHost` configuration. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> * Controller setup - add schemas based on CRDs available Since KServe controllers are modified to watch resources based on available CRDs, a similar change in the setup of the manager is needed: schemas need to be added to the manager based on available CRDs rather than based only on the values in the inferenceservice-config ConfigMap. This would keep both manager setup and controller setup in sync with regards schemas and watches around the CRDs. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> --------- Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

upgrade vllm version Signed-off-by: Johnu George <johnugeorge109@gmail.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

…Fixes kserve#3452 (kserve#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* Enable dtype for huggingface server Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup merge :) Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update poetry Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

* fix for extract zip from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * code verification Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * Update README.md Signed-off-by: Alexa Griffith agriffith50@bloomberg.net * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * update vllm Signed-off-by: alexagriffith <agriffith50@bloomberg.net> * Update README.md --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

* fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * simplify Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add missing delete action Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix logic Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: asd981256 <asd981256@gmail.com>

fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

Signed-off-by: asd981256 <asd981256@gmail.com>

oss-prow-bot bot requested review from cmaddalozzo and israel-hdez May 14, 2024 12:20

asd981256 and others added 26 commits May 14, 2024 12:32

beMerged field in crd, BeMerged field in struct, router behavior

d7cdd1e

Signed-off-by: asd981256 <asd981256@gmail.com>

add test func

af4b42d

Signed-off-by: asd981256 <asd981256@gmail.com>

go mod vendor, make generate, make test

46658e2

Signed-off-by: asd981256 <asd981256@gmail.com>

Remove generate endpoints (kserve#3654)

3d1f5a3

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

Fix quick install does not cleans up Istio installer (kserve#3660)

3138e5a

Fix quick install does not cleansup Istio installer Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Bump version to 0.13.0-rc0 (kserve#3665)

dcb71d8

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

upgrade vllm/transformers version (kserve#3671)

dc49c86

upgrade vllm version Signed-off-by: Johnu George <johnugeorge109@gmail.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Add openai models endpoint (kserve#3666)

4437792

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

Add method for checking model health/readiness (kserve#3673)

81cef35

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

Update Dockerfile and Readme (kserve#3676)

73bad9a

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: asd981256 <asd981256@gmail.com>

Fix: huggingface runtime in helm chart (kserve#3679)

f47bb9a

fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: asd981256 <asd981256@gmail.com>

rename field to Response, re-generate code

9684f3d

Signed-off-by: asd981256 <asd981256@gmail.com>

asd981256 force-pushed the merge-sequence-steps branch from 33a0b27 to 9684f3d Compare May 14, 2024 12:33

asd981256 marked this pull request as draft May 14, 2024 12:51

oss-prow-bot bot added the do-not-merge/work-in-progress label May 14, 2024

asd981256 closed this May 14, 2024

asd981256 deleted the merge-sequence-steps branch May 14, 2024 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge sequence steps response #3690

Merge sequence steps response #3690

asd981256 commented May 14, 2024

oss-prow-bot bot commented May 14, 2024

Merge sequence steps response #3690

Merge sequence steps response #3690

Conversation

asd981256 commented May 14, 2024

oss-prow-bot bot commented May 14, 2024