Ability to change parameters of of the default `Deployment` created from `ServingRuntime` and `InferenceService` #3452

averyngo34 · 2024-02-19T15:09:10Z

/kind feature

Describe the solution you'd like
I'd like to have the ability to modify some of the Deployment.spec for the predictor deployment created from ServingRuntime and InferenceService. These parameters include strategy for adjusting rolling update strategy, template.spec.securityContext for adjusting the security context to name a few. The ideal solution would be to override any Deployment.spec inside the ServingRuntime and InferenceService.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Currently, the deployment created by ServingRuntime and InferenceService use the default Kubernetes deployment values.

Links to the design documents:
[Optional, start with the short-form RFC template to outline your ideas and get early feedback.]
[Required, use the longer-form design doc template to specify and discuss your design in more detail]

The text was updated successfully, but these errors were encountered:

terrytangyuan · 2024-02-19T15:43:11Z

/assign

yuzisun · 2024-03-30T23:46:22Z

I think we need to think about how we can unify these fields with serverless mode, I am not sure if we want to simply add the deployment.spec.

in serverless mode, progressDeadlineSeconds is specified as a knative annotation and terminationGracePeriodSeconds is set according to the timeout field.
enableServiceLinks is a legacy field which I think should not be exposed.
deploymentStrategy is probably only applicable to raw deployment mode.
securityContext should be already defined on the podSpec of inferenceservice

terrytangyuan · 2024-04-09T00:36:01Z

For deployment spec, we'd like to customize the following:

spec.progressDeadlineSeconds: in serverless mode, this can be configured through KNative annotation serving.knative.dev/progress-deadline, I am proposing that we add a new field in ComponentExtensionSpec.ProgressDeadlineSeconds that can be used for both modes and document the precedence over the KNative annotation under serverless mode.
spec.deploymentStrategy: expose this through a new field ComponentExtensionSpec.RawDeploymentStrategy.

For deployment template spec, we'd like to customize the following:

spec.template.spec.terminationGracePeriodSeconds: can be set via the timeout field.
spec.template.spec.enableServiceLinks: @averyngo34 this is a legacy field. Is it really necessary to customize it?
spec.template.spec.securityContext: can already be customized in podSpec of inferenceservice

In short, we will only introduce two new fields:

ComponentExtensionSpec.ProgressDeadlineSeconds
ComponentExtensionSpec.RawDeploymentStrategy

WDYT? Anything I missed here? @averyngo34 @yuzisun

averyngo34 · 2024-04-09T13:55:47Z

I agree with the approach you suggested. Thank you very much

terrytangyuan · 2024-04-10T13:59:27Z

Update: ComponentExtensionSpec.ProgressDeadlineSeconds is no longer needed by @averyngo34. Some known issues for reference kubernetes/kubernetes#106697, knative/serving#14835, knative/serving#14835 (comment)

terrytangyuan · 2024-04-10T16:38:14Z

We discussed in the community meeting today. I'll update my PR to introduce a new field ComponentExtensionSpec.DeploymentStrategy.

…Fixes #3452 (#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

…Fixes kserve#3452 (kserve#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

[RHOAIENG-3375][Cherry-pick] feat: Support customizable deployment strategy for RawDeployment mode. Fixes kserve#3452 (kserve#3603)

…Fixes kserve#3452 (kserve#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: asd981256 <asd981256@gmail.com>

* upgrade vllm/transformers version (#3671) upgrade vllm version Signed-off-by: Johnu George <johnugeorge109@gmail.com> * Add openai models endpoint (#3666) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Enable dtype support for huggingface server (#3613) * Enable dtype for huggingface server Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup merge :) Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update poetry Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add method for checking model health/readiness (#3673) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * fix for extract zip from gcs (#3510) * fix for extract zip from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * code verification Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * Update Dockerfile and Readme (#3676) Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update huggingface readme (#3678) * update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * Update README.md Signed-off-by: Alexa Griffith agriffith50@bloomberg.net * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * update vllm Signed-off-by: alexagriffith <agriffith50@bloomberg.net> * Update README.md --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> * fix: HPA equality check should include annotations (#3650) * fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * simplify Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add missing delete action Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix logic Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Fix: huggingface runtime in helm chart (#3679) fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix: model id and model dir check order (#3680) * fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Allow model_dir to be specified on template Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Default model_dir to /mnt/models for HF Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Lint format Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix:vLLM Model Supported check throwing circular dependency (#3688) * Fix:vLLM Model Supported check throwing circular dependency Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix return case Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix to check all arch in model config forr vllm support Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fixlint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Fix: Allow null in Finish reason streaming response in vLLM (#3684) Fix: allow null in Finish reason Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Johnu George <johnugeorge109@gmail.com> Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Curtis Maddalozzo <cmaddalozzo@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Datta Nimmaturi <39181234+Datta0@users.noreply.github.com> Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com> Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Co-authored-by: Alexa Griffith <agriffith50@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>

oss-prow-bot bot added the kind/feature label Feb 19, 2024

averyngo34 changed the title ~~Ability to change parameters of of the default Deployment created from ServingRuntime and InferenceService`~~ Ability to change parameters of of the default Deployment created from ServingRuntime and InferenceService Feb 19, 2024

oss-prow-bot bot assigned terrytangyuan Feb 19, 2024

terrytangyuan mentioned this issue Feb 26, 2024

feat: Support customizable deployment spec for RawDeployment mode #3479

Closed

terrytangyuan mentioned this issue Apr 15, 2024

feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 #3603

Merged

yuzisun closed this as completed in #3603 May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to change parameters of of the default `Deployment` created from `ServingRuntime` and `InferenceService` #3452

Ability to change parameters of of the default `Deployment` created from `ServingRuntime` and `InferenceService` #3452

averyngo34 commented Feb 19, 2024

terrytangyuan commented Feb 19, 2024

yuzisun commented Mar 30, 2024

terrytangyuan commented Apr 9, 2024

averyngo34 commented Apr 9, 2024

terrytangyuan commented Apr 10, 2024 •

edited

Loading

terrytangyuan commented Apr 10, 2024

Ability to change parameters of of the default Deployment created from ServingRuntime and InferenceService #3452

Ability to change parameters of of the default Deployment created from ServingRuntime and InferenceService #3452

Comments

averyngo34 commented Feb 19, 2024

terrytangyuan commented Feb 19, 2024

yuzisun commented Mar 30, 2024

terrytangyuan commented Apr 9, 2024

averyngo34 commented Apr 9, 2024

terrytangyuan commented Apr 10, 2024 • edited Loading

terrytangyuan commented Apr 10, 2024

Ability to change parameters of of the default `Deployment` created from `ServingRuntime` and `InferenceService` #3452

Ability to change parameters of of the default `Deployment` created from `ServingRuntime` and `InferenceService` #3452

terrytangyuan commented Apr 10, 2024 •

edited

Loading