initial changes for IG raw deployment mode

Signed-off-by: Mopuri, Bharath <bharath_mopuri@intuit.com>
kserve · Feb 3, 2024 · 9b5b762 · 9b5b762
1 parent fc1ba3e
commit 9b5b762
Show file tree

Hide file tree

Showing 5 changed files with 148 additions and 0 deletions.
diff --git a/docs/admin/kubernetes_deployment.md b/docs/admin/kubernetes_deployment.md
@@ -1,6 +1,9 @@
 # Kubernetes Deployment Installation Guide
 KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.
 
+** Starting with Kserve vx.xx release `InferenceGraph` as well supports `RawDeployment` mode
+See release notes
+
 Kubernetes 1.22 is the minimally required version and please check the following recommended Istio versions for the corresponding
 Kubernetes version.
 

diff --git a/docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md b/docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md
@@ -0,0 +1,88 @@
+# Announcing: KServe vx.xx
+
+We are excited to announce the release of KServe x.xx, in this release we made enhancements to the KServe control plane, especially brining RawDeployment for `InferenceGraph` as well.  Previously `RawDeployment` existed only for `InferenceService` 
+
+Here is a summary of the key changes:
+
+## KServe Core Inference Enhancements
+
+- Inference Graph enhancements for supporting `RawDeployment` along with Auto Scaling configuration right within the `InferenceGraphSpec` 
+
+IG `RawDeployment` makes the deployment light weight using native k8s resources. See the comparison below
+
+![Inference graph Knative based deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png)
+
+![Inference graph raw deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png)
+
+AutoScaling configuration fields were introduced to support scaling needs in 
+`RawDeployment` mode.  These fields are optional and when added effective only when this annotation `serving.kserve.io/autoscalerClass` not pointing to `external`
+ see the following example with Auto scaling fields `MinReplicas`, `MaxReplicas`, `ScaleTarget` and `ScaleMetric`:
+
+  ```yaml
+    apiVersion: serving.kserve.io/v1alpha1
+    kind: InferenceGraph
+    metadata:
+      name: graph_with_switch_node
+      annotations:
+        serving.kserve.io/deploymentMode: "RawDeployment"
+    spec:
+      nodes:
+        root:
+          routerType: Sequence
+          steps:
+            - name: "rootStep1"
+              nodeName: node1
+              dependency: Hard
+            - name: "rootStep2"
+              serviceName: {{ success_200_isvc_id }}
+        node1:
+          routerType: Switch
+          steps:
+            - name: "node1Step1"
+              serviceName: {{ error_404_isvc_id }}
+              condition: "[@this].#(decision_picker==ERROR)"
+              dependency: Hard
+      MinReplicas: 5
+      MaxReplicas: 10
+      ScaleTarget: 50
+      ScaleMetric: "cpu"
+  ```
+  For more details please refer to the [issue](https://github.com/kserve/kserve/issues/2454).
+
+- 
+
+### Enhanced Python SDK Dependency Management
+
+- 
+- 
+
+### KServe Python Runtimes Improvements
+- 
+
+### LLM Runtimes
+
+#### TorchServe LLM Runtime
+
+#### vLLM Runtime
+
+## ModelMesh Updates
+
+### Storing Models on Kubernetes Persistent Volumes (PVC)
+
+### Horizontal Pod Autoscaling (HPA)
+
+### Model Metrics, Metrics Dashboard, Payload Event Logging
+
+## What's Changed? :warning:
+
+## Join the community
+
+- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
+- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP))
+- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
+- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!
+
+
+Thanks for all the contributors who have made the commits to 0.11 release!
+
+The KServe Working Group
diff --git a/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png b/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png
diff --git a/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png b/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png
diff --git a/docs/reference/api.md b/docs/reference/api.md
@@ -524,6 +524,63 @@ Kubernetes core/v1.Affinity
 <em>(Optional)</em>
 </td>
 </tr>
+
+<tr>
+<td>
+<code>minReplicas</code><br/>
+<em>
+int
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>maxReplicas</code><br/>
+<em>
+int
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>Maximum number of replicas for autoscaling.</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>scaleTarget</code><br/>
+<em>
+int
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
+concurrency and rps targets are supported by Knative Pod Autoscaler
+(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-targets/">https://knative.dev/docs/serving/autoscaling/autoscaling-targets/</a>).</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>scaleMetric</code><br/>
+<em>
+<a href="#serving.kserve.io/v1beta1.ScaleMetric">
+ScaleMetric
+</a>
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>ScaleMetric defines the scaling metric type watched by autoscaler
+possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
+Knative Pod Autoscaler(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-metrics">https://knative.dev/docs/serving/autoscaling/autoscaling-metrics</a>).</p>
+</td>
+</tr>
+
+
 </tbody>
 </table>
 <h3 id="serving.kserve.io/v1alpha1.InferenceGraphStatus">InferenceGraphStatus