Merge changes from master to release-0.13 branch (#3698)

* upgrade vllm/transformers version (#3671) upgrade vllm version Signed-off-by: Johnu George <johnugeorge109@gmail.com> * Add openai models endpoint (#3666) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Enable dtype support for huggingface server (#3613) * Enable dtype for huggingface server Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup merge :) Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update poetry Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add method for checking model health/readiness (#3673) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * fix for extract zip from gcs (#3510) * fix for extract zip from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * code verification Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * Update Dockerfile and Readme (#3676) Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update huggingface readme (#3678) * update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * Update README.md Signed-off-by: Alexa Griffith agriffith50@bloomberg.net * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * update vllm Signed-off-by: alexagriffith <agriffith50@bloomberg.net> * Update README.md --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> * fix: HPA equality check should include annotations (#3650) * fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * simplify Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add missing delete action Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix logic Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Fix: huggingface runtime in helm chart (#3679) fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix: model id and model dir check order (#3680) * fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Allow model_dir to be specified on template Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Default model_dir to /mnt/models for HF Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Lint format Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix:vLLM Model Supported check throwing circular dependency (#3688) * Fix:vLLM Model Supported check throwing circular dependency Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix return case Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix to check all arch in model config forr vllm support Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fixlint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Fix: Allow null in Finish reason streaming response in vLLM (#3684) Fix: allow null in Finish reason Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Johnu George <johnugeorge109@gmail.com> Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Curtis Maddalozzo <cmaddalozzo@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Datta Nimmaturi <39181234+Datta0@users.noreply.github.com> Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com> Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Co-authored-by: Alexa Griffith <agriffith50@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
kserve · May 18, 2024 · 16d391b · 16d391b
1 parent bfc2e21
commit 16d391b
Show file tree

Hide file tree

Showing 57 changed files with 2,360 additions and 1,096 deletions.
diff --git a/charts/kserve-resources/templates/clusterservingruntimes.yaml b/charts/kserve-resources/templates/clusterservingruntimes.yaml
@@ -359,13 +359,13 @@ spec:
       autoSelect: true
       priority: 1
   protocolVersions:
+    - v1
     - v2
   containers:
     - name: kserve-container
       image: "{{ .Values.kserve.servingruntime.huggingfaceserver.image }}:{{ .Values.kserve.servingruntime.huggingfaceserver.tag }}"
       args:
-        - --model_id={{.Name}}
-        - --model_dir=/mnt/models
+        - --model_name={{ .Values.kserve.servingruntime.modelNamePlaceholder }}
       resources:
         requests:
           cpu: "1"

diff --git a/config/crd/full/serving.kserve.io_inferenceservices.yaml b/config/crd/full/serving.kserve.io_inferenceservices.yaml
@@ -1688,6 +1688,24 @@ spec:
                           - name
                         type: object
                       type: array
+                    deploymentStrategy:
+                      properties:
+                        rollingUpdate:
+                          properties:
+                            maxSurge:
+                              anyOf:
+                                - type: integer
+                                - type: string
+                              x-kubernetes-int-or-string: true
+                            maxUnavailable:
+                              anyOf:
+                                - type: integer
+                                - type: string
+                              x-kubernetes-int-or-string: true
+                          type: object
+                        type:
+                          type: string
+                      type: object
                     dnsConfig:
                       properties:
                         nameservers:
@@ -4333,6 +4351,24 @@ spec:
                           - name
                         type: object
                       type: array
+                    deploymentStrategy:
+                      properties:
+                        rollingUpdate:
+                          properties:
+                            maxSurge:
+                              anyOf:
+                                - type: integer
+                                - type: string
+                              x-kubernetes-int-or-string: true
+                            maxUnavailable:
+                              anyOf:
+                                - type: integer
+                                - type: string
+                              x-kubernetes-int-or-string: true
+                          type: object
+                        type:
+                          type: string
+                      type: object
                     dnsConfig:
                       properties:
                         nameservers:
@@ -13886,6 +13922,24 @@ spec:
                           - name
                         type: object
                       type: array
+                    deploymentStrategy:
+                      properties:
+                        rollingUpdate:
+                          properties:
+                            maxSurge:
+                              anyOf:
+                                - type: integer
+                                - type: string
+                              x-kubernetes-int-or-string: true
+                            maxUnavailable:
+                              anyOf:
+                                - type: integer
+                                - type: string
+                              x-kubernetes-int-or-string: true
+                          type: object
+                        type:
+                          type: string
+                      type: object
                     dnsConfig:
                       properties:
                         nameservers:

diff --git a/docs/samples/client/kfserving_sdk_v1beta1_sample.ipynb b/docs/samples/client/kfserving_sdk_v1beta1_sample.ipynb
@@ -22,7 +22,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from kubernetes import client \n",
+    "from kubernetes import client\n",
     "from kfserving import KFServingClient\n",
     "from kfserving import constants\n",
     "from kfserving import utils\n",
@@ -45,8 +45,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#namespace = utils.get_default_target_namespace()\n",
-    "namespace = 'kfserving-test'"
+    "# namespace = utils.get_default_target_namespace()\n",
+    "namespace = \"kfserving-test\""
    ]
   },
   {
@@ -69,16 +69,21 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "api_version = constants.KFSERVING_GROUP + '/' + kfserving_version\n",
+    "api_version = constants.KFSERVING_GROUP + \"/\" + kfserving_version\n",
     "\n",
-    "isvc = V1beta1InferenceService(api_version=api_version,\n",
-    "                               kind=constants.KFSERVING_KIND,\n",
-    "                               metadata=client.V1ObjectMeta(\n",
-    "                                   name='flower-sample', namespace=namespace),\n",
-    "                               spec=V1beta1InferenceServiceSpec(\n",
-    "                               predictor=V1beta1PredictorSpec(\n",
-    "                               tensorflow=(V1beta1TFServingSpec(\n",
-    "                                   storage_uri='gs://kfserving-examples/models/tensorflow/flowers'))))\n",
+    "isvc = V1beta1InferenceService(\n",
+    "    api_version=api_version,\n",
+    "    kind=constants.KFSERVING_KIND,\n",
+    "    metadata=client.V1ObjectMeta(name=\"flower-sample\", namespace=namespace),\n",
+    "    spec=V1beta1InferenceServiceSpec(\n",
+    "        predictor=V1beta1PredictorSpec(\n",
+    "            tensorflow=(\n",
+    "                V1beta1TFServingSpec(\n",
+    "                    storage_uri=\"gs://kfserving-examples/models/tensorflow/flowers\"\n",
+    "                )\n",
+    "            )\n",
+    "        )\n",
+    "    ),\n",
     ")"
    ]
   },
@@ -152,7 +157,7 @@
     }
    ],
    "source": [
-    "KFServing.get('flower-sample', namespace=namespace, watch=True, timeout_seconds=120)"
+    "KFServing.get(\"flower-sample\", namespace=namespace, watch=True, timeout_seconds=120)"
    ]
   },
   {
@@ -223,18 +228,23 @@
     }
    ],
    "source": [
-    "isvc = V1beta1InferenceService(api_version=api_version,\n",
-    "                               kind=constants.KFSERVING_KIND,\n",
-    "                               metadata=client.V1ObjectMeta(\n",
-    "                                   name='flower-sample', namespace=namespace),\n",
-    "                               spec=V1beta1InferenceServiceSpec(\n",
-    "                               predictor=V1beta1PredictorSpec(\n",
-    "                                   canary_traffic_percent=20,\n",
-    "                                   tensorflow=(V1beta1TFServingSpec(\n",
-    "                                       storage_uri='gs://kfserving-examples/models/tensorflow/flowers-2'))))\n",
+    "isvc = V1beta1InferenceService(\n",
+    "    api_version=api_version,\n",
+    "    kind=constants.KFSERVING_KIND,\n",
+    "    metadata=client.V1ObjectMeta(name=\"flower-sample\", namespace=namespace),\n",
+    "    spec=V1beta1InferenceServiceSpec(\n",
+    "        predictor=V1beta1PredictorSpec(\n",
+    "            canary_traffic_percent=20,\n",
+    "            tensorflow=(\n",
+    "                V1beta1TFServingSpec(\n",
+    "                    storage_uri=\"gs://kfserving-examples/models/tensorflow/flowers-2\"\n",
+    "                )\n",
+    "            ),\n",
+    "        )\n",
+    "    ),\n",
     ")\n",
     "\n",
-    "KFServing.patch('flower-sample', isvc, namespace=namespace)"
+    "KFServing.patch(\"flower-sample\", isvc, namespace=namespace)"
    ]
   },
   {
@@ -250,7 +260,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "KFServing.wait_isvc_ready('flower-sample', namespace=namespace)"
+    "KFServing.wait_isvc_ready(\"flower-sample\", namespace=namespace)"
    ]
   },
   {
@@ -268,7 +278,7 @@
     }
    ],
    "source": [
-    "KFServing.get('flower-sample', namespace=namespace, watch=True)"
+    "KFServing.get(\"flower-sample\", namespace=namespace, watch=True)"
    ]
   },
   {
@@ -313,7 +323,7 @@
     }
    ],
    "source": [
-    "KFServing.delete('flower-sample', namespace=namespace)"
+    "KFServing.delete(\"flower-sample\", namespace=namespace)"
    ]
   },
   {