Skip to content

Commit

Permalink
Merge changes from master to release-0.13 branch (#3698)
Browse files Browse the repository at this point in the history
* upgrade vllm/transformers version (#3671)

upgrade vllm version

Signed-off-by: Johnu George <johnugeorge109@gmail.com>

* Add openai models endpoint (#3666)

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>

* feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603)

* feat: Support customizable deployment strategy for RawDeployment mode

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* regen

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* lint

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Correctly apply rollingupdate

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* address comments

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Add validation

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Enable dtype support for huggingface server (#3613)

* Enable dtype for huggingface server

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Set float16 as default. Fixup linter

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Add small comment to make the changes understandable

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Fixup linter

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Adapt to new huggingfacemodel

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Fixup merge :)

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Explicitly mention the behaviour of dtype flag on auto.

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Default to FP32 for encoder models

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Fixup linter

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Update poetry

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Use torch.float32 forr tests explicitly

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

---------

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Add method for checking model health/readiness (#3673)

Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>

* fix for extract zip from gcs (#3510)

* fix for extract zip from gcs

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* initial commit for gcs model download unittests

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* unittests for model download from gcs

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* black format fix

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* code verification

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

---------

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Update Dockerfile and Readme (#3676)

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Update huggingface readme (#3678)

* update wording for huggingface README

small update to make readme easier to understand

Signed-off-by: Alexa Griffith  <agriffith50@bloomberg.net>

* Update README.md

Signed-off-by: Alexa Griffith agriffith50@bloomberg.net

* Update python/huggingfaceserver/README.md

Co-authored-by: Filippe Spolti <filippespolti@gmail.com>
Signed-off-by: Alexa Griffith  <agriffith50@bloomberg.net>

* update vllm

Signed-off-by: alexagriffith <agriffith50@bloomberg.net>

* Update README.md

---------

Signed-off-by: Alexa Griffith  <agriffith50@bloomberg.net>
Signed-off-by: Alexa Griffith agriffith50@bloomberg.net
Signed-off-by: alexagriffith <agriffith50@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Filippe Spolti <filippespolti@gmail.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

* fix: HPA equality check should include annotations (#3650)

* fix: HPA equality check should include annotations

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Only watch related autoscalerclass annotation

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* simplify

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Add missing delete action

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* fix logic

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Fix:  huggingface runtime in helm chart (#3679)

fix huggingface runtime in chart

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Fix: model id and model dir check order (#3680)

* fix huggingface runtime in chart

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Allow model_dir to be specified on template

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Default model_dir to /mnt/models for HF

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Lint format

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Fix:vLLM Model Supported check throwing circular dependency (#3688)

* Fix:vLLM Model Supported check throwing circular dependency

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* remove unwanted comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* remove unwanted comments

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix return case

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fix to check all arch in model config forr vllm support

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* fixlint

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

---------

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

* Fix: Allow null in Finish reason streaming response in vLLM (#3684)

Fix: allow null in Finish reason

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

---------

Signed-off-by: Johnu George <johnugeorge109@gmail.com>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Alexa Griffith  <agriffith50@bloomberg.net>
Signed-off-by: Alexa Griffith agriffith50@bloomberg.net
Signed-off-by: alexagriffith <agriffith50@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Curtis Maddalozzo <cmaddalozzo@users.noreply.github.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Datta Nimmaturi <39181234+Datta0@users.noreply.github.com>
Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com>
Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Co-authored-by: Alexa Griffith <agriffith50@bloomberg.net>
Co-authored-by: Filippe Spolti <filippespolti@gmail.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
  • Loading branch information
9 people committed May 18, 2024
1 parent bfc2e21 commit 16d391b
Show file tree
Hide file tree
Showing 57 changed files with 2,360 additions and 1,096 deletions.
4 changes: 2 additions & 2 deletions charts/kserve-resources/templates/clusterservingruntimes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -359,13 +359,13 @@ spec:
autoSelect: true
priority: 1
protocolVersions:
- v1
- v2
containers:
- name: kserve-container
image: "{{ .Values.kserve.servingruntime.huggingfaceserver.image }}:{{ .Values.kserve.servingruntime.huggingfaceserver.tag }}"
args:
- --model_id={{.Name}}
- --model_dir=/mnt/models
- --model_name={{ .Values.kserve.servingruntime.modelNamePlaceholder }}
resources:
requests:
cpu: "1"
Expand Down
54 changes: 54 additions & 0 deletions config/crd/full/serving.kserve.io_inferenceservices.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1688,6 +1688,24 @@ spec:
- name
type: object
type: array
deploymentStrategy:
properties:
rollingUpdate:
properties:
maxSurge:
anyOf:
- type: integer
- type: string
x-kubernetes-int-or-string: true
maxUnavailable:
anyOf:
- type: integer
- type: string
x-kubernetes-int-or-string: true
type: object
type:
type: string
type: object
dnsConfig:
properties:
nameservers:
Expand Down Expand Up @@ -4333,6 +4351,24 @@ spec:
- name
type: object
type: array
deploymentStrategy:
properties:
rollingUpdate:
properties:
maxSurge:
anyOf:
- type: integer
- type: string
x-kubernetes-int-or-string: true
maxUnavailable:
anyOf:
- type: integer
- type: string
x-kubernetes-int-or-string: true
type: object
type:
type: string
type: object
dnsConfig:
properties:
nameservers:
Expand Down Expand Up @@ -13886,6 +13922,24 @@ spec:
- name
type: object
type: array
deploymentStrategy:
properties:
rollingUpdate:
properties:
maxSurge:
anyOf:
- type: integer
- type: string
x-kubernetes-int-or-string: true
maxUnavailable:
anyOf:
- type: integer
- type: string
x-kubernetes-int-or-string: true
type: object
type:
type: string
type: object
dnsConfig:
properties:
nameservers:
Expand Down
62 changes: 36 additions & 26 deletions docs/samples/client/kfserving_sdk_v1beta1_sample.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"metadata": {},
"outputs": [],
"source": [
"from kubernetes import client \n",
"from kubernetes import client\n",
"from kfserving import KFServingClient\n",
"from kfserving import constants\n",
"from kfserving import utils\n",
Expand All @@ -45,8 +45,8 @@
"metadata": {},
"outputs": [],
"source": [
"#namespace = utils.get_default_target_namespace()\n",
"namespace = 'kfserving-test'"
"# namespace = utils.get_default_target_namespace()\n",
"namespace = \"kfserving-test\""
]
},
{
Expand All @@ -69,16 +69,21 @@
"metadata": {},
"outputs": [],
"source": [
"api_version = constants.KFSERVING_GROUP + '/' + kfserving_version\n",
"api_version = constants.KFSERVING_GROUP + \"/\" + kfserving_version\n",
"\n",
"isvc = V1beta1InferenceService(api_version=api_version,\n",
" kind=constants.KFSERVING_KIND,\n",
" metadata=client.V1ObjectMeta(\n",
" name='flower-sample', namespace=namespace),\n",
" spec=V1beta1InferenceServiceSpec(\n",
" predictor=V1beta1PredictorSpec(\n",
" tensorflow=(V1beta1TFServingSpec(\n",
" storage_uri='gs://kfserving-examples/models/tensorflow/flowers'))))\n",
"isvc = V1beta1InferenceService(\n",
" api_version=api_version,\n",
" kind=constants.KFSERVING_KIND,\n",
" metadata=client.V1ObjectMeta(name=\"flower-sample\", namespace=namespace),\n",
" spec=V1beta1InferenceServiceSpec(\n",
" predictor=V1beta1PredictorSpec(\n",
" tensorflow=(\n",
" V1beta1TFServingSpec(\n",
" storage_uri=\"gs://kfserving-examples/models/tensorflow/flowers\"\n",
" )\n",
" )\n",
" )\n",
" ),\n",
")"
]
},
Expand Down Expand Up @@ -152,7 +157,7 @@
}
],
"source": [
"KFServing.get('flower-sample', namespace=namespace, watch=True, timeout_seconds=120)"
"KFServing.get(\"flower-sample\", namespace=namespace, watch=True, timeout_seconds=120)"
]
},
{
Expand Down Expand Up @@ -223,18 +228,23 @@
}
],
"source": [
"isvc = V1beta1InferenceService(api_version=api_version,\n",
" kind=constants.KFSERVING_KIND,\n",
" metadata=client.V1ObjectMeta(\n",
" name='flower-sample', namespace=namespace),\n",
" spec=V1beta1InferenceServiceSpec(\n",
" predictor=V1beta1PredictorSpec(\n",
" canary_traffic_percent=20,\n",
" tensorflow=(V1beta1TFServingSpec(\n",
" storage_uri='gs://kfserving-examples/models/tensorflow/flowers-2'))))\n",
"isvc = V1beta1InferenceService(\n",
" api_version=api_version,\n",
" kind=constants.KFSERVING_KIND,\n",
" metadata=client.V1ObjectMeta(name=\"flower-sample\", namespace=namespace),\n",
" spec=V1beta1InferenceServiceSpec(\n",
" predictor=V1beta1PredictorSpec(\n",
" canary_traffic_percent=20,\n",
" tensorflow=(\n",
" V1beta1TFServingSpec(\n",
" storage_uri=\"gs://kfserving-examples/models/tensorflow/flowers-2\"\n",
" )\n",
" ),\n",
" )\n",
" ),\n",
")\n",
"\n",
"KFServing.patch('flower-sample', isvc, namespace=namespace)"
"KFServing.patch(\"flower-sample\", isvc, namespace=namespace)"
]
},
{
Expand All @@ -250,7 +260,7 @@
"metadata": {},
"outputs": [],
"source": [
"KFServing.wait_isvc_ready('flower-sample', namespace=namespace)"
"KFServing.wait_isvc_ready(\"flower-sample\", namespace=namespace)"
]
},
{
Expand All @@ -268,7 +278,7 @@
}
],
"source": [
"KFServing.get('flower-sample', namespace=namespace, watch=True)"
"KFServing.get(\"flower-sample\", namespace=namespace, watch=True)"
]
},
{
Expand Down Expand Up @@ -313,7 +323,7 @@
}
],
"source": [
"KFServing.delete('flower-sample', namespace=namespace)"
"KFServing.delete(\"flower-sample\", namespace=namespace)"
]
},
{
Expand Down
Loading

0 comments on commit 16d391b

Please sign in to comment.