diff --git a/.github/wordlist.txt b/.github/wordlist.txt index a6627faf..db0f7393 100644 --- a/.github/wordlist.txt +++ b/.github/wordlist.txt @@ -168,3 +168,7 @@ gz xvf IMG mv +appName +src +appType +appVersions diff --git a/docs/getting-started/install.md b/docs/getting-started/install.md index 1d4fed2a..44e096e0 100644 --- a/docs/getting-started/install.md +++ b/docs/getting-started/install.md @@ -1,5 +1,5 @@ Install the latest stable release of the Iter8 CLI as follows. ```shell -go install github.com/iter8-tools/iter8@v0.14 +go install github.com/iter8-tools/iter8@v0.15 ``` \ No newline at end of file diff --git a/docs/getting-started/installghaction.md b/docs/getting-started/installghaction.md deleted file mode 100644 index 0415da4c..00000000 --- a/docs/getting-started/installghaction.md +++ /dev/null @@ -1,7 +0,0 @@ -=== "GitHub Actions" - Install the latest stable release of the Iter8 CLI in your GitHub Actions workflow as follows. - - ```yaml - - name: Install Iter8 - run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.14 - ``` diff --git a/docs/getting-started/installgoinstall.md b/docs/getting-started/installgoinstall.md deleted file mode 100644 index 2fcaca69..00000000 --- a/docs/getting-started/installgoinstall.md +++ /dev/null @@ -1,5 +0,0 @@ -Install the latest stable release of the Iter8 CLI as follows. - -```yaml -GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.14 -``` \ No newline at end of file diff --git a/docs/tutorials/abn/abn.md b/docs/tutorials/abn/abn.md index c251ee98..5f31c8e6 100644 --- a/docs/tutorials/abn/abn.md +++ b/docs/tutorials/abn/abn.md @@ -92,7 +92,7 @@ data: EOF ``` -In this definition, each version of the application is composed of a `Service` and a `Deployment`. In the primary version, both are named `backend`. In any candidate version they are named `backend-candidate-1`. Iter8 uses this definition to identify when any of the versions of the application are available. It can then respond appropriate to `Lookup()` requests. +In this definition, each version of the application is composed of a `Service` and a `Deployment`. In the primary version, both are named `backend`. In any candidate version they are named `backend-candidate-1`. Iter8 uses this definition to identify when any of the versions of the application are available. It can then respond appropriately to `Lookup()` requests. ## Generate load @@ -116,8 +116,8 @@ kubectl label deployment backend-candidate-1 iter8.tools/watch="true" kubectl expose deployment backend-candidate-1 --name=backend-candidate-1 --port=8091 ``` -Until the candidate version is ready; that is, until all expected resources are deployed and available, calls to `Lookup()` will return only the index 0; the existing version. -Once the candidate version is ready, `Lookup()` will return both indices (0 and 1) so that requests can be distributed across versions. +Until the candidate version is ready; that is, until all expected resources are deployed and available, calls to `Lookup()` will return only the version number `0`; the existing version. +Once the candidate version is ready, `Lookup()` will return both version numbers (`0` and `1`) so that requests can be distributed across versions. ## Compare versions using Grafana @@ -127,24 +127,20 @@ Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port- kubectl port-forward service/grafana 3000:3000 ``` -Open Grafana in a browser: - -```shell -http://localhost:3000/ -``` +Open Grafana in a browser by going to [http://localhost:3000](http://localhost:3000) [Add a JSON API data source](http://localhost:3000/connections/datasources/marcusolsson-json-datasource) `Iter8` with: -- URL `http://iter8.default:8080/metrics` and -- query string `application=default%2Fbackend` +- URL: `http://iter8.default:8080/metrics` +- Query string: `application=default%2Fbackend` -[Create a new dashboard](http://localhost:3000/dashboards) by *import*. Do so by pasting the contents of this [JSON definition](https://gist.githubusercontent.com/Alan-Cha/aa4ba259cc4631aafe9b43500502c60f/raw/034249f24e2c524ee4e326e860c06149ae7b2677/gistfile1.txt) into the box and *load* it. Associate it with the JSON API data source defined above. +[Create a new dashboard](http://localhost:3000/dashboards) by *import*. Copy and paste the contents of this [JSON definition](https://gist.githubusercontent.com/Alan-Cha/aa4ba259cc4631aafe9b43500502c60f/raw/034249f24e2c524ee4e326e860c06149ae7b2677/gistfile1.txt) into the text box and *load* it. Associate it with the JSON API data source above. The Iter8 dashboard allows you to compare the behavior of the two versions of the backend component against each other and select a winner. Since user requests are being sent by the load generation script, the values in the report may change over time. The Iter8 dashboard may look like the following: ![A/B dashboard](images/dashboard.png) -Once a winner is identified, the winner can be promoted, and the candidate version deleted. +Once you identify a winner, it can be promoted, and the candidate version deleted. ## Promote candidate version diff --git a/docs/tutorials/deleteiter8controller.md b/docs/tutorials/deleteiter8controller.md index ef94a7d4..527b53bb 100644 --- a/docs/tutorials/deleteiter8controller.md +++ b/docs/tutorials/deleteiter8controller.md @@ -1,13 +1,9 @@ === "Helm" - Delete the Iter8 controller using `helm` as follows. - ```shell helm delete iter8 ``` === "Kustomize" - Delete the Iter8 controller using `kustomize` as follows. - === "namespace scoped" ```shell kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/iter8/namespaceScoped?ref=v0.15.3' diff --git a/docs/tutorials/installiter8controller.md b/docs/tutorials/installiter8controller.md index 5e1915af..158403b4 100644 --- a/docs/tutorials/installiter8controller.md +++ b/docs/tutorials/installiter8controller.md @@ -1,6 +1,4 @@ === "Helm" - Install the Iter8 controller using `helm` as follows. - === "namespace scoped" ```shell helm install --repo https://iter8-tools.github.io/iter8 iter8 traffic @@ -13,8 +11,6 @@ ``` === "Kustomize" - Install the Iter8 controller using `kustomize` as follows. - === "namespace scoped" ```shell kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/iter8/namespaceScoped?ref=v0.15.3' diff --git a/docs/tutorials/integrations/ghactions.md b/docs/tutorials/integrations/ghactions.md index 3ffcaa32..a88e333a 100644 --- a/docs/tutorials/integrations/ghactions.md +++ b/docs/tutorials/integrations/ghactions.md @@ -8,11 +8,11 @@ There are two ways that you can use Iter8 with GitHub Actions. You can [run Iter # Use Iter8 in a GitHub Actions workflow -Install the latest version of the Iter8 CLI using `iter8-tools/iter8@v0.14`. Once installed, the Iter8 CLI can be used as documented in various tutorials. For example: +Install the latest version of the Iter8 CLI using `iter8-tools/iter8@v0.15`. Once installed, the Iter8 CLI can be used as documented in various tutorials. For example: ```yaml linenums="1" - name: Install Iter8 - run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.14 + run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.15 # Launch an experiment inside Kubernetes # This assumes that your Kubernetes cluster is accessible from the GitHub Actions pipeline diff --git a/docs/tutorials/integrations/kserve-mm/blue-green.md b/docs/tutorials/integrations/kserve-mm/blue-green.md index aa563b1f..b284e61c 100644 --- a/docs/tutorials/integrations/kserve-mm/blue-green.md +++ b/docs/tutorials/integrations/kserve-mm/blue-green.md @@ -4,9 +4,9 @@ template: main.html # Blue-Green Rollout of a ML Model -This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring the network to distribute inference requests. +This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring routing resources to distribute inference requests. -After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying network configuration. +After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying routing configuration. ![Blue-Green rollout](images/blue-green.png) @@ -14,16 +14,21 @@ In this tutorial, we use the Istio service mesh to distribute inference requests ???+ "Before you begin" 1. Ensure that you have the [kubectl CLI](https://kubernetes.io/docs/reference/kubectl/). - 2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md) environment. + 2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/release-0.11/docs/quickstart.md) environment. If using the Quickstart environment, change your default namespace to `modelmesh-serving`: + ```shell + kubectl config set-context --current --namespace=modelmesh-serving + ``` 3. Install [Istio](https://istio.io). You can install the [demo profile](https://istio.io/latest/docs/setup/getting-started/). -## Install the Iter8 controller +## Install Iter8 --8<-- "docs/tutorials/installiter8controller.md" -## Deploy a primary model +## Initialize primary -Deploy the primary version of a model using an `InferenceService`: +### Application + +Deploy the primary version of the application. In this tutorial, the application is an ML model. Initialize the resources for the primary version of the model (`v0`) by deploying an `InferenceService` as follows: ```shell cat < + +## Install Iter8 + +--8<-- "docs/tutorials/installiter8controller.md" + +## Initialize primary + +### Application + +Deploy the primary version of the application. In this tutorial, the application is a KServe model. Initialize the resources for the primary version of the model (`v0`) by deploying an `InferenceService` as follows: + +```shell +cat < + +## Install Iter8 + +--8<-- "docs/tutorials/installiter8controller.md" + +## Initialize primary + +### Application + +Deploy the primary version of the application. In this tutorial, the application is a KServe model. Initialize the resources for the primary version of the model (`v0`) by deploying an `InferenceService` as follows: + +```shell +cat < $MANIFEST +apiVersion: apps/v1 +kind: Deployment +metadata: + name: sleep +spec: + replicas: 1 + selector: + matchLabels: + app: sleep + template: + metadata: + labels: + app: sleep +EOF +if [ "${SERVICE_MESH}" = "istio" ]; then +cat <> $MANIFEST + sidecar.istio.io/inject: "true" +EOF +elif [ "${SERVICE_MESH}" = "servicemesh" ]; then +cat <> $MANIFEST + annotations: + sidecar.istio.io/inject: "true" +EOF +fi +cat <> $MANIFEST + spec: + containers: + - name: sleep + image: curlimages/curl + command: ["/bin/sh", "-c", "sleep 3650d"] + workingDir: /demo + imagePullPolicy: IfNotPresent + volumeMounts: + - name: config-volume + mountPath: /demo + securityContext: + runAsNonRoot: true + runAsUser: 1001040000 + allowPrivilegeEscalation: false + volumes: + - name: config-volume + configMap: + name: demo-input +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: demo-input +data: + input.json: | + { + "inputs": [ + { + "name": "input-0", + "shape": [2, 4], + "datatype": "FP32", + "data": [ + [6.8, 2.8, 4.8, 1.4], + [6.0, 3.4, 4.5, 1.6] + ] + } + ] + } + wisdom.sh: | + curl -H 'Content-Type: application/json' http://wisdom.default -d @input.json -s -D - \ + | grep -e HTTP -e app-version + wisdom-test.sh: | + curl -H 'Content-Type: application/json' http://wisdom.default -d @input.json -s -D - \ + -H 'traffic: test' \ + | grep -e HTTP -e app-version +EOF + +kubectl apply -f $MANIFEST +rm -f $MANIFEST diff --git a/samples/modelmesh-serving/sleep.sh b/samples/modelmesh-serving/sleep.sh index e69de29b..c203e792 100644 --- a/samples/modelmesh-serving/sleep.sh +++ b/samples/modelmesh-serving/sleep.sh @@ -0,0 +1,423 @@ +#!/bin/sh + +# To use with RedHat OpenShift Service Mesh, set +# SERVICE_MESH=servicemesh + +if [ -z ${SERVICE_MESH+x} ]; then + SERVICE_MESH="istio" +fi + +MANIFEST=/tmp/manifest.$$ +cat < $MANIFEST +apiVersion: apps/v1 +kind: Deployment +metadata: + name: sleep +spec: + replicas: 1 + selector: + matchLabels: + app: sleep + template: + metadata: + labels: + app: sleep +EOF +if [ "${SERVICE_MESH}" = "istio" ]; then +cat <> $MANIFEST + sidecar.istio.io/inject: "true" +EOF +elif [ "${SERVICE_MESH}" = "servicemesh" ]; then +cat <> $MANIFEST + annotations: + sidecar.istio.io/inject: "true" +EOF +fi +cat <> $MANIFEST + spec: + containers: + - name: sleep + image: fullstorydev/grpcurl:latest-alpine + command: ["/bin/sh", "-c", "sleep 3650d"] + workingDir: /demo + imagePullPolicy: IfNotPresent + volumeMounts: + - name: config-volume + mountPath: /demo + securityContext: + runAsNonRoot: true + runAsUser: 1001040000 + allowPrivilegeEscalation: false + volumes: + - name: config-volume + configMap: + name: demo-input +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: demo-input +data: + kserve.proto: | + syntax = "proto3"; + package inference; + option go_package = "github.com/kserve/modelmesh-serving/fvt/generated;inference"; + + // Inference Server GRPC endpoints. + service GRPCInferenceService + { + // The ServerLive API indicates if the inference server is able to receive + // and respond to metadata and inference requests. + rpc ServerLive(ServerLiveRequest) returns (ServerLiveResponse) {} + + // The ServerReady API indicates if the server is ready for inferencing. + rpc ServerReady(ServerReadyRequest) returns (ServerReadyResponse) {} + + // The ModelReady API indicates if a specific model is ready for inferencing. + rpc ModelReady(ModelReadyRequest) returns (ModelReadyResponse) {} + + // The ServerMetadata API provides information about the server. Errors are + // indicated by the google.rpc.Status returned for the request. The OK code + // indicates success and other codes indicate failure. + rpc ServerMetadata(ServerMetadataRequest) returns (ServerMetadataResponse) {} + + // The per-model metadata API provides information about a model. Errors are + // indicated by the google.rpc.Status returned for the request. The OK code + // indicates success and other codes indicate failure. + rpc ModelMetadata(ModelMetadataRequest) returns (ModelMetadataResponse) {} + + // The ModelInfer API performs inference using the specified model. Errors are + // indicated by the google.rpc.Status returned for the request. The OK code + // indicates success and other codes indicate failure. + rpc ModelInfer(ModelInferRequest) returns (ModelInferResponse) {} + } + + message ServerLiveRequest {} + + message ServerLiveResponse + { + // True if the inference server is live, false if not live. + bool live = 1; + } + + message ServerReadyRequest {} + + message ServerReadyResponse + { + // True if the inference server is ready, false if not ready. + bool ready = 1; + } + + message ModelReadyRequest + { + // The name of the model to check for readiness. + string name = 1; + + // The version of the model to check for readiness. If not given the + // server will choose a version based on the model and internal policy. + string version = 2; + } + + message ModelReadyResponse + { + // True if the model is ready, false if not ready. + bool ready = 1; + } + + message ServerMetadataRequest {} + + message ServerMetadataResponse + { + // The server name. + string name = 1; + + // The server version. + string version = 2; + + // The extensions supported by the server. + repeated string extensions = 3; + } + + message ModelMetadataRequest + { + // The name of the model. + string name = 1; + + // The version of the model to check for readiness. If not given the + // server will choose a version based on the model and internal policy. + string version = 2; + } + + message ModelMetadataResponse + { + // Metadata for a tensor. + message TensorMetadata + { + // The tensor name. + string name = 1; + + // The tensor data type. + string datatype = 2; + + // The tensor shape. A variable-size dimension is represented + // by a -1 value. + repeated int64 shape = 3; + } + + // The model name. + string name = 1; + + // The versions of the model available on the server. + repeated string versions = 2; + + // The model's platform. See Platforms. + string platform = 3; + + // The model's inputs. + repeated TensorMetadata inputs = 4; + + // The model's outputs. + repeated TensorMetadata outputs = 5; + } + + message ModelInferRequest + { + // An input tensor for an inference request. + message InferInputTensor + { + // The tensor name. + string name = 1; + + // The tensor data type. + string datatype = 2; + + // The tensor shape. + repeated int64 shape = 3; + + // Optional inference input tensor parameters. + map parameters = 4; + + // The tensor contents using a data-type format. This field must + // not be specified if "raw" tensor contents are being used for + // the inference request. + InferTensorContents contents = 5; + } + + // An output tensor requested for an inference request. + message InferRequestedOutputTensor + { + // The tensor name. + string name = 1; + + // Optional requested output tensor parameters. + map parameters = 2; + } + + // The name of the model to use for inferencing. + string model_name = 1; + + // The version of the model to use for inference. If not given the + // server will choose a version based on the model and internal policy. + string model_version = 2; + + // Optional identifier for the request. If specified will be + // returned in the response. + string id = 3; + + // Optional inference parameters. + map parameters = 4; + + // The input tensors for the inference. + repeated InferInputTensor inputs = 5; + + // The requested output tensors for the inference. Optional, if not + // specified all outputs produced by the model will be returned. + repeated InferRequestedOutputTensor outputs = 6; + + // The data contained in an input tensor can be represented in "raw" + // bytes form or in the repeated type that matches the tensor's data + // type. To use the raw representation 'raw_input_contents' must be + // initialized with data for each tensor in the same order as + // 'inputs'. For each tensor, the size of this content must match + // what is expected by the tensor's shape and data type. The raw + // data must be the flattened, one-dimensional, row-major order of + // the tensor elements without any stride or padding between the + // elements. Note that the FP16 data type must be represented as raw + // content as there is no specific data type for a 16-bit float + // type. + // + // If this field is specified then InferInputTensor::contents must + // not be specified for any input tensor. + repeated bytes raw_input_contents = 7; + } + + message ModelInferResponse + { + // An output tensor returned for an inference request. + message InferOutputTensor + { + // The tensor name. + string name = 1; + + // The tensor data type. + string datatype = 2; + + // The tensor shape. + repeated int64 shape = 3; + + // Optional output tensor parameters. + map parameters = 4; + + // The tensor contents using a data-type format. This field must + // not be specified if "raw" tensor contents are being used for + // the inference response. + InferTensorContents contents = 5; + } + + // The name of the model used for inference. + string model_name = 1; + + // The version of the model used for inference. + string model_version = 2; + + // The id of the inference request if one was specified. + string id = 3; + + // Optional inference response parameters. + map parameters = 4; + + // The output tensors holding inference results. + repeated InferOutputTensor outputs = 5; + + // The data contained in an output tensor can be represented in + // "raw" bytes form or in the repeated type that matches the + // tensor's data type. To use the raw representation 'raw_output_contents' + // must be initialized with data for each tensor in the same order as + // 'outputs'. For each tensor, the size of this content must match + // what is expected by the tensor's shape and data type. The raw + // data must be the flattened, one-dimensional, row-major order of + // the tensor elements without any stride or padding between the + // elements. Note that the FP16 data type must be represented as raw + // content as there is no specific data type for a 16-bit float + // type. + // + // If this field is specified then InferOutputTensor::contents must + // not be specified for any output tensor. + repeated bytes raw_output_contents = 6; + } + + // An inference parameter value. The Parameters message describes a + // “name”/”value” pair, where the “name” is the name of the parameter + // and the “value” is a boolean, integer, or string corresponding to + // the parameter. + message InferParameter + { + // The parameter value can be a string, an int64, a boolean + // or a message specific to a predefined parameter. + oneof parameter_choice + { + // A boolean parameter value. + bool bool_param = 1; + + // An int64 parameter value. + int64 int64_param = 2; + + // A string parameter value. + string string_param = 3; + } + } + + // The data contained in a tensor represented by the repeated type + // that matches the tensor's data type. Protobuf oneof is not used + // because oneofs cannot contain repeated fields. + message InferTensorContents + { + // Representation for BOOL data type. The size must match what is + // expected by the tensor's shape. The contents must be the flattened, + // one-dimensional, row-major order of the tensor elements. + repeated bool bool_contents = 1; + + // Representation for INT8, INT16, and INT32 data types. The size + // must match what is expected by the tensor's shape. The contents + // must be the flattened, one-dimensional, row-major order of the + // tensor elements. + repeated int32 int_contents = 2; + + // Representation for INT64 data types. The size must match what + // is expected by the tensor's shape. The contents must be the + // flattened, one-dimensional, row-major order of the tensor elements. + repeated int64 int64_contents = 3; + + // Representation for UINT8, UINT16, and UINT32 data types. The size + // must match what is expected by the tensor's shape. The contents + // must be the flattened, one-dimensional, row-major order of the + // tensor elements. + repeated uint32 uint_contents = 4; + + // Representation for UINT64 data types. The size must match what + // is expected by the tensor's shape. The contents must be the + // flattened, one-dimensional, row-major order of the tensor elements. + repeated uint64 uint64_contents = 5; + + // Representation for FP32 data type. The size must match what is + // expected by the tensor's shape. The contents must be the flattened, + // one-dimensional, row-major order of the tensor elements. + repeated float fp32_contents = 6; + + // Representation for FP64 data type. The size must match what is + // expected by the tensor's shape. The contents must be the flattened, + // one-dimensional, row-major order of the tensor elements. + repeated double fp64_contents = 7; + + // Representation for BYTES data type. The size must match what is + // expected by the tensor's shape. The contents must be the flattened, + // one-dimensional, row-major order of the tensor elements. + repeated bytes bytes_contents = 8; + } + grpc_input.json: | + { + "inputs": [ + { + "name": "predict", + "shape": [1, 64], + "datatype": "FP32", + "contents": { + "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] + } + } + ] + } + wisdom.sh: | + cat grpc_input.json | \ + grpcurl -vv -plaintext -proto kserve.proto -d @ \ + -authority wisdom.modelmesh-serving \ + modelmesh-serving.modelmesh-serving:8033 \ + inference.GRPCInferenceService.ModelInfer \ + | grep -e app-version + wisdom-test.sh: | + cat grpc_input.json | \ + grpcurl -vv -plaintext -proto kserve.proto -d @ \ + -authority wisdom.modelmesh-serving \ + -H 'traffic: test' \ + modelmesh-serving.modelmesh-serving:8033 \ + inference.GRPCInferenceService.ModelInfer \ + | grep -e app-version + lightspeed.sh: | + cat grpc_input.json | \ + grpcurl -vv -plaintext -proto kserve.proto -d @ \ + -authority lightspeed.modelmesh-serving \ + modelmesh-serving.modelmesh-serving:8033 \ + inference.GRPCInferenceService.ModelInfer \ + | grep -e app-version + lightspeed-test.sh: | + cat grpc_input.json | \ + grpcurl -vv -plaintext -proto kserve.proto -d @ \ + -authority lightspeed.modelmesh-serving \ + -H 'traffic: test' \ + modelmesh-serving.modelmesh-serving:8033 \ + inference.GRPCInferenceService.ModelInfer \ + | grep -e app-version +EOF + +kubectl apply -f $MANIFEST +rm -f $MANIFEST