Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -168,3 +168,7 @@ gz
xvf
IMG
mv
appName
src
appType
appVersions
2 changes: 1 addition & 1 deletion docs/getting-started/install.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Install the latest stable release of the Iter8 CLI as follows.

```shell
go install github.com/iter8-tools/iter8@v0.14
go install github.com/iter8-tools/iter8@v0.15
```
7 changes: 0 additions & 7 deletions docs/getting-started/installghaction.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/getting-started/installgoinstall.md

This file was deleted.

20 changes: 8 additions & 12 deletions docs/tutorials/abn/abn.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ data:
EOF
```

In this definition, each version of the application is composed of a `Service` and a `Deployment`. In the primary version, both are named `backend`. In any candidate version they are named `backend-candidate-1`. Iter8 uses this definition to identify when any of the versions of the application are available. It can then respond appropriate to `Lookup()` requests.
In this definition, each version of the application is composed of a `Service` and a `Deployment`. In the primary version, both are named `backend`. In any candidate version they are named `backend-candidate-1`. Iter8 uses this definition to identify when any of the versions of the application are available. It can then respond appropriately to `Lookup()` requests.

## Generate load

Expand All @@ -116,8 +116,8 @@ kubectl label deployment backend-candidate-1 iter8.tools/watch="true"
kubectl expose deployment backend-candidate-1 --name=backend-candidate-1 --port=8091
```

Until the candidate version is ready; that is, until all expected resources are deployed and available, calls to `Lookup()` will return only the index 0; the existing version.
Once the candidate version is ready, `Lookup()` will return both indices (0 and 1) so that requests can be distributed across versions.
Until the candidate version is ready; that is, until all expected resources are deployed and available, calls to `Lookup()` will return only the version number `0`; the existing version.
Once the candidate version is ready, `Lookup()` will return both version numbers (`0` and `1`) so that requests can be distributed across versions.

## Compare versions using Grafana

Expand All @@ -127,24 +127,20 @@ Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-
kubectl port-forward service/grafana 3000:3000
```

Open Grafana in a browser:

```shell
http://localhost:3000/
```
Open Grafana in a browser by going to [http://localhost:3000](http://localhost:3000)

[Add a JSON API data source](http://localhost:3000/connections/datasources/marcusolsson-json-datasource) `Iter8` with:

- URL `http://iter8.default:8080/metrics` and
- query string `application=default%2Fbackend`
- URL: `http://iter8.default:8080/metrics`
- Query string: `application=default%2Fbackend`

[Create a new dashboard](http://localhost:3000/dashboards) by *import*. Do so by pasting the contents of this [JSON definition](https://gist.githubusercontent.com/Alan-Cha/aa4ba259cc4631aafe9b43500502c60f/raw/034249f24e2c524ee4e326e860c06149ae7b2677/gistfile1.txt) into the box and *load* it. Associate it with the JSON API data source defined above.
[Create a new dashboard](http://localhost:3000/dashboards) by *import*. Copy and paste the contents of this [JSON definition](https://gist.githubusercontent.com/Alan-Cha/aa4ba259cc4631aafe9b43500502c60f/raw/034249f24e2c524ee4e326e860c06149ae7b2677/gistfile1.txt) into the text box and *load* it. Associate it with the JSON API data source above.

The Iter8 dashboard allows you to compare the behavior of the two versions of the backend component against each other and select a winner. Since user requests are being sent by the load generation script, the values in the report may change over time. The Iter8 dashboard may look like the following:

![A/B dashboard](images/dashboard.png)

Once a winner is identified, the winner can be promoted, and the candidate version deleted.
Once you identify a winner, it can be promoted, and the candidate version deleted.

## Promote candidate version

Expand Down
4 changes: 0 additions & 4 deletions docs/tutorials/deleteiter8controller.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
=== "Helm"
Delete the Iter8 controller using `helm` as follows.

```shell
helm delete iter8
```

=== "Kustomize"
Delete the Iter8 controller using `kustomize` as follows.

=== "namespace scoped"
```shell
kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/iter8/namespaceScoped?ref=v0.15.3'
Expand Down
4 changes: 0 additions & 4 deletions docs/tutorials/installiter8controller.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
=== "Helm"
Install the Iter8 controller using `helm` as follows.

=== "namespace scoped"
```shell
helm install --repo https://iter8-tools.github.io/iter8 iter8 traffic
Expand All @@ -13,8 +11,6 @@
```

=== "Kustomize"
Install the Iter8 controller using `kustomize` as follows.

=== "namespace scoped"
```shell
kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/iter8/namespaceScoped?ref=v0.15.3'
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/integrations/ghactions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ There are two ways that you can use Iter8 with GitHub Actions. You can [run Iter

# Use Iter8 in a GitHub Actions workflow

Install the latest version of the Iter8 CLI using `iter8-tools/iter8@v0.14`. Once installed, the Iter8 CLI can be used as documented in various tutorials. For example:
Install the latest version of the Iter8 CLI using `iter8-tools/iter8@v0.15`. Once installed, the Iter8 CLI can be used as documented in various tutorials. For example:

```yaml linenums="1"
- name: Install Iter8
run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.14
run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.15

# Launch an experiment inside Kubernetes
# This assumes that your Kubernetes cluster is accessible from the GitHub Actions pipeline
Expand Down
122 changes: 64 additions & 58 deletions docs/tutorials/integrations/kserve-mm/blue-green.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,31 @@ template: main.html

# Blue-Green Rollout of a ML Model

This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring the network to distribute inference requests.
This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring routing resources to distribute inference requests.

After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying network configuration.
After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying routing configuration.

![Blue-Green rollout](images/blue-green.png)

In this tutorial, we use the Istio service mesh to distribute inference requests between different versions of a model.

???+ "Before you begin"
1. Ensure that you have the [kubectl CLI](https://kubernetes.io/docs/reference/kubectl/).
2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md) environment.
2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/release-0.11/docs/quickstart.md) environment. If using the Quickstart environment, change your default namespace to `modelmesh-serving`:
```shell
kubectl config set-context --current --namespace=modelmesh-serving
```
3. Install [Istio](https://istio.io). You can install the [demo profile](https://istio.io/latest/docs/setup/getting-started/).

## Install the Iter8 controller
## Install Iter8

--8<-- "docs/tutorials/installiter8controller.md"

## Deploy a primary model
## Initialize primary

Deploy the primary version of a model using an `InferenceService`:
### Application

Deploy the primary version of the application. In this tutorial, the application is an ML model. Initialize the resources for the primary version of the model (`v0`) by deploying an `InferenceService` as follows:

```shell
cat <<EOF | kubectl apply -f -
Expand All @@ -48,36 +53,36 @@ EOF
```

??? note "About the primary `InferenceService`"
Naming the model with the suffix `-0` (and the candidate with the suffix `-1`) simplifies the rollout initialization. However, any name can be specified.
The base name (`wisdom`) and version (`v0`) are identified using the labels `app.kubernets.io/name` and `app.kubernets.io.version`, respectively. These labels are not required.

Naming the instance with the suffix `-0` (and the candidate with the suffix `-1`) simplifies the routing initialization (see below). However, any name can be specified.

The label `iter8.tools/watch: "true"` lets Iter8 know that it should pay attention to changes to this `InferenceService`.
The label `iter8.tools/watch: "true"` is required. It lets Iter8 know that it should pay attention to changes to this application resource.

Inspect the deployed `InferenceService`:
You can inspect the deployed `InferenceService`. When the `READY` field becomes `True`, the model is fully deployed.

```shell
kubectl get inferenceservice wisdom-0
```

When the `READY` field becomes `True`, the model is fully deployed.

## Initialize the Blue-Green routing policy
### Routing

Initialize model rollout with a blue-green traffic pattern as follows:

```shell
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/iter8 traffic-templates -f - | kubectl apply -f -
templateName: initialize-rollout
targetEnv: kserve-modelmesh
trafficStrategy: blue-green
modelName: wisdom
cat <<EOF | helm template routing --repo https://iter8-tools.github.io/iter8 routing-actions -f - | kubectl apply -f -
appType: kserve-modelmesh
appName: wisdom
action: initialize
strategy: blue-green
EOF
```

The `initialize-rollout` template (with `trafficStrategy: blue-green`) configures the Istio service mesh to route all requests to the primary version of the model (`wisdom-0`). Further, it defines the routing policy that will be used by Iter8 when it observes changes in the models. By default, this routing policy splits inference requests 50-50 between the primary and candidate versions. For detailed configuration options, see the Helm chart.
The `initialize` action (with strategy `blue-green`) configures the (Istio) service mesh to route all requests to the primary version of the application (`wisdom-0`). It further defines the routing policy that will be used when changes are observed in the application resources. By default, this routing policy splits requests 50-50 between the primary and candidate versions. For detailed configuration options, see the [Helm chart](https://github.com/iter8-tools/iter8/blob/v0.15.5/charts/routing-actions/values.yaml).

## Verify network configuration
## Verify routing

To verify the network configuration, you can inspect the network configuration:
To verify the routing configuration, you can inspect the `VirtualService`:

```shell
kubectl get virtualservice -o yaml wisdom
Expand All @@ -88,7 +93,7 @@ To send inference requests to the model:
=== "From within the cluster"
1. Create a "sleep" pod in the cluster from which requests can be made:
```shell
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.14.3/samples/modelmesh-serving/sleep.sh | sh -
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.15.2/samples/modelmesh-serving/sleep.sh | sh -
```

2. exec into the sleep pod:
Expand All @@ -111,21 +116,22 @@ To send inference requests to the model:

2. Download the proto file and a sample input:
```shell
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/kserve.proto
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/grpc_input.json
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.15.1/samples/modelmesh-serving/kserve.proto
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.15.1/samples/modelmesh-serving/grpc_input.json
```

3. Send inference requests:
```shell
cat grpc_input.json | \
grpcurl -plaintext -proto kserve.proto -d @ \
grpcurl -vv -plaintext -proto kserve.proto -d @ \
-authority wisdom.modelmesh-serving \
localhost:8080 inference.GRPCInferenceService.ModelInfer
localhost:8080 inference.GRPCInferenceService.ModelInfer \
| grep -e app-version
```

Note that the model version responding to each inference request can be determined from the `modelName` field of the response.
Note that the model version responding to each inference request is noted in the response header `app-version`. In the requests above, we display only this header.

## Deploy a candidate model
## Deploy candidate

Deploy a candidate model using a second `InferenceService`:

Expand All @@ -152,45 +158,43 @@ EOF
```

??? note "About the candidate `InferenceService`"
The model name (`wisdom`) and version (`v1`) are recorded using the labels `app.kubernets.io/name` and `app.kubernets.io.version`.

In this tutorial, the model source (field `spec.predictor.model.storageUri`) is the same as for the primary version of the model. In a real world example, this would be different.

## Verify network configuration changes
## Verify routing changes

The deployment of the candidate model triggers an automatic reconfiguration by Iter8. Inspect the `VirtualService` to see that inference requests are now distributed between the primary model and the secondary model:
The deployment of the candidate model triggers an automatic reconfiguration by Iter8. Inspect the `VirtualService` to see that the routing has been changed. Requests are now distributed between the primary and candidate:

```shell
kubectl get virtualservice wisdom -o yaml
```

Send additional inference requests as described above.
You can send additional inference requests as described above. They will be handled by both versions of the model.

## Modify weights (optional)

You can modify the weight distribution of inference requests using the Iter8 `traffic-template` chart:
You can modify the weight distribution of inference requests as follows:

```shell
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/iter8 traffic-templates -f - | kubectl apply -f -
templateName: modify-weights
targetEnv: kserve-modelmesh
trafficStrategy: blue-green
modelName: wisdom
modelVersions:
cat <<EOF | helm template routing --repo https://iter8-tools.github.io/iter8 routing-actions -f - | kubectl apply -f -
appType: kserve-modelmesh
appName: wisdom
action: modify-weights
strategy: blue-green
appVersions:
- weight: 20
- weight: 80
EOF
```

Note that using the `modify-weights` overrides the default traffic split for all future candidate deployments.
Note that using the `modify-weights` action overrides the default traffic split for all future candidate deployments.

As above, you can verify the network configuration changes.
As above, you can verify the routing changes.

## Promote the candidate model
## Promote candidate

Promoting the candidate involves redefining the primary `InferenceService` using the new model and deleting the candidate `InferenceService`.
Promoting the candidate involves redefining the primary version of the application and deleting the candidate version.

### Redefine the primary `InferenceService`
### Redefine primary

```shell
cat <<EOF | kubectl replace -f -
Expand All @@ -217,41 +221,43 @@ EOF
??? note "What is different?"
The version label (`app.kubernets.io/version`) was updated. In a real world example, `spec.predictor.model.storageUri` would also be updated.

### Delete the candidate `InferenceService`
### Delete candidate

Once the primary `InferenceService` has been redeployed, delete the candidate:

```shell
kubectl delete inferenceservice wisdom-1
```

### Verify network configuration changes
### Verify routing changes

Inspect the `VirtualService` to see that the it has been automatically reconfigured to send requests only to the primary model.

## Clean up
## Cleanup

Delete the candidate model:
If not already deleted, delete the candidate:

```shell
kubectl delete --force isvc/wisdom-1
kubectl delete isvc/wisdom-1
```

Delete routing artifacts:
Delete routing:

```shell
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/iter8 traffic-templates -f - | kubectl delete --force -f -
templateName: initialize-rollout
targetEnv: kserve-modelmesh
trafficStrategy: blue-green
modelName: wisdom
cat <<EOF | helm template routing --repo https://iter8-tools.github.io/iter8 routing-actions -f - | kubectl delete -f -
appType: kserve-modelmesh
appName: wisdom
action: initialize
strategy: blue-green
EOF
```

Delete the primary model:
Delete primary:

```shell
kubectl delete --force isvc/wisdom-0
kubectl delete isvc/wisdom-0
```

Uninstall the Iter8 controller:
Uninstall Iter8:

--8<-- "docs/tutorials/deleteiter8controller.md"
Loading