Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions docs/getting-started/first-abn.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ This tutorial describes how to do A/B testing of a backend component using the [

A simple sample two-tier application using the Iter8 SDK is provided. Note that only the frontend component uses the Iter8 SDK. Deploy both the frontend and backend components of this application as described in each tab:

=== "frontend"
=== "Frontend"
Install the frontend component using an implementation in the language of your choice:

=== "node"
=== "Node"
```shell
kubectl create deployment frontend --image=iter8/abn-sample-frontend-node:0.17.3
kubectl expose deployment frontend --name=frontend --port=8090
Expand All @@ -43,7 +43,7 @@ A simple sample two-tier application using the Iter8 SDK is provided. Note that

The frontend component is implemented to call `Lookup()` before each call to the backend component. The frontend component uses the returned version number to route the request to the recommended version of the backend component.

=== "backend"
=== "Backend"
Release an initial version of the backend named `backend`:

```shell
Expand All @@ -66,9 +66,15 @@ In one shell, port-forward requests to the frontend component:
```
In another shell, run a script to generate load from multiple users:
```shell
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/abn-sample/generate_load.sh | sh -s --
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.3/samples/abn-sample/generate_load.sh | sh -s --
```

The load generator and sample frontend application outputs the backend that handled each recommendation. With just one version is deployed, all requests are handled by `backend-0`. In the output you will see something like:

```
Recommendation: {"Id":19,"Name":"sample","Source":"backend-74ff88c76d-nb87j"}
```

## Deploy candidate

A candidate version of the *backend* component can be deployed simply by adding a second version to the list of versions:
Expand All @@ -91,6 +97,12 @@ EOF
While the candidate version is deploying, `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model.
Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions.

Once both backend versions are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like:

```
Recommendation: {"Id":19,"Name":"sample","Source":"backend-candidate-1-56cb7cd5cf-bkrjv"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should say that it will be a mix and also show some output that looks like a mix, not just a single recommendation.

```

## Compare versions using Grafana

Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows:
Expand Down Expand Up @@ -132,6 +144,12 @@ EOF

Calls to `Lookup()` will now recommend that all traffic be sent to the new primary version `backend` (currently serving the promoted version of the code).

The output of the load generator will again show just `backend_0`:

```
Recommendation: {"Id":19,"Name":"sample","Source":"backend-74ff88c76d-nb87j"}
```

## Cleanup

Delete the sample application:
Expand All @@ -144,3 +162,15 @@ helm delete backend
Uninstall the Iter8 controller:

--8<-- "docs/getting-started/uninstall.md"

If you installed Grafana, you can delete it as follows:

```shell
kubectl delete svc/grafana, deploy/grafana
```

***

Congratulations! :tada: You completed your first A/B test with Iter8.

***
8 changes: 7 additions & 1 deletion docs/getting-started/first-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The Iter8 dashboard will look like the following:
![`http` Iter8 dashboard](../user-guide/tasks/images/httpdashboard.png)

## View logs
Logs are useful for debugging.
Logs are useful for debugging. To see the test logs:

```shell
kubectl logs -l iter8.tools/test=httpbin-test
Expand All @@ -102,6 +102,12 @@ kubectl delete deploy/httpbin

--8<-- "docs/getting-started/uninstall.md"

If you installed Grafana, you can delete it as follows:

```shell
kubectl delete svc/grafana, deploy/grafana
```

***

Congratulations! :tada: You completed your first performance test with Iter8.
Expand Down
33 changes: 29 additions & 4 deletions docs/getting-started/first-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ You can also send requests from a pod within the cluster:

1. Create a `sleep` pod in the cluster from which requests can be made:
```shell
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/kserve-serving/sleep.sh | sh -
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.4/samples/kserve-serving/sleep.sh | sh -
```

2. Exec into the sleep pod:
Expand All @@ -76,7 +76,7 @@ kubectl exec --stdin --tty "$(kubectl get pod --sort-by={metadata.creationTimest
curl httpbin.default -s -D - | grep -e '^HTTP' -e app-version
```

The output includes the success of the request (the HTTP return code) and the version of the application that responded (the `app-version` response header). For example:
The output includes the success of the request (the HTTP return code) and the version of the application that responded (in the `app-version` response header). In this example:

```
HTTP/1.1 200 OK
Expand Down Expand Up @@ -123,7 +123,15 @@ When the second version is deployed and ready, the Iter8 controller automaticall

### Verify routing

You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will now be handled equally by both versions.
You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will now be handled equally by both versions. Output will be something like:

```
HTTP/1.1 200 OK
app-version: httpbin-0
...
HTTP/1.1 200 OK
app-version: httpbin-1
```

## Modify weights (optional)

Expand Down Expand Up @@ -177,7 +185,12 @@ Once the (reconfigured) primary version ready, the Iter8 controller will automat

### Verify routing

You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version.
You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. Output will be something like:

```
HTTP/1.1 200 OK
app-version: httpbin-0
```

## Cleanup

Expand All @@ -187,6 +200,18 @@ Delete the application and its routing configuration:
helm delete httpbin
```

If you used the `sleep` pod to generate load, remove it:

```shell
kubectl delete deploy sleep
```

Uninstall Iter8 controller:

--8<-- "docs/getting-started/uninstall.md"

***

Congratulations! :tada: You completed your first blue-green rollout with Iter8.

***
13 changes: 5 additions & 8 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,8 @@ hide:

1. Stabilizing Iter8 APIs for CNCF sandboxing
2. Autoscaling the metrics service
3. Install infrastructure components such as Istio
4. Install ML components such as KServe and KServe ModelMesh
5. Extend routing templates to include application management
6. Support multi-cluster installs
7. Open Data Hub tier 1 project
8. Metrics & evaluation for foundation model/LLM-based apps
9. Hyperparameter tuning for foundation model/LLM-based inference pipelines
10. Data/concept drift detection for ML models
3. Support multi-cluster installs
4. Open Data Hub tier 1 project
5. Metrics & evaluation for foundation model/LLM-based apps
6. Hyperparameter tuning for foundation model/LLM-based inference pipelines
7. Data/concept drift detection for ML models
34 changes: 32 additions & 2 deletions docs/tutorials/integrations/kserve-mm/abn.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ application:
EOF
```

Wait for the backend model to be ready:

```shell
kubectl wait --for condition=ready isvc/backend-0 --timeout=600s
```

## Generate load

In one shell, port-forward requests to the frontend component:
Expand All @@ -70,9 +76,15 @@ In one shell, port-forward requests to the frontend component:

In another shell, run a script to generate load from multiple users:
```shell
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/abn-sample/generate_load.sh | sh -s --
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.3/samples/abn-sample/generate_load.sh | sh -s --
```

The load generator and sample frontend application outputs the backend that handled each recommendation. With just one version is deployed, all requests are handled by `backend-0`. In the output you will see something like:

```
Recommendation: backend-0__isvc-3642375d03
```

## Deploy candidate

A candidate version of the model can be deployed simply by adding a second version to the list of versions:
Expand Down Expand Up @@ -105,6 +117,12 @@ EOF
Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model.
Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions.

Once both backend versions are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like:

```
Recommendation: backend-1__isvc-3642375d03
```

## Compare versions using Grafana

Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows:
Expand Down Expand Up @@ -155,6 +173,12 @@ EOF

Calls to `Lookup()` will now recommend that all traffic be sent to the new primary version `backend-0` (currently serving the promoted version of the code).

The output of the load generator will again show just `backend_0`:

```
Recommendation: backend-0__isvc-3642375d03
```

## Cleanup

Delete the backend:
Expand All @@ -171,4 +195,10 @@ kubectl delete deploy/frontend svc/frontend

Uninstall Iter8 controller:

--8<-- "docs/getting-started/uninstall.md"
--8<-- "docs/getting-started/uninstall.md"

If you installed Grafana, you can delete it as follows:

```shell
kubectl delete svc/grafana, deploy/grafana
```
51 changes: 26 additions & 25 deletions docs/tutorials/integrations/kserve-mm/blue-green.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ application:
EOF
```

Wait for the backend model to be ready:

```shell
kubectl wait --for condition=ready isvc/wisdom-0 --timeout=600s
```

??? note "What happens?"
- Because `environment` is set to `kserve-modelmesh-istio`, an `InferenceService` object is created.
- The namespace `default` is inherited from the Helm release namespace since it is not specified in the version or in `application.metadata`.
Expand Down Expand Up @@ -90,33 +96,12 @@ cat grpc_input.json \
| grep -e app-version
```

The output includes the version of the application that responded (the `app-version` response header). For example:
The output includes the version of the application that responded (in the `app-version` response header). In this example:

```
app-version: wisdom-0
```

??? note "To send requests from outside the cluster"
To configure the release for traffic from outside the cluster, a suitable Istio `Gateway` is required. For example, this [sample gateway](https://raw.githubusercontent.com/kalantar/docs/release/samples/iter8-sample-gateway.yaml). When using the Iter8 `release` chart, set the `gateway` field to the name of your `Gateway`. Finally, to send traffic:

(a) In a separate terminal, port-forward the ingress gateway:
```shell
kubectl -n istio-system port-forward svc/istio-ingressgateway 8080:80
```
(b) Download the proto file and sample input:
```shell
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/modelmesh-serving/kserve.proto
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/modelmesh-serving/grpc_input.json
```
\(c) Send requests using the `Host` header:
```shell
cat grpc_input.json | \
grpcurl -vv -plaintext -proto kserve.proto -d @ \
-authority wisdom.modelmesh-serving \
localhost:8080 inference.GRPCInferenceService.ModelInfer \
| grep -e app-version
```

## Deploy candidate

A candidate version of the model can be deployed simply by adding a second version to the list of versions comprising the application:
Expand Down Expand Up @@ -151,7 +136,13 @@ When the candidate version is ready, the Iter8 controller will Iter8 will automa

### Verify Routing

You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will be handled equally by both versions.
You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will be handled equally by both versions. Output will be something like:

```
app-version: wisdom-0
...
app-version: wisdom-1
```

## Modify weights (optional)

Expand Down Expand Up @@ -186,7 +177,7 @@ Iter8 automatically reconfigures the routing to distribute traffic between the v

### Verify Routing

You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. 70 percent of requests will now be handled by the candidate version; the remaining 30 percent by the primary version.
You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. 70 percent of requests will now be handled by the candidate version (`wisdom-1`); the remaining 30 percent by the primary version (`wisdom-0`).

## Promote candidate

Expand Down Expand Up @@ -216,7 +207,11 @@ Once the (reconfigured) primary `InferenceService` ready, the Iter8 controller w

### Verify Routing

You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version.
You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. Output will be something like:

```
app-version: wisdom-0
```

## Cleanup

Expand All @@ -226,6 +221,12 @@ Delete the models are their routing:
helm delete wisdom
```

If you used the `sleep` pod to generate load, remove it:

```shell
kubectl delete deploy sleep
```

Uninstall Iter8 controller:

--8<-- "docs/getting-started/uninstall.md"
Loading