From 111f57e5243b19eb2c17f1a423888658aaa99892 Mon Sep 17 00:00:00 2001 From: Michael Kalantar Date: Tue, 31 Oct 2023 09:27:48 -0400 Subject: [PATCH 1/4] recommended clarifications Signed-off-by: Michael Kalantar --- docs/getting-started/first-abn.md | 38 +++++++++++++++++-- docs/getting-started/first-performance.md | 8 +++- docs/getting-started/first-release.md | 31 +++++++++++++-- docs/tutorials/integrations/kserve-mm/abn.md | 34 ++++++++++++++++- .../integrations/kserve-mm/blue-green.md | 30 +++++++++++++-- .../integrations/kserve-mm/canary.md | 34 +++++++++++++++-- .../tutorials/integrations/kserve/abn-grpc.md | 34 ++++++++++++++++- .../tutorials/integrations/kserve/abn-http.md | 34 ++++++++++++++++- .../integrations/kserve/blue-green.md | 31 +++++++++++++-- docs/tutorials/integrations/kserve/canary.md | 37 ++++++++++++++++-- docs/tutorials/integrations/kserve/grpc.md | 6 +++ docs/tutorials/integrations/kserve/http.md | 6 +++ docs/tutorials/load-test-grpc-multiple.md | 6 +++ docs/tutorials/load-test-grpc.md | 6 +++ docs/tutorials/load-test-http-multiple.md | 6 +++ 15 files changed, 312 insertions(+), 29 deletions(-) diff --git a/docs/getting-started/first-abn.md b/docs/getting-started/first-abn.md index 1ee83a8a..1471b506 100644 --- a/docs/getting-started/first-abn.md +++ b/docs/getting-started/first-abn.md @@ -26,10 +26,10 @@ This tutorial describes how to do A/B testing of a backend component using the [ A simple sample two-tier application using the Iter8 SDK is provided. Note that only the frontend component uses the Iter8 SDK. Deploy both the frontend and backend components of this application as described in each tab: -=== "frontend" +=== "Frontend" Install the frontend component using an implementation in the language of your choice: - === "node" + === "Node" ```shell kubectl create deployment frontend --image=iter8/abn-sample-frontend-node:0.17.3 kubectl expose deployment frontend --name=frontend --port=8090 @@ -43,7 +43,7 @@ A simple sample two-tier application using the Iter8 SDK is provided. Note that The frontend component is implemented to call `Lookup()` before each call to the backend component. The frontend component uses the returned version number to route the request to the recommended version of the backend component. -=== "backend" +=== "Backend" Release an initial version of the backend named `backend`: ```shell @@ -66,9 +66,15 @@ In one shell, port-forward requests to the frontend component: ``` In another shell, run a script to generate load from multiple users: ```shell - curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/abn-sample/generate_load.sh | sh -s -- + curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.3/samples/abn-sample/generate_load.sh | sh -s -- ``` +The load generator and sample frontend application outputs the backend that handled each recommendation. With just one version is deployed, all requests are handled by `backend-0`. In the output you will see something like: + +``` +Recommendation: {"Id":19,"Name":"sample","Source":"backend-74ff88c76d-nb87j"} +``` + ## Deploy candidate A candidate version of the *backend* component can be deployed simply by adding a second version to the list of versions: @@ -91,6 +97,12 @@ EOF While the candidate version is deploying, `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. +Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: + +``` +Recommendation: {"Id":19,"Name":"sample","Source":"backend-candidate-1-56cb7cd5cf-bkrjv"} +``` + ## Compare versions using Grafana Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows: @@ -132,6 +144,12 @@ EOF Calls to `Lookup()` will now recommend that all traffic be sent to the new primary version `backend` (currently serving the promoted version of the code). +The output of the load generator will again show just `backend_0`: + +``` +Recommendation: {"Id":19,"Name":"sample","Source":"backend-74ff88c76d-nb87j"} +``` + ## Cleanup Delete the sample application: @@ -144,3 +162,15 @@ helm delete backend Uninstall the Iter8 controller: --8<-- "docs/getting-started/uninstall.md" + +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + +*** + +Congratulations! :tada: You completed your first A/B test with Iter8. + +*** diff --git a/docs/getting-started/first-performance.md b/docs/getting-started/first-performance.md index 2484fba6..2aef6dc6 100644 --- a/docs/getting-started/first-performance.md +++ b/docs/getting-started/first-performance.md @@ -80,7 +80,7 @@ The Iter8 dashboard will look like the following: ![`http` Iter8 dashboard](../user-guide/tasks/images/httpdashboard.png) ## View logs -Logs are useful for debugging. +Logs are useful for debugging. To see the test logs: ```shell kubectl logs -l iter8.tools/test=httpbin-test @@ -102,6 +102,12 @@ kubectl delete deploy/httpbin --8<-- "docs/getting-started/uninstall.md" +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + *** Congratulations! :tada: You completed your first performance test with Iter8. diff --git a/docs/getting-started/first-release.md b/docs/getting-started/first-release.md index fd1fa0c0..8284e5d2 100644 --- a/docs/getting-started/first-release.md +++ b/docs/getting-started/first-release.md @@ -76,7 +76,7 @@ kubectl exec --stdin --tty "$(kubectl get pod --sort-by={metadata.creationTimest curl httpbin.default -s -D - | grep -e '^HTTP' -e app-version ``` -The output includes the success of the request (the HTTP return code) and the version of the application that responded (the `app-version` response header). For example: +The output includes the success of the request (the HTTP return code) and the version of the application that responded (in the `app-version` response header). In this example: ``` HTTP/1.1 200 OK @@ -123,7 +123,15 @@ When the second version is deployed and ready, the Iter8 controller automaticall ### Verify routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will now be handled equally by both versions. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will now be handled equally by both versions. Output will be something like: + +``` +HTTP/1.1 200 OK +app-version: httpbin-0 +... +HTTP/1.1 200 OK +app-version: httpbin-0 +``` ## Modify weights (optional) @@ -177,7 +185,12 @@ Once the (reconfigured) primary version ready, the Iter8 controller will automat ### Verify routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. Output will be something like: + +``` +HTTP/1.1 200 OK +app-version: httpbin-0 +``` ## Cleanup @@ -187,6 +200,18 @@ Delete the application and its routing configuration: helm delete httpbin ``` +If you used the `sleep` pod to generate load, remove it: + +```shell +kubectl delete deploy sleep +``` + Uninstall Iter8 controller: --8<-- "docs/getting-started/uninstall.md" + +*** + +Congratulations! :tada: You completed your first blue-green rollout with Iter8. + +*** diff --git a/docs/tutorials/integrations/kserve-mm/abn.md b/docs/tutorials/integrations/kserve-mm/abn.md index 87c81c12..18ccde85 100644 --- a/docs/tutorials/integrations/kserve-mm/abn.md +++ b/docs/tutorials/integrations/kserve-mm/abn.md @@ -61,6 +61,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/backend-0 --timeout=600s +``` + ## Generate load In one shell, port-forward requests to the frontend component: @@ -70,9 +76,15 @@ In one shell, port-forward requests to the frontend component: In another shell, run a script to generate load from multiple users: ```shell - curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/abn-sample/generate_load.sh | sh -s -- + curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.3/samples/abn-sample/generate_load.sh | sh -s -- ``` +The load generator and sample frontend application outputs the backend that handled each recommendation. With just one version is deployed, all requests are handled by `backend-0`. In the output you will see something like: + +``` +Recommendation: backend-0__isvc-3642375d03 +``` + ## Deploy candidate A candidate version of the model can be deployed simply by adding a second version to the list of versions: @@ -105,6 +117,12 @@ EOF Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. +Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: + +``` +Recommendation: backend-1__isvc-3642375d03 +``` + ## Compare versions using Grafana Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows: @@ -155,6 +173,12 @@ EOF Calls to `Lookup()` will now recommend that all traffic be sent to the new primary version `backend-0` (currently serving the promoted version of the code). +The output of the load generator will again show just `backend_0`: + +``` +Recommendation: backend-0__isvc-3642375d03 +``` + ## Cleanup Delete the backend: @@ -171,4 +195,10 @@ kubectl delete deploy/frontend svc/frontend Uninstall Iter8 controller: ---8<-- "docs/getting-started/uninstall.md" \ No newline at end of file +--8<-- "docs/getting-started/uninstall.md" + +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` diff --git a/docs/tutorials/integrations/kserve-mm/blue-green.md b/docs/tutorials/integrations/kserve-mm/blue-green.md index f89590e3..fd9e6466 100644 --- a/docs/tutorials/integrations/kserve-mm/blue-green.md +++ b/docs/tutorials/integrations/kserve-mm/blue-green.md @@ -50,6 +50,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/wisdom-0 --timeout=600s +``` + ??? note "What happens?" - Because `environment` is set to `kserve-modelmesh-istio`, an `InferenceService` object is created. - The namespace `default` is inherited from the Helm release namespace since it is not specified in the version or in `application.metadata`. @@ -90,7 +96,7 @@ cat grpc_input.json \ | grep -e app-version ``` -The output includes the version of the application that responded (the `app-version` response header). For example: +The output includes the version of the application that responded (in the `app-version` response header). In this example: ``` app-version: wisdom-0 @@ -151,7 +157,13 @@ When the candidate version is ready, the Iter8 controller will Iter8 will automa ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will be handled equally by both versions. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will be handled equally by both versions. Output will be something like: + +``` +app-version: wisdom-0 +... +app-version: wisdom-1 +``` ## Modify weights (optional) @@ -186,7 +198,7 @@ Iter8 automatically reconfigures the routing to distribute traffic between the v ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. 70 percent of requests will now be handled by the candidate version; the remaining 30 percent by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. 70 percent of requests will now be handled by the candidate version (`wisdom-1`); the remaining 30 percent by the primary version (`wisdom-0`). ## Promote candidate @@ -216,7 +228,11 @@ Once the (reconfigured) primary `InferenceService` ready, the Iter8 controller w ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. Output will be something like: + +``` +app-version: wisdom-0 +``` ## Cleanup @@ -226,6 +242,12 @@ Delete the models are their routing: helm delete wisdom ``` +If you used the `sleep` pod to generate load, remove it: + +```shell +kubectl delete deploy sleep +``` + Uninstall Iter8 controller: --8<-- "docs/getting-started/uninstall.md" diff --git a/docs/tutorials/integrations/kserve-mm/canary.md b/docs/tutorials/integrations/kserve-mm/canary.md index 789a0ef7..305ef950 100644 --- a/docs/tutorials/integrations/kserve-mm/canary.md +++ b/docs/tutorials/integrations/kserve-mm/canary.md @@ -51,6 +51,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/wisdom-0 --timeout=600s +``` + ??? note "What happens?" - Because `environment` is set to `kserve-modelmesh-istio`, an `InferenceService` object is created. - The namespace `default` is inherited from the Helm release namespace since it is not specified in the version or in `application.metadata`. @@ -90,7 +96,7 @@ cat grpc_input.json \ inference.GRPCInferenceService.ModelInfer \ | grep -e app-version ``` -4. To send a request with header `traffic: test`: +4. Requests can also be sent with the header `traffic: test`. When a candidate is deployed, requests with this header will be routed to the candidate. When no candidate is deployed, all requests will be routed to the same model version. ```shell cat grpc_input.json \ | grpcurl -vv -plaintext -proto kserve.proto -d @ \ @@ -101,7 +107,7 @@ cat grpc_input.json \ | grep -e app-version ``` -The output includes the version of the application that responded (the `app-version` response header). For example: +The output includes the version of the application that responded (in the `app-version` response header). In this example: ``` app-version: wisdom-0 @@ -171,7 +177,17 @@ When the candidate version is ready, the Iter8 controller will Iter8 will automa ### Verify routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Those with header `traffic` set to `true` will be handled by the candidate model (`wisdom-1`) while all others will be handled by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Those with header `traffic` set to `true` will be handled by the candidate model (`wisdom-1`): + +``` +app-version: wisdom-1 +``` + +All others will be handled by the primary version (`wisdom-0`): + +``` +app-version: wisdom-0 +``` ## Promote candidate @@ -203,7 +219,11 @@ Once the (reconfigured) primary `InferenceService` ready, the Iter8 controller w ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. Output will be something like: + +``` +app-version: wisdom-0 +``` ## Cleanup @@ -213,6 +233,12 @@ Delete the models and their routing: helm delete wisdom ``` +If you used the `sleep` pod to generate load, remove it: + +```shell +kubectl delete deploy sleep +``` + Uninstall Iter8 controller: --8<-- "docs/getting-started/uninstall.md" diff --git a/docs/tutorials/integrations/kserve/abn-grpc.md b/docs/tutorials/integrations/kserve/abn-grpc.md index 6db1bc67..08c2ce27 100644 --- a/docs/tutorials/integrations/kserve/abn-grpc.md +++ b/docs/tutorials/integrations/kserve/abn-grpc.md @@ -65,6 +65,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/backend-0 --timeout=600s +``` + ## Generate load In one shell, port-forward requests to the frontend component: @@ -74,8 +80,14 @@ In one shell, port-forward requests to the frontend component: In another shell, run a script to generate load from multiple users: ```shell - curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/abn-sample/generate_load.sh | sh -s -- + curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.3/samples/abn-sample/generate_load.sh | sh -s -- ``` + +The load generator and sample frontend application outputs the backend that handled each recommendation. With just one version is deployed, all requests are handled by `backend-0`. In the output you will see something like: + +``` +Recommendation: backend-0 +``` ## Deploy candidate @@ -113,6 +125,12 @@ EOF Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. +Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: + +``` +Recommendation: backend-1 +``` + ## Compare versions using Grafana Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows: @@ -167,6 +185,12 @@ EOF Calls to `Lookup()` will now recommend that all traffic be sent to the new primary version `backend-0` (currently serving the promoted version of the code). +The output of the load generator will again show just `backend_0`: + +``` +Recommendation: backend-0 +``` + ## Cleanup Delete the backend: @@ -183,4 +207,10 @@ kubectl delete deploy/frontend svc/frontend Uninstall Iter8 controller: ---8<-- "docs/getting-started/uninstall.md" \ No newline at end of file +--8<-- "docs/getting-started/uninstall.md" + +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` diff --git a/docs/tutorials/integrations/kserve/abn-http.md b/docs/tutorials/integrations/kserve/abn-http.md index c32bfddb..9dec42c6 100644 --- a/docs/tutorials/integrations/kserve/abn-http.md +++ b/docs/tutorials/integrations/kserve/abn-http.md @@ -62,6 +62,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/backend-0 --timeout=600s +``` + ## Generate load In one shell, port-forward requests to the frontend component: @@ -71,9 +77,15 @@ In one shell, port-forward requests to the frontend component: In another shell, run a script to generate load from multiple users: ```shell - curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/abn-sample/generate_load.sh | sh -s -- + curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.3/samples/abn-sample/generate_load.sh | sh -s -- ``` - + +The load generator and sample frontend application outputs the backend that handled each recommendation. With just one version is deployed, all requests are handled by `backend-0`. In the output you will see something like: + +``` +Recommendation: backend-0 +``` + ## Deploy candidate A candidate version of the model can be deployed simply by adding a second version to the list of versions: @@ -105,6 +117,12 @@ EOF Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. +Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: + +``` +Recommendation: backend-1 +``` + ## Compare versions using Grafana Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows: @@ -154,6 +172,12 @@ EOF Calls to `Lookup()` will now recommend that all traffic be sent to the new primary version `backend-0` (currently serving the promoted version of the code). +The output of the load generator will again show just `backend_0`: + +``` +Recommendation: backend-0 +``` + ## Cleanup Delete the backend: @@ -171,3 +195,9 @@ kubectl delete deploy/frontend svc/frontend Uninstall Iter8 controller: --8<-- "docs/getting-started/uninstall.md" + +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` diff --git a/docs/tutorials/integrations/kserve/blue-green.md b/docs/tutorials/integrations/kserve/blue-green.md index ff607670..fb3e0726 100644 --- a/docs/tutorials/integrations/kserve/blue-green.md +++ b/docs/tutorials/integrations/kserve/blue-green.md @@ -47,6 +47,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/wisdom-0 --timeout=600s +``` + ??? note "What happens?" - Because `environment` is set to `kserve`, an `InferenceService` object is created. - The namespace `default` is inherited from the Helm release namespace since it is not specified in the version or in `application.metadata`. @@ -83,7 +89,7 @@ http://wisdom.default -d @input.json -s -D - \ | grep -e HTTP -e app-version ``` -The output includes the success of the request (the HTTP return code) and the version of the application that responded (the `app-version` response header). For example: +The output includes the success of the request (the HTTP return code) and the version of the application that responded (in the `app-version` response header). In this example: ``` HTTP/1.1 200 OK @@ -142,7 +148,15 @@ When the candidate model is ready, Iter8 will automatically reconfigure the rout ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will be handled equally by both versions. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. Requests will be handled equally by both versions. Output will be something like: + +``` +HTTP/1.1 200 OK +app-version: wisdom-0 +... +HTTP/1.1 200 OK +app-version: wisdom-1 +``` ## Modify weights (optional) @@ -207,7 +221,12 @@ Once the (reconfigured) primary `InferenceService` ready, the Iter8 controller w ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will all be handled by the primary version. Output will be something like: + +``` +HTTP/1.1 200 OK +app-version: wisdom-0 +``` ## Cleanup @@ -217,6 +236,12 @@ Delete the models are their routing: helm delete wisdom ``` +If you used the `sleep` pod to generate load, remove it: + +```shell +kubectl delete deploy sleep +``` + Uninstall Iter8 controller: --8<-- "docs/getting-started/uninstall.md" diff --git a/docs/tutorials/integrations/kserve/canary.md b/docs/tutorials/integrations/kserve/canary.md index f4e315b4..97a46c1e 100644 --- a/docs/tutorials/integrations/kserve/canary.md +++ b/docs/tutorials/integrations/kserve/canary.md @@ -47,6 +47,12 @@ application: EOF ``` +Wait for the backend model to be ready: + +```shell +kubectl wait --for condition=ready isvc/wisdom-0 --timeout=600s +``` + ??? note "What happens?" - Because `environment` is set to `kserve`, an `InferenceService` object is created. - The namespace `default` is inherited from the Helm release namespace since it is not specified in the version or in `application.metadata`. @@ -84,7 +90,7 @@ http://wisdom.default -d @input.json -s -D - \ | grep -e HTTP -e app-version ``` -4. To send requests with the header `traffic: test`: +4. Requests can also be sent with the header `traffic: test`. When a candidate is deployed, requests with this header will be routed to the candidate. When no candidate is deployed, all requests will be routed to the same model version. ```shell curl -H 'Content-Type: application/json' \ -H 'traffic: test' \ @@ -92,7 +98,7 @@ http://wisdom.default -d @input.json -s -D - \ | grep -e HTTP -e app-version ``` -The output includes the success of the request (the HTTP return code) and the version of the application that responded (the `app-version` response header). For example: +The output includes the success of the request (the HTTP return code) and the version of the application that responded (in the `app-version` response header). In this example: ``` HTTP/1.1 200 OK @@ -155,7 +161,19 @@ EOF ??? note "About the candidate" In this tutorial, the model source (field `storageUri`) for the candidate version is the same as for the primary version of the model. In a real example, this would be different. The version label (`app.kubernetes.io/version`) can be used to distinguish between versions. -When the candidate version is ready, the Iter8 controller will Iter8 will automatically reconfigure the routing so that inference requests with the header `traffic` set to `true` will be sent to the candidate model. All other requests will be sent to the primary model. +When the candidate version is ready, the Iter8 controller will Iter8 will automatically reconfigure the routing so that inference requests with the header `traffic` set to `true` will be sent to the candidate model: + +``` +HTTP/1.1 200 OK +app-version: wisdom-1 +``` + +All other requests will be sent to the primary model (`wisdom-0`): + +``` +HTTP/1.1 200 OK +app-version: wisdom-0 +``` ### Verify routing @@ -190,7 +208,12 @@ Once the (reconfigured) primary `InferenceService` ready, the Iter8 controller w ### Verify Routing -You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will be handled by the primary version. +You can verify the routing configuration by inspecting the `VirtualService` and/or by sending requests as described above. They will be handled by the primary version. Output will be something like: + +``` +HTTP/1.1 200 OK +app-version: wisdom-0 +``` ## Cleanup @@ -200,6 +223,12 @@ Delete the models and their routing: helm delete wisdom ``` +If you used the `sleep` pod to generate load, remove it: + +```shell +kubectl delete deploy sleep +``` + Uninstall Iter8 controller: --8<-- "docs/getting-started/uninstall.md" diff --git a/docs/tutorials/integrations/kserve/grpc.md b/docs/tutorials/integrations/kserve/grpc.md index 37334b10..cd789018 100644 --- a/docs/tutorials/integrations/kserve/grpc.md +++ b/docs/tutorials/integrations/kserve/grpc.md @@ -105,5 +105,11 @@ kubectl delete inferenceservice sklearn-irisv2 --8<-- "docs/getting-started/uninstall.md" +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + ??? note "Some variations and extensions of this performance test" 1. The [grpc task](../../../user-guide/tasks/grpc.md) can be configured with load related parameters such as the number of requests, requests per second, or number of concurrent connections. \ No newline at end of file diff --git a/docs/tutorials/integrations/kserve/http.md b/docs/tutorials/integrations/kserve/http.md index 237660c4..865ab5a9 100644 --- a/docs/tutorials/integrations/kserve/http.md +++ b/docs/tutorials/integrations/kserve/http.md @@ -94,6 +94,12 @@ kubectl delete inferenceservice sklearn-irisv2 --8<-- "docs/getting-started/uninstall.md" +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + ??? note "Some variations and extensions of this performance test" 1. The [http task](../../../user-guide/tasks/http.md) can be configured with load related parameters such as the number of requests, queries per second, or number of parallel connections. 2. The [http task](../../../user-guide/tasks/http.md) can be configured to send various types of content as payload. diff --git a/docs/tutorials/load-test-grpc-multiple.md b/docs/tutorials/load-test-grpc-multiple.md index d8cf0692..db7b6b47 100644 --- a/docs/tutorials/load-test-grpc-multiple.md +++ b/docs/tutorials/load-test-grpc-multiple.md @@ -92,5 +92,11 @@ kubectl delete deploy/routeguide --8<-- "docs/getting-started/uninstall.md" +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + ??? note "Some variations and extensions of this performance test" 1. The [grpc task](../user-guide/tasks/grpc.md) can be configured with load related parameters such as the total number of requests, requests per second, or number of concurrent connections. \ No newline at end of file diff --git a/docs/tutorials/load-test-grpc.md b/docs/tutorials/load-test-grpc.md index b646c429..a391c2bd 100644 --- a/docs/tutorials/load-test-grpc.md +++ b/docs/tutorials/load-test-grpc.md @@ -134,5 +134,11 @@ kubectl delete deploy/routeguide --8<-- "docs/getting-started/uninstall.md" +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + ??? note "Some variations and extensions of this performance test" 1. The [grpc task](../user-guide/tasks/grpc.md) can be configured with load related parameters such as the total number of requests, requests per second, or number of concurrent connections. \ No newline at end of file diff --git a/docs/tutorials/load-test-http-multiple.md b/docs/tutorials/load-test-http-multiple.md index 9a6bc941..6d18cbba 100644 --- a/docs/tutorials/load-test-http-multiple.md +++ b/docs/tutorials/load-test-http-multiple.md @@ -90,6 +90,12 @@ kubectl delete deploy/httpbin --8<-- "docs/getting-started/uninstall.md" +If you installed Grafana, you can delete it as follows: + +```shell +kubectl delete svc/grafana, deploy/grafana +``` + ??? note "Some variations and extensions of this performance test" 1. The [http task](../user-guide/tasks/http.md) can be configured with load related parameters such as the number of requests, queries per second, or number of parallel connections. 2. The [http task](../user-guide/tasks/http.md) can be configured to send various types of content as payload. \ No newline at end of file From 2c1deafe8ad3536b9654050183a90dbdc3d9ec6d Mon Sep 17 00:00:00 2001 From: Michael Kalantar Date: Tue, 31 Oct 2023 09:35:53 -0400 Subject: [PATCH 2/4] spelling Signed-off-by: Michael Kalantar --- docs/getting-started/first-abn.md | 2 +- docs/tutorials/integrations/kserve-mm/abn.md | 2 +- docs/tutorials/integrations/kserve/abn-grpc.md | 2 +- docs/tutorials/integrations/kserve/abn-http.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/getting-started/first-abn.md b/docs/getting-started/first-abn.md index 1471b506..f7f94068 100644 --- a/docs/getting-started/first-abn.md +++ b/docs/getting-started/first-abn.md @@ -97,7 +97,7 @@ EOF While the candidate version is deploying, `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. -Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: +Once both backend versions are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: ``` Recommendation: {"Id":19,"Name":"sample","Source":"backend-candidate-1-56cb7cd5cf-bkrjv"} diff --git a/docs/tutorials/integrations/kserve-mm/abn.md b/docs/tutorials/integrations/kserve-mm/abn.md index 18ccde85..3fde86e4 100644 --- a/docs/tutorials/integrations/kserve-mm/abn.md +++ b/docs/tutorials/integrations/kserve-mm/abn.md @@ -117,7 +117,7 @@ EOF Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. -Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: +Once both backend versions are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: ``` Recommendation: backend-1__isvc-3642375d03 diff --git a/docs/tutorials/integrations/kserve/abn-grpc.md b/docs/tutorials/integrations/kserve/abn-grpc.md index 08c2ce27..7df50ddd 100644 --- a/docs/tutorials/integrations/kserve/abn-grpc.md +++ b/docs/tutorials/integrations/kserve/abn-grpc.md @@ -125,7 +125,7 @@ EOF Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. -Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: +Once both backend versions are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: ``` Recommendation: backend-1 diff --git a/docs/tutorials/integrations/kserve/abn-http.md b/docs/tutorials/integrations/kserve/abn-http.md index 9dec42c6..c546a0e1 100644 --- a/docs/tutorials/integrations/kserve/abn-http.md +++ b/docs/tutorials/integrations/kserve/abn-http.md @@ -117,7 +117,7 @@ EOF Until the candidate version is ready, calls to `Lookup()` will return only the version index number `0`; that is, the first, or primary, version of the model. Once the candidate version is ready, `Lookup()` will return both `0` and `1`, the indices of both versions, so that requests can be distributed across both versions. -Once both backends are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: +Once both backend versions are responding to requests, the output of the load generator will include recommendations from the candidate version. In this example, you should see something like: ``` Recommendation: backend-1 From 670a5ef09c064c626c91d7ffae55e434d6632adc Mon Sep 17 00:00:00 2001 From: Michael Kalantar Date: Wed, 1 Nov 2023 14:43:58 -0400 Subject: [PATCH 3/4] add kubernetes gateway api tutorials Signed-off-by: Michael Kalantar --- docs/getting-started/first-release.md | 4 +- .../integrations/kserve-mm/blue-green.md | 21 -- .../integrations/kserve/blue-green.md | 2 +- docs/tutorials/integrations/kserve/canary.md | 2 +- .../kubernetes-gateway-api/blue-green.md | 205 ++++++++++++++++++ .../kubernetes-gateway-api/canary.md | 190 ++++++++++++++++ mkdocs.yml | 3 + 7 files changed, 402 insertions(+), 25 deletions(-) create mode 100644 docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md create mode 100644 docs/tutorials/integrations/kubernetes-gateway-api/canary.md diff --git a/docs/getting-started/first-release.md b/docs/getting-started/first-release.md index 8284e5d2..acd90273 100644 --- a/docs/getting-started/first-release.md +++ b/docs/getting-started/first-release.md @@ -63,7 +63,7 @@ You can also send requests from a pod within the cluster: 1. Create a `sleep` pod in the cluster from which requests can be made: ```shell -curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/kserve-serving/sleep.sh | sh - +curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.4/samples/kserve-serving/sleep.sh | sh - ``` 2. Exec into the sleep pod: @@ -130,7 +130,7 @@ HTTP/1.1 200 OK app-version: httpbin-0 ... HTTP/1.1 200 OK -app-version: httpbin-0 +app-version: httpbin-1 ``` ## Modify weights (optional) diff --git a/docs/tutorials/integrations/kserve-mm/blue-green.md b/docs/tutorials/integrations/kserve-mm/blue-green.md index fd9e6466..6e62c364 100644 --- a/docs/tutorials/integrations/kserve-mm/blue-green.md +++ b/docs/tutorials/integrations/kserve-mm/blue-green.md @@ -102,27 +102,6 @@ The output includes the version of the application that responded (in the `app-v app-version: wisdom-0 ``` -??? note "To send requests from outside the cluster" - To configure the release for traffic from outside the cluster, a suitable Istio `Gateway` is required. For example, this [sample gateway](https://raw.githubusercontent.com/kalantar/docs/release/samples/iter8-sample-gateway.yaml). When using the Iter8 `release` chart, set the `gateway` field to the name of your `Gateway`. Finally, to send traffic: - - (a) In a separate terminal, port-forward the ingress gateway: - ```shell - kubectl -n istio-system port-forward svc/istio-ingressgateway 8080:80 - ``` - (b) Download the proto file and sample input: - ```shell - curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/modelmesh-serving/kserve.proto - curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/modelmesh-serving/grpc_input.json - ``` - \(c) Send requests using the `Host` header: - ```shell - cat grpc_input.json | \ - grpcurl -vv -plaintext -proto kserve.proto -d @ \ - -authority wisdom.modelmesh-serving \ - localhost:8080 inference.GRPCInferenceService.ModelInfer \ - | grep -e app-version - ``` - ## Deploy candidate A candidate version of the model can be deployed simply by adding a second version to the list of versions comprising the application: diff --git a/docs/tutorials/integrations/kserve/blue-green.md b/docs/tutorials/integrations/kserve/blue-green.md index fb3e0726..c63fc5da 100644 --- a/docs/tutorials/integrations/kserve/blue-green.md +++ b/docs/tutorials/integrations/kserve/blue-green.md @@ -74,7 +74,7 @@ You can also send inference requests from a pod within the cluster: 1. Create a `sleep` pod in the cluster from which requests can be made: ```shell -curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/kserve-serving/sleep.sh | sh - +curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.4/samples/kserve-serving/sleep.sh | sh - ``` 2. Exec into the sleep pod: diff --git a/docs/tutorials/integrations/kserve/canary.md b/docs/tutorials/integrations/kserve/canary.md index 97a46c1e..5d61e18a 100644 --- a/docs/tutorials/integrations/kserve/canary.md +++ b/docs/tutorials/integrations/kserve/canary.md @@ -75,7 +75,7 @@ You can also send inference requests from a pod within the cluster: 1. Create a `sleep` pod in the cluster from which requests can be made: ```shell -curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.17.3/samples/kserve-serving/sleep.sh | sh - +curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.4/samples/kserve-serving/sleep.sh | sh - ``` 2. Exec into the sleep pod: diff --git a/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md b/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md new file mode 100644 index 00000000..ba2cdbdf --- /dev/null +++ b/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md @@ -0,0 +1,205 @@ +--- +template: main.html +--- + +# Blue-green release + +This tutorial shows how Iter8 can be used to release a basic Kubernetes application using a blue-green rollout strategy. +In a blue-green rollout, a percentage of requests are directed to a candidate version of the model. +This percentage can be changed over time. +The user declaratively describes the desired application state at any given moment. +An Iter8 `release` chart assists users who describe the application state at any given moment. +The chart provides the configuration needed for Iter8 to automatically deploy application versions and configure the routing to implement the blue-green rollout strategy. + +![Blue-green rollout](../../images/blue-green.png) + +This tutorial uses the Kubernetes Gateway API to allow the use any service mesh that supports this API. In this case, we use demonstrate with [Linkerd](https://linkerd.io/). + +???+ warning "Before you begin" + 1. Ensure that you have a Kubernetes cluster and the [`kubectl`](https://kubernetes.io/docs/reference/kubectl/) and [`helm`](https://helm.sh/) CLIs. You can create a local Kubernetes cluster using tools like [Kind](https://kind.sigs.k8s.io/) or [Minikube](https://minikube.sigs.k8s.io/docs/). + 2. [Install Linkerd](https://linkerd.io/2.14/getting-started/). + +## Install the Iter8 controller + +--8<-- "docs/getting-started/install.md" + +## Deploy initial version + +Deploy the initial version of the application ([httpbin](https://httpbin.org/))using the Iter8 `release` chart by identifying the environment into which it should be deployed, a list of the versions to be deployed (only one here), and the rollout strategy to be used. Note that we deploy the application to the namespace `test`. + +???+ note "About creating a namespace for linkerd deployments" + When creating a namespace, it should be annotated so that all created pods are injected with the linkerd proxy. This can be done, for example, by using the linkerd CLI: + ```shell + kubectl create ns test --dry-run=client -o yaml | linkerd inject - | kubectl apply -f - + ``` + +```shell +cat < Date: Wed, 1 Nov 2023 14:46:44 -0400 Subject: [PATCH 4/4] update roadmap Signed-off-by: Michael Kalantar --- docs/roadmap.md | 13 +++++-------- .../kubernetes-gateway-api/blue-green.md | 1 + 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/docs/roadmap.md b/docs/roadmap.md index 61361bd5..c7f8dbf5 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -9,11 +9,8 @@ hide: 1. Stabilizing Iter8 APIs for CNCF sandboxing 2. Autoscaling the metrics service -3. Install infrastructure components such as Istio -4. Install ML components such as KServe and KServe ModelMesh -5. Extend routing templates to include application management -6. Support multi-cluster installs -7. Open Data Hub tier 1 project -8. Metrics & evaluation for foundation model/LLM-based apps -9. Hyperparameter tuning for foundation model/LLM-based inference pipelines -10. Data/concept drift detection for ML models \ No newline at end of file +3. Support multi-cluster installs +4. Open Data Hub tier 1 project +5. Metrics & evaluation for foundation model/LLM-based apps +6. Hyperparameter tuning for foundation model/LLM-based inference pipelines +7. Data/concept drift detection for ML models \ No newline at end of file diff --git a/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md b/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md index ba2cdbdf..57880ae2 100644 --- a/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md +++ b/docs/tutorials/integrations/kubernetes-gateway-api/blue-green.md @@ -172,6 +172,7 @@ application: strategy: blue-green EOF ``` + ??? note "What is different?" The version label (`app.kubernetes.io/version`) of the primary version was updated. In a real world example, the image would also have been updated (with that from the candidate version).