diff --git a/.github/wordlist.txt b/.github/wordlist.txt index 3b9fd240..d2a9fa0a 100644 --- a/.github/wordlist.txt +++ b/.github/wordlist.txt @@ -50,7 +50,9 @@ LitmusChaos localhost minikube MLOps +mockoon modelmesh +msec namespace namespaces NewRelic diff --git a/docs/tutorials/abn/rewards.md b/docs/tutorials/abn/rewards.md new file mode 100644 index 00000000..c86ad633 --- /dev/null +++ b/docs/tutorials/abn/rewards.md @@ -0,0 +1,115 @@ +--- +template: main.html +--- + +# A/B/n Experiments with Rewards + +This tutorial describes how to use Iter8 to evaluate two or more versions on an application or ML model to identify the "best" version according to some reward metric(s). + +A reward metric is a metric that measures the benefit or profit of a version of an application or ML model. Reward metrics are usually application or model specific. User engagement, sales, and net profit are examples. + +## Assumptions + +We assume that you have deployed multiple versions of an application (or ML model) with the following characteristics: + +- There is a way to route user traffic to the deployed versions. This might be done using the Iter8 SDK, the Iter8 traffic control features, or some other mechanism. +- Metrics, including reward metrics, are being exported to a metrics store such as Prometheus. +- Metrics can be retrieved from the metrics store by application (model) version. + +In this tutorial, we mock a Prometheus service and demonstrate how to write an Iter8 experiment that evaluates reward metrics. + +## Mock Prometheus + +For simplicity, we use [mockoon](https://mockoon.com/) to create a mocked Prometheus service instead of deploying Prometheus itself: + +```shell +kubectl create deploy prometheus-mock \ +--image mockoon/cli:latest \ +--port 9090 \ +-- mockoon-cli start --daemon-off \ +--port 9090 \ +--data https://raw.githubusercontent.com/kalantar/docs/rewards/samples/abn/model-prometheus-abn-tutorial.json +kubectl expose deploy prometheus-mock --port 9090 +``` + +## Define template + +Create a [_provider specification_](../../user-guide/tasks/custommetrics.md#provider-spec) that describes how Iter8 should fetch each metric value from the metrics store. The specification provides information about the provider URL, the HTTP method to be used, and any common headers. Furthermore, for each metric, there is: +- metadata, such as name, type and description, +- HTTP query parameters, and +- a jq expression describing how to extract the metric value from the response. + +For example, a specification for the mean latency metric from Prometheus can look like the following: + +``` +metric: +- name: latency-mean + type: gauge + description: | + Mean latency + params: + - name: query + value: | + (sum(last_over_time(revision_app_request_latencies_sum{ + {{- template "labels" . }} + }[{{ .elapsedTimeSeconds }}s])) or on() vector(0))/(sum(last_over_time(revision_app_request_latencies_count{ + {{- template "labels" . }} + }[{{ .elapsedTimeSeconds }}s])) or on() vector(0)) + jqExpression: .data.result[0].value[1] | tonumber +``` + +Note that the template is parameterized. Values are provided by the Iter8 experiment at run time. + +A sample provider specification for Prometheus is provided [here](https://gist.githubusercontent.com/kalantar/80c9efc0fd4cc34572d893cc82bdc4d2/raw/f3629aa62cdc9fd7e39ee2b6b113a8bf7b6b4463/model-prometheus-abn-tutorial.tpl). + +It describes the following metrics: + +- request-count +- latency-mean +- profit-mean + +## Launch experiment + +```shell +iter8 k launch \ +--set "tasks={custommetrics,assess}" \ +--set custommetrics.templates.model-prometheus="https://gist.githubusercontent.com/kalantar/80c9efc0fd4cc34572d893cc82bdc4d2/raw/f3629aa62cdc9fd7e39ee2b6b113a8bf7b6b4463/model-prometheus-abn-tutorial.tpl" \ +--set custommetrics.values.labels.model_name=wisdom \ +--set 'custommetrics.versionValues[0].labels.mm_vmodel_id=wisdom-1' \ +--set 'custommetrics.versionValues[1].labels.mm_vmodel_id=wisdom-2' \ +--set assess.SLOs.upper.model-prometheus/latency-mean=50 \ +--set "assess.rewards.max={model-prometheus/profit-mean}" \ +--set runner=cronjob \ +--set cronjobSchedule="*/1 * * * *" +``` + +This experiment executes in a [loop](../../user-guide/topics/parameters.md), once every minute. It uses the [`custommetrics` task](../../user-guide/tasks/custommetrics.md) to read metrics from the (mocked) Prometheus provider. Finally, the [`assess` task](../../user-guide/tasks/assess.md) verifies that the `latency-mean` is below 50 msec and identifies which version provides the greatest reward; that is, the greatest mean profit. + +## Inspect experiment report + +=== "Text" + ```shell + iter8 k report + ``` +=== "HTML" + ```shell + iter8 k report -o html > report.html # view in a browser + ``` + +Because the experiment loops, the reported results will change over time. + +*** + +## Cleanup + +Delete the experiment: + +```shell +iter8 k delete +``` + +Terminate the mocked Prometheus service: + +```shell +kubectl delete deploy/prometheus-mock svc/prometheus-mock +``` diff --git a/mkdocs.yml b/mkdocs.yml index 9ee85aff..217ef176 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -122,7 +122,9 @@ nav: - Load test gRPC with SLOs: tutorials/load-test-grpc.md - Load test multiple gRPC methods: tutorials/load-test-grpc-multiple.md - Chaos injection with SLOs: tutorials/chaos/slo-validation-chaos.md - - A/B experiments: tutorials/abn/abn.md + - A/B experiments: + - Iter8 SDK: tutorials/abn/abn.md + - Evaluating rewards: tutorials/abn/rewards.md - Automated experiments: tutorials/autox/autox.md - Custom metrics: - One version: tutorials/custom-metrics/one-version.md diff --git a/samples/abn/model-prometheus-abn-tutorial.json b/samples/abn/model-prometheus-abn-tutorial.json new file mode 100644 index 00000000..a3fbdfdc --- /dev/null +++ b/samples/abn/model-prometheus-abn-tutorial.json @@ -0,0 +1 @@ +{"uuid":"010a623b-dcbe-499c-a964-5501b725e663","lastMigration":25,"name":"Prometheus (model)","endpointPrefix":"api/v1/","latency":0,"port":9090,"hostname":"0.0.0.0","folders":[],"routes":[{"uuid":"387e3484-79f3-4844-8228-4cc2700a24d6","documentation":"","method":"get","endpoint":"query","responses":[{"uuid":"dc1c57ee-fe48-47f3-846e-8f67a9ac38e8","body":"{\n \"response\": \"wisdom-1: request-count\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 0 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: request-count","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"fa57be05-b2b1-4284-bf21-7d7a8fc3c779","body":"{\n \"response\": \"wisdom-1: request-count\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 0 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: request-count","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"2e36070b-162b-4af5-81c6-0df83ab2503c","body":"{\n \"response\": \"v1: latency-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ float 0 50 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: latency-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"model_request_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"\\)\\s*/\\s*\\(","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"9e7e7ef3-7aad-46bd-a469-2bed8c90917f","body":"{\n \"response\": \"v2: latency-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ float 0 50 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: latency-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"\\)\\s*/\\s*\\(","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"00e55214-d6f6-414a-8b52-10b202fef479","body":"{\n \"response\": \"v1: profit-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 10 80 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: profit-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"profit_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"e2a07264-2c5e-4877-993b-750296a31dab","body":"{\n \"response\": \"v2: profit-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 5 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: profit-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"profit_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"785190e8-3e45-4e7f-9352-fe8e06a4928b","body":"{\n \"response\": \"unable to identify query\"\n \"query\": \"{{ queryParam 'query' }}\",\n}","latency":0,"statusCode":400,"label":"unmatched query","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[],"rulesOperator":"OR","disableTemplating":false,"fallbackTo404":false,"default":true},{"uuid":"566f29dc-0bff-4fa9-8449-fb4b37e8f6df","body":"{}","latency":0,"statusCode":200,"label":"","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[],"rulesOperator":"OR","disableTemplating":false,"fallbackTo404":false,"default":false}],"enabled":true,"responseMode":null}],"rootChildren":[{"type":"route","uuid":"387e3484-79f3-4844-8228-4cc2700a24d6"}],"proxyMode":false,"proxyHost":"","proxyRemovePrefix":false,"tlsOptions":{"enabled":false,"type":"CERT","pfxPath":"","certPath":"","keyPath":"","caPath":"","passphrase":""},"cors":true,"headers":[{"key":"Content-Type","value":"application/json"}],"proxyReqHeaders":[{"key":"","value":""}],"proxyResHeaders":[{"key":"","value":""}],"data":[]} \ No newline at end of file