-
Notifications
You must be signed in to change notification settings - Fork 6
Rewards #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Rewards #93
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
2af3986
experiment with reward
kalantar bb73166
update index
kalantar ae90e3c
wordsmith and spelling
kalantar 20631b4
wordsmith and spelling
kalantar 1b378ed
mockoon configuration
kalantar 23e39d7
udpate reference in rewards
kalantar 545d37f
update links
kalantar fa42879
add explanation
kalantar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,7 +50,9 @@ LitmusChaos | |
| localhost | ||
| minikube | ||
| MLOps | ||
| mockoon | ||
| modelmesh | ||
| msec | ||
| namespace | ||
| namespaces | ||
| NewRelic | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| --- | ||
| template: main.html | ||
| --- | ||
|
|
||
| # A/B/n Experiments with Rewards | ||
|
|
||
| This tutorial describes how to use Iter8 to evaluate two or more versions on an application or ML model to identify the "best" version according to some reward metric(s). | ||
|
|
||
| A reward metric is a metric that measures the benefit or profit of a version of an application or ML model. Reward metrics are usually application or model specific. User engagement, sales, and net profit are examples. | ||
|
|
||
| ## Assumptions | ||
|
|
||
| We assume that you have deployed multiple versions of an application (or ML model) with the following characteristics: | ||
|
|
||
| - There is a way to route user traffic to the deployed versions. This might be done using the Iter8 SDK, the Iter8 traffic control features, or some other mechanism. | ||
| - Metrics, including reward metrics, are being exported to a metrics store such as Prometheus. | ||
| - Metrics can be retrieved from the metrics store by application (model) version. | ||
|
|
||
| In this tutorial, we mock a Prometheus service and demonstrate how to write an Iter8 experiment that evaluates reward metrics. | ||
|
|
||
| ## Mock Prometheus | ||
|
|
||
| For simplicity, we use [mockoon](https://mockoon.com/) to create a mocked Prometheus service instead of deploying Prometheus itself: | ||
|
|
||
| ```shell | ||
| kubectl create deploy prometheus-mock \ | ||
| --image mockoon/cli:latest \ | ||
| --port 9090 \ | ||
| -- mockoon-cli start --daemon-off \ | ||
| --port 9090 \ | ||
| --data https://raw.githubusercontent.com/kalantar/docs/rewards/samples/abn/model-prometheus-abn-tutorial.json | ||
| kubectl expose deploy prometheus-mock --port 9090 | ||
| ``` | ||
|
|
||
| ## Define template | ||
|
|
||
| Create a [_provider specification_](../../user-guide/tasks/custommetrics.md#provider-spec) that describes how Iter8 should fetch each metric value from the metrics store. The specification provides information about the provider URL, the HTTP method to be used, and any common headers. Furthermore, for each metric, there is: | ||
| - metadata, such as name, type and description, | ||
| - HTTP query parameters, and | ||
| - a jq expression describing how to extract the metric value from the response. | ||
|
|
||
| For example, a specification for the mean latency metric from Prometheus can look like the following: | ||
|
|
||
| ``` | ||
| metric: | ||
| - name: latency-mean | ||
| type: gauge | ||
| description: | | ||
| Mean latency | ||
| params: | ||
| - name: query | ||
| value: | | ||
| (sum(last_over_time(revision_app_request_latencies_sum{ | ||
| {{- template "labels" . }} | ||
| }[{{ .elapsedTimeSeconds }}s])) or on() vector(0))/(sum(last_over_time(revision_app_request_latencies_count{ | ||
| {{- template "labels" . }} | ||
| }[{{ .elapsedTimeSeconds }}s])) or on() vector(0)) | ||
| jqExpression: .data.result[0].value[1] | tonumber | ||
| ``` | ||
|
|
||
| Note that the template is parameterized. Values are provided by the Iter8 experiment at run time. | ||
|
|
||
| A sample provider specification for Prometheus is provided [here](https://gist.githubusercontent.com/kalantar/80c9efc0fd4cc34572d893cc82bdc4d2/raw/f3629aa62cdc9fd7e39ee2b6b113a8bf7b6b4463/model-prometheus-abn-tutorial.tpl). | ||
|
|
||
| It describes the following metrics: | ||
|
|
||
| - request-count | ||
| - latency-mean | ||
| - profit-mean | ||
|
|
||
| ## Launch experiment | ||
|
|
||
| ```shell | ||
| iter8 k launch \ | ||
| --set "tasks={custommetrics,assess}" \ | ||
| --set custommetrics.templates.model-prometheus="https://gist.githubusercontent.com/kalantar/80c9efc0fd4cc34572d893cc82bdc4d2/raw/f3629aa62cdc9fd7e39ee2b6b113a8bf7b6b4463/model-prometheus-abn-tutorial.tpl" \ | ||
| --set custommetrics.values.labels.model_name=wisdom \ | ||
| --set 'custommetrics.versionValues[0].labels.mm_vmodel_id=wisdom-1' \ | ||
| --set 'custommetrics.versionValues[1].labels.mm_vmodel_id=wisdom-2' \ | ||
| --set assess.SLOs.upper.model-prometheus/latency-mean=50 \ | ||
| --set "assess.rewards.max={model-prometheus/profit-mean}" \ | ||
| --set runner=cronjob \ | ||
| --set cronjobSchedule="*/1 * * * *" | ||
| ``` | ||
|
|
||
| This experiment executes in a [loop](../../user-guide/topics/parameters.md), once every minute. It uses the [`custommetrics` task](../../user-guide/tasks/custommetrics.md) to read metrics from the (mocked) Prometheus provider. Finally, the [`assess` task](../../user-guide/tasks/assess.md) verifies that the `latency-mean` is below 50 msec and identifies which version provides the greatest reward; that is, the greatest mean profit. | ||
|
|
||
| ## Inspect experiment report | ||
|
|
||
| === "Text" | ||
| ```shell | ||
| iter8 k report | ||
| ``` | ||
| === "HTML" | ||
| ```shell | ||
| iter8 k report -o html > report.html # view in a browser | ||
| ``` | ||
|
|
||
| Because the experiment loops, the reported results will change over time. | ||
|
|
||
| *** | ||
|
|
||
| ## Cleanup | ||
|
|
||
| Delete the experiment: | ||
|
|
||
| ```shell | ||
| iter8 k delete | ||
| ``` | ||
|
|
||
| Terminate the mocked Prometheus service: | ||
|
|
||
| ```shell | ||
| kubectl delete deploy/prometheus-mock svc/prometheus-mock | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"uuid":"010a623b-dcbe-499c-a964-5501b725e663","lastMigration":25,"name":"Prometheus (model)","endpointPrefix":"api/v1/","latency":0,"port":9090,"hostname":"0.0.0.0","folders":[],"routes":[{"uuid":"387e3484-79f3-4844-8228-4cc2700a24d6","documentation":"","method":"get","endpoint":"query","responses":[{"uuid":"dc1c57ee-fe48-47f3-846e-8f67a9ac38e8","body":"{\n \"response\": \"wisdom-1: request-count\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 0 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: request-count","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"fa57be05-b2b1-4284-bf21-7d7a8fc3c779","body":"{\n \"response\": \"wisdom-1: request-count\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 0 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: request-count","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"2e36070b-162b-4af5-81c6-0df83ab2503c","body":"{\n \"response\": \"v1: latency-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ float 0 50 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: latency-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"model_request_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"\\)\\s*/\\s*\\(","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"9e7e7ef3-7aad-46bd-a469-2bed8c90917f","body":"{\n \"response\": \"v2: latency-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ float 0 50 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: latency-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"\\)\\s*/\\s*\\(","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"00e55214-d6f6-414a-8b52-10b202fef479","body":"{\n \"response\": \"v1: profit-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 10 80 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: profit-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"profit_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"e2a07264-2c5e-4877-993b-750296a31dab","body":"{\n \"response\": \"v2: profit-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 5 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: profit-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"profit_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"785190e8-3e45-4e7f-9352-fe8e06a4928b","body":"{\n \"response\": \"unable to identify query\"\n \"query\": \"{{ queryParam 'query' }}\",\n}","latency":0,"statusCode":400,"label":"unmatched query","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[],"rulesOperator":"OR","disableTemplating":false,"fallbackTo404":false,"default":true},{"uuid":"566f29dc-0bff-4fa9-8449-fb4b37e8f6df","body":"{}","latency":0,"statusCode":200,"label":"","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[],"rulesOperator":"OR","disableTemplating":false,"fallbackTo404":false,"default":false}],"enabled":true,"responseMode":null}],"rootChildren":[{"type":"route","uuid":"387e3484-79f3-4844-8228-4cc2700a24d6"}],"proxyMode":false,"proxyHost":"","proxyRemovePrefix":false,"tlsOptions":{"enabled":false,"type":"CERT","pfxPath":"","certPath":"","keyPath":"","caPath":"","passphrase":""},"cors":true,"headers":[{"key":"Content-Type","value":"application/json"}],"proxyReqHeaders":[{"key":"","value":""}],"proxyResHeaders":[{"key":"","value":""}],"data":[]} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.