-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kubernetes Monitoring Mixin #5
Conversation
ishanjainn
commented
Aug 26, 2023
•
edited
edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some leading questions about the type of metrics you're using, and how exactly tokens are handled (and thus, what sort of query and thresholds make sense for the alerts).
I know very little about OpenAI in particular, which is why I ask the questions I do. If you've already considered these things, you can ignore my comments :)
Overall, no notes on the design of the dashboard.
kubernetes-mixin/README.md
Outdated
- HighCompletionTokensUsage | ||
- HighPromptTokensUsage | ||
- HighTotalTokensUsage | ||
- LongRequestDuration | ||
- HighUsageCost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alerts for these have (apparently?) arbitrary numeric values. Could these be configurable?
Taking a quick look at the OpenAI tokens system, it looks like the maximum number of tokens per-request is 4097.
Some questions for which I do not have answers, but could inform these numbers.
Are the token metrics gauges or counters? If they're counters, a rate should be used instead of sum.
Can a single model process more than one request at once? If so, the sum of these over 5m could handle roughly 2 requests every 5min (assuming they're gauges, with a 5+min range).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the metrics are gauges
No, Can just make a single request at a time
"values": false | ||
"disableTextWrap": false, | ||
"editorMode": "code", | ||
"expr": "sum by(job) (count_over_time(openai_promptTokens{job=~\"$job\"}[$__interval]))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prompt tokens are not the same as requests. Is there a bespoke request counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didnt get this
"useBackend": false | ||
} | ||
], | ||
"title": "Total Tokens vs Request Duration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have both of these on the same timeseries? Request duration is likely to be less than 100 seconds, while average total tokens per request may be over 1000. A 10x scale difference will be hard to grok on the same graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah axis wise I agree it looks a bit off, But I just wanted to have a correlation panel, Is there a way we can do this better?
"useBackend": false | ||
"disableTextWrap": false, | ||
"editorMode": "code", | ||
"expr": "openai_totalTokens{job=~\"$job\"}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not clear if this is a counter or a gauge?
If it is a gauge, it will only have the value of total available tokens when the metrics is sampled?
If it is a counter, some sort of increase or rate function needs to be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a gauge, it will only have the value of total available tokens when the metrics is sampled?
Yup this is a gauge, So totalToken is a sum of completition tokens and prompt tokens, Hence used it directly against duration in this panel