Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kubernetes Monitoring Mixin #5

Merged
merged 16 commits into from
Sep 2, 2023
Merged

Add Kubernetes Monitoring Mixin #5

merged 16 commits into from
Sep 2, 2023

Conversation

ishanjainn
Copy link
Member

@ishanjainn ishanjainn commented Aug 26, 2023

Screenshot 2023-08-25 at 12 49 24 PM
Screenshot 2023-08-25 at 12 49 33 PM

@ishanjainn ishanjainn changed the title Mixin Add Kubernetes Monitoring Mixin Aug 26, 2023
@ishanjainn ishanjainn requested a review from rgeyer August 28, 2023 15:08
Copy link

@rgeyer rgeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some leading questions about the type of metrics you're using, and how exactly tokens are handled (and thus, what sort of query and thresholds make sense for the alerts).

I know very little about OpenAI in particular, which is why I ask the questions I do. If you've already considered these things, you can ignore my comments :)

Overall, no notes on the design of the dashboard.

Comment on lines 11 to 15
- HighCompletionTokensUsage
- HighPromptTokensUsage
- HighTotalTokensUsage
- LongRequestDuration
- HighUsageCost
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alerts for these have (apparently?) arbitrary numeric values. Could these be configurable?

Taking a quick look at the OpenAI tokens system, it looks like the maximum number of tokens per-request is 4097.

Some questions for which I do not have answers, but could inform these numbers.

Are the token metrics gauges or counters? If they're counters, a rate should be used instead of sum.

Can a single model process more than one request at once? If so, the sum of these over 5m could handle roughly 2 requests every 5min (assuming they're gauges, with a 5+min range).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the metrics are gauges
No, Can just make a single request at a time

"values": false
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum by(job) (count_over_time(openai_promptTokens{job=~\"$job\"}[$__interval]))",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompt tokens are not the same as requests. Is there a bespoke request counter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didnt get this

"useBackend": false
}
],
"title": "Total Tokens vs Request Duration",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have both of these on the same timeseries? Request duration is likely to be less than 100 seconds, while average total tokens per request may be over 1000. A 10x scale difference will be hard to grok on the same graph

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah axis wise I agree it looks a bit off, But I just wanted to have a correlation panel, Is there a way we can do this better?

"useBackend": false
"disableTextWrap": false,
"editorMode": "code",
"expr": "openai_totalTokens{job=~\"$job\"}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not clear if this is a counter or a gauge?

If it is a gauge, it will only have the value of total available tokens when the metrics is sampled?

If it is a counter, some sort of increase or rate function needs to be applied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a gauge, it will only have the value of total available tokens when the metrics is sampled?

Yup this is a gauge, So totalToken is a sum of completition tokens and prompt tokens, Hence used it directly against duration in this panel

@ishanjainn ishanjainn merged commit 18395c8 into main Sep 2, 2023
7 checks passed
@ishanjainn ishanjainn deleted the mixin branch September 2, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants