-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Export Ray Counter as Prometheus Counter metric #43795
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
rkooo567
reviewed
Mar 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM.
A couple high level comments;
- We should update the API doc
- Can we allow to disable this via env var? (and document it in the API doc)
rkooo567
approved these changes
Mar 8, 2024
8 tasks
@jjyao I assume you are going to follow up on a separate PR w/ documentation? Please make sure to update the serve monitoring page and docs for the serve metric wrappers |
8 tasks
jjyao
added a commit
that referenced
this pull request
Mar 13, 2024
Update the Serve Counter metric doc to mention the change in #43795 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
jjyao
added a commit
that referenced
this pull request
Mar 14, 2024
Update the Serve Counter metric doc to mention the change in #43795 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
8 tasks
ryanaoleary
pushed a commit
to ryanaoleary/ray
that referenced
this pull request
Jun 7, 2024
…3795) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
ryanaoleary
pushed a commit
to ryanaoleary/ray
that referenced
this pull request
Jun 7, 2024
Update the Serve Counter metric doc to mention the change in ray-project#43795 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Currently
ray.utils.metrics.Counter
is exported as Prometheus gauge metric which is the wrong metric type. However, directly fixing the metric type is backward incompatible since Prometheus changes the metric name for counter type (append_total
suffix).To address this issue, #41446 has a fix where we vendor Prometheus client and change its internal code to not append the
_total
suffix. I feel this is undesirable since it's a well known convention that counter has_total
suffix (it's even mandatory in OpenMetrics spec https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md, which is a revision of the original Prometheus exposition format and Prometheus fully supports it now and what's acutally why the Prometheus always append _total suffix for counter since it's required by OpenMetrics). By not following this convention, we may accidentally break systems that rely on it (https://stackoverflow.com/questions/75202155/how-can-i-prevent-micrometer-from-adding-total-suffix-to-counter-metric-name).Instead, this PR tries to keep backward compatibility in a different way by exporting both the wrong gauge metric and the correct counter metric for
ray.utils.metrics.Counter
. This will double the metrics but I searched in our codebase, we don't have that many Counter metrics (tens of them) so I think it's fine. Later on when we have a major release we can stop exporting the wrong gauge metric. This also gives users time to migrate their dashboards in the meantime.Related issue number
Closes #37768
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.