New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix remote write metrics #300
Fix remote write metrics #300
Conversation
Previously, only failed requests were tracked; this makes it difficult to know how many requests were actually attempted. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
This commit adds a new metric to track the number of remote write requests made when handling the v2 endpoint. It also disambiguates the metric name for forwarded v1 requests. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
Help: "Total amount of samples successfully forwarded", | ||
}) | ||
forwardErrors = prometheus.NewCounter(prometheus.CounterOpts{ | ||
Name: "telemeter_forward_request_errors_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have alerts for these metrics? If so we should probably change these as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll investigate. In any case, there are no other references to those metrics in this repo, so we'lll need to make adjustments in either obs/cfg or the deployment repos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, confirmed we don't have any alerts or recording rules referencing these metrics today. eventually we'll want to add some alerts based on the error rates tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
/retest |
3 similar comments
/retest |
/retest |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: metalmatze, squat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Today, we only track the number of errors encountered when forwarding requests from the v1 upload endpoint to thanos. We don't track the total amount of requests made from the v1 upload endpoint handler. We also don't track any requests made when forwarding requests from the v2 endpoint to thanos.
This PR adds these metrics to give us better visibility during another incident.
cc @metalmatze @brancz @bwplotka @kakkoyun