Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose `up_info` metric with label for potential errors #3068

Closed
RichiH opened this Issue Aug 14, 2017 · 9 comments

Comments

Projects
None yet
2 participants
@RichiH
Copy link
Member

RichiH commented Aug 14, 2017

up is obviously somewhat special. While it's possible to navigate to /targets, wait for the target list to load, and then search for the job/instance having issues, that's hardly a nice way to get at what's the underlying issue. /api/v1/targets exists as well, but that also lives outside of what PromQL can access.

It would be simpler to have an up_info or similar which could expose the current error state and use that as part of alerting.

This comes close to event logging, but I would argue that this is a somewhat special case as it's built into Prometheus so it's hard to get context, otherwise.

https://github.com/RichiH/OpenMetrics/issues/3 is somwhat related to this FR.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 14, 2017

This is event logging, which doesn't belong as a metric. There's also a really high bar for adding any metric to be ingested on every scrape, and a metric with unbounded cardinality is not likely to meet that bar.

This was already discussed on #2317

@RichiH

This comment has been minimized.

Copy link
Member Author

RichiH commented Aug 14, 2017

It's not unbounded as the source is Prometheus and it's only fixed strings.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 14, 2017

It's unbounded as it may contain arbitrary urls and error messages. This is something for logging and/or tracing.

@RichiH

This comment has been minimized.

Copy link
Member Author

RichiH commented May 28, 2018

Given our recent discussions in OpenMetrics about ENUM and STATESET, it is probably possible to find some middle ground in this use case.

I would love to get a somewhat wider discussion regarding this going, but maybe it's best to add something like this to the dev summit.

@RichiH RichiH reopened this May 28, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 28, 2018

OpenMetrics doesn't change anything, this is still event logging with unbounded cardinality. This can't be represented as an enum.

@RichiH

This comment has been minimized.

Copy link
Member Author

RichiH commented May 28, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 28, 2018

You are asking it to return dynamic text and URLs, as that's what the error from a failed scrape looks like.

I see what you are trying to do, but metrics are fundamentally not suitable for this use case. The only thing it is sane for us to expose as part of every single scrape that happens is up, anything beyond that is debug information which belongs elsewhere such as the Prometheus UI or logs.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 13, 2018

There's no information that wasn't considered originally, so closing again.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.