Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Prometheus Remote Storage API #3902

Open
jkohen opened this Issue Mar 1, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@jkohen
Copy link
Contributor

jkohen commented Mar 1, 2018

As part of Google's continued commitment to the open source community, Kubernetes and all the CNCF projects, the Google Cloud Management team wants to support seamless, highly-available, and scalable ingestion for Prometheus clients. Our team wants to work closely with the Prometheus community to improve Prometheus' interoperability with public cloud services, by extending its ability to ingest data into other monitoring systems with rich querying and alerting capabilities.

Specifically, we are proposing to extend the Remote Storage API and the data pipeline that flows into it to support the full Prometheus (and eventually OpenMetrics) data model. We understand that this will require ongoing maintenance and we are committed to contribute engineering time to ensure long-term success. We believe that the common extensibility framework we propose will be easily reused by other major cloud providers.

Last, but not least, we see working with the Prometheus community as an extension of our ongoing collaboration on the OpenMetrics standard.

We look forward to engaging with the Prometheus maintainers to further advance Prometheus adoption and benefit the overall open source community. Google would be happy to host a conversation to discuss in more detail and outline possible next steps, whom should we be working with to make that happen?

Mini-design

Many of the technical use cases for wide cloud integration are already addressed by the Prometheus client but not the Prometheus server. We have looked at the remote storage API in Prometheus server, and we determined that Prometheus server and its remote storage API will require enhancement to meet public cloud use cases (see the table below for a full set of requirements). We are willing to contribute the changes needed to make an integration via the remote storage API viable.

Property Prometheus client exposes Prometheus server exposes
Element type (int64, double) double double
Metric kind (gauge, cumulative/counter, distribution) Yes Nothing
Metric and label description Metric description Nothing
Samples with monotonically increasing timestamps Yes, assuming client exposes monotonically increasing timestamps. Yes, apparently. QueueManager states the need to preserve this invariant on resharding.
Reset timestamp for cumulative metrics No. No.
Atomic histogram Yes, as multiple metrics that can be combined with expfmt.TextParser.TextToMetricFamilies Sometimes. The metrics that make up a given histogram aren't guaranteed to arrive in a single WriteRequest (see previous row).
Inventory data from discovery stage N/A Yes, subject to Prometheus relabelling. Integrations would benefit from access to the unfiltered inventory data.
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 1, 2018

As I said when you last asked about this 2 months ago, if you want to do this the way to integrate is via the client_golang parser which can be found at https://github.com/prometheus/common/tree/master/expfmt - not via Prometheus itself. It already provides you with everything you ask for, and does not require the deeply invasive changes that what you request implies.

Prometheus is intended to be a monitoring system in and of itself, not merely as the ingestion component to other monitoring systems. Remote write is intended primarily as a way to facilitate long term storage for Prometheus, and does not require the features you mention for this goal.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Mar 3, 2018

I think the fundamental point you are trying to make here @brian-brazil is that the remote API is not intended to be stream-out protocol for SaaS integrations to begin with. We are certainly not making this clear enough and see more and more solutions being built onto this by now. It simply results in multiple semantical incorrectness.

So the actual discussion we should be having is what a proper API for that use case would look like and where it could fit. If we don't provide a way to do this, people will keep abusing the remote API.

The metric metadata aspect is arguably a distinct issue. I know you are strongly against it @brian-brazil but there are multiple people, @beorn7 and me included, that see value in it, so we should discuss it.
It's perfectly okay to be best effort, e.g. in the face of changes, but having something would be valuable.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 3, 2018

The metric metadata aspect is arguably a distinct issue. I know you are strongly against it @brian-brazil but there are multiple people, @beorn7 and me included, that see value in it, so we should discuss it.

There's certainly debate there, but even were we to resolve that then the remote API would be the last place to get that information. Doing it in remote API first would be the tail wagging the dog, as Prometheus currently has no use for such information from remote storage and the bandwidth&performance implications would be problematic.

So the actual discussion we should be having is what a proper API for that use case would look like and where it could fit. If we don't provide a way to do this

We already have that in the parsers, which we already provide for exactly this use case - and indeed encourage their use by the various other monitoring systems out there using our format.

The Stackdriver use case here is that they like the format, but not the server. Thus they should use the format directly rather than trying to turn Prometheus into Stackdriver.

@keilw

This comment has been minimized.

Copy link

keilw commented Jul 4, 2018

The metric metadata aspect cannot be ignored if OpenMetrics or similar efforts were really meant to work across vendors and clouds. Otherwise you end up with mishaps like the Mars Climate Orbiter and similar costs or worse by data and unit misinterpretation. Grafana has very strong Units of Measurement support and even considers to go further, it goes far beyond what Prometheus offers today. And a true open metrics standard will likely have to follow that example if it wants to be successful.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 4, 2018

This issue is unrelated to OpenMetrics, as this issue is about Prometheus internals rather than that OpenMetrics format which we will ingest.

@keilw

This comment has been minimized.

Copy link

keilw commented Jul 4, 2018

Even if it's unrelated to OpenMetrics, I would say storing metadata is advisable.
Many do, OpenCensus even has a strong notion of Measure (somewhat similar to Quantity or Measurement in JSR 363 and its implementations, the way that's used in Parfait or what MP Metrics does)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.