Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't handle federation text output #54

Closed
brian-brazil opened this issue Sep 8, 2016 · 11 comments
Closed

Can't handle federation text output #54

brian-brazil opened this issue Sep 8, 2016 · 11 comments
Assignees

Comments

@brian-brazil
Copy link
Contributor

Prometheus can't scrape the text version of it's own federation output, e.g.

# TYPE cpu untyped
cpu{job="promclient_triton_5s",instance="container-metrics:9092",gs_instanceid="aabd3924",gs_svcuser="svcuser2",type="idle",gs_dc="us-east-1",gs_env="dev",id="prom-1"} 0.0886400014181114 1473307588759
# TYPE cpu untyped
cpu{instance="container-metrics:9092",gs_instanceid="aabd3924",gs_svcuser="svcuser2",type="wait",gs_dc="us-east-1",gs_env="dev",job="promclient_triton_5s",id="prom-1"} 0.6554035022739826 1473307588759
# TYPE cpu untyped
cpu{instance="container-metrics:9093",job="promclient_triton_5s",gs_dc="us-east-1",gs_env="dev",gs_instanceid="231deaa1",gs_svcuser="svcuser2",type="sys",id="prom-1"} 0.3062938875174634 1473307586226
# TYPE cpu untyped
cpu{job="promclient_triton_5s",gs_instanceid="fb310edz",gs_svcuser="svcuser1",type="user",gs_dc="us-east-1",instance="container-metrics:9091",gs_env="dev",id="prom-1"} 0.5589919972685718 1473307585070

From https://groups.google.com/forum/#!topic/prometheus-developers/ZVb2NjjUJ5A

@beorn7
Copy link
Member

beorn7 commented Sep 8, 2016

As said in the discussion, this is a problem on the generation side, not on the parse side, see https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details

I guess the problem is that the federation code creates one metric family per time series and doesn't bundle them. That's already wrong in the protobuf representation, and the text parser mirrors that.

@beorn7 beorn7 self-assigned this Sep 8, 2016
@fabxc
Copy link
Contributor

fabxc commented Sep 8, 2016

With the storage having no notion of metric families, that seems tricky to
do differently unless we want to apply some sorting to the entire output
first.

On Thu, Sep 8, 2016, 12:26 PM Björn Rabenstein notifications@github.com
wrote:

As said in the discussion, this is a problem on the generation side, not
on the parse side, see
https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details

I guess the problem is that the federation code creates one metric family
per time series and doesn't bundle them. That's already wrong in the
protobuf representation, and the text parser mirrors that.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#54 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEuA8obroFutazFrzaMF6PgP1QcVxDm8ks5qn-LRgaJpZM4J3mQj
.

@beorn7
Copy link
Member

beorn7 commented Sep 8, 2016

I'll look into how much effort that is. I could imagine it's worth the savings in bandwidth and the reduced effort on the receiving side.

We can also approach the problem from the other side (not mutually exclusive) by loosing the requirement of not repeating the TYPE and HELP lines (last mention wins).
(I vaguely remember that I had implemented it in the loose way back then but deliberately enforced it later because of the spec... there might have been a concrete reason to do so but I don't remember.)

@RichiH
Copy link
Member

RichiH commented Sep 8, 2016

Instead of ordering, keeping a list of already-exposed metric TYPEs might be more efficient.

Last one wins would not save any bandwidth.

@beorn7
Copy link
Member

beorn7 commented Sep 8, 2016

Last one wins would not save any bandwidth.

Yes, that's what I tried to say. There are two solutions to the problem, any would solve it, but both could be applied:

  1. Make the text format spec more lenient about TYPE and HELP (and implement accordingly). Easy fix, helps in other situations, too. But doesn't save bandwidth and decoding effort and might have unintended consequences (e.g. unintentionally duplicated metrics might go undetected).
  2. Dedup federation on the MetricFamily level. Has some cost but saves bandwidth and decoding cost.

beorn7 pushed a commit that referenced this issue Sep 8, 2016
This acknowledges the fact that duplicate metrics in the slice passed
to MetricFamilyToText will result in invalid text format.

This is meant to document the status quo and bears no implication for
the possible loosening of the text format spec, cf. #54.
@brian-brazil
Copy link
Contributor Author

This restriction has been annoying to deal with when writing exporters, as it prevents streaming. Especially for federation where some users are trying to dump entire Prometheus servers I think we should be going for approaches that don't require collating millions of time series.

Duplicate time series we'll catch anyway due to out-of-order sample detection.

@beorn7
Copy link
Member

beorn7 commented Sep 8, 2016

"Last TYPE or HELP wins" will once more prevent streaming on the ingestion side.
Another thing would be if we allowed multiple occurrences of TYPE or HELP but they must be consistent.
Or we do "first wins".

@brian-brazil
Copy link
Contributor Author

I don't think the exact semantics matter, and tbh I'd leave this explicitly unspecified.

@beorn7
Copy link
Member

beorn7 commented Sep 9, 2016

After another read of the spec, I ran into an explicit requirement to have unique metric family names even in the protobuf format: "Each MetricFamily within the same exposition must have a unique name." Also, we have more about the text format: "The TYPE line for a metric name has to appear before the first sample is reported for that metric name." I'd say even if we want to be more lenient, we have to clearly specify how to deal with dupes, and the above points more towards "first wins".

Conclusion: The spec is more explicit than initially thought. The current federation exposition is clearly in breach of the spec. Making the requirements more lenient (or even "unspecify" them, about which I would have a really bad feeling) would be a change of the format, breaking existing consumer implementations. That needed to be expressed in a version bump. We should anyway think about versioning a bit more, the 0.0.4 we have right now appears inappropriate for a stable API.

In any case, the course of action would be to first change and set the spec and then create implementations compliant with it (rather than creating non-compliant implementation and then adjust the spec to them).

@beorn7 beorn7 removed the bug label Sep 9, 2016
@beorn7
Copy link
Member

beorn7 commented Sep 9, 2016

We have a number of issues around the exposition format by now. How about moving them all into the docs repo, with a dedicated label? We should have clarity on the spec before starting any work, and the spec lives in the docs repo. So it might be the best point to track those issues.

@beorn7
Copy link
Member

beorn7 commented Sep 9, 2016

This is for now a specification issue. I have filed prometheus/docs#547 for that. Closing this for now, as the course of action in prometheus/common heavily depends on the decision made there.

@beorn7 beorn7 closed this as completed Sep 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants