Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making joins simpler in PromQL #3094

Open
RichiH opened this Issue Aug 20, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@RichiH
Copy link
Member

RichiH commented Aug 20, 2017

At yesterday's dev summit, and as part of the discussion around prometheus/snmp_exporter#180 , @brian-brazil also suggested making joins easier to write.

This issue is to document possible changes / enhancements to PromQL.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 20, 2017

I've pondered this previously, and haven't been able to come up with anything. Tooling on the Grafana end is likely going to be a more fruitful approach.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 21, 2017

Yes, we need this

That it is hard does not mean we don't need it. Solving this in Grafana is not enough, I am frequently missing this both during exploration and in alerting. Having to construct long instance match regexes is not a solution, it takes a long time to assemble, makes the queries completely unreadable, and for large lists plain fails.

Mockup (proposal)

I'm imagining a join on() operator that would work on time series like

metric1{thing="name1",some_property="an_endpoint"} 123
thing_info{thing="name1", thang="name2", other_property="a_label"} 1
thang_info{thang="name2, third_property="a_role"} 1

by matching on the labels in the on clause, taking the remaining labels from both sides (favouring left in case of conflict) and the value from the left side:

  • metric1 join on(thing) thing_info => metric1{thing="name1", some_property="an_endpoint", thang="name2", other_property="a_label"}
  • metric1 join on(thing) thing_info join on(thang) => `metric1{thing="name1", some_property="an_endpoint", thang="name2", other_property="a_label", , third_property="a_role"}

Open questions

What would the meaning of group_left / group_right be in this context? They could of course just be illegal.

How would I limit which labels get joined? Do I need this capability? It could be emulated with label_replace.

Real-world use-case

The above is distilled from the following use case: Given the request latency of an instance in Kubernetes, kube_pod_info which links the pod to the node, and kube_node_info which tells me the kernel version of each node, what is the 99th percentile latency for different kernel versions? By path and method?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2017

Having to construct long instance match regexes is not a solution, it takes a long time to assemble, makes the queries completely unreadable, and for large lists plain fails.

I'm not seeing how this is relevant to this issue.

taking the remaining labels from both sides (favouring left in case of conflict) and the value from the left side:

I'm not particularly in favour of such a joining, as the nature of info metrics is that they contain an unknown and changing set of label names over time. Thus taking all the labels from the right means you no longer know what the labels of the output are, and thus breaks without.

How would I limit which labels get joined? Do I need this capability?

That's what group_left already does, and this capability is essential for semantic sanity.

Given the request latency of an instance in Kubernetes, kube_pod_info which links the pod to the node, and kube_node_info which tells me the kernel version of each node, what is the 99th percentile latency for different kernel versions? By path and method?

I don't see how the proposed change makes this significantly easier than what we have already, in fact it's likely to cause problems due to also pulling in all the other labels.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 28, 2017

I'm not seeing how this is relevant to this issue.

It's how we work around joins not being simple.

you no longer know what the labels of the output are, and thus breaks without.

Curious, I use without exactly for the case where I don't know what the exact label set is.

I don't see how the proposed change makes this significantly easier than what we have already

Ah, so this is possible already? How?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 28, 2017

It's how we work around joins not being simple.

Can you explain exactly what you're doing right now? Selectors are orthogonal to joins.

Curious, I use without exactly for the case where I don't know what the exact label set is.

Thus why we need to be careful about keeping the set of in-play labels sufficiently knowable. Pulling in all labels from both sides of an expression where one side is an _info doesn't allow for that.

Ah, so this is possible already? How?

I'd need to see the exact metrics, but it'd be something like a * on (pod) group_left(kernel_version) (pod_info * on (node) group_left(kernel_version) node_info)

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 29, 2017

Can you explain exactly what you're doing right now? Selectors are orthogonal to joins.

Out of band, we collect lists of hostnames for whatever we're interested in, and concatenate them into instance=~"(host1|host2|host3|…).*". It's what Grafana does, but it's really awkward for exploration.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2017

That'd be pretty awkward alright.

@hanikesn

This comment has been minimized.

Copy link

hanikesn commented Dec 13, 2017

We're also having trouble with the current group_left behavior with regard to labels. E.g. we want to aggregate container cpu usage in our kubernetes cluster by labels placed on the pods. The problem is we don't know beforehand what labels our users will put on the pods. This works fine, when we can use kube_pod_info as the left hand and join with per pod metrics, but breaks when we want to get metric per service per container aggregation.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 14, 2017

If you don't know the names of the labels you want to aggregate by, you're going to find it fairly difficult to do anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.