Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to handle aggregated sums from federated prometheus #1490

Closed
filippog opened this Issue Mar 16, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@filippog
Copy link

filippog commented Mar 16, 2016

hi,
I'm testing prometheus federation and aggregation, namely I have a "global" prometheus at https://prometheus.wmflabs.org pulling all metrics matching '^project:' from several per-project prometheus servers, e.g. https://graphite-prometheus.wmflabs.org/status. In this context a "project" is a simply collection of VMs, following the job: example in the federation documentation. Currently project: metrics are generated via rules on the federated servers e.g. project:node_network_receive_bytes{} = sum(node_network_receive_bytes{device!="lo"}) BY (device) and the idea being to pull only aggregated metrics from federated servers.

Normally this works but with some caveats e.g. rate(project:node_network_receive_bytes[5m]) on the global prometheus will artificially jump when a new target is added to the federated servers since that's now rate of sums instead of sum of rates. Jump on the global server at the same time a new target came up

What's the recommendation in this case? Having the 5m rate as a recording rule would achieve that I think? thanks!

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 16, 2016

You should be aggregating up on the project prometheus servers:

project_device:node_network_receive_bytes:rate5m = sum by (project, device) (rate(node_network_receive_bytes{job="node"}[5m])))

Never take the rate of a sum, as it causes the problem you've encountered.

@filippog

This comment has been minimized.

Copy link
Author

filippog commented Mar 16, 2016

thanks @brian-brazil ! btw I'm happy to turn this issue into a documentation improvement, e.g. linking "best practices -> recording rules" from "federation", I'm fine to close it too

@grobie grobie closed this Apr 14, 2016

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.