Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uphow to handle aggregated sums from federated prometheus #1490
Comments
This comment has been minimized.
This comment has been minimized.
|
You should be aggregating up on the project prometheus servers:
Never take the rate of a sum, as it causes the problem you've encountered. |
brian-brazil
added
the
question
label
Mar 16, 2016
This comment has been minimized.
This comment has been minimized.
|
thanks @brian-brazil ! btw I'm happy to turn this issue into a documentation improvement, e.g. linking "best practices -> recording rules" from "federation", I'm fine to close it too |
grobie
closed this
Apr 14, 2016
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 24, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
filippog commentedMar 16, 2016
hi,
I'm testing prometheus federation and aggregation, namely I have a "global" prometheus at https://prometheus.wmflabs.org pulling all metrics matching '^project:' from several per-project prometheus servers, e.g. https://graphite-prometheus.wmflabs.org/status. In this context a "project" is a simply collection of VMs, following the
job:example in the federation documentation. Currentlyproject:metrics are generated via rules on the federated servers e.g.project:node_network_receive_bytes{} = sum(node_network_receive_bytes{device!="lo"}) BY (device)and the idea being to pull only aggregated metrics from federated servers.Normally this works but with some caveats e.g.
rate(project:node_network_receive_bytes[5m])on the global prometheus will artificially jump when a new target is added to the federated servers since that's now rate of sums instead of sum of rates. Jump on the global server at the same time a new target came upWhat's the recommendation in this case? Having the 5m rate as a recording rule would achieve that I think? thanks!