-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent node disk space and memory with metric aggregation #2513
Comments
Node metrics should be excluded from aggregation as 100K nodes is not a feasible scenario. @gerhard @dcorbacho can we be aggregating these as well? References #2512. |
The problem that I suspect your are hitting is the lack of labels on those metrics, so that they can be distinguished between nodes. This is the correct approach in Prometheus, where we use a
The I am not familiar with the service that you are using for visualising metrics, but I would recommend using a PromQL equivalent to get the metrics in the right format. We are implementing Prometheus exporter best practices, even if it was hard work at the time: prometheus/docs#1414 If this answer addresses your question, please close the issue @logicbomb421. Thanks! |
We don't think these are aggregated. They may be missing labels. @logicbomb421 has the above suggestion from @gerhard helped? Are the charts from GCP (Google Cloud Platform)? If that's the case, it sounds like GCP is unaware of the fact that these metrics are node-specific. |
I am not sure how you are querying those metrics, but if you are using a loadbalancer or service to query rather than the nodes directly, then the values will change based on which node services the request.
I'm assuming that this issue is solved for you @logicbomb421 👍🏻 |
I am seeing some oddness when I look at disk space (
rabbitmq_disk_space_available_bytes
) and memory (rabbitmq_process_resident_memory_bytes
) metrics in a multi-node cluster. It seems the value returned hops between all nodes, which makes charting the value rather odd. After reading the documentation on metric aggregation and enablingprometheus.return_per_object_metrics
, I expected to see these metrics expand into one per node, however that does not appear to be the case.I would be very grateful if someone could explain how to view these two metrics per node.
Node Configuration
Example of
rabbitmq_disk_space_available_bytes
Example of
rabbitmq_process_resident_memory_bytes
Thanks!
The text was updated successfully, but these errors were encountered: