Inconsistent node disk space and memory with metric aggregation #2513

logicbomb421 · 2020-06-23T19:19:00Z

I am seeing some oddness when I look at disk space (rabbitmq_disk_space_available_bytes) and memory (rabbitmq_process_resident_memory_bytes) metrics in a multi-node cluster. It seems the value returned hops between all nodes, which makes charting the value rather odd. After reading the documentation on metric aggregation and enabling prometheus.return_per_object_metrics, I expected to see these metrics expand into one per node, however that does not appear to be the case.

I would be very grateful if someone could explain how to view these two metrics per node.

Node Configuration

Example of `rabbitmq_disk_space_available_bytes`

Example of `rabbitmq_process_resident_memory_bytes`

Thanks!

The text was updated successfully, but these errors were encountered:

michaelklishin · 2020-06-23T20:32:40Z

Node metrics should be excluded from aggregation as 100K nodes is not a feasible scenario. @gerhard @dcorbacho can we be aggregating these as well?

References #2512.

gerhard · 2020-06-24T15:10:21Z

rabbitmq_disk_space_available_bytes & rabbitmq_process_resident_memory_bytes are per node, there is nothing to aggregate.

The problem that I suspect your are hitting is the lack of labels on those metrics, so that they can be distinguished between nodes. This is the correct approach in Prometheus, where we use a on(instance) group_left rabbitmq_identity_info operation on all metrics that need to be grouped by node & cluster. To be more exact, this is how the rabbitmq_disk_space_available_bytes metric gets queried:

rabbitmq_disk_space_available_bytes * on(instance) group_left(rabbitmq_cluster, rabbitmq_node) rabbitmq_identity_info{rabbitmq_cluster="$rabbitmq_cluster"}

The {{rabbitmq_node}} in the Legend splits all metrics per node when visualised in Grafana. This is what that looks like:

I am not familiar with the service that you are using for visualising metrics, but I would recommend using a PromQL equivalent to get the metrics in the right format. We are implementing Prometheus exporter best practices, even if it was hard work at the time: prometheus/docs#1414

If this answer addresses your question, please close the issue @logicbomb421. Thanks!

michaelklishin · 2021-02-02T17:03:45Z

We don't think these are aggregated. They may be missing labels. @logicbomb421 has the above suggestion from @gerhard helped? Are the charts from GCP (Google Cloud Platform)? If that's the case, it sounds like GCP is unaware of the fact that these metrics are node-specific.

gerhard · 2021-02-02T17:22:51Z

I am not sure how you are querying those metrics, but if you are using a loadbalancer or service to query rather than the nodes directly, then the values will change based on which node services the request.

prometheus. return_per_object_metrics applies to multiple metrics within a single node, like channels, queues etc. There will be a single rabbitmq_process_resident_memory_bytes metric per node, regardless of what prometheus. return_per_object_metrics is set to.

I'm assuming that this issue is solved for you @logicbomb421 👍🏻

gerhard transferred this issue from rabbitmq/rabbitmq-prometheus Nov 13, 2020

michaelklishin mentioned this issue Feb 2, 2021

Aggregated queue_messages_published_total metric violates Prometheus expectations about counters #2783

Closed

michaelklishin closed this as completed Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent node disk space and memory with metric aggregation #2513

Inconsistent node disk space and memory with metric aggregation #2513

logicbomb421 commented Jun 23, 2020

michaelklishin commented Jun 23, 2020

gerhard commented Jun 24, 2020

michaelklishin commented Feb 2, 2021

gerhard commented Feb 2, 2021

Inconsistent node disk space and memory with metric aggregation #2513

Inconsistent node disk space and memory with metric aggregation #2513

Comments

logicbomb421 commented Jun 23, 2020

Node Configuration

Example of rabbitmq_disk_space_available_bytes

Example of rabbitmq_process_resident_memory_bytes

michaelklishin commented Jun 23, 2020

gerhard commented Jun 24, 2020

michaelklishin commented Feb 2, 2021

gerhard commented Feb 2, 2021

Example of `rabbitmq_disk_space_available_bytes`

Example of `rabbitmq_process_resident_memory_bytes`