New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return real pend value in erlang:dist_get_stat/1 #2270
Conversation
Only the dist_util code is using this function and it already is compatible with a non-boolean value.
@gerhard is this is your tailor-made or some shared dashboard? |
It's a dashboard that we - the RabbitMQ team - plan on sharing, most likely via grafana.com. It is not RabbitMQ specific, it will work with any Erlang cluster that runs prometheus.erl. The Erlang part was done in deadtrickster/prometheus.erl#92 This dashboard fits under a wider Observability initiative within RabbitMQ, all code currently lives under https://github.com/rabbitmq/rabbitmq-prometheus/tree/master/docker. To get it all up and running, check the Makefile in the parent dir, the up target is what spins everything up locally. |
It appears that the value is not the number of messages as previously thought but the number of bytes to be sent. Indeed it corresponds to the |
We have deployed a 3-node Erlang cluster with this patch applied and wired everything together, this is what the end-result looks like (notice the Data buffered int he distribution links queue panel): These are all the relevant
We are running on Is there anything else that you need from us before merging & cutting a new OTP release with this in? |
This was done so that we can validate erlang/otp#2270
We will leave this function undocumented (and thus subject to change without prior notice). This because changes that we may do in the future can make it impossible (or very expensive) to answer the question of how much data is in the queue. |
@garazdawi I think that observability should be an important design aspect. It would be very useful to have this metric or something that can be a reasonably close substitute. |
As long as we can capture the busy dist limit buffer - how much of The end-goal is to quantify how busy a particular distribution link is, and to know when this is becoming a bottleneck in an Erlang cluster. Can you suggest a better/different way of going about it? This is what this patch gives us, which is pretty good: |
Add node_queue_size_bytes metric to dist collector. erlang/otp#2270 is available since Erlang/OTP 22.1, released 17th of September 2019, time to ship this feature 🚢 Thanks @essen!
Only the dist_util code is using this function and it already
is compatible with a non-boolean value.
We are interested in using this value as a metric to know
how large the distribution output queue is when we encounter
distribution-related issues.
/cc @gerhard