-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Given the following RabbitMQ deployment:
- RabbitMQ v3.7.8
- Erlang/OTP v20.3.8.9
- Ubuntu 16.04.5 LTS
- Linux 4.15.0-32-generic x86_64
CPU utilisation is unpredictable when queues are mirrored
Given 10k idle queues mirrored across all nodes in a 3-node GCP n1-highcpu-8 cluster,
RabbitMQ nodes utilise CPU (user + system) in an unpredictable and arbitrary way:
dstat --cpu 60 1
| CPU (60s avg) | rmq0 (initial Master) | rmq1 (Slave1) | rmq2 (Slave2) |
|---|---|---|---|
| user | 35% | 62% | 29% |
| system | 18% | 8% | 18% |
| idle | 48% | 30% | 53% |
This is what happens to CPU utilisation (user + system) when rmq 1 (initial Master) gets stopped via rabbitmqctl shutdown:
| CPU (60s avg) | rmq0 (stopped) | rmq1 (promoted Master) | rmq2 (Slave1) |
|---|---|---|---|
| user | 0% | 64% | 30% |
| system | 0% | 7% | 19% |
| idle | 100% | 29% | 50% |
And this is what happens when the rmq1 (promoted Master) gets stopped via rabbitmqctl shutdown:
| CPU (60s avg) | rmq0 (stopped) | rmq1 (stopped) | rmq2 (promoted Master) |
|---|---|---|---|
| user | 0% | 0% | 31% |
| system | 0% | 0% | 18% |
| idle | 100% | 100% | 51% |
We stop & restart the only node running rmq2 (restarted Master),
and notice significantly lower CPU utilisation (user + system) on this host (6% vs 49%):
| CPU (60s avg) | rmq0 (stopped) | rmq1 (stopped) | rmq2 (restarted Master) |
|---|---|---|---|
| user | 0% | 0% | 4% |
| system | 0% | 0% | 2% |
| idle | 100% | 100% | 94% |
After rmq1 (new Slave1) gets started,
we observe same CPU utilisation (user + system) on this host as before the stop (71% vs 70%):
| CPU (60s avg) | rmq0 (stopped) | rmq1 (new Slave1) | rmq2 (Master) |
|---|---|---|---|
| user | 0% | 64% | 4% |
| system | 0% | 7% | 2% |
| idle | 100% | 29% | 94% |
After rmq0 (new Slave2) gets started we observe lower CPU utilisation (user + system) on this host (5% vs 53%):
| CPU (60s avg) | rmq0 (new Slave 2) | rmq1 (Slave1) | rmq2 (Master) |
|---|---|---|---|
| user | 3% | 64% | 5% |
| system | 2% | 7% | 2% |
| idle | 95% | 29% | 93% |
After all nodes are restarted, we observe the same CPU utilisation across all hosts:
| CPU (60s avg) | rmq0 (restarted Slave 2) | rmq1 (restarted Slave1) | rmq2 (restarted Master) |
|---|---|---|---|
| user | 3% | 64% | 5% |
| system | 2% | 7% | 2% |
| idle | 95% | 29% | 93% |
After all nodes are restarted for the second time, we observe high CPU utilisation on all hosts:
| CPU (60s avg) | rmq0 (restarted Slave 2) | rmq1 (restarted Slave1) | rmq2 (restarted Master) |
|---|---|---|---|
| user | 32% | 64% | 33% |
| system | 20% | 8% | 18% |
| idle | 48% | 28% | 48% |
After all nodes are restarted for the third time, we observe low CPU utilisation on 2 out of 3 hosts:
| CPU (60s avg) | rmq0 (restarted Slave 2) | rmq1 (restarted Slave1) | rmq2 (restarted Master) |
|---|---|---|---|
| user | 4% | 64% | 10% |
| system | 2% | 7% | 5% |
| idle | 94% | 29% | 85% |
If we continue restarting all nodes sufficient times, they will all end up utilising 5-7% CPU (user + system), with no other changes.
In other words, starting & stopping all nodes sufficient times was observed to reduce the CPU utilisation from 50-70% to ~5-7%.
Does +sbwt none make a difference?
Yes, it does (1-3% vs 5-7% CPU utilisation) but some nodes exhibit high & unpredictable CPU utilisation when restarted.
With +sbwt none set and all nodes restarted for the first time:
| CPU (60s avg) | rmq0 (restarted Master) | rmq1 (restarted Slave2) | rmq2 (restarted Slave1) |
|---|---|---|---|
| user | 10% | 48% | 4% |
| system | 2% | 3% | 4% |
| idle | 88% | 50% | 92% |
rmq1 (restarted Slave2) msacc output per thread type with system time
rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'
Thread alloc aux bif busy_wait check_io emulator ets gc gc_full nif other port send sleep timers
Stats per type:
async 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%)100.00%(82.6%) 0.00%( 0.0%)
aux 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 6.24%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 93.76%( 0.6%) 0.00%( 0.0%)
dirty_cpu_sche 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%)100.00%( 5.2%) 0.00%( 0.0%)
dirty_io_sched 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%)100.00%( 6.5%) 0.00%( 0.0%)
scheduler 0.95%( 0.0%) 0.66%( 0.0%) 1.22%( 0.1%) 0.24%( 0.0%) 0.20%( 0.0%) 2.12%( 0.1%) 0.06%( 0.0%) 0.07%( 0.0%) 1.36%( 0.1%) 0.00%( 0.0%) 2.44%( 0.1%) 0.00%( 0.0%) 0.00%( 0.0%) 89.63%( 4.6%) 1.03%( 0.1%)
With +sbwt none set and all nodes restarted for the second time, CPU utilisation (user + system) drops to 1% on the slaves & stays at 27% on rmq0 (restarted Master):
| CPU (60s avg) | rmq0 (restarted Master) | rmq1 (restarted Slave2) | rmq2 (restarted Slave1) |
|---|---|---|---|
| user | 18% | 1% | 1% |
| system | 9% | 0% | 0% |
| idle | 73% | 99% | 99% |
rmq0 (restarted Master) msacc output per thread type with system time:
rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'
Thread alloc aux bif busy_wait check_io emulator ets gc gc_full nif other port send sleep timers
Stats per type:
async 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%)100.00%(82.6%) 0.00%( 0.0%)
aux 0.00%( 0.0%) 0.04%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 4.66%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 95.30%( 0.6%) 0.00%( 0.0%)
dirty_cpu_sche 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 15.65%( 0.8%) 0.00%( 0.0%) 0.35%( 0.0%) 0.08%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 83.93%( 4.3%) 0.00%( 0.0%)
dirty_io_sched 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%) 0.00%( 0.0%)100.00%( 6.5%) 0.00%( 0.0%)
scheduler 0.89%( 0.0%) 0.57%( 0.0%) 1.51%( 0.1%) 0.18%( 0.0%) 0.17%( 0.0%) 2.69%( 0.1%) 0.65%( 0.0%) 0.07%( 0.0%) 1.22%( 0.1%) 0.00%( 0.0%) 2.04%( 0.1%) 0.00%( 0.0%) 0.00%( 0.0%) 88.99%( 4.6%) 1.01%( 0.1%)
With +sbwt none set and only rmq0 (restarted Slave2) restarted, CPU utilisation (user + system) drops to 1-3%:
| CPU (60s avg) | rmq0 (restarted Slave2) | rmq1 (restarted Master) | rmq2 (restarted Slave1) |
|---|---|---|---|
| user | 1% | 2% | 1% |
| system | 1% | 1% | 0% |
| idle | 98% | 97% | 99% |
rabbitmqctl -n rabbit@rmq0-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq0-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.
rabbitmqctl -n rabbit@rmq1-gm-gc list_queues pid slave_pids
<rabbit@rmq1-gm-gc.1.21866.0> [<rabbit@rmq2-gm-gc.1.32123.1>, <rabbit@rmq0-gm-gc.3.4944.1>]
<rabbit@rmq1-gm-gc.1.21911.0> [<rabbit@rmq2-gm-gc.1.32133.1>, <rabbit@rmq0-gm-gc.3.23329.0>]
...
rabbitmqctl -n rabbit@rmq2-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq2-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.
Other observations
- listing queues on slave nodes times out after 60s (default rabbitmqctl operation timeout)
- Mnesia overload messages are sometimes present across different nodes, regardless whether they are masters or slaves
- microstate accounting reports a similar breakdown regardless of CPU utilisation, but unusal utilisation in
dirty_cpu_schedulers - Erlang v21.0.9 doesn't make a difference
- using
aoffcbfas the memory allocation strategy for eheap & binary allocator does not make a difference to CPU utilisation (v3.7.6 change) - some evidence that hosts with high CPU utilisation exhibit high dirty pages
- initial evidence shows same behaviour on RabbitMQ v3.7.0, more observations required to be certain
Open questions
- Why does CPU utilisation differ between node restarts?
- Does RabbitMQ v3.7.0 exhibit the same behaviour?
- Does RabbitMQ v3.6.16 exhibit the same behaviour?
- Does RabbitMQ v3.6.6 exhibit the same behaviour? - Confirmed that it still exists in v3.6.15 on v19.1.5
- Do we observe the same behaviour with a 2-node RabbitMQ cluster?
- Do we observe the same behaviour when running latest stable Erlang/OTP 19.3.x?