Skip to content

Erratic CPU utilization profile with a large number of mirrored queues #1713

@gerhard

Description

@gerhard

Given the following RabbitMQ deployment:

  • RabbitMQ v3.7.8
  • Erlang/OTP v20.3.8.9
  • Ubuntu 16.04.5 LTS
  • Linux 4.15.0-32-generic x86_64

CPU utilisation is unpredictable when queues are mirrored

Given 10k idle queues mirrored across all nodes in a 3-node GCP n1-highcpu-8 cluster,
RabbitMQ nodes utilise CPU (user + system) in an unpredictable and arbitrary way:

dstat --cpu 60 1

CPU (60s avg) rmq0 (initial Master) rmq1 (Slave1) rmq2 (Slave2)
user 35% 62% 29%
system 18% 8% 18%
idle 48% 30% 53%

This is what happens to CPU utilisation (user + system) when rmq 1 (initial Master) gets stopped via rabbitmqctl shutdown:

CPU (60s avg) rmq0 (stopped) rmq1 (promoted Master) rmq2 (Slave1)
user 0% 64% 30%
system 0% 7% 19%
idle 100% 29% 50%

And this is what happens when the rmq1 (promoted Master) gets stopped via rabbitmqctl shutdown:

CPU (60s avg) rmq0 (stopped) rmq1 (stopped) rmq2 (promoted Master)
user 0% 0% 31%
system 0% 0% 18%
idle 100% 100% 51%

We stop & restart the only node running rmq2 (restarted Master),
and notice significantly lower CPU utilisation (user + system) on this host (6% vs 49%):

CPU (60s avg) rmq0 (stopped) rmq1 (stopped) rmq2 (restarted Master)
user 0% 0% 4%
system 0% 0% 2%
idle 100% 100% 94%

After rmq1 (new Slave1) gets started,
we observe same CPU utilisation (user + system) on this host as before the stop (71% vs 70%):

CPU (60s avg) rmq0 (stopped) rmq1 (new Slave1) rmq2 (Master)
user 0% 64% 4%
system 0% 7% 2%
idle 100% 29% 94%

After rmq0 (new Slave2) gets started we observe lower CPU utilisation (user + system) on this host (5% vs 53%):

CPU (60s avg) rmq0 (new Slave 2) rmq1 (Slave1) rmq2 (Master)
user 3% 64% 5%
system 2% 7% 2%
idle 95% 29% 93%

After all nodes are restarted, we observe the same CPU utilisation across all hosts:

CPU (60s avg) rmq0 (restarted Slave 2) rmq1 (restarted Slave1) rmq2 (restarted Master)
user 3% 64% 5%
system 2% 7% 2%
idle 95% 29% 93%

After all nodes are restarted for the second time, we observe high CPU utilisation on all hosts:

CPU (60s avg) rmq0 (restarted Slave 2) rmq1 (restarted Slave1) rmq2 (restarted Master)
user 32% 64% 33%
system 20% 8% 18%
idle 48% 28% 48%

After all nodes are restarted for the third time, we observe low CPU utilisation on 2 out of 3 hosts:

CPU (60s avg) rmq0 (restarted Slave 2) rmq1 (restarted Slave1) rmq2 (restarted Master)
user 4% 64% 10%
system 2% 7% 5%
idle 94% 29% 85%

If we continue restarting all nodes sufficient times, they will all end up utilising 5-7% CPU (user + system), with no other changes.

In other words, starting & stopping all nodes sufficient times was observed to reduce the CPU utilisation from 50-70% to ~5-7%.

Does +sbwt none make a difference?

Yes, it does (1-3% vs 5-7% CPU utilisation) but some nodes exhibit high & unpredictable CPU utilisation when restarted.

With +sbwt none set and all nodes restarted for the first time:

CPU (60s avg) rmq0 (restarted Master) rmq1 (restarted Slave2) rmq2 (restarted Slave1)
user 10% 48% 4%
system 2% 3% 4%
idle 88% 50% 92%

rmq1 (restarted Slave2) msacc output per thread type with system time

rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'

        Thread    alloc           aux           bif     busy_wait      check_io      emulator           ets            gc       gc_full           nif         other          port          send         sleep        timers

Stats per type:
         async  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%(82.6%)  0.00%( 0.0%)
           aux  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  6.24%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 93.76%( 0.6%)  0.00%( 0.0%)
dirty_cpu_sche  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 5.2%)  0.00%( 0.0%)
dirty_io_sched  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 6.5%)  0.00%( 0.0%)
     scheduler  0.95%( 0.0%)  0.66%( 0.0%)  1.22%( 0.1%)  0.24%( 0.0%)  0.20%( 0.0%)  2.12%( 0.1%)  0.06%( 0.0%)  0.07%( 0.0%)  1.36%( 0.1%)  0.00%( 0.0%)  2.44%( 0.1%)  0.00%( 0.0%)  0.00%( 0.0%) 89.63%( 4.6%)  1.03%( 0.1%)

With +sbwt none set and all nodes restarted for the second time, CPU utilisation (user + system) drops to 1% on the slaves & stays at 27% on rmq0 (restarted Master):

CPU (60s avg) rmq0 (restarted Master) rmq1 (restarted Slave2) rmq2 (restarted Slave1)
user 18% 1% 1%
system 9% 0% 0%
idle 73% 99% 99%

rmq0 (restarted Master) msacc output per thread type with system time:

rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'

        Thread    alloc           aux           bif     busy_wait      check_io      emulator           ets            gc       gc_full           nif         other          port          send         sleep        timers

Stats per type:
         async  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%(82.6%)  0.00%( 0.0%)
           aux  0.00%( 0.0%)  0.04%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  4.66%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 95.30%( 0.6%)  0.00%( 0.0%)
dirty_cpu_sche  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 15.65%( 0.8%)  0.00%( 0.0%)  0.35%( 0.0%)  0.08%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 83.93%( 4.3%)  0.00%( 0.0%)
dirty_io_sched  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 6.5%)  0.00%( 0.0%)
     scheduler  0.89%( 0.0%)  0.57%( 0.0%)  1.51%( 0.1%)  0.18%( 0.0%)  0.17%( 0.0%)  2.69%( 0.1%)  0.65%( 0.0%)  0.07%( 0.0%)  1.22%( 0.1%)  0.00%( 0.0%)  2.04%( 0.1%)  0.00%( 0.0%)  0.00%( 0.0%) 88.99%( 4.6%)  1.01%( 0.1%)

With +sbwt none set and only rmq0 (restarted Slave2) restarted, CPU utilisation (user + system) drops to 1-3%:

CPU (60s avg) rmq0 (restarted Slave2) rmq1 (restarted Master) rmq2 (restarted Slave1)
user 1% 2% 1%
system 1% 1% 0%
idle 98% 97% 99%
rabbitmqctl -n rabbit@rmq0-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq0-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.

rabbitmqctl -n rabbit@rmq1-gm-gc list_queues pid slave_pids
<rabbit@rmq1-gm-gc.1.21866.0>   [<rabbit@rmq2-gm-gc.1.32123.1>, <rabbit@rmq0-gm-gc.3.4944.1>]
<rabbit@rmq1-gm-gc.1.21911.0>   [<rabbit@rmq2-gm-gc.1.32133.1>, <rabbit@rmq0-gm-gc.3.23329.0>]
...

rabbitmqctl -n rabbit@rmq2-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq2-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.

Other observations

  • listing queues on slave nodes times out after 60s (default rabbitmqctl operation timeout)
  • Mnesia overload messages are sometimes present across different nodes, regardless whether they are masters or slaves
  • microstate accounting reports a similar breakdown regardless of CPU utilisation, but unusal utilisation in dirty_cpu_schedulers
  • Erlang v21.0.9 doesn't make a difference
  • using aoffcbf as the memory allocation strategy for eheap & binary allocator does not make a difference to CPU utilisation (v3.7.6 change)
  • some evidence that hosts with high CPU utilisation exhibit high dirty pages
  • initial evidence shows same behaviour on RabbitMQ v3.7.0, more observations required to be certain

Open questions

  • Why does CPU utilisation differ between node restarts?
  • Does RabbitMQ v3.7.0 exhibit the same behaviour?
  • Does RabbitMQ v3.6.16 exhibit the same behaviour?
  • Does RabbitMQ v3.6.6 exhibit the same behaviour? - Confirmed that it still exists in v3.6.15 on v19.1.5
  • Do we observe the same behaviour with a 2-node RabbitMQ cluster?
  • Do we observe the same behaviour when running latest stable Erlang/OTP 19.3.x?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions