Erratic CPU utilization profile with a large number of mirrored queues

Given the following RabbitMQ deployment:

* RabbitMQ v3.7.8
* Erlang/OTP v20.3.8.9
* Ubuntu 16.04.5 LTS
* Linux 4.15.0-32-generic x86_64

## CPU utilisation is unpredictable when queues are mirrored

Given 10k idle queues mirrored across all nodes in a 3-node [GCP n1-highcpu-8](https://cloud.google.com/compute/pricing) cluster,
RabbitMQ nodes utilise CPU (user + system) in an unpredictable and arbitrary way:

`dstat --cpu 60 1`

| CPU (60s avg) | rmq0 (initial Master) | rmq1 (Slave1) | rmq2 (Slave2) |
|            -: |                    -: |            -: |            -: |
|          user |                   35% |           62% |           29% |
|        system |                   18% |            8% |           18% |
|          idle |                   48% |           30% |           53% |

This is what happens to CPU utilisation (user + system) when **rmq 1 (initial Master)** gets stopped via `rabbitmqctl shutdown`:

| CPU (60s avg) | rmq0 (stopped) | rmq1 (promoted Master) | rmq2 (Slave1) |
|            -: |             -: |                     -: |            -: |
|          user |             0% |                    64% |           30% |
|        system |             0% |                     7% |           19% |
|          idle |           100% |                    29% |           50% |

And this is what happens when the **rmq1 (promoted Master)** gets stopped via `rabbitmqctl shutdown`:

| CPU (60s avg) | rmq0 (stopped) | rmq1 (stopped) | rmq2 (promoted Master) |
|            -: |             -: |             -: |                     -: |
|          user |             0% |             0% |                    31% |
|        system |             0% |             0% |                    18% |
|          idle |           100% |           100% |                    51% |

We stop & restart the only node running **rmq2 (restarted Master)**,
and notice significantly lower CPU utilisation (user + system) on this host (6% vs 49%):

| CPU (60s avg) | rmq0 (stopped) | rmq1 (stopped) | rmq2 (restarted Master) |
|            -: |             -: |             -: |                      -: |
|          user |             0% |             0% |                      4% |
|        system |             0% |             0% |                      2% |
|          idle |           100% |           100% |                     94% |

After **rmq1 (new Slave1)** gets started,
we observe same CPU utilisation (user + system) on this host as before the stop (71% vs 70%):

| CPU (60s avg) | rmq0 (stopped) | rmq1 (new Slave1) | rmq2 (Master) |
|            -: |             -: |                -: |            -: |
|          user |             0% |               64% |            4% |
|        system |             0% |                7% |            2% |
|          idle |           100% |               29% |           94% |

After **rmq0 (new Slave2)** gets started we observe lower CPU utilisation (user + system) on this host (5% vs 53%):

| CPU (60s avg) | rmq0 (new Slave 2) | rmq1 (Slave1) | rmq2 (Master) |
|            -: |                 -: |            -: |            -: |
|          user |                 3% |           64% |            5% |
|        system |                 2% |            7% |            2% |
|          idle |                95% |           29% |           93% |

After all nodes are restarted, we observe the same CPU utilisation across all hosts:

| CPU (60s avg) | rmq0 (restarted Slave 2) | rmq1 (restarted Slave1) | rmq2 (restarted Master) |
|            -: |                       -: |                      -: |                      -: |
|          user |                       3% |                     64% |                      5% |
|        system |                       2% |                      7% |                      2% |
|          idle |                      95% |                     29% |                     93% |

After all nodes are restarted for the second time, we observe high CPU utilisation on all hosts:

| CPU (60s avg) | rmq0 (restarted Slave 2) | rmq1 (restarted Slave1) | rmq2 (restarted Master) |
|            -: |                       -: |                      -: |                      -: |
|          user |                      32% |                     64% |                     33% |
|        system |                      20% |                      8% |                     18% |
|          idle |                      48% |                     28% |                     48% |

After all nodes are restarted for the third time, we observe low CPU utilisation on 2 out of 3 hosts:

| CPU (60s avg) | rmq0 (restarted Slave 2) | rmq1 (restarted Slave1) | rmq2 (restarted Master) |
|            -: |                       -: |                      -: |                      -: |
|          user |                       4% |                     64% |                     10% |
|        system |                       2% |                      7% |                      5% |
|          idle |                      94% |                     29% |                     85% |

If we continue restarting all nodes sufficient times, they will all end up utilising 5-7% CPU (user + system), with no other changes.

In other words, starting & stopping all nodes sufficient times was observed to reduce the CPU utilisation from 50-70% to ~5-7%.

## Does `+sbwt none` make a difference?

Yes, it does (1-3% vs 5-7% CPU utilisation) **but** some nodes exhibit high & unpredictable CPU utilisation when restarted.

With `+sbwt none` set and all nodes restarted for the first time:

| CPU (60s avg) | rmq0 (restarted Master) | rmq1 (restarted Slave2) | rmq2 (restarted Slave1) |
|            -: |                      -: |                      -: |                      -: |
|          user |                     10% |                     48% |                      4% |
|        system |                      2% |                      3% |                      4% |
|          idle |                     88% |                     50% |                     92% |

**rmq1 (restarted Slave2)** msacc output per thread type with system time

```
rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'

        Thread    alloc           aux           bif     busy_wait      check_io      emulator           ets            gc       gc_full           nif         other          port          send         sleep        timers

Stats per type:
         async  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%(82.6%)  0.00%( 0.0%)
           aux  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  6.24%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 93.76%( 0.6%)  0.00%( 0.0%)
dirty_cpu_sche  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 5.2%)  0.00%( 0.0%)
dirty_io_sched  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 6.5%)  0.00%( 0.0%)
     scheduler  0.95%( 0.0%)  0.66%( 0.0%)  1.22%( 0.1%)  0.24%( 0.0%)  0.20%( 0.0%)  2.12%( 0.1%)  0.06%( 0.0%)  0.07%( 0.0%)  1.36%( 0.1%)  0.00%( 0.0%)  2.44%( 0.1%)  0.00%( 0.0%)  0.00%( 0.0%) 89.63%( 4.6%)  1.03%( 0.1%)
```

With `+sbwt none` set and all nodes restarted for the second time, CPU utilisation (user + system) drops to 1% on the slaves & stays at 27% on **rmq0 (restarted Master)**:

| CPU (60s avg) | rmq0 (restarted Master) | rmq1 (restarted Slave2) | rmq2 (restarted Slave1) |
|            -: |                      -: |                      -: |                      -: |
|          user |                     18% |                      1% |                      1% |
|        system |                      9% |                      0% |                      0% |
|          idle |                     73% |                     99% |                     99% |

**rmq0 (restarted Master)** msacc output per thread type with system time:

```
rabbitmqctl eval 'msacc:start(60000), msacc:print(msacc:stats(),#{system => true}).'

        Thread    alloc           aux           bif     busy_wait      check_io      emulator           ets            gc       gc_full           nif         other          port          send         sleep        timers

Stats per type:
         async  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%(82.6%)  0.00%( 0.0%)
           aux  0.00%( 0.0%)  0.04%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  4.66%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 95.30%( 0.6%)  0.00%( 0.0%)
dirty_cpu_sche  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 15.65%( 0.8%)  0.00%( 0.0%)  0.35%( 0.0%)  0.08%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%) 83.93%( 4.3%)  0.00%( 0.0%)
dirty_io_sched  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)100.00%( 6.5%)  0.00%( 0.0%)
     scheduler  0.89%( 0.0%)  0.57%( 0.0%)  1.51%( 0.1%)  0.18%( 0.0%)  0.17%( 0.0%)  2.69%( 0.1%)  0.65%( 0.0%)  0.07%( 0.0%)  1.22%( 0.1%)  0.00%( 0.0%)  2.04%( 0.1%)  0.00%( 0.0%)  0.00%( 0.0%) 88.99%( 4.6%)  1.01%( 0.1%)
```

With `+sbwt none` set and only **rmq0 (restarted Slave2)** restarted, CPU utilisation (user + system) drops to 1-3%:

| CPU (60s avg) | rmq0 (restarted Slave2) | rmq1 (restarted Master) | rmq2 (restarted Slave1) |
|            -: |                      -: |                      -: |                      -: |
|          user |                      1% |                      2% |                      1% |
|        system |                      1% |                      1% |                      0% |
|          idle |                     98% |                     97% |                     99% |

```
rabbitmqctl -n rabbit@rmq0-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq0-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.

rabbitmqctl -n rabbit@rmq1-gm-gc list_queues pid slave_pids
<rabbit@rmq1-gm-gc.1.21866.0>   [<rabbit@rmq2-gm-gc.1.32123.1>, <rabbit@rmq0-gm-gc.3.4944.1>]
<rabbit@rmq1-gm-gc.1.21911.0>   [<rabbit@rmq2-gm-gc.1.32133.1>, <rabbit@rmq0-gm-gc.3.23329.0>]
...

rabbitmqctl -n rabbit@rmq2-gm-gc list_queues pid slave_pids
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
Error: operation list_queues on node rabbit@rmq2-gm-gc timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.
```

### Other observations

* listing queues on slave nodes times out after 60s (default rabbitmqctl operation timeout)
* Mnesia overload messages are sometimes present across different nodes, regardless whether they are masters or slaves
* microstate accounting reports a similar breakdown regardless of CPU utilisation, but unusal utilisation in `dirty_cpu_schedulers`
* Erlang v21.0.9 doesn't make a difference
* using `aoffcbf` as the memory allocation strategy for eheap & binary allocator does not make a difference to CPU utilisation (v3.7.6 change)
* some evidence that hosts with high CPU utilisation exhibit high dirty pages
* initial evidence shows same behaviour on RabbitMQ v3.7.0, more observations required to be certain

## Open questions

* Why does CPU utilisation differ between node restarts?
* Does RabbitMQ v3.7.0 exhibit the same behaviour?
* Does RabbitMQ v3.6.16 exhibit the same behaviour?
* Does RabbitMQ v3.6.6 exhibit the same behaviour? - [Confirmed that it still exists in v3.6.15 on v19.1.5](https://groups.google.com/d/msg/rabbitmq-users/6jGtaHINmNM/x3DlC1-yBwAJ)
* Do we observe the same behaviour with a 2-node RabbitMQ cluster?
* Do we observe the same behaviour when running latest stable Erlang/OTP 19.3.x?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Erratic CPU utilization profile with a large number of mirrored queues #1713

CPU utilisation is unpredictable when queues are mirrored

Does `+sbwt none` make a difference?

Other observations

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Erratic CPU utilization profile with a large number of mirrored queues #1713

Description

CPU utilisation is unpredictable when queues are mirrored

Does +sbwt none make a difference?

Other observations

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Does `+sbwt none` make a difference?