Load a backlog of messages on the cluster before upgrade #64

ferozjilla · 2020-03-26T10:57:25Z

Is your feature request related to a problem? Please describe.

At the moment, we do not load a backlog of messages in our cluster before we upgrade and test the results. This can be seen on these lines:

starting RabbitTestTool without any backlog here
running rollout restart on the stateful set as soon as consumers connect here

Loading a backlog of messages is useful as it tests our logic that certain nodes that are critical to synchronization should wait for the sync to complete before being rolled. Otherwise, messages may be lost.

Describe the solution you'd like

The solution has two parts -

working out a value to set for the backlog of messages
setting the backlog of messages

The size of the backlog

From the rabbitmq memory docs, we know that paging starts happening at 50% of the memory high watermark. The idea here is that paging will further add time for the synchronisation to occur. This in turn increases the likelihood that nodes need to wait for sync before they can be rolled. So, let's set the backlog to be above 50% of the memory high watermark to create this situation.

RabbitMQ internals and Maths 🤓

Work out the absolute value of the high watermark: vm_memory_high_watermark.absolute
Ensure that paging happens above 50% of the high watermark by looking at: vm_memory_high_watermark_paging_ratio
Calculate 70% of the absolute value of the high watermark.

Setting the value

The RabbitTestTool has a flag to set the initial backlog (initialPublish perhaps), and the topology file also includes the size of each message. Set this combination such that (number of messages) * (size of a message) is about 70% of the high watermark, the value calculated in the previous section.

At this point, we have a backlog, and messages are being paged to disk.

The text was updated successfully, but these errors were encountered:

Zerpet · 2020-04-29T13:11:23Z

I'm having a bit of a 🤯 here. Initial backlog is set to 100k, message size is set to 16 bytes, we have 4 queues, 2 mirror with ha-all and 2 quorum as of here:

https://github.com/pivotal/rabbitmq-for-kubernetes-upgrades/blob/bdbde65ac75c41c3b34e592fe8f69d4b9339fe78/topologies/direct-safe.json#L12-L14

Therefore each node should have 100k messages, per queue (leader or mirror) times 16 bytes, therefore:

100.000 x 4 x 16 = 6400000 bytes
6400000 bytes / 1024 / 1024 = 6.1 Mb

However, I observe the node memory going up to ~600 Mb 🤯 Moreover, the memory reports from a node shows ~89Mb for quorum queue and ~100ish MB for mirrors. These figures change over time as the backlog is being drained or consumed. Still, what the 🤯

ferozjilla · 2020-04-29T14:04:28Z

We could look at the Erlang grafana dashboard made by the core team to see where the memory is being used up. Admittedly, the maths is an oversimplification since it does not account for how Erlang uses the memory.

Also, Gerhard's TGIR: RMQ ate my RAM

mkuratczyk · 2020-04-29T14:33:12Z

Definitely reach out to our friends - there are known rough edges, especially with quorum queues, so it would be one of them (known or not yet known).

Zerpet · 2020-04-29T16:59:37Z

Context

We reached out to the Core team with our analysis and expectations. We deployed Prometheus-Grafana in dev2-bunny and we could not observe anything outstanding or massively obvious explaining the behaviour. We are waiting for the Core team to provide some insights regarding the memory utilisation.

We did a rolling restart of a 3-node RMQ cluster with 1.5M ready messages of size 16 bytes on each node, using 3 classic mirrored queues. We observed that nodes 1 and 2 rolled fairly quick and pushed the queue masters to node 0. Subsequently, node 0 became mirror sync critical and it stayed in Terminating state for some time until the other two nodes finished synchronising the queues.

The problem was made worse by the memory usage being close to the high memory watermark (one node got OOM killed) and the synchronisation took a relatively long time > 5 minutes. Even though RabbitMQ was not unavailable per se since we were able to connect to, however the queues were "unavailable" because they were synchronising for a very long time.

Conclussions

Our preStop hook is working as intended
It's not wise to rollout the cluster when the memory usage is close to high memory water mark

Zerpet · 2020-04-29T17:23:07Z

And the answer to the mystery is in RabbitMQ docs:

Payload: >= 1 byte, variable size, typically few hundred bytes to a few hundred kilobytes
Protocol attributes: >= 0 bytes, variable size, contains headers, priority, timestamp, reply to, etc.
RabbitMQ metadata: >= 720 bytes, variable size, contains exchange, routing keys, message properties, persistence, redelivery status, etc.
RabbitMQ message ordering structure: 16 bytes

If we consider minimum values for metadata and attributes, we have 736 bytes of messages size. If we estimate 1024 bytes of metadata, 1040 bytes would be our message size. This by 1.5M messages is roughly 1 GB and 1.4 GB.

Zerpet · 2020-05-01T15:49:59Z

Context

Tweaked the default values in run-test.sh file in https://github.com/pivotal/rabbitmq-for-kubernetes-upgrades/commit/199e51d58cffbdb959d2296a194fc7fef3b69d71. This script is used mostly in the pipeline and makes sense to adapt this script, rather than the topology file. The topology file has no delay in consumer consumption, which is desired most of the cases and a even value for initial backlog. All these values can be tweaked via command line arguments.

Using an initial backlog of 120.000 messages per queue of size 16 bytes, we are able to generate a load of ~70-80% of the high water mark (~800 MB). This link sheds light on how to calculate the total message size. We are using four queues, two of each type, quorum and mirror. The publisher rate is set at 100 messages per second and consumer processing time is 1 millisecond. Using this restrictions, we are able to keep ready messages in the queues at all times during the test duration (120 seconds).

The unavailability period is still set to 30 seconds. Lower values feel too aggressive and may report false positives. We could consider testing with 20 seconds threshold, although we should explore how long does a leader election or master relocation takes in our setup to ensure we are not setting a too stretch value.

The following screenshots show the memory available before hitting high memory water mark, the number of ready messages and number of incoming/outgoing messages.

Zerpet · 2020-05-05T16:19:11Z

Verified today that the pipeline is running well with an initial backlog of messages, according to the tool configuration. We have to let it run and generate some data to analise if there is any signs of data loss or unavailability.

ferozjilla added the upgrades Any work related to upgrades label Mar 26, 2020

ferozjilla added this to To do in RabbitMQ Cluster Kubernetes Operator via automation Mar 26, 2020

This was referenced Mar 26, 2020

Configuration of the upgrade pipeline #66

Closed

Look into RabbitTestTool reports from upgrade pipeline #41

Closed

ferozjilla self-assigned this Apr 29, 2020

Zerpet self-assigned this Apr 29, 2020

j4mcs moved this from To do to In progress in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet moved this from In progress to To do in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet assigned Zerpet and unassigned Zerpet and ferozjilla Apr 30, 2020

Zerpet moved this from To do to In progress in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet moved this from In progress to To do in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet self-assigned this Apr 30, 2020

Zerpet moved this from To do to In progress in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet moved this from In progress to To do in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet assigned Zerpet and unassigned Zerpet Apr 30, 2020

Zerpet moved this from To do to In progress in RabbitMQ Cluster Kubernetes Operator Apr 30, 2020

Zerpet closed this as completed May 5, 2020

RabbitMQ Cluster Kubernetes Operator automation moved this from In progress to Done May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load a backlog of messages on the cluster before upgrade #64

Load a backlog of messages on the cluster before upgrade #64

ferozjilla commented Mar 26, 2020 •

edited

Loading

Zerpet commented Apr 29, 2020

ferozjilla commented Apr 29, 2020 •

edited

Loading

mkuratczyk commented Apr 29, 2020

Zerpet commented Apr 29, 2020

Zerpet commented Apr 29, 2020

Zerpet commented May 1, 2020

Zerpet commented May 5, 2020

Load a backlog of messages on the cluster before upgrade #64

Load a backlog of messages on the cluster before upgrade #64

Comments

ferozjilla commented Mar 26, 2020 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

The size of the backlog

RabbitMQ internals and Maths 🤓

Setting the value

Zerpet commented Apr 29, 2020

ferozjilla commented Apr 29, 2020 • edited Loading

mkuratczyk commented Apr 29, 2020

Zerpet commented Apr 29, 2020

Context

Conclussions

Zerpet commented Apr 29, 2020

Zerpet commented May 1, 2020

Context

Zerpet commented May 5, 2020

ferozjilla commented Mar 26, 2020 •

edited

Loading

ferozjilla commented Apr 29, 2020 •

edited

Loading