Implements Mirror Queue Sync in Batches #344

videlalvaro · 2015-10-05T00:58:24Z

This PR is related to these other PRs, so they have to be merged together:
rabbitmq/rabbitmq-test#4
rabbitmq/rabbitmq-website#87

In order to improve the performance of mirror queue sync, this PR introduces the concept of batch publishing for the backing queue.

There are 2 new callbacks: batch_publish/4 and 'batch_publish_delivered/4`. Both behave like their non batch counterparts, but in this case they accept batches of messages, which can use optimization like the ones already in place for queue purge, paging, and so on.

These new callbacks have been implemented for rabbit_variable_queue, rabbit_priority_queue, rabbit_mirror_queue_master and rabbit_mirror_queue_slave.

Apart from this, sync'ing mirrored queues can now be done in batches of messages, instead of one by one. With a batch size of 20k msgs, 1 million messages can be sync'ed in 6 seconds (it was 60 seconds before).

To configure mirror sync'ing, there's a new mirroring policy called ha-sync-batch-size, which is documented here: rabbitmq/rabbitmq-website#87

The sync'ing tests from rabbitmq-test have been updated to also test for ha-sync-batch-size and the non-batch mode.

Fixes #336

videlalvaro · 2015-10-09T00:12:42Z

To test mirror sync you can use these commands:

make -j test FILTER=eager_sync and make -j test FILTER=sync_detection

michaelklishin · 2015-10-10T11:07:49Z

src/rabbit_mirror_queue_master.erl

+                             false = dict:is_key(MsgId, SS), %% ASSERTION
+                             Sizes + rabbit_basic:msg_size(Msg)}
+                    end, {[], false, 0}, Publishes),
+    Publishes2 = lists:reverse(Publishes1),


If we reverse the list after foldl, how about using foldr?

mostly from the docs

foldl/3 is tail recursive and would usually be preferred to foldr/3.

http://erlang.org/doc/man/lists.html#foldl-3

Aslo AFAIK list reverse is a BIF

I'm fine to change this to foldr if you think it's required

Lets keep foldl.

michaelklishin · 2015-10-10T13:28:41Z

src/rabbit_mirror_queue_misc.erl

            [policy_validator, <<"ha-promote-on-shutdown">>, ?MODULE]}},
     {requires, rabbit_registry},
     {enables, recovery}]}).

+%% For compatibility with versions that don't support sync batching.
+-define(DEFAULT_BATCH_SIZE, 1).


Are we talking about pre-3.6.0 versions here? Mixed 3.6.0/3.5.x clusters are not allowed, so we can use a different default.

You are right. I think this constant came to life on my first POCs, but is not required anymore. Probably not used in the code.

On Oct 10, 2015, at 3:28 PM, Michael Klishin notifications@github.com wrote:

In src/rabbit_mirror_queue_misc.erl:

[policy_validator, <<"ha-promote-on-shutdown">>, ?MODULE]}}, {requires, rabbit_registry}, {enables, recovery}]}).

+%% For compatibility with versions that don't support sync batching.
+-define(DEFAULT_BATCH_SIZE, 1).
Are we talking about pre-3.6.0 versions here? Mixed 3.6.0/3.5.x clusters are not allowed, so we can use a different default.

—
Reply to this email directly or view it on GitHub.

It is used via rabbit_mirror_queue_misc:sync_batch_size/0. I just don't think it serves any compatibility purpose and therefore has to be 1. I'd suggest making it 16K or so and moving to the app config.

Thoughts?

Changing the default to 16K leads to sync test failures.

Looking at the code I remembered the original purpose. The idea is for it to be 1, so you either use non-batch sync, or batched sync in case the policy has been set. The logic that assumes policy batch size either 1 or > 1 is here: https://github.com/rabbitmq/rabbitmq-server/blob/rabbitmq-server-336/src/rabbit_mirror_queue_sync.erl#L212

OK, that makes more sense. I had changes that moved the default to the app file, bumped the default and simplified sync_batch_size/0, the error is

Running 5 of 72 tests; FILTER=eager_sync; COVER=false eager_sync ---------- eager_sync: [setup] [running]rabbit_test_runner: make_test_multi...*failed* in function sync_detection:wait_for_sync_status/5 (test/src/sync_detection.erl, line 159) in call from eager_sync:sync/2 (test/src/eager_sync.erl, line 167) in call from eager_sync:eager_sync/1 (test/src/eager_sync.erl, line 63) in call from rabbit_test_runner:'-make_test_multi/7-fun-2-'/3 (src/rabbit_test_runner.erl, line 129) **error:{sync_status_max_tries_failed,[{queue,<<"ha.two.test">>}, {node,c@urano}, {expected_status,true}, {max_tried,100.0}]} output:<<"">>

I see no reason to not batch all the time, only making batch size configurable (with 16K or so by default).

The problem is finding the right batch size. 16k for big messages is too much. It can even cause a network partition (reason why we have max msg size 2Gb in the first place). Finding the right value depends on workload (this is explained on the related rabbitmq-website PR), but if we provide a default, I think it has to be lower.

Messages that are hundreds of MB in size are probably very rare. Most messages on common workloads are < 4K in size.

We can go with 4096 as default value and those with large messages can adjust it. 4K * 4 KiB per message = 16 MiB of payload, not particularly excessive.

@carlhoerberg can you please help us pick the default batch size for eager (full) mirror sync? Maybe you have some stats on median/95th percentile message size distribution at cloudamqp, or any other data that can help us here?

videlalvaro · 2015-10-10T21:41:00Z

@michaelklishin ready for another round. bugs discovered/fixed were:

improper handling of batch_publish_delivered accumulators
re-ordering of messages due to how publish/publish_delivered msgs were partitioned
when there were delivered messages in the batch, then not all messages were sync'ed
the test eager_sync_cancel would always fail, when SyncBatchSize > ?MSG_COUNT, since the sync would finish before the test is able to cancel it

Implements Mirror Queue Sync in Batches

videlalvaro added 21 commits September 29, 2015 00:41

implements BQ:batch_publish and BQ:batch_publish_delivered

f30aaa3

implement mirror message sync in batches

676e413

adds ha-sync-batch-size policy

5ec328d

retrieves batch size from policy

ef2d3f3

refactors msg broadcast

701ee99

oops

d480e1d

refactors shared logic

0e89449

cosmetics

e8fe201

adds explanation

ccae00a

implements batch publishing for mirrored queues

5c2d50d

implements batch publishing for priority queues

8076b1b

adds batch publish tests

90c0244

fixes failing test

7cdd8c7

fixes retrieving sync_batch_size from policy

474520d

improves comment about sync batch order

332c389

cosmetics

b813c42

fixes arguments passed to batch_publish_*

b6f44d6

off by one error

7f27f43

Merge branch 'master' into rabbitmq-server-336

e0bb5df

improves comment

3776634

restores new line

215dac3

Merge branch 'master' into rabbitmq-server-336

c863a81

michaelklishin reviewed Oct 10, 2015
View reviewed changes

Wording

199d5a9

michaelklishin reviewed Oct 10, 2015
View reviewed changes

videlalvaro and others added 3 commits October 10, 2015 18:01

removes unused constant

fbc7ff5

Clarify this comment

ed516d4

adds default sync batch size on the app config

0646e9d

videlalvaro added 5 commits October 10, 2015 23:23

get into account unacked messages when syncing

1a0a00c

send msg batch in proper order

b43b3b9

refactors message sync'ing in batches

1929533

fixes conflicts

1a08f21

removes non batch-sync code path

e17a5e2

videlalvaro assigned michaelklishin Oct 10, 2015

Move this constant closer to the only place it is used

dc72935

michaelklishin added a commit that referenced this pull request Oct 12, 2015

Merge pull request #344 from rabbitmq/rabbitmq-server-336

44a0ddb

Implements Mirror Queue Sync in Batches

michaelklishin merged commit 44a0ddb into master Oct 12, 2015

michaelklishin added this to the n/a milestone Oct 12, 2015

dumbbell deleted the rabbitmq-server-336 branch January 2, 2018 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements Mirror Queue Sync in Batches #344

Implements Mirror Queue Sync in Batches #344

videlalvaro commented Oct 5, 2015

videlalvaro commented Oct 9, 2015

michaelklishin Oct 10, 2015

videlalvaro Oct 10, 2015

videlalvaro Oct 10, 2015

videlalvaro Oct 10, 2015

michaelklishin Oct 10, 2015

michaelklishin Oct 10, 2015

videlalvaro Oct 10, 2015

michaelklishin Oct 10, 2015

michaelklishin Oct 10, 2015

videlalvaro Oct 10, 2015

michaelklishin Oct 10, 2015

michaelklishin Oct 10, 2015

videlalvaro Oct 10, 2015

michaelklishin Oct 10, 2015

michaelklishin Oct 10, 2015

videlalvaro commented Oct 10, 2015

Implements Mirror Queue Sync in Batches #344

Implements Mirror Queue Sync in Batches #344

Conversation

videlalvaro commented Oct 5, 2015

videlalvaro commented Oct 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

videlalvaro commented Oct 10, 2015