Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown acks (e.g. after network partition heals) should be handled gracefully #255

Closed
michaelklishin opened this issue Aug 6, 2015 · 2 comments
Assignees

Comments

@michaelklishin
Copy link
Member

Moving this from a legacy tracker (https://bugzilla.rabbitmq.com/show_bug.cgi?id=26308) as it is still relevant:

Logs from [redacted] show lots of:

=ERROR REPORT==== 28-Jul-2014::16:27:50 ===
** Generic server <0.271.0> terminating
** Last message in was {'$gen_cast',{ack,[842683],<7670.3568.247>}}
** When Server state == {q,
                         {amqqueue, <snip>,
                         undefined,undefined,undefined,7,running}
** Reason for termination == 
** {{case_clause,{empty,{[],[]}}},
    [{rabbit_queue_consumers,subtract_acks,4,[]},
     {rabbit_queue_consumers,subtract_acks,3,[]},
     {rabbit_amqqueue_process,subtract_acks,4,[]},
     {rabbit_amqqueue_process,handle_cast,2,[]},
     {gen_server2,handle_msg,2,[]},
     {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

In all cases:

* It was immediately after running_partitioned_network
* The ack was from the other side of the partition
* The queue had not failed over

More investigation details:

OK, it's actually a bit harder to hit than suggested above, but it's doable. The queue throws away any acktags for the channel when it sees it go down, but is also happy to re-establish a channel record for the same channel. The channel is blissfully unaware.

This means that as well as the crash in this bug, there's also some wrong behaviour that can result if the channel happens to ack with acktag 1 from the first time round, the queue will consider it to be for acktag 1 the second time around - which might not be the same message.

It's somewhat racy to hit as we need to be able to get a channel to talk to a remote queue again after being partitioned from it - but Mnesia will remove the queue record. Therefore I added the following patch to reproduce:

diff -r b649bbf7d910 src/rabbit_channel.erl
--- a/src/rabbit_channel.erl    Wed Jul 30 12:51:15 2014 +0100
+++ b/src/rabbit_channel.erl    Wed Jul 30 14:08:49 2014 +0100
@@ -730,7 +730,18 @@
     check_read_permitted(QueueName, State),
     case rabbit_amqqueue:with_exclusive_access_or_die(
            QueueName, ConnPid,
-           fun (Q) -> rabbit_amqqueue:basic_get(
+           fun (Q) ->
+                   C = case get(slow) of
+                           undefined -> 1;
+                           N         -> N + 1
+                       end,
+                   put(slow, C),
+                   case C of
+                       3 -> io:format("SLEEPING~n"),
+                            timer:sleep(20000);
+                       _ -> ok
+                   end,
+                   rabbit_amqqueue:basic_get(
                         Q, self(), NoAck, rabbit_limiter:pid(Limiter))
            end) of
         {ok, MessageCount,

This will ensure the 3rd basic.get on a channel is very slow.

Then create a two node cluster {rabbit, hare} with a short net ticktime (say 5s). Create a queue "test" on hare with some messages. Then on rabbit:

rr(amqp_connection).
{ok, Conn} = amqp_connection:start(#amqp_params_network{}).
{ok, Ch} = amqp_connection:open_channel(Conn).
amqp_channel:call(Ch, #'basic.get'{queue = <<"test">>, no_ack=false}).
amqp_channel:call(Ch, #'basic.get'{queue = <<"test">>, no_ack=false}).
amqp_channel:call(Ch, #'basic.get'{queue = <<"test">>, no_ack=false}).
amqp_channel:call(Ch, #'basic.ack'{delivery_tag = 2}).

This will pause between the 2nd and 3rd basic.get. Meanwhile:

# for port in 25672 25673 ; do iptables -A INPUT -p tcp --dport ${port} -j DROP ; done

(wait for partition)

# for port in 25672 25673 ; do iptables -D INPUT -p tcp --dport ${port} -j DROP ; done

The partition should heal, then the channel does a 3rd basic get. The queue thinks it's the first, and creates a new channel record with acktag 1. Then the channel sends acktag 2. Result: boom.
@michaelklishin
Copy link
Member Author

Discussed solutions:

1) establish some UUID for the channel-queue pair, generated by the queue the first time it hears of a channel and returned by the queue when acking. If a queue gets some acks from a UUID it has never seen before, it ignores them.

Downside: complicated.

2) Make each acktag into a UUID and have the queue ignore unrecognised ones.

Downside: expensive (probably).

3) Have the channel forget acktags when it sees a queue go down.

Downside: not obvious it completely eliminates the problem if the queue sees the channel down before the channel sees the queue down.

4) Just ignore unrecognised acktags.

Downside: still possible to have incorrect behaviour even if no crash.

Option 2 may be worth it for the next feature release. 3 and 4 may work as temporary workarounds that do not change inter-node communication protocol. The work on switching to Raft will open up additional options.

@michaelklishin
Copy link
Member Author

So #496 implements option 4. We may try a variation on 1 in the future, or even make internal ack transfer a part of the Raft log (once that is integrated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants