You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the output of maybe_stuck in node4 it seems that 2 operations might be going on over a queue: info() and terminate(). These needs more investigation, there might be a lock here.
A general note: we've seen a number of (at least on the surface) similar Mnesia issues in the last couple of months. Several were reported to the OTP team and are supposed to be fixed in OTP master.
The second issue was reported at the same time this bug was filled, and is going through the last round of testing right now #676 (comment). I will see if #675 still happens with this patch or needs further investigation.
Environment: 3 Ubuntu machines with RabbitMQ 3.6.1
We found two new problems while running regression tests for #581 in the 3.6.1 release.
Problem number 1:
Node 4 is blocked here:
Full log here
The problem seems on the node 4: This transaction is blocked: https://github.com/rabbitmq/rabbitmq-common/blob/stable/src/rabbit_amqqueue.erl#L767
and mnesia seems to be restarting a transaction: https://github.com/erlang/otp/blob/maint/lib/mnesia/src/mnesia_tm.erl#L878 https://github.com/erlang/otp/blob/maint/lib/mnesia/src/mnesia_tm.erl#L914
From the output of maybe_stuck in node4 it seems that 2 operations might be going on over a queue: info() and terminate(). These needs more investigation, there might be a lock here.
All the processes stuck in node5 are waiting for delegate_13 in node4 (see the monitors), and this process has a very large message queue:
Problem number 2 (related to 1):
Tcp connections are not released:
There are no clients connected to the cluster, the server 10.100.0.121 has been stopped, but the connections are still there.
list_queues is stuck in node4, and this seems to be the stuck queue:
This is be the queue blocked on the terminate from the maybe_stuck output (4th process,
<5588.27566.47>
). All other processes are waiting for it, while the queue is using a worker (process ) to run a mnesia transaction (process 6) which is being restarted (mnesia_tm:restart
). The transaction is being executed here: https://github.com/rabbitmq/rabbitmq-common/blob/rabbitmq_v3_6_1_rc2/src/rabbit_misc.erl#L534 and is this one: https://github.com/rabbitmq/rabbitmq-common/blob/rabbitmq_v3_6_1_rc2/src/rabbit_amqqueue.erl#L768The text was updated successfully, but these errors were encountered: