You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AMQPConnectionException: Server connection error: 320, message: CONNECTION_FORCED - Node was put into maintenance mode
This happens when consumer is connected to rabbitmq mode which then goes into maintenance mode. We have this issue every few months, because we are using Amazon MQ (Amazon AWS's service for Rabbitmq) which has required maintenance windows for upgrades. We have reached out to their support, where they told us following:
Hello,
Warm greetings from AWS Premium Support! This is Rajil from the Amazon MQ team regarding the support ticket #170677697201438. I will be assisting you with this case today.
As I understand, you have observed a 'Node was put into maintenance mode' error message on your MQ broker logs - you want to ensure that there would not be any issues during cluster maintenance. Please confirm if my understanding is correct.
Please note that, since Amazon MQ is a managed service, maintenance windows are a necessary period where AWS can make changes in the backend as well as to the broker engine to ensure it has the latest security patches available, among other service/architecture improvements. The two hour period that we configure as maintenance window for the broker has to be configured at a time where you would be expecting no or the least traffic, as restarts are to be expected during this period.
Since you already have a cluster broker deployment, kindly ensure that the client connecting to the broker attempts a retry in case the above error message is observed during a maintenance window. In RabbitMQ cluster deployments, the nodes are restarted one-by-one, meaning at least two nodes will be up and running at all times. Even if a connection is severed, a connection retry will result in the other nodes accepting the connection, and the clients can keep using the broker.
You can read more about maintenance windows in Amazon MQ here[1].
I hope the above information serves you well. Meanwhile, if you feel that I have missed out any point or have misunderstood your concern, or you have any extended queries or concerns related to anything that we discussed here, please feel free to write back to me with the query you might be facing. I will be glad to assist you further.
Our consumers at the moment crash, but docker restarts the consumer which causes reconnect to different node and things continue normally. However, I think it would make sense if things continued working normally without consumer crashing.
I would like to propose that php-amqplib handles the reconnect automatically. Alternatively, what do you suggest consumer should do? There isn't even specific exception being thrown here so, if we wanted to handle it in userspace, we would have to do something like this (not tested), right?
try {
$queue->get();
} catch (AMQPConnectionException$e) {
if (str_contains($e->getMessage(), 'Node was put into maintenance mode')) {
$connection->reconnect();
return;
}
throw$e;
}
I've also played with heartbeat configuration, but unfortunately it doesn't have any effect in this case.
The text was updated successfully, but these errors were encountered:
AMQP connections are stateful and we don't track what bindings and subscriptions were made. That's why reconnect cannot be done so easily.
But if You have a vision how to handle everything then I would really like to review PR.
fabpot
added a commit
to symfony/symfony
that referenced
this issue
Feb 23, 2024
…n loss (ostrolucky)
This PR was merged into the 7.1 branch.
Discussion
----------
[Messenger][AMQP] Automatically reconnect on connection loss
| Q | A
| ------------- | ---
| Branch? | 7.1
| Bug fix? |no
| New feature? | no
| Deprecations? | no
| Issues |
| License | MIT
When using Rabbitmq in cluster, there is a common need of having to upgrade the nodes, while keeping the existing connections. The way this is normally done is by putting nodes in cluster to `maintenance mode`, ensuring cluster is healthy at all times. However, symfony/messenger nor php-amqp handle this use case at the moment. What happens instead is that exception when getting the message is thrown, worker crashes, error is logged and process manager has to respin it. This all happens without having a way in user space to handle this case better. Messenger's retry mechanism does not work here, because that one kicks in only when exception is thrown in handlers. Concrete exception is following:
> [AMQPConnectionException (320)]
> Server connection error: 320, message: CONNECTION_FORCED - Node was put into maintenance mode
What I'm proposing in this PR is that if connection error is detected _try to reconnect once_ before throwing exception. That should handle the outlined case. This goes line in line with recommendation from AWS's support we got:
> kindly ensure that the client connecting to the broker attempts a retry in case the above error message is observed during a maintenance window. In RabbitMQ cluster deployments, the nodes are restarted one-by-one, meaning at least two nodes will be up and running at all times. Even if a connection is severed, a connection retry will result in the other nodes accepting the connection, and the clients can keep using the broker.
I've also reported issue at php-amqplib/php-amqplib#1161 with hope that this could be fixed at some point upstream, but I don't give it a big chance.
Commits
-------
056b4a5 [Messenger] AMQP:Automatically reconnect on connection loss
This happens when consumer is connected to rabbitmq mode which then goes into maintenance mode. We have this issue every few months, because we are using
Amazon MQ
(Amazon AWS's service for Rabbitmq) which has required maintenance windows for upgrades. We have reached out to their support, where they told us following:Our consumers at the moment crash, but docker restarts the consumer which causes reconnect to different node and things continue normally. However, I think it would make sense if things continued working normally without consumer crashing.
I would like to propose that php-amqplib handles the reconnect automatically. Alternatively, what do you suggest consumer should do? There isn't even specific exception being thrown here so, if we wanted to handle it in userspace, we would have to do something like this (not tested), right?
I've also played with heartbeat configuration, but unfortunately it doesn't have any effect in this case.
The text was updated successfully, but these errors were encountered: