Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMQPConnectionException: Server connection error: 320, message: CONNECTION_FORCED - Node was put into maintenance mode #1161

Open
ostrolucky opened this issue Feb 10, 2024 · 1 comment
Milestone

Comments

@ostrolucky
Copy link

ostrolucky commented Feb 10, 2024

AMQPConnectionException: Server connection error: 320, message: CONNECTION_FORCED - Node was put into maintenance mode

This happens when consumer is connected to rabbitmq mode which then goes into maintenance mode. We have this issue every few months, because we are using Amazon MQ (Amazon AWS's service for Rabbitmq) which has required maintenance windows for upgrades. We have reached out to their support, where they told us following:

Hello,

Warm greetings from AWS Premium Support! This is Rajil from the Amazon MQ team regarding the support ticket #170677697201438. I will be assisting you with this case today.

As I understand, you have observed a 'Node was put into maintenance mode' error message on your MQ broker logs - you want to ensure that there would not be any issues during cluster maintenance. Please confirm if my understanding is correct.

Please note that, since Amazon MQ is a managed service, maintenance windows are a necessary period where AWS can make changes in the backend as well as to the broker engine to ensure it has the latest security patches available, among other service/architecture improvements. The two hour period that we configure as maintenance window for the broker has to be configured at a time where you would be expecting no or the least traffic, as restarts are to be expected during this period.

Since you already have a cluster broker deployment, kindly ensure that the client connecting to the broker attempts a retry in case the above error message is observed during a maintenance window. In RabbitMQ cluster deployments, the nodes are restarted one-by-one, meaning at least two nodes will be up and running at all times. Even if a connection is severed, a connection retry will result in the other nodes accepting the connection, and the clients can keep using the broker.

You can read more about maintenance windows in Amazon MQ here[1].

I hope the above information serves you well. Meanwhile, if you feel that I have missed out any point or have misunderstood your concern, or you have any extended queries or concerns related to anything that we discussed here, please feel free to write back to me with the query you might be facing. I will be glad to assist you further.

Have a great day ahead!

========================= References ==========================

[1] Maintaining an Amazon MQ broker - https://docs.aws.amazon.com/amazon-mq/latest/developer-guide/maintaining-brokers.html

Our consumers at the moment crash, but docker restarts the consumer which causes reconnect to different node and things continue normally. However, I think it would make sense if things continued working normally without consumer crashing.

I would like to propose that php-amqplib handles the reconnect automatically. Alternatively, what do you suggest consumer should do? There isn't even specific exception being thrown here so, if we wanted to handle it in userspace, we would have to do something like this (not tested), right?

try {
  $queue->get();
} catch (AMQPConnectionException $e) {
  if (str_contains($e->getMessage(), 'Node was put into maintenance mode')) {
     $connection->reconnect();
     return;
  }
   throw $e;
}

I've also played with heartbeat configuration, but unfortunately it doesn't have any effect in this case.

@ramunasd
Copy link
Member

AMQP connections are stateful and we don't track what bindings and subscriptions were made. That's why reconnect cannot be done so easily.
But if You have a vision how to handle everything then I would really like to review PR.

fabpot added a commit to symfony/symfony that referenced this issue Feb 23, 2024
…n loss (ostrolucky)

This PR was merged into the 7.1 branch.

Discussion
----------

[Messenger][AMQP] Automatically reconnect on connection loss

| Q             | A
| ------------- | ---
| Branch?       | 7.1
| Bug fix?      |no
| New feature?  | no
| Deprecations? | no
| Issues        |
| License       | MIT

When using Rabbitmq in cluster, there is a common need of having to upgrade the nodes, while keeping the existing connections. The way this is normally done is by putting nodes in cluster to `maintenance mode`, ensuring cluster is healthy at all times. However, symfony/messenger nor php-amqp handle this use case at the moment. What happens instead is that exception when getting the message is thrown, worker crashes, error is logged and process manager has to respin it. This all happens without having a way in user space to handle this case better. Messenger's retry mechanism does not work here, because that one kicks in only when exception is thrown in handlers. Concrete exception is following:

>  [AMQPConnectionException (320)]
>  Server connection error: 320, message: CONNECTION_FORCED - Node was put into maintenance mode

What I'm proposing in this PR is that if connection error is detected _try to reconnect once_ before throwing exception. That should handle the outlined case. This goes line in line with recommendation from AWS's support we got:
> kindly ensure that the client connecting to the broker attempts a retry in case the above error message is observed during a maintenance window. In RabbitMQ cluster deployments, the nodes are restarted one-by-one, meaning at least two nodes will be up and running at all times. Even if a connection is severed, a connection retry will result in the other nodes accepting the connection, and the clients can keep using the broker.

I've also reported issue at php-amqplib/php-amqplib#1161 with hope that this could be fixed at some point upstream, but I don't give it a big chance.

Commits
-------

056b4a5 [Messenger] AMQP:Automatically reconnect on connection loss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants