-
Notifications
You must be signed in to change notification settings - Fork 16.8k
[stable/rabbitmq-ha] Rejoin cluster on restarted pod #4474
Comments
A rejoining node will contact its last known peer upon boot. In most cases you do not want to reset a restarted cluster member. I'd recommend getting a sense of RabbitMQ clustering basics. RabbitMQ logs from all nodes are critically important when investigating such issues. |
@michaelklishin i agree reset and file removal looks wrong way, but it is not clear why it can't reconnect back |
Should be fixed via #4610 |
unfortunately not fixed |
This rabbitmq-users response provides a very plausible hypothesis: automatic forced cleanup of unknown nodes was enabled in the Kubernetes example. Unintended removal of nodes temporarily leaving the cluster is one of the consequences of that decision, and apparently a pod restart can trigger it. |
@maksimru I'd inspect (or at least post) full server logs instead of waiting for a magical fix to land from a stranger on the Internet. |
@hadigoh please close this issue. #4823 demonstrates how to work around it (and includes a default change to the config map). The effect is documented as well. I don't know when the review is going to happen but there are no changes in the chart necessary to avoid nodes that temporarily leave the cluster from being cleaned up. |
Thanks a lot @michaelklishin |
#4823 is in. |
hi michal, I have installed helm chart rabbitmq-ha on azure kubernetes cluster. After few days one pod automatically restarts. In the logs nothing unusual is there except one line that says mnesia table exit...Any ideas what might be wrong |
Is this a request for help?:
No
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Version of Helm and Kubernetes:
Helm
Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Kubernetes:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2018-01-26T19:04:38Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"linux/amd64"}
Which chart:
stable/rabbitmq-ha
What happened:
On pod restart, the restarted pod won't be able to rejoin cluster.
The message is:
init terminating in do_boot ({error,{inconsistent_cluster,Node 'rabbit@172.17.0.26' thinks it's clustered with node 'rabbit@172.17.0.19', but 'rabbit@172.17.0.19' disagrees}})
What you expected to happen:
The restarted pod should be able to rejoin cluster
This page https://www.rabbitmq.com/clustering.html says that the node should run
rabbitmqctl reset
before rejoiningHow to reproduce it (as minimally and precisely as possible):
After that, the restarted pod won't be able to rejoin cluster
Anything else we need to know:
Image used: rabbitmq:3.7-alpine
Chart version: 1.0.2
minikube version: v0.25.0
The text was updated successfully, but these errors were encountered: