-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster doesn't recover if all rabbitmq-server pods deleted from cluster #609
Comments
Hi, Thanks for this report. This is a known issue that we are planning on addressing soon (check #578). If you are deploying a new cluster, you can set If you want to recover an existing cluster, you need to perform the following steps (
Now, if you delete all your pods, they will all get started in parallel which solves the problem. The main disadvantage of |
Thank you Michal. |
This is fixed from PR: #621 The PR is on main branch, not in release version just yet. Will close after it's release |
@sheiks The fix is release in |
Describe the bug
RabbitMQ cluster cannot recover if someone deletes all pods in the rabbitmq cluster using kubectl cli.
To Reproduce
Steps to reproduce the behavior:
kubectl delete pods rabbitmq-server-0 rabbitmq-server-1 rabbitmq-server-2
kubectl get pods
rabbitmq-server-0 0/1 Running 5 69m
kubectl logs -f rabbitmq-server-0
2021-02-17 14:29:41.001 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2021-02-17 14:30:11.002 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:30:11.002 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2021-02-17 14:30:41.002 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:30:41.003 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2021-02-17 14:31:11.004 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:31:11.004 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2021-02-17 14:31:41.005 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:31:41.005 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2021-02-17 14:32:11.006 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:32:11.006 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2021-02-17 14:32:41.007 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:32:41.007 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2021-02-17 14:33:11.008 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:33:11.008 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2021-02-17 14:33:41.009 [warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-server-2.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-1.rabbitmq-nodes.rabbitmq-cl-op-poc','rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq-cl-op-poc'],[rabbit_durable_queue]}
2021-02-17 14:33:41.009 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
Below values.yaml file used with https://github.com/rabbitmq/cluster-operator/tree/main/charts/rabbitmq
labels:
label1: foo
label2: bar
annotations:
annotation1: foo
annotation2: bar
replicas: 3
imagePullSecrets:
service:
type: LoadBalancer
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: 100m
memory: 1Gi
tolerations:
operator: "Equal"
value: "rabbitmq"
effect: "NoSchedule"
rabbitmq:
additionalPlugins:
- rabbitmq_shovel
- rabbitmq_shovel_management
additionalConfig: |
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
envConfig: |
PLUGINS_DIR=/opt/rabbitmq/plugins:/opt/rabbitmq/community-plugins
advancedConfig: |
[
{ra, [
{wal_data_dir, '/var/lib/rabbitmq/quorum-wal'}
]}
].
terminationGracePeriodSeconds: 42
skipPostDeploySteps: true
override:
statefulSet:
spec:
template:
spec:
containers:
- name: rabbitmq
ports:
- containerPort: 12345 # opens an additional port on the rabbitmq server container
name: additional-port
protocol: TCP
Expected behavior
We had seen this problem when were using bitnami images, and solution for this problem documented here https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq#recovering-the-cluster-from-complete-shutdown
May be it's good to document same for cluster-operator as well
Screenshots
If applicable, add screenshots to help explain your problem.
Version and environment information
Additional context
Add any other context about the problem here.
https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq#recovering-the-cluster-from-complete-shutdown
The text was updated successfully, but these errors were encountered: