Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Reading stale RabbitmqCluster information can lead to undesired statefulset deletion #648

Closed
srteam2020 opened this issue Mar 30, 2021 · 3 comments · Fixed by #651
Closed
Labels
bug Something isn't working

Comments

@srteam2020
Copy link
Contributor

Describe the bug

We find that rabbitmq-cluster-operator could accidentally delete the statefulset when the controller experiences a restart and talks to one stale apiserver in an HA k8s cluster. After some inspection, we find that the root cause is the staleness populated from the apiserver that makes the controller think the RabbitmqCluster is going to be deleted (with a non-zero DeletionTimestamp), while it is actually not. One potential approach to handle this issue is to label each statefulset the UID of the RabbitmqCluster, and check the label before deleting the statefulset.

To Reproduce

Steps to reproduce the behavior:

  1. Create a RabbitmqCluster named rabbitmq-cluster1 in a HA k8s cluster. The controller is talking to apiserver1 and the reconcile() will create a statefulset for rabbitmq-cluster1.
  2. Delete rabbitmq-cluster1. Apiserver1 will send the update events with a non-zero DeletionTimestamp to the controller and the controller will delete the statefulset of rabbitmq-cluster1 in prepareForDeletion. Meanwhile, apiserver2 is partitioned so its watch cache stops at the moment that rabbitmq-cluster1 is tagged with a non-zero DeletionTimestamp.
  3. Create the RabbitmqCluster with the same name again. Now, rabbitmq-cluster1 and its statefulset get back. However, apiserver2 still holds the stale view that rabbitmq-cluster1 has a non-zero DeletionTimestamp and is about to be deleted.
  4. The controller crashes due to some node failure and restarts. This time the controller talks to the stale apiserver2. The restarted controller reads the stale information about rabbitmq-cluster1 from apiserver2 that rabbitmq-cluster1 has a non-zero DeletionTimestamp. Since the controller identifies the statefulset only with the name of the RabbitmqCluster, the statefulset belonging to the newly created RabbitmqCluster will be deleted in prepareForDeletion.

Expected behavior
The controller should be able to differentiate the statefulsets belonging to different RabbitmqCluster instances with the same name (in history). When reading the stale information about (previously deleted) RabbitmqCluster, the controller should be able to check whether the existing statefulset really belongs to that RabbitmqCluster (e.g., by using the UID) before perform any deletion.

Version and environment information

  • RabbitMQ: 3.8.12
  • RabbitMQ Cluster Operator: 4f13b9a (main branch)
  • Kubernetes: 1.18.9

Additional context

We are willing to issue a PR to help fix this issue.
As mentioned above, we can label each statefulset with the UID of the RabbitmqCluster, and check the label before deleting the statefulset to ensure we are deleting the correct statefulset.

@srteam2020 srteam2020 added the bug Something isn't working label Mar 30, 2021
@embano1
Copy link

embano1 commented Mar 30, 2021

I think using Preconditions in DeleteOptions would be a good way to address this:

type DeleteOptions struct {
	unversioned.TypeMeta `json:",inline"`

	// Optional duration in seconds before the object should be deleted. Value must be non-negative integer.
	// The value zero indicates delete immediately. If this value is nil, the default grace period for the
	// specified type will be used.
	GracePeriodSeconds *int64 `json:"gracePeriodSeconds,omitempty"`

	// Must be fulfilled before a deletion is carried out. If not possible, a 409 Conflict status will be
	// returned.
	Preconditions *Preconditions `json:"preconditions,omitempty"`

	// Should the dependent objects be orphaned. If true/false, the "orphan"
	// finalizer will be added to/removed from the object's finalizers list.
	OrphanDependents *bool `json:"orphanDependents,omitempty"`
}

@srteam2020
Copy link
Contributor Author

@embano1 Thanks for the pointer!
Yes, using Preconditions to specify the UID for deletion is a good way to address this. We can label each rabbitmqCluster with the UID of its statefulset, and specify the UID of the statefulset in the precondition when performing deletion.
We will issue a PR for this.

@Zerpet
Copy link
Collaborator

Zerpet commented Apr 7, 2021

Thank you @srteam2020 for reporting this and for submitting a PR to fix it. We discussed this issue yesterday in our internal sync up and we will have a look at your PR, likely tomorrow.

srteam2020 added a commit to srteam2020/cluster-operator that referenced this issue Apr 7, 2021
srteam2020 added a commit to srteam2020/cluster-operator that referenced this issue Apr 7, 2021
ChunyiLyu added a commit that referenced this issue Apr 9, 2021
fix #648: delete the statefulset with precondition set to the correct…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants