-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change destroy operation to use foreground cascading delete #2379
Conversation
By default, Kubernetes uses "background cascading deletion" (BCD) to clean up resources. This works in most cases with the eventual consistency model as resources are garbage collected. However, there are cases where BCD can lead to stuck resources due to race conditions between dependents. A concrete example is an application Deployment that includes a volume mount managed by a Container Storage Interface (CSI) driver. The underlying Pods managed by this Deployment depend on the CSI driver to unmount the volume on teardown, and this process can take some time. Thus, if a Namespace containing both the CSI driver and the application Deployment is deleted, it is possible for the CSI driver to be removed before it has finished tearing down the application Pods, leaving them stuck in a "Terminating" state. A reliable way to avoid this race condition is by using "foreground cascading deletion" (FCD) instead. FCD blocks deletion of the parent resource until any children have been deleted. In the previous example, the application Deployment resource would not be deleted until all of the underlying Pods had unmounted the CSI volume and finished terminating. Once the application Deployment is gone, then Pulumi can safely clean up the CSI driver as well. One downside of this approach is that resource deletion can take longer to resolve since Kubernetes is explicitly waiting on the delete operation to complete. However, this increases reliability of the delete operation by making it less prone to race conditions, so the tradeoff seems worth it.
Does the PR have any schema changes?Looking good! No breaking changes found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Appears to do what it says on the box :)
Does the PR have any schema changes?Looking good! No breaking changes found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable change. Given that we'd like to somewhat manage the lifecycle and ensure resources are deleted correctly, switching over to foreground cascading delete seems like a reasonable trade-off vs speed.
@lblackstone Loved the detailed PR description with motivation and explanation of the change. I'd love to see it extended with the testing approach: why you didn't think we needed extra tests, whether you tested it manually, what kind of risks it could have brought in, etc. |
I added some additional detail to the PR description. |
Proposed changes
By default, Kubernetes uses "background cascading deletion" (BCD) to clean up resources. This works in most cases with the eventual consistency model as resources are garbage collected. However, there are cases where BCD can lead to stuck resources due to race conditions between dependents. A concrete example is an application Deployment that includes a volume mount managed by a Container Storage Interface (CSI) driver. The underlying Pods managed by this Deployment depend on the CSI driver to unmount the volume on teardown, and this process can take some time. Thus, if a Namespace containing both the CSI driver and the application Deployment is deleted, it is possible for the CSI driver to be removed before it has finished tearing down the application Pods, leaving them stuck in a "Terminating" state.
A reliable way to avoid this race condition is by using "foreground cascading deletion" (FCD) instead. FCD blocks deletion of the parent resource until any children have been deleted. In the previous example, the application Deployment resource would not be deleted until all of the underlying Pods had unmounted the CSI volume and finished terminating. Once the application Deployment is gone, then Pulumi can safely clean up the CSI driver as well.
One downside of this approach is that resource deletion can take longer to resolve since Kubernetes is explicitly waiting on the delete operation to complete. However, this increases reliability of the delete operation by making it less prone to race conditions, so the tradeoff seems worth it.
This PR doesn't include additional testing because this scenario is already well covered by existing tests. Every test includes a destroy operation, which exercises the new behavior. This change was confirmed to fix the customer issue, and manual testing was also performed.
Related issues (optional)
Fix https://github.com/pulumi/customer-support/issues/931