-
Notifications
You must be signed in to change notification settings - Fork 22
Delete of spire-oidc might get stuck if the csi driver got removed first #330
Comments
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Any pod using a csi driver will behave badly when you pull the csi driver out from under it... It wont just affect spire-spiffe-oidc-discovery-provider, but any workload using the regular spiffe csi driver. If nested spire is being used, spire-server will malfunction if the upstream csi driver is pulled too. |
Here is a command to find out if the driver is being used by any pod:
|
I don't know how safe it is to pull the kubernetes finalizer. There may be other things that get affected by that, not just the spiffe csi driver.... If the user gets into that state with their own workload, I think we should probably recommend reinstalling the csi driver so it can properly delete the other pods before removal rather then recommend they remove finalizers. As for the workloads within our chart, while I generally don't like helm hooks much, a pre-delete hook feels pretty safe and reasonable to remove the resources that will get deleted anyway to make sure they are deleted before the csi driver gets removed. We need to test that carefully though to make sure its safe in all conditions. |
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
I just tried purposefully failing a pre-delete hook.
It leaves everything in place and the job in Error state, so the logs can be looked at. So, I think we can have 3 jobs, weighted in the right order:
|
It is generally not safe at all, the best case scenario involves the admin being fully aware of all the tasks the finalizer would have performed, and then deciding which ones still need to be performed, and then manually performing them. Part of this blocked one of my enterprise customers from using the CSI driver, as they don't expose the ability to perform such administrative operations to the same developers they permit deployment and removal of Helm Charts, and the authorized administrative team doesn't have the tolerance to perform this manual cleanup frequently (aka once).
The pre-delete hook could ensure the driver is installed, if values.yaml is configured to support it. The post-delete hook could ensure the driver is removed. This ordering should help, and will minimize future failed deployment removals.
I know the hooks aren't very elegant, but this is the use case that created them. I would not recommend removing the pods in pre-delete, as ensuring the CSI is installed / reinstalled in pre-delete has the "bulk" of the application removal located where it is expected. Once the work is done here, generally it should become easier to expand the pattern to all of the other helm hooks. The CSI driver then becomes part of the pre-install hook, and is "ensured to be present" in the post-install hook" While some of these steps might seem like busy work, they make it far easier to handle Helm rollbacks, including rollbacks from uninstallation, or rollbacks of installation that effectively remove the deployment. Note: sub-chart ordering complicates this process significantly, because sub-charts are intended to manage independent application dependencies, not components of the same micro-service. Much like I wouldn't put a service into a sub-chart and its ConfigMap into a different sub-chart and then attempt to figure out how to get them ordered, the CSI driver should not be in a different sub-chart from its users. |
I dont follow. What does csi install/reinstall have to do with deleting the chart? The issue described is when deleting the chart and helm's lack of ordering. Also, I don't think you can rollback a chart deletion? I could be wrong though. I dont think subcharts have much effect on hooks either. All the same type of hooks are ran at the same time regardless of nesting, unless you specify an explicit ordering, in which case you have control even over multiple subcharts via priority annotation. |
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
* 94a2b72 Merge pull request #324 from spiffe/enable-testing-multiple-charts * e426bc0 Downgrade chart-testing tool to 3.8.0 * 09b4664 Utilize ct install --github-groups in ci workflow * 42086bd Run example tests also on all k8s versions * 4848c48 Improve Makefile help and implementation * db06038 Increase some timeouts, trying to fix the tests * 54ed71f Add back tests for examples * 4b4cef1 Skip namespace-override test because of #330 * 380979c Prevent ci folder ending up in Helm package * 06ddb7c Use chart-testing ci/*-values.yaml for testing * a4c1de7 Add basic unit test framework (#390) * 088b296 Improve tornjak service API to have object structure (#392) * 522066e Align tornjak clientCA naming convention (#393) * f05cb4f Add option to configure TLS/mTLS endpoint for Tornjak (#338) * f461a01 Bump test chart dependencies (#386) * ce39e82 Bump test chart dependencies (#382) * 19d3208 Fix oidc provider config change not rolling out (#383) * 0197621 Bump helm/kind-action from 1.7.0 to 1.8.0 (#384) * 3ed1859 Add missing tolerations config to daemonsets (#381) * 4ad68b1 Add namespace to spiffe-oidc-discovery-provider RBAC definitions (#379) * c1b1dd3 Add additional domains to JWT issued items. (#230) * 3405e13 Bump test chart dependencies * 81452d5 Fix missing spiffe-csi-driver imagePullSecrets template (#376) Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
When doing a helm uninstall in some situations the spire-oidc pod gets stuck in a terminating state.
This also prevents you from deleting the namespace.
This could happen already when you do a simple
helm uninstall
.The best approach to prevent this issue is to delete the oidc deployment first.
In case you already got stuck your namespace removal you can do following:
Run following in a separate terminal
Then run following to cleanup the namespace
We probably should add a piece of documentation to inform the users of our chart. It could basically happen with any pod/namespace that depends on the csi driver.
The text was updated successfully, but these errors were encountered: