Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete NS openstad get stuck on digital ocean #31

Open
ToshKoevoets opened this issue Jun 21, 2020 · 13 comments
Open

Delete NS openstad get stuck on digital ocean #31

ToshKoevoets opened this issue Jun 21, 2020 · 13 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ToshKoevoets
Copy link
Collaborator

ToshKoevoets commented Jun 21, 2020

I've now had it for second time that NS delete got stuck in terminating phase while deleting from Digital Ocean. Not sure if it's a problem with the repo or with vendor/ way it's installed.

            {
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "creationTimestamp": "2020-06-20T12:25:08Z",
        "deletionTimestamp": "2020-06-21T14:21:14Z",
        "labels": {
            "name": "openstad"
        },
        "name": "openstad",
        "resourceVersion": "510470",
        "selfLink": "/api/v1/namespaces/openstad",
        "uid": "4de827ad-5c03-4585-94a7-9cabdd0c2d1f"
    },
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
    "status": {
        "conditions": [
            {
                "lastTransitionTime": "2020-06-21T14:21:24Z",
                "message": "All resources successfully discovered",
                "reason": "ResourcesDiscovered",
                "status": "False",
                "type": "NamespaceDeletionDiscoveryFailure"
            },
            {
                "lastTransitionTime": "2020-06-21T14:21:24Z",
                "message": "All legacy kube types successfully parsed",
                "reason": "ParsedGroupVersions",
                "status": "False",
                "type": "NamespaceDeletionGroupVersionParsingFailure"
            },
            {
                "lastTransitionTime": "2020-06-21T14:21:24Z",
                "message": "Failed to delete all resource types, 3 remaining: conversion webhook for acme.cert-manager.io/v1alpha2, Kind=Order failed: Post https://openstad-cert-manager-webhook.openstad.svc:443/convert?timeout=30s: service \"openstad-cert-manager-webhook\" not found, conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post https://openstad-cert-manager-webhook.openstad.svc:443/convert?timeout=30s: service \"openstad-cert-manager-webhook\" not found, conversion webhook for cert-manager.io/v1alpha2, Kind=CertificateRequest failed: Post https://openstad-cert-manager-webhook.openstad.svc:443/convert?timeout=30s: service \"openstad-cert-manager-webhook\" not found",
                "reason": "ContentDeletionFailed",
                "status": "True",
                "type": "NamespaceDeletionContentFailure"
            },
            {
                "lastTransitionTime": "2020-06-21T14:21:46Z",
                "message": "All content successfully removed",
                "reason": "ContentRemoved",
                "status": "False",
                "type": "NamespaceContentRemaining"
            },
            {
                "lastTransitionTime": "2020-06-21T14:21:46Z",
                "message": "All content-preserving finalizers finished",
                "reason": "ContentHasNoFinalizers",
                "status": "False",
                "type": "NamespaceFinalizersRemaining"
            }
        ],
        "phase": "Terminating"
    }
}

@ghost ghost self-assigned this Jun 22, 2020
@ghost ghost added the bug Something isn't working label Jun 22, 2020
@ghost
Copy link

ghost commented Jun 22, 2020

Hello @ToshKoevoets,

Thanks for reporting the issue. We have found that this issue is caused by the CRD's of Cert-Manager. These are not being uninstalled and therefore halt the uninstall process. I will work on implementing a fix!

@ghost
Copy link

ghost commented Jun 22, 2020

As a temporary fix, I will share how I resolved it locally:
kubectl get crd gave me a list of CRD's.

NAME                                             CREATED AT
certificaterequests.cert-manager.io              2020-06-19T14:02:48Z
certificates.cert-manager.io                     2020-06-19T14:02:49Z
challenges.acme.cert-manager.io                  2020-06-19T14:02:50Z
ciliumclusterwidenetworkpolicies.cilium.io       2020-06-01T17:42:50Z
ciliumendpoints.cilium.io                        2020-06-01T17:42:51Z
ciliumnetworkpolicies.cilium.io                  2020-06-01T17:42:48Z
ciliumnodes.cilium.io                            2020-06-01T17:42:52Z
clusterissuers.cert-manager.io                   2020-06-19T14:02:52Z
issuers.cert-manager.io                          2020-06-19T14:02:53Z
orders.acme.cert-manager.io                      2020-06-19T14:02:54Z
volumesnapshotclasses.snapshot.storage.k8s.io    2020-06-01T17:41:51Z
volumesnapshotcontents.snapshot.storage.k8s.io   2020-06-01T17:41:51Z
volumesnapshots.snapshot.storage.k8s.io          2020-06-01T17:41:51Z

Here I see that the CRD's of Cert-Manager are still on the cluster.

I manually removed all cert-manager CRD's.
The command: kubectl get crd | grep cert gave me a list of CRD's from cert-manager:

certificaterequests.cert-manager.io              2020-06-19T14:02:48Z
certificates.cert-manager.io                     2020-06-19T14:02:49Z
challenges.acme.cert-manager.io                  2020-06-19T14:02:50Z
clusterissuers.cert-manager.io                   2020-06-19T14:02:52Z
issuers.cert-manager.io                          2020-06-19T14:02:53Z
orders.acme.cert-manager.io                      2020-06-19T14:02:54Z

I manually did a kubectl delete crd $name for every CRD I got with the grep cert filter.
This left me with the following crd's:

NAME                                             CREATED AT
ciliumclusterwidenetworkpolicies.cilium.io       2020-06-01T17:42:50Z
ciliumendpoints.cilium.io                        2020-06-01T17:42:51Z
ciliumnetworkpolicies.cilium.io                  2020-06-01T17:42:48Z
ciliumnodes.cilium.io                            2020-06-01T17:42:52Z
volumesnapshotclasses.snapshot.storage.k8s.io    2020-06-01T17:41:51Z
volumesnapshotcontents.snapshot.storage.k8s.io   2020-06-01T17:41:51Z
volumesnapshots.snapshot.storage.k8s.io          2020-06-01T17:41:51Z

After deleting the last Cert-Manager CRD, the namespace finished terminating and disappeared.

I will work on a fix that will either keep the CRD's or delete them correctly and automatically.

@ghost
Copy link

ghost commented Jun 22, 2020

Also found this in the Helm Documentation: https://helm.sh/docs/topics/charts/#limitations-on-crds

CRDs are never deleted. Deleting a CRD automatically deletes all of the CRD's contents across all namespaces in the cluster. Consequently, Helm will not delete CRDs.

Which means we shouldn't delete the CRD's but rather keep them.

@ghost
Copy link

ghost commented Jun 22, 2020

I don't see a good workaround for this bug inside the Helm Charts itself.

On the Helm Chart page for Cert-Manager I find next to the installCRDs property: If true, CRD resources will be installed as part of the Helm chart. If enabled, when uninstalling CRD resources will be deleted causing all installed custom resources to be DELETED. Which I interpreted as that it will delete the CRD's of Cert-Manager when this is enabled. But it doesn't do this.

For now, I consider this a bug in the Cert-Manager Chart and will create an OpenStad-CRD chart.
This Chart has 1 goal: install all CRD's for OpenStad.

This makes it so that ClusterIssuer can be installed without multiple steps and prevents hanging on deleting the NS of the OpenStad Chart.

In the future, we may want to look into a better option.

@ghost
Copy link

ghost commented Jun 22, 2020

Or actually, as its the only CRD's that are used till now, it may be better to manually install it Cert-Manager instead.

@ghost
Copy link

ghost commented Jun 22, 2020

Interestingly enough I have tried the following and got it working on DigitalOcean:

Install:

helm install openstad . --namespace openstad --create-namespace --values custom-values.yaml

Upgrade (for ClusterIssuer):

helm upgrade openstad . --namespace openstad --values custom-values.yaml --set clusterIssuer.enabled=true

Uninstall:

helm uninstall openstad --namespace openstad

When I replaced the uninstall step with:

kubectl delete ns openstad

It started hanging and couldn't delete the ns

After that I used this to delete the cert-man CRDs:

kubectl delete crd certificaterequests.cert-manager.io certificates.cert-manager.io clusterissuers.cert-manager.io challenges.acme.cert-manager.io issuers.cert-manager.io orders.acme.cert-manager.io

And it deleted the ns successfully

@ToshKoevoets
Copy link
Collaborator Author

ToshKoevoets commented Jun 22, 2020

I'm not sure when it happens, but it happened twice now, did you have working cers? So it's reproducable.

@ghost
Copy link

ghost commented Jun 22, 2020

I did get working certificates.
But I am only able to effectively reproduce it on DigitalOcean by deleting the namespace before uninstalling the chart.

@ghost ghost added the help wanted Extra attention is needed label Jun 22, 2020
ghost pushed a commit that referenced this issue Jun 22, 2020
@ghost
Copy link

ghost commented Jun 22, 2020

I have added a section in troubleshooting.md for this issue.

The fix for when I have deleted the namespace before uninstalling the chart seems to work. But I'm not sure whether there are other causes. (Nor do I know whether the second named cause in troubleshooting.md is relevant)

I will not consider this issue fixed yet. If anyone reaches this problem again after following the prevention section under the header Namespace hangs on terminations in troubleshooting.md, please let me know and show the logs of the steps you took to reach this bug.

ghost pushed a commit that referenced this issue Jun 22, 2020
Causation 2 is an edge case, another but unlikely method to get the same error.
@ghost ghost removed their assignment Jun 22, 2020
@ToshKoevoets
Copy link
Collaborator Author

ToshKoevoets commented Jun 22, 2020

@S-Nigel

My bad, I didn't read your comment correctly, the uninstall step I haven't tried, that seems to makes sense that that is the better way to do it.

Will try it coming days when I do some debugging.

@ghost
Copy link

ghost commented Jun 22, 2020

@ToshKoevoets No worries.
I can only replicate it in 2 different ways both documented in the troubleshooting markdown.

Let me know after you have debugged a bit whether you can find other ways to break it ;)

@ToshKoevoets
Copy link
Collaborator Author

Still an issue, helm uninstall doesn't delete the crd's

@ToshKoevoets ToshKoevoets reopened this Jul 12, 2020
@ghost
Copy link

ghost commented Jul 20, 2020

This is kinda an interesting issue.
We can't just delete CRD's with Helm. Once we've created a CRD, other components may be dependent on it, it is risky to delete it. For this reason, CRD's are often unmanaged (e.g. Helm doesn't delete it by default).

From the Helm Documentation:

  • CRDs are never reinstalled. If Helm determines that the CRDs in the crds/ directory are already present (regardless of version), Helm will not attempt to install or upgrade.
  • CRDs are never installed on upgrade or rollback. Helm will only create CRDs on installation operations.
  • CRDs are never deleted. Deleting a CRD automatically deletes all of the CRD's contents across all namespaces in the cluster. Consequently, Helm will not delete CRDs.

Operators who want to upgrade or delete CRDs are encouraged to do this manually and with great care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant