Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create cluster issuer #4174

Closed
remcohaszing opened this issue Jul 2, 2021 · 7 comments
Closed

Unable to create cluster issuer #4174

remcohaszing opened this issue Jul 2, 2021 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@remcohaszing
Copy link

Describe the bug:

Today cert-manager has suddenly stopped issue certificates. I hoped to fix it by upgrading cert-manager from v1.3.1 to v4.0.0.

After identitying existing entities of cert-manager CRDs, I found there’s only a cluster issuer, nothing else.

I ran the following commands to upgrade.

$ kubectl delete clusterissuers.cert-manager.io letsencrypt-dev
$ helm uninstall --namespace managed cert-manager
$ helm install cert-manager jetstack/cert-manager --namespace managed --set 'installCRDs=true'
$ kubectl apply -f config/development/cluster-issuer.yaml
Error from server (InternalError): error when creating "config/development/cluster-issuer.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded

where the cluster issuer to create looks like this:

# Based on https://cert-manager.io/docs/configuration/acme/dns01/digitalocean
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dev
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@appsemble.com
    privateKeySecretRef:
      name: letsencrypt-dev
    solvers:
      - selector:
          dnsZones:
            - appsemble.review
            - mijneigenapp.nl
        dns01:
          digitalocean:
            tokenSecretRef:
              name: digitalocean-dns
              key: access-token
      - http01:
          ingress:
            class: nginx

TL;DR I’m no longer able to create a cluster issuer

Expected behaviour:

The cluster issuer is created.

Steps to reproduce the bug:

kubectl apply -f https://gitlab.com/appsemble/infra/wikis/config/development/cluster-issuer.yaml

Anything else we need to know?:

We upgraded the Kubernetes cluster from 1.20 to 1.21 yesterday.

Environment details::

  • Kubernetes version: 1.21.1
  • Cloud-provider/provisioner: DigitalOcean
  • cert-manager version: 1.4.0
  • Install method: helm

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 2, 2021
@irbekrm
Copy link
Contributor

irbekrm commented Jul 2, 2021

Hi @remcohaszing ,

Looks like the API server cannot reach webhook, probably because it's not ready yet if you run all those command in a sequence in a script?
We currently have some ongoing work to provide a way to check for an installation to be ready #4171

There's also https://github.com/alenkacz/cert-manager-verifier or you could add something to the script that does similar thing

@remcohaszing
Copy link
Author

I didn’t run it as a script. When I try to create the cluster issues again, it still fails.

These are my cert-manager deployments. They seem ready

$ kubectl get deployments.apps -n managed
NAMESPACE     NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
managed       cert-manager                            1/1     1            1           71m
managed       cert-manager-cainjector                 1/1     1            1           71m
managed       cert-manager-webhook                    1/1     1            1           71m

The same goed for pods

$ kubectl get pod -n managed
NAME                                                    READY   STATUS    RESTARTS   AGE
cert-manager-5d7f97b46d-gnpwz                           1/1     Running   0          70m
cert-manager-cainjector-69d885bf55-6j9xd                1/1     Running   0          70m
cert-manager-webhook-5bc995fb5b-9xmtp                   1/1     Running   0          70m

cm-verifier seems to fail as well.

$ ./cm-verifier --namespace managed --debug
Waiting for deployments in namespace managed:
resource cert-manager-test already exists
Resource cert-manager-test created 
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": dial tcp 10.245.221.217:443: i/o timeout
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": dial tcp 10.245.221.217:443: i/o timeout
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": dial tcp 10.245.221.217:443: i/o timeout
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": dial tcp 10.245.221.217:443: i/o timeout
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": context deadline exceeded
resource test-selfsigned already deleted
resource selfsigned-cert already deleted
Deployment cert-manager READY! ヽ(•‿•)ノ
Deployment cert-manager-cainjector READY! ヽ(•‿•)ノ
Deployment cert-manager-webhook READY! ヽ(•‿•)ノ
error when waiting for certificate to be ready: Timeout reached: context deadline exceeded

The webhook logs don’t provide much information either.

$ kubectl logs -n managed cert-manager-webhook-5bc995fb5b-9xmtp
W0702 09:00:46.893039       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0702 09:00:46.895258       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0702 09:00:46.895708       1 webhook.go:69] cert-manager/webhook "msg"="using dynamic certificate generating using CA stored in Secret resource"  "secret_name"="cert-manager-webhook-ca" "secret_namespace"="managed"
I0702 09:00:46.896145       1 server.go:150] cert-manager/webhook "msg"="listening for insecure healthz connections"  "address"=":6080"
I0702 09:00:46.896310       1 server.go:163] cert-manager/webhook "msg"="listening for secure connections"  "address"=":10250"
I0702 09:00:46.896403       1 server.go:189] cert-manager/webhook "msg"="registered pprof handlers"  
I0702 09:00:46.898165       1 reflector.go:219] Starting reflector *v1.Secret (1m0s) from external/io_k8s_client_go/tools/cache/reflector.go:167
I0702 09:00:47.939976       1 dynamic_source.go:199] cert-manager/webhook "msg"="Updated serving TLS certificate"

@irbekrm
Copy link
Contributor

irbekrm commented Jul 2, 2021

Retrying create of resource test-selfsigned, error: error when creating resource test-selfsigned/cert-manager-test. Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.managed.svc:443/mutate?timeout=10s": dial tcp 10.245.221.217:443: i/o timeout

Perhaps it's an issue with the webhook not being reachable from Kubernetes API server then. Is this DO managed Kubernetes? Depending on how the CNI is / can be set up you might need to run webhook on host network, see an example for AWS EKS https://cert-manager.io/docs/installation/compatibility/#aws-eks

@stvnksslr
Copy link

stvnksslr commented Jul 3, 2021

I am also having a similar issue since upgrading to 1.21.1-do.1 no matter what I do both issuers/clusterIssuers are not able to reach the webhook.

This can be replicated on a fresh DOKS instance with just cert-manager installed. Happy to provide additional logs but they look about the same as the above, no specific issues until a Issuer/ClusterIssuer is created and then a timeout on the webhook. EDIT: DO seems to have pushed a new minor version which atleast for me fixed the issue after a complete uninstall / reinstall 1.21.2-do.2

@irbekrm
Copy link
Contributor

irbekrm commented Jul 4, 2021

I cannot reproduce the issue on 1.21.x 1.21.2-do.2 (I don't have other 1.21 patch versions available for install)
I don't see release notes for 1.21.x 1.21.2-do.2 in the changelog yet, but there were networking related changes in 1.21.x 1.21.2-do.1 https://docs.digitalocean.com/products/kubernetes/changelog/#1212-do1-2021-07-01

If it re-occurs it may be worth reaching out to DO folks (the actual issue appears to be that the Kubernetes API server in the managed control plane cannot reach a service in the data plane, specifically an admission webhook).

@remcohaszing
Copy link
Author

I used this issue as an excuse to also upgrade our load balancer, which caused even more issues. I’ll report back here when I get everything up-and-running again, hopefully with a solution.

@remcohaszing
Copy link
Author

remcohaszing commented Jul 7, 2021

The cert-manager issues seem to have been resolved by upgrading the cluster to 1.21.2-do.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants