[stable/cert-manager] v0.6.0 - Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request #10869

rmuehlbauer · 2019-01-24T12:30:57Z

Is this a request for help?:

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT (maybe)

Version of Helm and Kubernetes:
→ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}

→ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.6-gke.3", GitCommit:"04ad69a117f331df6272a343b5d8f9e2aee5ab0c", GitTreeState:"clean", BuildDate:"2019-01-10T00:39:15Z", GoVersion:"go1.10.3b4", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
cert-manager Version 0.6

What happened:
after upgrading the cert-manager pod's log is full of messages:
controller.go:147] certificates controller: Re-queuing item "some-certificate" due to error processing: Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
I've also done "helm delete --purge" and reinstalled the chart again - same behaviour.
Therefore, I can reproduce the issue by just installing the cert-manager chart

Anything else we need to know:
I've followed the install/upgrade instructions on https://cert-manager.readthedocs.io/en/latest/admin/upgrading/index.html and the upgrade went smooth without any problems. Also after the upgrade, all the pods are in "running" state.

The text was updated successfully, but these errors were encountered:

haf · 2019-01-25T23:29:02Z

#10856

rmuehlbauer · 2019-01-28T09:20:31Z

I think the issue #10856 might be indeed somehow related.
Before starting the "helm upgrade" command, I've labeled the existing namespace like described here: cert-manager documentation and afterwards also verified the label is correctly set:
→ kubectl describe namespace cert-manager Name: ingress Labels: certmanager.k8s.io/disable-validation=true

haf · 2019-01-28T10:55:40Z

Good to know! I have a helm-let's-encrypt to upgrade going forward as well, so I'm not going to do that until this is resolved. /cc @kragniz

rmuehlbauer · 2019-01-30T15:08:46Z

today I had to do a complete fallback to 0.5.2 as cert-manager 0.6.0 caused very strange side effect on my k8s gke environment:
Today I've upgraded to a new kubernetes version and afterwards I had huge problems with the cluster, as two pods - "calico-typha-vertical-autoscaler" and "calico-node-vertical-autoscaler" didn't start anymore and also other cluster ressources had a strange behaviour.
Pods restarted after error messages like
autoscaler.go:49] failed to discover apigroup for kind "DaemonSet": unable to retrieve the complete list of server APIs: admission.certmanager.k8s.io/v1beta1: an error on the server ("service unavailable") has prevented the request from succeeding
My first idea was that this might have been caused by the kubernetes update - but this was not the case, as a second cluster - that was not updated - had the same strange issues.
So I've completely uninstalled cert-manager (also removed all CRD's) and installed version 0.5.2 again.
After that procedure I also had to recreate the clusterissuer, as it was also gone.
Now everything is up and running again... for the moment I'm gonna stay with the old cert-manager version - at least this version is working for me and not causing strange issues.

rmuehlbauer · 2019-02-18T15:28:34Z

did anyone already try with cert-manager 0.6.5?

davi5e · 2019-02-19T04:49:56Z

@rmuehlbauer Trying with v0.6.5, same error... Any workarounds?

rmuehlbauer · 2019-02-19T08:07:14Z

at least not to my knowledge...

rmuehlbauer · 2019-02-25T13:43:41Z

today I had some time to dig somewhat deeper and finally could resolve the issue with new cert-manager versions - maybe you guys can use this to sort things out on your side.

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

After working my way thorough cert-manager's "getting started guide" and "troubleshooting guide", I found a Note (on the very bottom on the troubleshooting guide) saying: "If the job continues to fail, please read the Webhook docs for additional information."
Now, on this Webhook Doc (which you can find here: https://cert-manager.readthedocs.io/en/latest/getting-started/webhook.html) I found a interesting piece of information, regarding running cert-manager on private GKE clusters.
On GKE environments the K8s masters only have very limited access to its nodes. Now, to be able to use cert-managers webhook you have to allow those connections also.
This was somehow the missing piece of information - Now it was easy to work my way through the GKE docs (found here: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules), gather all the little pieces together and create a new firewall rule which solved the issue for me.
Basically I allowed the K8s master network to access the webhook pod on port 6443 (if you have a deeper look on the webhook pod, you will see that it acually listens on that port and only the webhook service translated that port from 6443 to 443)

I hope this piece of information helps a bit to sort out situations on your side.

rmuehlbauer · 2019-02-26T08:12:21Z

issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod

Since the apiserver containers run with host networking and Pod networking is not routable outside the cluster, we can simply open up traffic for the relevant port. Reference to the issue: helm/charts#10869 (comment)

gajus · 2019-03-15T15:57:55Z

issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod

What is the gcloud command line to create this rule?

Izopi4a · 2019-03-16T16:29:34Z

i would like to see that gke command as well please

gajus · 2019-03-16T17:26:33Z

For the record, I did not have this problem when setting up cert-manager.

From my notes, here is literally everything that was needed to set up the cert-manager on a new cluster.

# Install the CustomResourceDefinition resources separately
kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.7/deploy/manifests/00-crds.yaml

# Create the namespace for cert-manager
kubectl create namespace cert-manager

# Label the cert-manager namespace to disable resource validation
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
helm install \
  --name cert-manager \
  --namespace cert-manager \
  --version v0.7.0 \
  jetstack/cert-manager

kubectl get pods --namespace cert-manager

# Setup cluster issuer (using letsencrypt)

cat <<'EOF' | kubectl create -f -
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: 'gajus@gajus.com'
    privateKeySecretRef:
      name: letsencrypt-production
    http01: {}
EOF

# Set up certficate (replace with your details)
cat <<'EOF' | kubectl replace -f -
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: queryalert-com
  namespace: default
spec:
  secretName: queryalert-com-tls
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  commonName: queryalert.com
  dnsNames:
  - queryalert.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - queryalert.com
EOF

Then just update Ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: {{ .Release.Name | quote }}
  labels:
    {{- include "release_labels" . | indent 4 }}
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
+    certmanager.k8s.io/cluster-issuer: 'letsencrypt-production'
+    certmanager.k8s.io/acme-challenge-type: http01
+spec:
+  tls:
+    - hosts:
+      - queryalert.com
+      secretName: queryalert-com-tls
  rules:
    - host: queryalert.com
      http:
        paths:
          - path: /api
            backend:
              serviceName: {{ .Release.Name | quote }}
              servicePort: 8080

Izopi4a · 2019-03-18T09:25:05Z

with 0.7.0 it works indeed thx

rmuehlbauer · 2019-03-18T09:32:02Z

issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod

What is the gcloud command line to create this rule?

please have a look at #10869 (comment)

mlushpenko · 2019-04-03T20:06:43Z

So, we are running a private cluster on GKE:

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

I created firewall

CLUSTER=staging
REGION=europe-west4
SOURCE=$(gcloud container clusters describe $CLUSTER --region $REGION | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
NETWORK=$(gcloud container clusters describe $CLUSTER --region $REGION | egrep '^network:' | cut -d ':' -f 2 | tr -d ' ')
TAGS=$(gcloud compute firewall-rules list --filter "name~^gke-$CLUSTER" --format 'value(targetTags.list():label=TARGET_TAGS)' | head -n 1)

gcloud compute firewall-rules create cert-manager-admission-webhook --action ALLOW --direction INGRESS --source-ranges $SOURCE --rules tcp:6443 --target-tags $TAGS --network $NETWORK

Then, I tried repeating steps from here #10869 (comment) and getting the same error when trying to create ClusterIssuer:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

Could you point me what else am I missing?

rmuehlbauer · 2019-04-04T13:27:17Z

So, we are running a private cluster on GKE:

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

I created firewall

CLUSTER=staging
REGION=europe-west4
SOURCE=$(gcloud container clusters describe $CLUSTER --region $REGION | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
TAGS=$(gcloud compute firewall-rules list --filter "name~^gke-$CLUSTER" --format 'value(targetTags.list():label=TARGET_TAGS)' | head -n 1)

gcloud compute firewall-rules create cert-manager-admission-webhook --action ALLOW --direction INGRESS --source-ranges $SOURCE --rules tcp:6443 --target-tags $TAGS

Then, I tried repeating steps from here #10869 (comment) and getting the same error when trying to create ClusterIssuer:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

Could you point me what else am I missing?

hmm...your SOURCE and TAGS variables seem to be populated with the correct values - at least when I tried your commands in my environment.
also your "firewall-rules create" line is looking good.
That is exactly that got it finally working in my case...
What about the resulting firewall rule - did you check it is effective for the gke hosts in your cluster? (I dont know how to check this using cli but you can easily see it in the webgui in the firewall rules details on the very bottom of the page...)

mlushpenko · 2019-04-04T15:00:25Z

@rmuehlbauer thanks, good observation, I haven't really checked if rules were applied and they weren't, I am not sure why, maybe it has something to do with custon node pools or preemptible nodes.

UPDATE: It was network, damn it, our cluster doesn't run on default network. I've updated my commands

One more update, I got a step further with connections, but now getting this:

I0404 15:31:48.644874       1 request.go:942] Request Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}}
I0404 15:31:48.645004       1 round_trippers.go:419] curl -k -v -XPOST  -H "Content-Type: application/json" -H "User-Agent: image.app_linux-amd64.binary/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Authorization: Bearer blalblalal" 'https://10.125.192.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews'
I0404 15:31:48.654331       1 round_trippers.go:438] POST https://10.125.192.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 9 milliseconds
I0404 15:31:48.654376       1 round_trippers.go:444] Response Headers:
I0404 15:31:48.654392       1 round_trippers.go:447]     Audit-Id: c5990609-2b9d-47ce-9bda-d15180940f1c
I0404 15:31:48.654397       1 round_trippers.go:447]     Content-Type: application/json
I0404 15:31:48.654400       1 round_trippers.go:447]     Content-Length: 294
I0404 15:31:48.654403       1 round_trippers.go:447]     Date: Thu, 04 Apr 2019 15:31:48 GMT
I0404 15:31:48.654441       1 request.go:942] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false,"reason":"no RBAC policy matched"}}
I0404 15:31:48.654606       1 authorization.go:73] Forbidden: "/", Reason: "no RBAC policy matched"
I0404 15:31:48.654766       1 wrap.go:47] GET /: (10.220427ms) 403 [Go-http-client/2.0 172.16.0.10:54560]
I0404 15:31:49.774434       1 log.go:172] http: TLS handshake error from 172.16.0.11:56368: remote error: tls: bad certificate
I0404 15:31:50.357270       1 log.go:172] http: TLS handshake error from 172.16.0.10:58938: remote error: tls: bad certificate
I0404 15:31:52.656929       1 log.go:172] http: TLS handshake error from 172.16.0.10:58960: remote error: tls: bad certificate
I0404 15:31:55.179124       1 log.go:172] http: TLS handshake error from 172.16.0.10:58966: remote error: tls: bad certificate
I0404 15:31:56.677010       1 log.go:172] http: TLS handshake error from 172.16.0.10:58972: remote error: tls: bad certificate

I was updating from 0.5.2, so maybe something got messed up along those lines, I may try clean install again a bit later

rmuehlbauer · 2019-04-04T16:18:44Z

@mlushpenko I think you are hitting some RBAC issues - hava a look at https://docs.cert-manager.io/en/latest/getting-started/install.html - especially about the note regarding RBAC and GKE...hopefully that fixes your problem

mlushpenko · 2019-04-05T12:27:23Z

@rmuehlbauer thanks for suggestion, although looks fine:

kubectl describe clusterrolebinding cluster-admin-binding                                           
Name:         cluster-admin-binding
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  cluster-admin
Subjects:
  Kind  Name                     Namespace
  ----  ----                     ---------
  User  mlushpenko@blockport.io

I was testing by running helm with my permissions, but it does look related to RBAC as it states in the error log. It feels to me like whoever is calling the API (probably webhook pod or cert-manager) is not running with specific SA because it tries to use anonymous user:

"user":"system:anonymous","group":["system:unauthenticated"]}

Do you have idea about validation process? I read How it works section, but didn't find relevant info.
From the other side, I probably won't be spending much more time with it now, but maybe this will help some other people if they encounter similar issues.

yujunz · 2019-04-15T08:06:47Z

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

This hint helps me solve the problem.

Some notes here: for cluster created by kops, cross subnet mode may need to be enabled explicitly on AWS when using calico.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  name: k8s.local
spec:
  networking:
    # Ref: https://github.com/kubernetes/kops/blob/master/docs/networking.md#enable-cross-subnet-mode-in-calico-aws-only
    calico:
      crossSubnet: true

oshalygin · 2019-05-16T22:58:33Z

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

I dont actually have masterIpv4CidrBlock listed when i describe the cluster, likely because it isn't a private cluster. Also masterIpv4CidrBlock is noted as deprecated in these docs?

https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.zones.clusters

Anyone have a workaround here that doesn't involve all the madness with private clusters? Trying to set this up on a vpc-native gke cluster.

intellix · 2019-10-24T19:29:33Z

@oshalygin CIDR is 0.0.0.0/0 on a public cluster

mpvoss · 2020-01-22T18:00:06Z

Credit to @skuro for the gcloud commands to add the firewall:
cert-manager/cert-manager#2109 (comment)

dmytropetryk · 2020-02-28T08:18:07Z

In my case, I was deploying via terraform cert-manager and cert-manager-issuer charts in the same script.
I fixed it when set depends on cert-manager for cert-manager-issuer. Cert-manager should deploy first

rmuehlbauer closed this as completed Feb 26, 2019

Yanson mentioned this issue Feb 26, 2019

Cannot create http01 ClusterIssuer with DigitalOcean provider using new static manifests cert-manager/cert-manager#1149

Closed

woodwardmatt mentioned this issue Mar 1, 2019

Verifying Install: "failed calling admission webhook" (Azure, GKE private cluster) cert-manager/cert-manager#1425

Closed

davi5e mentioned this issue Mar 7, 2019

helm install stable/cert-manager does not work anymore? cert-manager/cert-manager#1255

Closed

ismailbaskin mentioned this issue Jul 11, 2019

Changing webhook port to 10250 instead of 6443 for better compability. cert-manager/cert-manager#1883

Closed

neolit123 mentioned this issue Dec 24, 2019

/apis/admission.certmanager.k8s.io/v1beta1?timeout=32s 503 Service Unavailable kubernetes/kubernetes#86594

Closed

raj-saxena mentioned this issue Jul 7, 2020

Failed calling webhook...context deadline exceeded actions/actions-runner-controller#68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable/cert-manager] v0.6.0 - Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request #10869

[stable/cert-manager] v0.6.0 - Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request #10869

rmuehlbauer commented Jan 24, 2019 •

edited

Loading

haf commented Jan 25, 2019

rmuehlbauer commented Jan 28, 2019

haf commented Jan 28, 2019 •

edited

Loading

rmuehlbauer commented Jan 30, 2019

rmuehlbauer commented Feb 18, 2019

davi5e commented Feb 19, 2019

rmuehlbauer commented Feb 19, 2019

rmuehlbauer commented Feb 25, 2019

rmuehlbauer commented Feb 26, 2019

gajus commented Mar 15, 2019

Izopi4a commented Mar 16, 2019

gajus commented Mar 16, 2019

Izopi4a commented Mar 18, 2019

rmuehlbauer commented Mar 18, 2019

mlushpenko commented Apr 3, 2019 •

edited

Loading

rmuehlbauer commented Apr 4, 2019

mlushpenko commented Apr 4, 2019 •

edited

Loading

rmuehlbauer commented Apr 4, 2019

mlushpenko commented Apr 5, 2019 •

edited

Loading

yujunz commented Apr 15, 2019

oshalygin commented May 16, 2019

intellix commented Oct 24, 2019 •

edited

Loading

mpvoss commented Jan 22, 2020

dmytropetryk commented Feb 28, 2020

[stable/cert-manager] v0.6.0 - Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request #10869

[stable/cert-manager] v0.6.0 - Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request #10869

Comments

rmuehlbauer commented Jan 24, 2019 • edited Loading

haf commented Jan 25, 2019

rmuehlbauer commented Jan 28, 2019

haf commented Jan 28, 2019 • edited Loading

rmuehlbauer commented Jan 30, 2019

rmuehlbauer commented Feb 18, 2019

davi5e commented Feb 19, 2019

rmuehlbauer commented Feb 19, 2019

rmuehlbauer commented Feb 25, 2019

rmuehlbauer commented Feb 26, 2019

gajus commented Mar 15, 2019

Izopi4a commented Mar 16, 2019

gajus commented Mar 16, 2019

Izopi4a commented Mar 18, 2019

rmuehlbauer commented Mar 18, 2019

mlushpenko commented Apr 3, 2019 • edited Loading

rmuehlbauer commented Apr 4, 2019

mlushpenko commented Apr 4, 2019 • edited Loading

rmuehlbauer commented Apr 4, 2019

mlushpenko commented Apr 5, 2019 • edited Loading

yujunz commented Apr 15, 2019

oshalygin commented May 16, 2019

intellix commented Oct 24, 2019 • edited Loading

mpvoss commented Jan 22, 2020

dmytropetryk commented Feb 28, 2020

rmuehlbauer commented Jan 24, 2019 •

edited

Loading

haf commented Jan 28, 2019 •

edited

Loading

mlushpenko commented Apr 3, 2019 •

edited

Loading

mlushpenko commented Apr 4, 2019 •

edited

Loading

mlushpenko commented Apr 5, 2019 •

edited

Loading

intellix commented Oct 24, 2019 •

edited

Loading