Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A invalid certificate spec (such as unknown unit) blocks other certificate signing. #1269

Closed
hping2 opened this issue Jan 25, 2019 · 3 comments

Comments

Projects
None yet
3 participants
@hping2
Copy link

commented Jan 25, 2019

Describe the bug:

On 0.6.0 (probably also earlier versions), I saw this behavior.

  1. In a certificate helm chart, with invalid certificate definition, such as wrong duration unit, duration: 360d, ('d' is not supported unit). This certificate cannot get signed. (I am using a ClusterIssuer). Refer the attached log, the certificate name: ns000/backend-tls-certificate

  2. All other valid certificate sign requests will get blocked. The certificates controller doesn't sync the later items. Refer the attached log, the 2nd certificate name: ns001/dummy-cert-1

  3. Until I fixed the issue, then all those two certs get signed.

Expected behaviour:
The invalid certificate should not block others.

Steps to reproduce the bug:
Please check the description to reproduce the bug.
Log messages:

I0125 16:49:51.609965       1 controller.go:141] Using the following nameservers for DNS01 checks: [10.244.14.1:53]
I0125 16:49:51.610532       1 leaderelection.go:175] attempting to acquire leader lease  ns001/cert-manager-controller...
I0125 16:50:58.003858       1 leaderelection.go:184] successfully acquired lease ns001/cert-manager-controller
I0125 16:50:58.004067       1 metrics.go:145] Listening on http://0.0.0.0:9402
I0125 16:50:58.004293       1 controller.go:82] Starting issuers controller
I0125 16:50:58.004321       1 controller.go:82] Starting ingress-shim controller
I0125 16:50:58.004342       1 controller.go:82] Starting certificates controller
I0125 16:50:58.004375       1 controller.go:82] Starting clusterissuers controller
I0125 16:50:58.004389       1 controller.go:82] Starting orders controller
I0125 16:50:58.004409       1 controller.go:82] Starting challenges controller
E0125 16:50:58.007586       1 reflector.go:205] pkg/client/informers/externalversions/factory.go:72: Failed to list *v1alpha1.Certificate: v1alpha1.CertificateList: Items: []v1alpha1.Certificate: v1alpha1.Certificate: Spec: v1alpha1.CertificateSpec: IssuerRef: v1alpha1.ObjectReference: Duration: unmarshalerDecoder: time: unknown unit d in duration 360d, error found in #10 byte of ...|on":"360d","issuerRe|..., bigger context ...|er.svc.cluster.local","backend"],"duration":"360d","issuerRef":{"kind":"ClusterIssuer","name":"testp|...
I0125 16:50:58.108300       1 controller.go:141] clusterissuers controller: syncing item 'test-verify-mpki'
I0125 16:50:58.108340       1 controller.go:141] clusterissuers controller: syncing item 'control-plane-internal-tls'
I0125 16:50:58.196389       1 setup.go:106] Vault verified
I0125 16:50:58.196424       1 controller.go:147] clusterissuers controller: Finished processing work item "test-verify-mpki"
I0125 16:50:58.270623       1 setup.go:106] Vault verified
I0125 16:50:58.270651       1 controller.go:147] clusterissuers controller: Finished processing work item "control-plane-internal-tls"
E0125 16:50:59.010038       1 reflector.go:205] pkg/client/informers/externalversions/factory.go:72: Failed to list *v1alpha1.Certificate: v1alpha1.CertificateList: Items: []v1alpha1.Certificate: v1alpha1.Certificate: Spec: v1alpha1.CertificateSpec: IssuerRef: v1alpha1.ObjectReference: Duration: unmarshalerDecoder: time: unknown unit d in duration 360d, error found in #10 byte of ...|on":"360d","issuerRe|..., bigger context ...|er.svc.cluster.local","backend"],"duration":"360d","issuerRef":{"kind":"ClusterIssuer","name":"testp|...

E0125 16:51:00.012084       1 reflector.go:205] pkg/client/informers/externalversions/factory.go:72: Failed to list *v1alpha1.Certificate: v1alpha1.CertificateList: Items: []v1alpha1.Certificate: v1alpha1.Certificate: Spec: v1alpha1.CertificateSpec: IssuerRef: v1alpha1.ObjectReference: Duration: unmarshalerDecoder: time: unknown unit d in duration 360d, error found in #10 byte of ...|on":"360d","issuerRe|..., bigger context ...|er.svc.cluster.local","backend"],"duration":"360d","issuerRef":{"kind":"ClusterIssuer","name":"testp|...
...
...  huge amount of repeated error logs like above ...
...

E0125 17:31:19.894329       1 reflector.go:205] pkg/client/informers/externalversions/factory.go:72: Failed to list *v1alpha1.Certificate: v1alpha1.CertificateList: Items: []v1alpha1.Certificate: v1alpha1.Certificate: Spec: v1alpha1.CertificateSpec: IssuerRef: v1alpha1.ObjectReference: Duration: unmarshalerDecoder: time: unknown unit d in duration 360d, error found in #10 byte of ...|on":"360d","issuerRe|..., bigger context ...|er.svc.cluster.local","backend"],"duration":"360d","issuerRef":{"kind":"ClusterIssuer","name":"testp|...

I0125 17:31:20.904483       1 controller.go:145] certificates controller: syncing item 'ns000/backend-tls-certificate'
I0125 17:31:20.904559       1 controller.go:145] certificates controller: syncing item 'ns001/dummy-cert-1'
I0125 17:31:20.904584       1 helpers.go:183] Setting lastTransitionTime for Certificate "dummy-cert-1" condition "Ready" to 2019-01-25 17:31:20.904574181 +0000 UTC m=+2489.301729630
I0125 17:31:20.904544       1 helpers.go:183] Setting lastTransitionTime for Certificate "backend-tls-certificate" condition "Ready" to 2019-01-25 17:31:20.904520222 +0000 UTC m=+2489.301675667
I0125 17:31:20.911172       1 controller.go:151] certificates controller: Finished processing work item "ns000/backend-tls-certificate"
I0125 17:31:20.911210       1 controller.go:145] certificates controller: syncing item 'ns000/backend-tls-certificate'
I0125 17:31:20.911259       1 controller.go:151] certificates controller: Finished processing work item "ns000/backend-tls-certificate"
E0125 17:31:21.172931       1 sync.go:254] [ns001/dummy-cert-1] Error getting certificate 'dummy-cert-tls-1': secret "dummy-cert-tls-1" not found
I0125 17:31:21.178255       1 controller.go:151] certificates controller: Finished processing work item "ns001/dummy-cert-1"
I0125 17:31:21.178283       1 controller.go:145] certificates controller: syncing item 'ns001/dummy-cert-1'
I0125 17:31:21.178572       1 helpers.go:190] Found status change for Certificate "dummy-cert-1" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2019-01-25 17:31:21.178567808 +0000 UTC m=+2489.575723236

Anything else we need to know?:
In my setup, those two certs are in different namespaces.

Environment details::

  • Kubernetes version: v1.12.3:
  • Cloud-provider/provisioner: GKE, kops AWS, etc):
  • cert-manager version: v0.6.0
  • Install method: helm

/kind bug

@munnerz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

Hm, this is tough as it's the underlying lister that is throwing an error here. I'll ask around to see how we can 'absorb' these decode errors more generally.

That said, if you enable the webhook component then this sort of error should not occur in the first place, as the resource will be rejected before it is persisted to the apiserver.

@munnerz

This comment has been minimized.

Copy link
Member

commented Jan 30, 2019

After speaking to sig-api-machinery, it's not possible to suppress errors when a decoding error occurs on our 'informers' as it could lead to an inconsistent cache state, resulting in cert-manager making invalid assumptions about the 'state of the world' and taking invalid actions.

I think all we can do here is strongly recommend that you keep the webhook component available, and document the format that is expected for these fields. #1279 does both of these 😄

@munnerz

This comment has been minimized.

Copy link
Member

commented Jan 30, 2019

/remove-kind bug
/kind documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.