Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate status is false, but message is success #2985

Closed
Romiko opened this issue Jun 6, 2020 · 13 comments
Closed

Certificate status is false, but message is success #2985

Romiko opened this issue Jun 6, 2020 · 13 comments
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@Romiko
Copy link

Romiko commented Jun 6, 2020

I have posted this is two message forums, but I realised those issues were closed, since this is version 0.15.1 now, I have decided to open a new issue as the other issues were marked resolves, sorry for the repetition.

Overview

Certificate Order and Request completed but Certificate stuck in progess and logs are spamming in a loop.

minikube version
minikube version: v1.10.1
commit: 63ab801ac27e5742ae442ce36dff7877dcccb278

dependencies:

name: cert-manager
version: "v0.15.1"
repository: https://charts.jetstack.io

Type Reason Age From Message
Normal Issued 22m (x72401 over 8h) cert-manager Certificate issued successfully
Normal Issued 4m56s (x2228 over 19m) cert-manager Certificate issued successfully

Logs

I0606 23:10:36.439826 1 sync.go:485] cert-manager/controller/certificates "msg"="checking if certificate stored on CertificateRequest is up to date" "related_resource_kind"="CertificateRequest" "related_resource_name"

So I have a valid certificate.

CertificateRequest

Certificate fetched from issuer successfully

Order completed successfully

Certificate

Certificate issued successfully

But the logs say otherwise and the status is in progress

Status:
Conditions:
Last Transition Time: 2020-06-06T14:42:12Z
Message: Waiting for CertificateRequest "elastic.*abc.me-1703632688" to complete
Reason: InProgress
Status: False
Type: Ready

ClusterIssuer

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    email: r***@gmail.com
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
    - selector:
        dnsZones:
          - "***bc*.me"
      dns01:
        route53:
          region: ap-southeast-2a
          accessKeyID: A****
          secretAccessKeySecretRef:
            name: staging-route53-credentials-secret
            key: secret-access-key
@Romiko
Copy link
Author

Romiko commented Jun 7, 2020

Hi,

I forgot to mention that the ingress resource are also setup correctly and the certificate is installed and working.

elasticsearch:
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: "nginx"
      cert-manager.io/cluster-issuer: "letsencrypt-production"
    path: /
    hosts:
      - elastic.baz.xyz.com
    tls:
    - hosts:
      - elastic.internal.romiko.me
      secretName: elastic.baz.xyz.com-tls

Chrome also shows a trusted website.

@Romiko
Copy link
Author

Romiko commented Jun 7, 2020

Certificate in failed state

@Romiko
Copy link
Author

Romiko commented Jun 7, 2020

image

@Romiko
Copy link
Author

Romiko commented Jun 7, 2020

image

image

image

image

image

@SamuelTechwolf
Copy link

We've encountered a similar issue today on one of our TLS certificates:

This morning a certificate managed by an ingress issued a new request. This request was completed successfully, but the certificate itself is still stuck in a non-ready state.

We are using Traefik as our ingress controller, and the certificate being in a non-ready state causes Traefik to use its default (self-signed cert), resulting in loss of TLS functionality.

Before the issue of the new request, this setup worked properly and the certificate was in a ready state. A near identical deployment did not issue a new request and does still have a working certificate.

Certificate resource description:
Certificate not ready

Certificate request status:
certificate request succesful

cert-manager-logs:
cert manager logs

@SamuelTechwolf
Copy link

Manually deleting the certificaterequest resource to force a new request resolved the issue for me. Not 100% sure if it is just this deletion that did the trick, I did refresh a number of other resources this way.

Deleting the following resources to force an update did not initially resolve the issue (listed in order of trial): Certificate, secret tied to certificate

@Romiko
Copy link
Author

Romiko commented Jun 8, 2020

Manually deleting the certificaterequest resource to force a new request resolved the issue for me. Not 100% sure if it is just this deletion that did the trick, I did refresh a number of other resources this way.

Deleting the following resources to force an update did not initially resolve the issue (listed in order of trial): Certificate, secret tied to certificate

Yes, I did something similar. I configured an ingress resource that used the same certificate. Then I deleted the older cert request and then had a green status. But this is not a practical solution. There seems to be a race condition occuring.

@kmai
Copy link

kmai commented Jun 29, 2020

Same (or worst issue), as I try deleting the CertificateRequest and a new one gets generated (and the status of it is True) without changes on the Certificate resource. I tried removing the secret for the certificate as well and a new secret gets created, with an indentical modulus for key and certificate, and with an (appearingly) valid certificate.

I'm running cert-manager 0.15.1 using the DNS01 Route53 validation on EKS 1.16 with Istio.

 kmai | at stg-eks ⎈ | at 10:28:14 
❯ kubectl get secret/staging-penneo-com-tls -o json -n cert-manager | jq -r '.data."tls.crt" | @base64d' | openssl x509  -modulus -noout
Modulus=DF3A179B3C42BBBB67B262D621B7175A4142E84486B8D94065C9E66EEEAA7A2320B294B8C5E5A91BE5822006BC0026D46325638223B5653BA531A067EA25946BCE0CC6B5C97D75C4D5885C5CD2ED76AA4B4E816ED8AF5318901A5A9E14AE30C36FFCAE4DCF780D9A596EF0EA4094DB51A59E5BBE8906966635404F7540CA68B630FBB764A94CB28DC120CE37518CA1B48FFF0A79399D3DC5349DE1020385ACF5F0E5289D9F92089518EE74629414E2C5C0710185B811A0BC4EBB07D9C928FB20C78316F9C8F1737B78C7075C8DED6A9E317AA3299F9A34A494FA34672CD18F7CE8CA106D7E3839733D30F30B99D34E08DC55B26D267937925772E4F9E9A981D9

  kmai | at stg-eks ⎈ | at 10:28:40 
❯ kubectl get secret/staging-penneo-com-tls -o json -n cert-manager | jq -r '.data."tls.key" | @base64d' | openssl rsa -modulus -noout
Modulus=DF3A179B3C42BBBB67B262D621B7175A4142E84486B8D94065C9E66EEEAA7A2320B294B8C5E5A91BE5822006BC0026D46325638223B5653BA531A067EA25946BCE0CC6B5C97D75C4D5885C5CD2ED76AA4B4E816ED8AF5318901A5A9E14AE30C36FFCAE4DCF780D9A596EF0EA4094DB51A59E5BBE8906966635404F7540CA68B630FBB764A94CB28DC120CE37518CA1B48FFF0A79399D3DC5349DE1020385ACF5F0E5289D9F92089518EE74629414E2C5C0710185B811A0BC4EBB07D9C928FB20C78316F9C8F1737B78C7075C8DED6A9E317AA3299F9A34A494FA34672CD18F7CE8CA106D7E3839733D30F30B99D34E08DC55B26D267937925772E4F9E9A981D9

 kmai | at stg-eks ⎈ | at 10:28:46 
❯ kubectl get secret/staging-penneo-com-tls -o json -n cert-manager | jq -r '.data."tls.crt" | @base64d' | openssl x509 -text -noout | egrep -A2 "Validity"
        Validity
            Not Before: Jun 29 07:23:24 2020 GMT
            Not After : Sep 27 07:23:24 2020 GMT
kubectl describe cert/staging-cert -n cert-manager
Name:         staging-cert
Namespace:    cert-manager
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"cert-manager.io/v1alpha2","kind":"Certificate","metadata":{"annotations":{},"name":"staging-cert","namespace":"cert-manager...
API Version:  cert-manager.io/v1alpha3
Kind:         Certificate
Metadata:
  Creation Timestamp:  2020-06-26T13:37:38Z
  Generation:          1
  Resource Version:    2278005
  Self Link:           /apis/cert-manager.io/v1alpha3/namespaces/cert-manager/certificates/staging-cert
  UID:                 3c9590ca-7910-44e7-b1f8-a0b181c9e070
Spec:
  Dns Names:
    *.staging.penneo.com
    staging.penneo.com
  Duration:  2160h0m0s
  Issuer Ref:
    Group:        cert-manager.io
    Kind:         ClusterIssuer
    Name:         acme-staging
  Key Algorithm:  rsa
  Key Encoding:   pkcs1
  Key Size:       2048
  Renew Before:   360h0m0s
  Secret Name:    staging-penneo-com-tls
  Subject:
    Organizations:
      penneo
  Uri SA Ns:
    spiffe://cluster.local/ns/default/sa/example
  Usages:
    server auth
    client auth
Status:
  Conditions:
    Last Transition Time:  2020-06-26T13:37:39Z
    Message:               Waiting for CertificateRequest "staging-cert-2842903884" to complete
    Reason:                InProgress
    Status:                False
    Type:                  Ready
Events:
  Type    Reason  Age                          From          Message
  ----    ------  ----                         ----          -------
  Normal  Issued  2m40s (x1202305 over 2d18h)  cert-manager  Certificate issued successfully

After that, I updated my Certificate resource to remove the uriSANs key because we're not using it right now, and right after it the resource went into an Issued state:

spec:
  dnsNames:
  - "*.staging.penneo.com"
  - staging.penneo.com
  duration: 2160h0m0s
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: acme-staging
  keyAlgorithm: rsa
  keyEncoding: pkcs1
  keySize: 2048
  renewBefore: 360h0m0s
  secretName: staging-penneo-com-tls
  subject:
    organizations:
    - penneo
  usages:
  - server auth
  - client auth
status:
  conditions:
  - lastTransitionTime: "2020-06-29T08:37:44Z"
    message: Certificate is up to date and has not expired
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2020-09-27T07:23:24Z"

@meyskens
Copy link
Contributor

meyskens commented Aug 4, 2020

In 0.16 we added a bunch of improvements to the certificate controller. Can you try to test that one to see if it still happens?

@meyskens meyskens added kind/flake Categorizes issue or PR as related to a flaky test. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Aug 4, 2020
@jetstack-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2021
@jetstack-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale

@jetstack-bot jetstack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 15, 2021
@jetstack-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

@jetstack-bot
Copy link
Contributor

@jetstack-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

5 participants