Order stuck in errored state #4441

rittneje · 2021-09-09T00:54:13Z

Describe the bug:

We have a certificate that is supposed to be refreshed with Let's Encrypt once a week. Back in June, the Order failed mysteriously:

Failed to finalize Order: 403 urn:ietf:params:acme:error:orderNotReady: Order's status ("valid") is not acceptable for finalization

The Order then remained in the "errored" state for about 3 months, until the certificate itself finally expired. It seems that it is incorrectly trying to reuse the broken Order instead of starting a new one.

This particular certificate had been renewed many times previously. We did notice similar issues with other CertificateRequests/Orders, but not all the same time. Nonetheless, the results were identical. If an Order fails due to some transient issue, cert-manager incorrectly tries to reuse that broken CertificateRequest forever rather than making a new one, and then eventually the Certificate expires.

The only way to fix it is to manually delete the offending CertificateRequest, and then wait an hour for it to try again.

Expected behaviour:

It should have fixed itself automatically without any manual intervention.

Steps to reproduce the bug:

We really don't know what specifically caused the Order to fail in the first place, but once it does, this will happen.

Anything else we need to know?:

Environment details::

Kubernetes version: 1.18
Cloud-provider/provisioner: AWS EKS
cert-manager version: 1.2.0
Install method: kubectl apply

/kind bug

The text was updated successfully, but these errors were encountered:

irbekrm · 2021-09-10T17:14:43Z

This should have been fixed in #4130 which was released in v1.5.0- would you be able to upgrade and let us know if it got fixed?

rittneje · 2021-09-10T17:31:20Z

@irbekrm Sure we can check after our next upgrade cycle for cert-manager.

jakexks · 2021-10-07T11:27:42Z

Please re-open if you experience the issue when cert-manager is up to date.
/remove-kind bug
/triage support
/close

jetstack-bot · 2021-10-07T11:27:45Z

@jakexks: Closing this issue.

In response to this:

Please re-open if you experience the issue when cert-manager is up to date.
/remove-kind bug
/triage support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

PSanetra · 2021-11-29T16:12:08Z

@irbekrm I guess this is a duplicate of #2765 and does not seem to be fixed by 1.6.1

irbekrm · 2021-11-29T16:25:29Z

@irbekrm I guess this is a duplicate of #2765 and does not seem to be fixed by 1.6.1

From a brief look #2765 is about not retrying to finalize orders that have already been finalized.

The issue described here is also caused by retrying to finalize already finalized Orders- but creating a new Order after the previous one errored due to the repeated attempt to finalize may be a sufficient solution here as that should at least ensure that the failed certificate requests are retried (by creating a new CertificateRequest after the backoff period).

does not seem to be fixed by 1.6.1

Do you have some logs that you could add to #2765 ? Also the status of the Orders/CertificateRequests etc would be useful as well as what were you trying to achieve (why did you need to run kubectl cert-manager -n my-namespace renew my-cert manually?).

PSanetra · 2021-11-30T10:04:56Z

@irbekrm Sorry, I don't have the exact logs and resource states anymore. The issue is now resolved for us.

why did you need to run kubectl cert-manager -n my-namespace renew my-cert manually?

I needed to run this command as we have set the preferredChain option on the cluster issuer so that we always get the ISRG Root X1 chain. The certificates with the old chain was still considered valid by cert-manager, therefore we needed to run the renew command.

jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 9, 2021

jetstack-bot added triage/support Indicates an issue that is a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 7, 2021

jetstack-bot closed this as completed Oct 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order stuck in errored state #4441

Order stuck in errored state #4441

rittneje commented Sep 9, 2021 •

edited

irbekrm commented Sep 10, 2021

rittneje commented Sep 10, 2021

jakexks commented Oct 7, 2021

jetstack-bot commented Oct 7, 2021

PSanetra commented Nov 29, 2021

irbekrm commented Nov 29, 2021 •

edited

PSanetra commented Nov 30, 2021

Order stuck in errored state #4441

Order stuck in errored state #4441

Comments

rittneje commented Sep 9, 2021 • edited

irbekrm commented Sep 10, 2021

rittneje commented Sep 10, 2021

jakexks commented Oct 7, 2021

jetstack-bot commented Oct 7, 2021

PSanetra commented Nov 29, 2021

irbekrm commented Nov 29, 2021 • edited

PSanetra commented Nov 30, 2021

rittneje commented Sep 9, 2021 •

edited

irbekrm commented Nov 29, 2021 •

edited