Challenge Records Not Always Cleaned Up #3640

Evesy · 2021-02-08T16:39:51Z

Describe the bug:
After successfully completing dns-01 challenges, cert-manager is not always cleaning up the TXT records it created

Expected behaviour:
All DNS records related to challenges should be deleted once completed.

Steps to reproduce the bug:
TBC.

I currently cannot consistently reproduce the issue

Anything else we need to know?:

Environment details::

Kubernetes version: v1.17.14-gke.1600
Cloud-provider/provisioner: GKE
cert-manager version: 1.1.0
Install method: Custom helm chart

The issue only seems to affect challenge records provisioned in Google Cloud DNS, we don't see the same thing for Cloudflare DNS (Though about 95% of challenges are via Cloud DNS)

I can see in the GCP logging for one example the API requests to create the record, but no requests to later delete the record.

/kind bug

maelvls · 2021-02-09T08:08:08Z

Hi! It looks like a bug during the "finalizer" stage; when the issue happens, would you be able to share the cert-manager-controller logs?

/triage needs-information

Evesy · 2021-02-09T18:05:35Z

Hi! It looks like a bug during the "finalized" stage; when the issue happens, would you be able to share the cert-manager-controller logs?

/triage needs-information

Absolutely. I've increased the logging around cert-manager and will grab a copy of the logs the next time it happens

Evesy · 2021-02-11T13:57:00Z

Hey @maelvls -- Logs are here. Best I could do was a csv as they were exported from Kibana, cheers

maelvls · 2021-02-23T15:49:48Z

After almost 30 minutes into investigating the logs, I realized that I was looking at anti-chronological entries 😅

I then was surprised by the absence of a line that would say "finalizer" (something like controller/challenges/finalizer). The removal of the TXT records happens in acmechallenges/sync.go, and it seems like the Challenge object never gets deleted maybe?

The challenge itself seems to be properly deleted (I mean, metadata.deletionTime becomes non-null):

sync.go:101] controller/orders msg="Order has already been completed, cleaning up any owned Challenge resources" resource_kind="Order" resource_name="sauron-adverts-evo-app-tls-78s5d-3403441770" "resource_namespace"="sauron-adverts-evo-app" "resource_version"="v1"
round_trippers.go:443] DELETE https://10.192.0.1:443/apis/acme.cert-manager.io/v1/namespaces/sauron-adverts-evo-app/challenges/sauron-adverts-evo-app-tls-78s5d-3403441770-1727866623 200 OK in 4 milliseconds

Not sure why the finalizer logs don't show :(

Evesy · 2021-08-09T12:07:35Z

Hey, is there any more information you need on this? We're still seeing a quite a lot of challenge records left around after the certificate issuance.

Happy to collect anything that'd be useful to help debug

jetstack-bot · 2021-11-07T12:57:32Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

Evesy · 2021-11-08T11:55:18Z

/remove-lifecycle stale

jetstack-bot · 2022-02-06T12:01:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

Evesy · 2022-02-08T14:51:27Z

/remove-lifecycle stale

This is still occurring as of 1.6

wallrj · 2022-05-10T14:53:45Z

I've been looking at the code and noticed a few problems and potential cleanups:

Missing unit-tests
Refactor the update and updateStatus to a single deferred function #5121
WIP: Only remove the cleanup finalizer if the cleanup succeeds #5126
- Challenge Finalizer is always removed, regardless of whether solver.cleanup succeeds
- Challenge Finalizer is assumed to be the only (first) finalizer (breaks if external controllers add their own finalizers)

jetstack-bot · 2022-08-12T16:42:19Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

Evesy · 2022-08-12T21:36:18Z

/remove-lifecycle stale

jetstack-bot · 2022-11-10T21:58:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

Evesy · 2022-11-11T10:07:50Z

/remove-lifecycle stale

Evesy · 2023-01-03T15:36:27Z

@wallrj Hi, are there any plans to continue with the open PR to progress towards a fix for challenge records not always being cleaned up?

jetstack-bot · 2023-05-15T11:05:49Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

mecseid · 2023-05-15T11:55:56Z

/remove-lifecycle stale

I run into the same issue with DigitalOcean DNS services, which contains a lot of TXT record for the DNS challenge.

maaft · 2023-07-21T07:26:39Z

same issue here, also with DigitalOcean (didn't try other DNS services) ! It's a bit annoying.

jetstack-bot · 2023-10-19T07:54:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

Evesy · 2023-10-20T12:18:48Z

/remove-lifecycle stale

mycarrysun · 2023-12-19T19:17:54Z

Are there any updates on this? We're experiencing the same behavior in 1.13.3 with the azureDNS solver, but only with delegated domains. The regular subdomains in the same dns zone are cleaned up like normal.

D3CK3R · 2024-01-19T08:03:25Z

Any update here?

smeng9 · 2024-01-24T08:29:25Z

The digital ocean TXT records keep piling up.

After several renews the TXT records gets too large which exceeds max response size and lets encrypt refuses to parse it https://community.letsencrypt.org/t/max-response-size-for-dns-01/122700/6

Is there a solution to the TXT records clean up issue?

D3CK3R · 2024-01-24T10:11:06Z

Any simple workaround for this? We have hundreds of records in our DNS

cert-manager-bot · 2024-05-03T16:14:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

mycarrysun · 2024-05-03T16:35:25Z

/remove-lifecycle stale

Routhinator · 2024-07-02T18:37:20Z

This is a problem, the behaviour here leads to issues with rate limiting as different DNS automations like cert-manager and external-dns have to perform more and more queries to check all pages.

pre · 2024-08-21T10:10:21Z

Stale/Stuck Challenges should be deleted after a given timeout #7234

jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 8, 2021

jetstack-bot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Feb 9, 2021

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2021

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2021

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2022

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2022

jakexks mentioned this issue Apr 28, 2022

DNS-01 challenge records not cleaned up #4681

Closed

wallrj added the area/acme Indicates a PR directly modifies the ACME Issuer code label Apr 28, 2022

This was referenced May 12, 2022

Refactor the update and updateStatus to a single deferred function #5121

Merged

WIP: Only remove the cleanup finalizer if the cleanup succeeds #5126

Closed

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2022

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2022

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2022

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2022

irbekrm removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Feb 14, 2023

irbekrm added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 14, 2023

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 15, 2023

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 15, 2023

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2023

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2023

wallrj self-assigned this Nov 1, 2023

wallrj added this to the 1.15 milestone Feb 3, 2024

wallrj removed their assignment Feb 3, 2024

cert-manager-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2024

cert-manager-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2024

inteon modified the milestones: 1.15, 1.16 May 14, 2024

This was referenced Sep 18, 2024

Only remove the cleanup finalizer if the cleanup succeeds #7286

Open

Stale/Stuck Challenges should be deleted after a given timeout #7234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge Records Not Always Cleaned Up #3640

Challenge Records Not Always Cleaned Up #3640

Evesy commented Feb 8, 2021

maelvls commented Feb 9, 2021 •

edited

Loading

Evesy commented Feb 9, 2021

Evesy commented Feb 11, 2021

maelvls commented Feb 23, 2021 •

edited

Loading

Evesy commented Aug 9, 2021

jetstack-bot commented Nov 7, 2021

Evesy commented Nov 8, 2021

jetstack-bot commented Feb 6, 2022

Evesy commented Feb 8, 2022

wallrj commented May 10, 2022 •

edited by maelvls

Loading

jetstack-bot commented Aug 12, 2022

Evesy commented Aug 12, 2022

jetstack-bot commented Nov 10, 2022

Evesy commented Nov 11, 2022

Evesy commented Jan 3, 2023

jetstack-bot commented May 15, 2023

mecseid commented May 15, 2023

maaft commented Jul 21, 2023 •

edited

Loading

jetstack-bot commented Oct 19, 2023

Evesy commented Oct 20, 2023

mycarrysun commented Dec 19, 2023

D3CK3R commented Jan 19, 2024

smeng9 commented Jan 24, 2024

D3CK3R commented Jan 24, 2024

cert-manager-bot commented May 3, 2024

mycarrysun commented May 3, 2024

Routhinator commented Jul 2, 2024

pre commented Aug 21, 2024

Challenge Records Not Always Cleaned Up #3640

Challenge Records Not Always Cleaned Up #3640

Comments

Evesy commented Feb 8, 2021

maelvls commented Feb 9, 2021 • edited Loading

Evesy commented Feb 9, 2021

Evesy commented Feb 11, 2021

maelvls commented Feb 23, 2021 • edited Loading

Evesy commented Aug 9, 2021

jetstack-bot commented Nov 7, 2021

Evesy commented Nov 8, 2021

jetstack-bot commented Feb 6, 2022

Evesy commented Feb 8, 2022

wallrj commented May 10, 2022 • edited by maelvls Loading

jetstack-bot commented Aug 12, 2022

Evesy commented Aug 12, 2022

jetstack-bot commented Nov 10, 2022

Evesy commented Nov 11, 2022

Evesy commented Jan 3, 2023

jetstack-bot commented May 15, 2023

mecseid commented May 15, 2023

maaft commented Jul 21, 2023 • edited Loading

jetstack-bot commented Oct 19, 2023

Evesy commented Oct 20, 2023

mycarrysun commented Dec 19, 2023

D3CK3R commented Jan 19, 2024

smeng9 commented Jan 24, 2024

D3CK3R commented Jan 24, 2024

cert-manager-bot commented May 3, 2024

mycarrysun commented May 3, 2024

Routhinator commented Jul 2, 2024

pre commented Aug 21, 2024

maelvls commented Feb 9, 2021 •

edited

Loading

maelvls commented Feb 23, 2021 •

edited

Loading

wallrj commented May 10, 2022 •

edited by maelvls

Loading

maaft commented Jul 21, 2023 •

edited

Loading