New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solutions for split-horizon and restricted DNS environment issues #903
Comments
I believe there's a third option, which is not as nice, but may provide an escape hatch for users. From https://github.com/ietf-wg-acme/acme/blob/master/draft-ietf-acme-acme.md#retrying-challenges: I think this leaves open the possibility that we can just allow for disabling of the self check entirely, and have a user-specified timeout. This is essentially just hoping the user will configure things correctly to avoid using up their rate-limit, so such an option should come with a big warning. However, it would be simple to implement and allow for people to bypass this issue if their local dns environment is a problem. (e.g. the user can only use internal resolvers, and they don't have a resolver that will actually view the real state, they only have the internal zone to hand) That said, I think the second option is better, but I don't know that it's viable in all environments. |
https://tools.ietf.org/html/rfc2119 The RFC does support this, and quite well describes this case. Given we now have a 'back-off' on re-issuance attempts (currently hardcoded at ever 5 minutes), we could consider implementing this. Users would still hit the quota for attempts quite quickly in failure cases. With #809 we could extend this backoff behaviour further as well. |
For route 53 the GetChange api call will help to validate the record is resolvable. https://docs.aws.amazon.com/sdk-for-go/api/service/route53/#Route53.GetChange This is already done at https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/dns/route53/route53.go#L181-L193 Hence I think is safe to assume that when R53 says That covers the Challenge validation. However cert-manager also checks for the hostname that is requesting a cert to be resolvable, but afaik, DNS01 challenge doesn't require this, why the check is implemented in certmanager?, this makes split-dns even harder. I think solution
In my case, I have a dnsmasq server just for cert-manager, it only forwards queries for And this works like a charm if |
Additional info to ticket |
We are also facing this issue and @mikebryant's suggestion with a backoff policy seems very resonable. Split DNS resolves public and internal IPs for the same domains, depending on the origin. Worse yet, we cannot simply workaround this using |
Hey guys any update on this?. Is removing the check a valid option here? I could submit a PR. In my case I do the validation using the It works great. |
Even in 0.5.2, the problem seems to be occurred.
|
Hey Guys, Have just run into this issue where our environment has no internet access and cannot get to the NS of the zone to check propagation. It's understandable why this would be the desired behaviour as you can remove the chance that negative caching comes into affect. For our purposes we understand this possibility and ran a quick test with modified code that uses the dns01-self-check-nameservers as the authoritative name servers instead. It works as expected and allowed the process to complete. I would propose a new flag --dns01-check-authoritative that defaults to true so we can toggle the behaviour. Happy to put the PR together as we have most of the work already done. |
Happy to report back to inform that the new 0.6.0 works perfectly to work around this issue. Using Many thanks to the community. |
For anyone running split-horizon DNS who finds this like I did, the config changes end up looking like this despite the Docs specifying otherwise: - --dns01-recursive-nameservers="1.1.1.1:53"
- --dns01-recursive-nameservers="1.0.0.1:53"
- --dns01-recursive-nameservers-only |
An [issue][0] describes a problem I was experiencing where the domain I was trying to update is overridden by my local DNS settings (split DNS). This change makes it so that, when performing a DNS01 challenge, `cert-manager` will use a public DNS server instead of the local one. [0]: cert-manager/cert-manager#903
The [docs][0] don't seem to be correct. A [comment on an issue][1] might be more correct. [0]: https://cert-manager.io/docs/configuration/acme/dns01/#setting-nameservers-for-dns01-self-check [1]: cert-manager/cert-manager#903 (comment)
I have seen a lot of issues regarding the DNS01 self check come up.
There seems to be two categories of problem, and this issue tries to encapsulates these different problems in order to come up with a solution that helps resolve them.
So far, the two big problems I see:
Restricted DNS environments
Here, a user has configured their cluster/VPC so that all outbound traffic on port 53 is denied, except for the one cluster DNS server (i.e. kube-dns, or their route53 resolver).
This consequently means the DNS01 self-check will timeout, because cert-manager will recurse up the DNS records to find the authoritative nameserver, and query that.
We attempt to query the authoritative nameserver so we can be sure the record has been updated at the root - this is how Let's Encrypt perform their own validations, and as such is the most concrete way to verify a validation will succeed.
#877
Split-horizon DNS environments
Here, a user has a public and private zone configured in Route53 (or similar) for the same domain. This is typically done to allow applications in the cluster to resolve hostnames to internal endpoints instead of using the regular 'user facing' endpoints.
This creates issues for cert-manager because when we perform a DNS query to find the DNS authority, the internal nameserver will respond with the private DNS zone root, consequently failing the self-check (as cert-manager will have updated the public zone).
This is mitigated by allowing users to specify the
--dns01-self-check-nameservers
flag, which will alter the 'root' nameserver used to perform the initial query - the idea here being that by specifying e.g. 8.8.8.8, they will begin recursing the public zone to find the authority that Let's Encrypt will see.In a worse class of this problem, cert-manager may actually update the private DNS zone without being aware it is internal only, and consequently pass the self check despite the record not being publicly available, eventually resulting in the Let's Encrypt quota/rate limit being used up.
In order to mitigate this, cert-manager also allows specifying a
hostedZoneID
in some DNS provider configurations (e.g. route53) which allows the user to override the auto-hosted zone selection logic. This works well, but then requires a DNS provider configuration per DNS zone to be configured, which is awkward and not as we design the API to be used.Both of the above combined
Some users experience both of these issues together.
cert-manager will update the public DNS zone (assuming they use the hostedZoneID field in Route53's case), but they'll only be able to query the local DNS server due to network policies.
In order to mitigate this, a user would need to setup a separate DNS server that uses a public resolver upstream and point cert-manager at that using the
--dns01-self-check-nameservers
flag.#894
Possible solutions
Allow forcing all DNS queries to go through a single resolver - this solution kind of works, but will fall down in the face of DNS caches/ttls. It would not be possible to point all traffic at the authoritative nameserver, as that would restrict users to only being able to obtain certificates for domains hosted within that zone.
Clearly document, or provide assistance in configuring a DNS resolver that will handle this for us. Users in restricted network environments could then allow just that DNS resolver access to the wider internet (or simply 8.8.8.8 et al).
???
It seems the majority of users experiencing this are on AWS, so we've not heard many complaints about the lack of
hostedZoneID
field in other providers.The field was actually not meant to be a part of cert-manager, but I didn't notice its addition until after we began shipping releases. Ideally, I'd like to remove the field before we reach 1.0.
In order to do so, we need to work out how to either auto-detect public/private zones for all DNS providers, or otherwise add some dns-provider specific configuration to our Certificate resource.
Related #783
/area provider-acme
The text was updated successfully, but these errors were encountered: