-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Title: On-demand DNS resolution
Description:
Currently, DNS clusters (LOGICAL and STRICT), constantly resolve DNS in the background.
We see fairly common reports of DNS servers being overloaded due to having a large explosion of these clusters across many Envoy workloads sharing the same DNS server (typically this is kube-dns).
Often, these workloads are either infrequently or never actually sending requests to these clusters. However, due to the operational complexity involved with having fine grained configuration, the cluster is still present on an excessive number of Envoy instances. Even with perfect configuration, we may have a service we call once per hour but need to resolve repeatedly.
We use the respect_ttl field, so sometimes this can be fixed with a larger TTL. However, this is not always configurable, and even when it is high can still lead to thundering herd problems.
It would be ideal if we could support on-demand DNS resolution. This could be implemented in a few ways:
-
(preferred) on first request to a cluster, do DNS resolution and set timer to re-resolve (based on TTL). Once timer is ready, re-resolve only if there was any requests during this time. This means that as long as
1/QPS > TTL, we are blocked for DNS only on the first requests. If we have infrequent requests, we only resolve once per request. -
On the first request to a cluster, start DNS resolution like normal.
If we have full on-demand CDS this would look a lot like (2). However, I expect that will take a long time, and even when we have it this is still likely useful