Skip to content

On-demand DNS resolution #20562

@howardjohn

Description

@howardjohn

Title: On-demand DNS resolution

Description:
Currently, DNS clusters (LOGICAL and STRICT), constantly resolve DNS in the background.

We see fairly common reports of DNS servers being overloaded due to having a large explosion of these clusters across many Envoy workloads sharing the same DNS server (typically this is kube-dns).

Often, these workloads are either infrequently or never actually sending requests to these clusters. However, due to the operational complexity involved with having fine grained configuration, the cluster is still present on an excessive number of Envoy instances. Even with perfect configuration, we may have a service we call once per hour but need to resolve repeatedly.

We use the respect_ttl field, so sometimes this can be fixed with a larger TTL. However, this is not always configurable, and even when it is high can still lead to thundering herd problems.

It would be ideal if we could support on-demand DNS resolution. This could be implemented in a few ways:

  1. (preferred) on first request to a cluster, do DNS resolution and set timer to re-resolve (based on TTL). Once timer is ready, re-resolve only if there was any requests during this time. This means that as long as 1/QPS > TTL, we are blocked for DNS only on the first requests. If we have infrequent requests, we only resolve once per request.

  2. On the first request to a cluster, start DNS resolution like normal.

If we have full on-demand CDS this would look a lot like (2). However, I expect that will take a long time, and even when we have it this is still likely useful

Metadata

Metadata

Assignees

Labels

area/dnsenhancementFeature requests. Not bugs or questions.stalestalebot believes this issue/PR has not been touched recently

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions