-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STRICT_DNS drops cluster members on lookup failure #2691
Comments
@jasonmartens the history here is that when we used to use Your other option is to enable active health checking against the endpoints. This will stabilize the endpoints since Envoy will trust active HC over discovery. |
Enable TCP Client Metrics Signed-off-by: gargnupur <gargnupur@google.com> Enable TCP Client Metrics Signed-off-by: gargnupur <gargnupur@google.com> Remove extra line Signed-off-by: gargnupur <gargnupur@google.com> Regenerate wasm files Signed-off-by: gargnupur <gargnupur@google.com>
@junr03 fixed this recently. |
Fix a possible use-after-free with platform cert verification by using a unique_ptr in the flat_hash_set of pending validations. The flat_hash_set does not ensure pointer stability, but the validation thread holds a pointer to the PendingVerification, which is problematic. This PR makes PendingVerification non-moveable and non-copyable which avoids this problem. There is also another potential use-after free in that the task posted to the dispatcher deletes the PendingValidation, but the PendingValidation touches member variables after the call to post. Reordered the call to post to avoid this. Fixes #2691 Signed-off-by: Ryan Hamilton rch@google.com Signed-off-by: JP Simard <jp@jpsim.com>
Fix a possible use-after-free with platform cert verification by using a unique_ptr in the flat_hash_set of pending validations. The flat_hash_set does not ensure pointer stability, but the validation thread holds a pointer to the PendingVerification, which is problematic. This PR makes PendingVerification non-moveable and non-copyable which avoids this problem. There is also another potential use-after free in that the task posted to the dispatcher deletes the PendingValidation, but the PendingValidation touches member variables after the call to post. Reordered the call to post to avoid this. Fixes #2691 Signed-off-by: Ryan Hamilton rch@google.com Signed-off-by: JP Simard <jp@jpsim.com>
Title: STRICT_DNS drops cluster members on lookup failure
Description:
We are using Envoy in a Consul environment. We would like to use DNS lookups to configure our clusters. For our particular use case, we need Envoy instances in DCs around the world to locate a set of hosts in one datacenter. To do this, we are using a prepared query. In short, this allows us to do a global lookup of the set of hosts we need and query it using DNS.
However, when network lag is too great the DNS response occasionally returns NXDOMAIN, instead of the set of IPs it normally returns. When using STRICT_DNS for the cluster, this is catastrophic, because all hosts are removed from the cluster causing downtime until the next successful DNS query happens.
Instead, I would like Envoy to consider the DNS entries as advisory, and keep using the last known set until lookups recover.
Workarounds
We are trying out LOGICAL_DNS instead, which seems to have the DNS lookup properties that we want. However, we do have a set of Envoy sidecars that are the result of the lookup, and it would be better if downstream Envoy could maintain connections to upstream envoy instances. From what I can tell, LOGICAL_DNS also does not use HTTP/2?
We are just getting started with Envoy, so maybe there is something obvious I'm missing. But from what I can tell, the behavior of STRICT_DNS is more what we want than LOGICAL_DNS.
The text was updated successfully, but these errors were encountered: