You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Our k8s cluster (emissary-igress running on bare metal as daemon set on 3 dedicated nodes) started to act very weird, we noticed this by receiving (for particular services only) a:
no_healthy_upstream
But upstream service where ok and fully reachable from all pods (running manually a curl command)
We decided to restart one of the pod and after that it started to return expected status, but other were still not.
After further investigation we noticed following in the logs:
emissary-ingress-l55kb emissary-ingress [2024-02-14 10:57:03.261][85][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:152] dns resolution for bug-service-fastapi.bug-service failed with c-ares status 12
emissary-ingress-l55kb emissary-ingress [2024-02-14 10:57:03.261][85][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:245] DNS request timed out 4 times
emissary-ingress-l55kb emissary-ingress [2024-02-14 10:57:03.261][85][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:278] dns resolution for bug-service-fastapi.bug-service completed with status 1
Which gave us a suspicious that this might be related to envoy DNS.
Funny enough when checking on each particular pod dns resolve is working fine for given host, which again leads to a conclusion this is something inside envoy.
We also noticed that when running netstat -anp | grep :53 | grep ^udp | grep ESTABLISHED on affected pods we constantly see same ESTABLISH connection, similar to:
# netstat -anp | grep :53 | grep ^udp | grep ESTABLISHED
udp 0 0 10.243.192.5:34923 192.168.0.10:53 ESTABLISHED -
and when repeating that command multiple times we got same output (source port), were on working pods source port was changing.
The issue was also seen by checking envoy_dns_cares_timeouts metric.
Is this something known?
Unfortunately after restart of remaining pods, everything returned to normal.
At the moment we cannot reproduce this, but I'm afraid (according to murphy's law) it will happen again sooner than later.
To Reproduce
Not deterministic unfortunately
Expected behavior
emissary-ingress (and underlying envoy) are capable of doing DNS resolve and therefore returns expected status for upstream system (instead of no_healthy_upstream)
Versions (please complete the following information):
Hi @arista-marcin , to my knowledge we've not seen anything like that. I'm not finding any references to this error. Thanks for reporting. If you do see it again, please let us know.
Describe the bug
Our k8s cluster (emissary-igress running on bare metal as daemon set on 3 dedicated nodes) started to act very weird, we noticed this by receiving (for particular services only) a:
no_healthy_upstream
But upstream service where ok and fully reachable from all pods (running manually a curl command)
We decided to restart one of the pod and after that it started to return expected status, but other were still not.
After further investigation we noticed following in the logs:
Which gave us a suspicious that this might be related to envoy DNS.
Funny enough when checking on each particular pod dns resolve is working fine for given host, which again leads to a conclusion this is something inside envoy.
We also noticed that when running
netstat -anp | grep :53 | grep ^udp | grep ESTABLISHED
on affected pods we constantly see same ESTABLISH connection, similar to:and when repeating that command multiple times we got same output (source port), were on working pods source port was changing.
The issue was also seen by checking
envoy_dns_cares_timeouts
metric.Is this something known?
Unfortunately after restart of remaining pods, everything returned to normal.
At the moment we cannot reproduce this, but I'm afraid (according to murphy's law) it will happen again sooner than later.
To Reproduce
Not deterministic unfortunately
Expected behavior
emissary-ingress (and underlying envoy) are capable of doing DNS resolve and therefore returns expected status for upstream system (instead of no_healthy_upstream)
Versions (please complete the following information):
The text was updated successfully, but these errors were encountered: