-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS resolution timeout/failure in 1.8.9 #4260
Comments
Issue could be related to #4050 |
Please also see: #4257 |
I have a similar issue by using fluentbit logs:
|
I get the same error with the Elasticsearch output when configuring it with the Cloud_ID and Cloud_Auth config in both Minikube and AKS. The exact error (with v1.8.10) is:
So I have two observations:
|
On 1.8.10 sporadically reproduced too |
I tried the helm chart and manual installation but I have the same problem. is there any solution?
[2022/02/07 20:21:18] [ warn] [net] getaddrinfo(host='*****.westeurope.azure.elastic-cloud.com:9243', err=4): Domain name not found |
Hello, I had the same issue. Regards, |
As we see this issue in the latest version fluent/fluent-bit#4260
* Update fluent-bit to the latest version As the stable chart is not supported, used: https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/Chart.yaml * Pinned app version to "1.8.4" As we see this issue in the latest version fluent/fluent-bit#4260 * Add Unit tests and Documentation actions
Update fluent-bit to the latest version, as the stable chart is not supported, used: https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/Chart.yaml Pinned app version to "1.8.3", as we see this issue in the latest version fluent/fluent-bit#4260
Update fluent-bit to the latest version, as the stable chart is not supported, used: https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/Chart.yaml Pinned app version to "1.8.4", as we see this issue in the latest version fluent/fluent-bit#4260
* Upgrade logging to use latest version Update fluent-bit to the latest version, as the stable chart is not supported, used: https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/Chart.yaml Pinned app version to "1.8.4", as we see this issue in the latest version fluent/fluent-bit#4260
Can you retest with the latest 1.8 (1.8.13 currently) or 1.9.0 release? There have been various fixes around DNS. |
@patrick-stephens Issue also occurs in 1.9.3.
@bensta I think you are right. When I issue a curl inside the fluent-debug container then a response is returned. If the issue would be related to the kube-dns or resolving then a resolving error should be returned by curl as well. |
…t using cloud_id. Signed-off-by: 030 <chocolatey030@gmail.com>
…t using cloud_id. Signed-off-by: 030 <chocolatey030@gmail.com>
Which fluent-bit version are they running? |
@leonardo-albertovich 1.8.9, same as reported in this issue. |
We are also seeing this same issue with fluent-bit v1.9.3. We are seeing repeated log messages like the following, and fluent-bit does not upload logs using the Stackdriver output plugin:
I have not yet tried Side note: our platform is very geographically dispersed (all over the globe). Anecdotally, we are seeing these DNS issues with fluent-bit instances running on nodes that are very far away geographically from the VMs where CoreDNS is running in our clusters. We have been hypothesizing that the repeated failing DNS queries caused by |
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
@nkinkade I had to test stackdriver in GCE yesterday and in order to get it to work properly there you need to add this option |
@leonardo-albertovich How come all DNS settings are not documented? https://github.com/fluent/fluent-bit/blob/master/src/flb_upstream.c#L43 |
I think some of those settings catered some very specific corner cases and weren't meant for general usage. |
@leonardo-albertovich I will bring this up next time we have some sort of community or other meeting, IMO we should not have special hidden settings that only some maintainers understand and know about. If a setting needs a warning attached to it or some caveats, sure, that makes sense, but anything that exists should be documented IMO. |
@leonardo-albertovich: Thanks for the tip. Some small parts of our cluster run in GCP, but the overwhelming majority is comprised of globally distributed bare-metal machines. To be sure I understand the option, does it simply mean that if a DNS query returns both a v4 and v6 address for a name, that fluent-bit will always chose to use the v4 address over the v6 address? If so, I'm not sure how that would help us. In my previous post I said I would report back on what the I still suspect there is some sort of bug in fluent-bit that causes a deadlock or something similar after certain network timeouts or failures. |
Yes @nkinkade that setting would cause fluent-bit to prefer ipv4 records any time both ipv4 and ipv6 records are available. It doesn't mean it it will stop using ipv6 if that's the only record type available, it's just about ordering and it's meant to address a very specific issue in GCE and it's not useful outside of that environment. |
@PettitWesley I think there is no intention to hide configuration options, actually the binary helper list them here:
we will make sure to update the web docs with such same info, but again, there is no such "special hidden settings", just undocumented in web.. it will be fixed soon |
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com>
…s a port. (fluent#5458) [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com> Signed-off-by: 030 <chocolatey030@gmail.com> Signed-off-by: Manal Geries <mgeriesa@gmail.com>
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
…s a port. (fluent#5458) [fluentGH-4260] Resolve domain name not found by adding code that is capable of extracting the port if it exists. If not then the default 443 will be used. Signed-off-by: 030 <chocolatey030@gmail.com> Signed-off-by: 030 <chocolatey030@gmail.com> Signed-off-by: root <root@sumit-acs.novalocal>
Bug Report
Describe the bug
Hi, I am facing a DNS resolution timeout/failure using 1.8.9 with the forward module to a stackdriver.
To Reproduce
Your Environment
Additional context
Some fluent-bit pods eventually output logs such as Resource temporarily unavailable and gave up:
The text was updated successfully, but these errors were encountered: