Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ASYNC net.dns.resolver fails with getaddrinfo(err=12): Timeout while contacting DNS servers with Elasticsearch shipper #7105

Closed
gavenkoa opened this issue Apr 2, 2023 · 6 comments

Comments

@gavenkoa
Copy link

gavenkoa commented Apr 2, 2023

Reproduced in v1.8.11 & 1.9.7 with Elasticsearch plugin & net.dns.resolver ASYNC:

[ warn] [net] getaddrinfo(host='evil.com', err=12): Timeout while contacting DNS servers
[OUTPUT]
    # https://docs.fluentbit.io/manual/pipeline/outputs/elasticsearch
    Name            es
    Match           app
    Host            ${es_host}
    Port            9200
    HTTP_User       ${es_user}
    HTTP_Passwd     ${es_pass}
    Retry_Limit     False

    Logstash_Format  On
    Logstash_Prefix  app
    Logstash_DateFormat %Y-%m-%d
    Current_Time_Index Off

    Include_Tag_Key Off
    Generate_ID     Off
    Trace_Output    On
    Trace_Error     On

    tls on
    tls.verify off

    # TCP or UDP
    net.dns.mode UDP
    # true / false
    # net.dns.prefer_ipv4 true
    # LEGACY or ASYNC
    net.dns.resolver ASYNC

Changing net.dns.resolver to LEGACY resolved the problem in both versions. net.dns.resolver is documented only in v1.9:

https://docs.fluentbit.io/manual/v/1.9-pre/administration/networking

Similar reports:

@gavenkoa
Copy link
Author

gavenkoa commented Apr 2, 2023

OS Windows Server 2012 R2, 64bit. With net.dns.mode TCP fails as corporate environment forbids it.

@patrick-stephens
Copy link
Contributor

Do you know if the 2.0.10 releases work correctly?
There will be no further official 1.8 or 1.9 releases hence the query to see if it is an issue we need to resolve for the 2.0 and upcoming 2.1 series.

@wtchangdm
Copy link

wtchangdm commented Apr 19, 2023

I have the same problem and saw this on version 2.0.10, 2.0.11 and 2.1.0. I am using the loki output though.

Some info:

  1. Haven't explicitly set anything for net.dns.resolver.
  2. We are running daemonsets (with 2.0.11) on an EKS cluster.
  3. The cluster has NodeLocal DNSCache. The DNS resolution time shouldn't be an issue.
[2023/04/19 05:15:00] [ warn] [net] getaddrinfo(host='<REDACTED>.', err=12): Timeout while contacting DNS servers
[2023/04/19 05:15:00] [error] [output:loki:loki.0] no upstream connections available
[2023/04/19 05:15:00] [ warn] [net] getaddrinfo(host='<REDACTED>.', err=12): Timeout while contacting DNS servers
[2023/04/19 05:15:00] [error] [output:loki:loki.0] no upstream connections available
[2023/04/19 05:15:00] [error] [upstream] connection #-1 to tcp://unavailable:0 timed out after 10 seconds (connection timeout)
...

The output looks like the following:

[Output]
        Name loki
        Match *
        host    <REDACTED>
        port    443
        tls     On
        workers 1
        labels service=$kube_namesapce

Will try net.dns.resolver LEGACY and report here if it helps.

@wtchangdm
Copy link

Almost 24 hours later, the DNS error seems gone after setting net.dns.resolver to LEGACY with Version 2.1.0.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Jul 20, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants