Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't resolve internal host name via IPv4 #1085

Closed
svenstaro opened this issue Apr 27, 2020 · 16 comments
Closed

Can't resolve internal host name via IPv4 #1085

svenstaro opened this issue Apr 27, 2020 · 16 comments

Comments

@svenstaro
Copy link

Describe the bug
Carried over from hatoo/oha#56. Basically I got an internally resolvable host called internal.host (IPv4-only) that I need to use an internal DNS server for. nslookup and host work just fine and can resolve the name. trust-dns can't. I get

no record found for name: internal.host type: A class: IN

To Reproduce
https://github.com/hatoo/oha-56 was made by the author of oha to illustrate the problem.

Expected behavior
The host should resolve just fine like nslookup or host.

System:

  • OS: Arch Linux
  • Architecture: x86_64
  • Version: latest
  • rustc version: 1.44.0

Version:
Crate: trust-dns-resolver
Version: 0.19.3

Additional context
See https://github.com/hatoo/oha-56 and hatoo/oha#56

@bluejekyll
Copy link
Member

Can you share (feel free to remove the specific IP's and or names, but we need to understand the upstream resolver configuration, internal vs. external for example) of the /etc/resolv.conf? I'm guessing you have an internal DNS service.

I think what's possibly going on is that trust-dns is recognizing an upstream response from a public name service as being authoritative for .host which is a real domain. This represents a bad name choice for internal names, and trust-dns-resolvers dynamic usage of any nameservers in any order (which differs from glibc which always queries resolvers in order).

I've been considering to loosen the current usage of this configuration option: https://docs.rs/trust-dns-resolver/0.19.4/trust_dns_resolver/config/struct.ResolverOpts.html#structfield.distrust_nx_responses, to include distrusting all NX and NoError responses as well.

@svenstaro
Copy link
Author

Alright I'll try to share as much as I can. So:

My computer's network is managed by systemd-networkd and as the first resolver I use systemd-resolved. However, I also use dnsmasq after systemd-resolved to do some weirder stuff where I choose specific upstream DNS servers depending on the host. I'll illustrate this using actual config.

10.13.37.1 is my local network's router and my primary gateway. In this scenario, I'll try to reach enterprise.wtf. wtf being a real tld (though obviously this is not the real host :P. enterprise.wtf resolves fine using nslookup, host, etc. but not using trust-dns.

/etc/resolv.conf
# Managed by systemd-resolved
nameserver 127.0.0.1 # This is dnsmasq
nameserver 10.13.37.1
nameserver fd00::3681:c4ff:fee4:7801
resolvectl
Global
       LLMNR setting: yes                 
MulticastDNS setting: yes                 
  DNSOverTLS setting: no                  
      DNSSEC setting: allow-downgrade     
    DNSSEC supported: no                  
  Current DNS Server: 127.0.0.1           
         DNS Servers: 127.0.0.1           
Fallback DNS Servers: 1.1.1.1             
                      9.9.9.10            
                      8.8.8.8             
                      2606:4700:4700::1111
                      2620:fe::10         
                      2001:4860:4860::8888
          DNSSEC NTA: 10.in-addr.arpa     
                      16.172.in-addr.arpa 
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa 
                      18.172.in-addr.arpa 
                      19.172.in-addr.arpa 
                      20.172.in-addr.arpa 
                      21.172.in-addr.arpa 
                      22.172.in-addr.arpa 
                      23.172.in-addr.arpa 
                      24.172.in-addr.arpa 
                      25.172.in-addr.arpa 
                      26.172.in-addr.arpa 
                      27.172.in-addr.arpa 
                      28.172.in-addr.arpa 
                      29.172.in-addr.arpa 
                      30.172.in-addr.arpa 
                      31.172.in-addr.arpa 
                      corp                
                      d.f.ip6.arpa        
                      home                
                      internal            
                      intranet            
                      lan                 
                      local               
                      private             
                      test                
Link 3 (br0)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
DefaultRoute setting: yes                      
       LLMNR setting: yes                      
MulticastDNS setting: no                       
  DNSOverTLS setting: no                       
      DNSSEC setting: allow-downgrade          
    DNSSEC supported: no                       
  Current DNS Server: 10.13.37.1               
         DNS Servers: 10.13.37.1               
                      fd00::3681:c4ff:fee4:7801
/etc/dnsmasq.conf
server=/enterprise.wtf/192.168.99.100 # Use a special DNS upstream server for domains under enterprise.wtf
server=1.1.1.1

So yes, the internal host I'm trying to reach does use an actual TLD just as you suspected. And yes, this kind of sucks on the enterprise's part. However, this is nothing I have control over and I'd like my Rust stuff to work even in this kind of situation. Obviously I can't expect any global authority to validate my internal host's validity but that's fine for my case.

@bluejekyll
Copy link
Member

For reference, the only reserved DNS names are these: https://tools.ietf.org/html/rfc6761

See this thread for good understanding of why using non-reserved DNS names is dangerous: https://serverfault.com/questions/17255/top-level-domain-domain-suffix-for-private-network

I'd highly recommend encouraging your enterprise to stop using a non-regerstered domain. This is a potential security vulnerability waiting to happen.

As to fixing this, if you're interested in testing this, I think it could be done by expanding this to include NXDomain and NoError responses: https://github.com/bluejekyll/trust-dns/blob/76a3776d8840b4915390f241cda94d1f12b0df73/crates/resolver/src/name_server/name_server.rs#L141-L147

If that works, I was already considering expanding this for reasons like this.

@svenstaro
Copy link
Author

I'd highly recommend encouraging your enterprise to stop using a non-regerstered domain. This is a potential security vulnerability waiting to happen.

Yeah... I believe I'll need a few more years of convincing before making progress on that front. The implications are bad. :)

I did some hacking but I don't really have any overview of this code base and barely have any idea what I'm doing. I did this:

                if self.options.distrust_nx_responses {
                    match response.response_code() {
                        ResponseCode::ServFail => {
                            let note = "Nameserver responded with SERVFAIL";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        ResponseCode::NXDomain => {
                            dbg!(&response.response_code());
                            let note = "Nameserver responded with NXDomain";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        ResponseCode::NoError => {
                            dbg!(&response.response_code());
                            let note = "Nameserver responded with NoError";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        _ => (),
                    }

But I'm not sure I get the result of this:

[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
[/home/svenstaro/src/trust-dns/crates/resolver/src/name_server/name_server.rs:149] &response.response_code() = NXDomain
Error: proto error: Nameserver responded with NXDomain

Probably not how you wanted me to hack this. :)

@bluejekyll
Copy link
Member

That's the change I was expecting :)

That's probably correct, as that it will now treat these as "failures" and not accurately as "NxDomain" which is technically a successful response. It will trigger the logic to continue trying to resolve the name.

I see that you're still getting an error though, is it not finding the name you're looking for?

@svenstaro
Copy link
Author

Indeed, it still doesn't appear to resolve it. The code is the same as linked above: https://github.com/hatoo/oha-56

I put in google.com just for shits and that outputs the IP just fine.

drill is perfectly happy with my host:

drill enterprise.wtf
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 19998
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
;; enterprise.wtf.	IN	A

;; ANSWER SECTION:
enterprise.wtf.	297	IN	CNAME	enterprise2.wtf.
enterprise2.wtf.	297	IN	A	10.1.241.11

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 3 msec
;; SERVER: 127.0.0.1
;; WHEN: Mon Apr 27 21:03:40 2020
;; MSG SIZE  rcvd: 91

@bluejekyll
Copy link
Member

Oh, sorry, I've misled you I realize, I think I mislead you on the "NoError" state... That has to incorporate one more check, ResponseCode::NoError if response.answers().is_empty() => ...

I bet if you run cargo test on your current changes, a lot would be broken :)

@svenstaro
Copy link
Author

svenstaro commented Apr 27, 2020

True, without your changes, a lot of tests are broken and they are happy if I add that condition. Sadly, it doesn't change anything in my particular case.

Just to confirm, my code is now

                    match response.response_code() {
                        ResponseCode::ServFail => {
                            let note = "Nameserver responded with SERVFAIL";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        ResponseCode::NXDomain => {
                            dbg!(&response.response_code());
                            let note = "Nameserver responded with NXDomain";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        ResponseCode::NoError if response.answers().is_empty() => {
                            dbg!(&response.response_code());
                            let note = "Nameserver responded with NoError";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        _ => (),
                    }

and the output is the same as in #1085 (comment).

Judging by the output, my case never hits that match arm anyway.

@bluejekyll
Copy link
Member

Hm, that would imply we're never hitting your internal DNS server. Can you try configuring https://docs.rs/trust-dns-resolver/0.19.4/trust_dns_resolver/config/struct.ResolverOpts.html#structfield.num_concurrent_reqs to 1, I'm wondering if you're running into an existing other bug: #933

@svenstaro
Copy link
Author

I now have

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let resolver = trust_dns_resolver::AsyncResolver::tokio(
        Default::default(),
        trust_dns_resolver::config::ResolverOpts {
            ip_strategy: trust_dns_resolver::config::LookupIpStrategy::Ipv4Only, // oha --ipv4
            num_concurrent_reqs: 1,
            ..Default::default()
        },
    )
    .await?;

    let addrs = resolver
        .lookup_ip("enterprise.wtf") // put hostname here
        .await?
        .iter()
        .collect::<Vec<_>>();

    dbg!(addrs);

    Ok(())
}

and that didn't seem to change the behavior at all.

@bluejekyll
Copy link
Member

We need to cleanup these APIs... ugg.

So the issue with your new configuration, is that the Default ResolverConfig only uses the public Google DNS resolvers. You'll want to use this function to get your system's config: https://docs.rs/trust-dns-resolver/0.19.4/trust_dns_resolver/system_conf/fn.read_system_conf.html

That should be easier to find. Then you can change any of the ResolverOpts as necessary, but it will start with the opts as read from the resolv.conf. At that point you should get your internal DNS resolvers included in the lookup list.

@svenstaro
Copy link
Author

Oh gee, it turns out that we simply used that wrong then. Now when using read_system_conf(), everything just works without code changes in trust_dns_resolver!

However, I do run into #933 if I don't limit the concurrent requests to 1.

@svenstaro
Copy link
Author

All in all, I'm happy we quickly found the problem thanks to your help. It resulted in hatoo/oha#59 so at least we made the world a slightly better place.

Keep rocking.

@bluejekyll
Copy link
Member

That's great news. Is the concurrent lookup issue a bug with the code that you were experimenting with before? (the changes to error detection you made to name_server.rs)

@svenstaro
Copy link
Author

I reverted my changes and then found out that I hit #933 and then I added the workaround and the spurious problems went away. Not sure what else I'm supposed to be checking.

@bluejekyll
Copy link
Member

bluejekyll commented Apr 27, 2020

I was wondering if you could help debug #933, if you have time, I'm wondering if this:

                      match response.response_code() {
                        ResponseCode::ServFail => {
                            let note = "Nameserver responded with SERVFAIL";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        ResponseCode::NXDomain => {
                            dbg!(&response.response_code());
                            let note = "Nameserver responded with NXDomain";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        ResponseCode::NoError if response.answers().is_empty() => {
                            dbg!(&response.response_code());
                            let note = "Nameserver responded with NoError";
                            debug!("{}", note);
                            return Err(ProtoError::from(note));
                        }
                        _ => (),
                    }

Fixes that, if you have some time to check, it would help debug that other issue, with the concurrency set to something more than 1...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants