New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove hardcoded DNS servers #56
Conversation
|
Hi @fenichelar, It seems you haven't yet signed a CLA. Please do so here. Once you do that we will be able to review and accept this pull request. Thanks! |
|
The logic is consistent with the way most systems work:
As an additional fallback, if no DNS servers are manually configured or configured via DHCP, it will use Cloudflare's DNS servers. Most systems don't have functionality like this so DNS requests would just fail, but this may help reduce user issues. |
|
I do not think it is necessary, or even correct, to remove the Docker Embedded DNS server (dns://127.0.0.11). As far as I interpret the corefile template, it does not treat the Docker Embedded DNS server as a fallback server, but it ensures that either a DHCP supplied DNS server is used or a Docker supplied DNS server. |
|
@gjdoornink I updated the logic. Expanded for readability: Thoughts? |
|
@fenichelar |
|
@gjdoornink How? |
|
The hosts in the forward statement form a pool of load balanced DNS resolvers. How the requests are distributed to the resolvers in the pool is determined by the policy. But regardless of which policy is used, requests for local host names will eventually be sent to any of the resolvers in the pool. This means that a certain amount of requests for local host names will inevitably be sent to one of the cloudflare servers and those requests will fail. In pull request #55 I aimed to fix a single use case since I did not feel comfortable overhauling the entire structure without first discussing the feasibility of such a change with the original authors. But since you are suggesting a more encompassing change ... :-) Original corefile, without manually configured DNS servers, but with DHCP assigned DNS server: Your corefile, without manually configured DNS servers and without DHCP assigned DNS server: All requests to dns://127.0.0.1:5553 are forwarded to cloudflare servers which cannot resolve hosts in the local network/docker infrastructure. |
|
@gjdoornink I'm not following what the issue is.
Yes. This is the same way it behaves today and the same way it behaves with your PR. If you don't provide DNS servers to Home Assistant that are capable of resolving local domain names, then it won't be able to resolve local domain names. There is nothing we can do about this. |
|
@fenichelar |
If there are DNS servers that can resolve local domain names, then I agree. But if no DNS servers that can resolve local domain names are known to Home Assistant, then the forward line will contain DNS servers that can't resolve local domains. The only alternative would be to not have any forwarding configured. Maybe the best way to communicate would be to just let me know what you are thinking for the forwarding line. I'm open to suggestions for my PR. I like your PR but it doesn't solve the problem I am having (I don't want DNS requests to be sent to Cloudflare under any circumstances because the requests are blocked so it just causes noisy logs and slow DNS failures). The issue I am having has been reported by many others as well. So I am trying to solve both of our problems with this PR How would you change the above? |
|
It can indeed be difficult to communicate across an ocean and a language barrier :-) My statement is that dns://127.0.0.11:53 is also a local resolver just as the DHCP assigned DNS server. Your pull request tries to resolve a different issue than mine, that is clear. If I were in your shoes, I would reinstate the following line: But I have not tested this, of course :-) Kind regards, |
The above line is what is causing my issue. If my DNS server returns any of these errors, then I do not want Home Assistant to try Cloudflare's DNS server. The errors are intentionally returned in many cases (DNSSec failure, domain blocking, domain not found, etc.) Thoughts on this? |
|
@fenichelar |
|
@gjdoornink What is wrong with the corefile I provided? I feel like we are overcomplicating this. Think about how DNS works in ever other device you have (computers, phones, streaming boxes, game consoles, camera, etc.). If DNS servers are manually configured, those DNS servers are used and no other servers. If DNS servers are provided via DHCP, those DNS servers are used and no other servers. Failover is handled by providing multiple DNS servers manually or via DHCP. In every other device I have, if the configured DNS server returns And this has significant drawbacks: From my Mac: From Home Assistant: Both my Mac and Home Assistant are connected to the same network, same switch, same DNS server provided via DHCP. Yet Home Assistant took 16 times as long to determine the domain doesn't exist. And this is with Cloudflare not being blocked by my firewall. And here is an example for DNSSec failure: Mac: Home Assistant: Home Assistant was 140 times slower. I only left Cloudflare in the corefile because I wanted to change as little as possible and solve the issue. But I really don't think any DNS server should be hardcoded like that. What other device has hardcoded DNS servers? Personally, I think the config should be: Which makes Home Assistant behave just like every other device does. I don't understand why we need hardcoded DNS servers. |
|
what would happen if : Requests would go to the Docker embedded server ( dns://127.0.0.11:53 ), right ? I think this might be the reason why the core devs added cloudflare servers. Any solution to this dns mess shouldn't remove the ultimate fallback, even though in any half-decent network setup this shouldn't ever happen. |
Yes
I'm not sure but the way it currently works if no DNS servers are configured manually or with DHCP is that requests are sent to the Docker Embedded DNS server and then Cloudflare. I just removed Cloudflare but kept the rest of the logic essentially the same.
Can you explain why you think Home Assistant should have a failover DNS server if one is not provided manually or through DHCP? I think DNS requests should fail in this scenario. Does you computer, phone, streaming box, game console, printer, or any other network device have a failover DNS server? Mine don't. Home Assistant is the only device in my network that has a hard coded failover DNS server as far as I know. I truly don't understand why Home Assistant should be any different than every other device. I know making Home Assistant "just work" is an important goal, but Apple has the same goal with their products and neither Macs nor iOS devices have a hard coded failover DNS servers. If DNS doesn't work in your network, none of your devices are going to work. I think Home Assistant not working in this scenario is not only acceptable, it is appropriate. That being said, I am certainly happy to contribute on a PR that makes failing over to Cloudflare's DNS servers optional. But I am not sure I am familiar enough with supervisor (where the config option would be managed) to try and tackle this on my own. |
It shouldn't have a failover. I fully agree with your reasoning. My only concern is that this PR might get refused/ignored because it is contrary to what core devs have decided.
Couldn't your logic/PR be extended to add the failover into the corefile ? |
|
For me, Core DNS was always a solution looking for a problem, DNS has worked the same way for decades and does not need 'fixing'' HA should use DNS servers configured by DHCP or the user directly. If none are set, or the ones that are set fail, then HA should just fail. Adding a fallback is just moving a network problem elsewhere and making it more difficult to trace and resolve. Personally I think hard coded DNS of any kind is a breach of the users privacy and trust and should be removed. If the core devs believe that there should be a fallback (as I believe they do, having had four requests to change this over the last 12 months rejected), then: Firstly it should just use plain port 53 unencrypted DNS, not DoT. This is because port 53 DNS requests can easily be redirected to a local/alternate DNS server at the network firewall, this cannot be done with DoT requests as the SSL cert of a local DoT server will not match the requested domain. Secondly, the fallback server should only be contacted on SERVFAIL, because NXDOMAIN and REFUSED are not errors. Thirdly, currently if the fallback is blocked, a cascading failure mode occurs. HA gets stuck in a loop sending millions of DNS request to the fallback to the point where all other components become unresponsive. (the fallback for the fallback is the fallback). Clearly if the fallback fails, it's should not be contacted again. |
|
perhaps trying a parallel PR which does nothing more than removing NXDOMAIN and REFUSED from the corefile ? |
Maybe they do, maybe they don't. At least some of the requests were hostile and just turned into unproductive name calling. I also haven't seen any PRs. Submitting an issue asking a maintainer to do something (particularly in an aggressive tone) and submitting a PR are very different. I'm not trying to point fingers at anyone, I am just saying that I think a productive conversation with the maintainers should be had about this PR before we come to the conclusion that this PR or a variation of it will not be accepted
Below is an incomplete list of potential paths forward:
Hopefully one of the maintainers can response providing some directions on how they want to proceed. I'm optimistic we can figure this out |
|
In 25y of IT experience as a system and network administrator, I never found a server software, or an enterprise sw, or an automation sw in this case, that extends the network configuration of the user/environment in which it runs, by auto-provisioning such an important thing as a FALLBACK DNS server. It's a bad design and practice. If the environment is not properly configured, it is CORRECT that the sw doesn't work, it should simply throw an error. Network configuration should never be decided by an app, it is up to the user/admin to make those choices. If they don't have the skills to decide, it means they're using standard DHCP, and that's it. Can you simply imagine a WEB SERVER "fallbacking" to a fixed external DNS server? Gosh...it would be considered a serious security violation. I really can't understand what was the idea behind the fallback enry in that configuration. If some dev could transparently explain what was the logic behind that decision, we could maybe understand better what most people consider a bad implementation. If the network environment is not properly configured, it is absolutely correct that HA doesn't work. HA should throw an error when it has DNS issues, like properly designed apps/frameworks/enterprise sw do, that's it. Furthermore, those benchmarks I read about DNS performance in HA would explain a lot of timeout/connection errors we see in HA logs. DNS is a critical service, it should be reliable, fast, efficient. Actual implementation is not at the level of quality we expect from HA. All this discussion about this DNS/networking architecture is almost surreal... |
I think you are bringing up a good point. We need to make sure that Home Assistant logs a clear error message if no DNS servers have been provided via DHCP or manually. It would also be good if Home Assistant logged clear error messages if the health checks to the DNS servers fail. I will dig into what the logs look like today under these scenarios and see if any improvements can be made. |
|
Agree that this is a good point - I ran into this as I was investigating issues caused by robots aggressively scanning intentionally exposed web servers. Doing due diligence and troubleshooting when I ran into this CloudFlare DoH problem. (Yes it's a problem, as I had made no conscious choice as the Security Architect for this functional requirement). Had to do a lot of digging to realize what was going on. Disturbed as this is neither a configurable item nor is it mentioned in the documentation. This must be documented so that the requirement is known and security administrators will expect this in their logs. I would hope the developers would allow an option to, um, opt out. If not that then at least clearly document what is going on. |
|
I have had discussions with the devs in the past on this subject, and it boils down to this: They believe that more 'issues' will be raised, that they then have to deal with, if a fallback DNS service is removed. Implementing a fallback DNS is more common than you would think, and is done by small and large organisations alike. eg. All Google devices (Google Home Hub, Google Speakers Nest Hubs, Chromecast, Android Phones, Nest Thermostats etc.) are all hard coded to use Googles public DNS servers, some as a fallback, and some just ignore any local DNS settings all together. The difference is when, for example, you block access to Google public DNS on your local network, these devices carry on working without issues using the locally configured DNS. HA does not. Block access to CloudFlair on your local network and HA goes into meltdown, sending out millions of requests until eventually the system becomes overloaded and fails. IMHO removing this all together would be the best option, but if it has to stay, then at least fix the 'meltdown' issue if access to the fallback is blocked, and/or move away from DoT and use basic DNS on Port 53 which can be easily redirected to a local service. |
If that is true, they should implement a configurable option to disable it, leaving it as default (horrible decision). You can't create problems to many people because you "think" you might have more issues. Let these things be configurable, hardcoding these kind of things is really a bad choice, especially for an open-source solution.
Sorry but I mentioned enteprise software and apps, not hardware devices designed explicitly to work with cloud. And even in that case, it is bad practice and design. My firewall disables all kind of dns traffic from LAN to WAN. I enforce usage of my local DNS server, had to do it exactly because many bad designed firmware in devices behave like that. But you'll never find a commercial enterprise app with a fallback dns not configurable, because it's an awful design choice.
My firewall/router (OPNsense) has been configured to block all DNS traffic to the internet, except from the DNS server itself, that that runs on the firewall. HA doesn't meltdown, it works. But I stumbled across the DNS problem because sometimes, only some times, it starts wanting to use the CloudFlare servers, and it can't, so it starts not working. But 95% of the time it works fine. I didn't notice the "millions of queries" issue. What I noticed is that generally, HA is really SLOW in resolving hostnames, because of the awful dns design/implementation. |
core maintainer :
May we know what is/was the main issue that you solved with this ? |
May we kindly ask to know what is the "main issue" you tried to solve? Maybe there are better ways to solve it, without creating other issues.
So, if we suggest to disable the fallback, explaining why, you ask for all kinds of PR and options to be implemented, but when you implemented the fallback, you didn't provide everything needed to enable/disable it. May I kindly ask why you didn't put that option in first place when you decided for the hard-coded fallback? Last question: why would a system with a disabled fallback option be considered "unsupported"? I really hope HA is still an open-source project, because it seems like when users have different opinions vs the official devs, there are too many imposed obstacles to overcome. Collaboration should be the way to pursue, not the opposite. The DNS implementation of HA is clearly badly designed and inefficient (@fenichelar benchmarked it, it is amazingly slow and buggy). We're asking to improve it because we love HA and would like the product to be better than it currently is. I look forward to devs helping us help them realize that there's much to improve regarding DNS. But that doesn't seem the case... You guys, with all other contributors, made an incredible product, it's a shame that a fundamental functionality of the system is not at the same general level, but the worst thing is you don't even want to discuss about it. BTW: you also closed #55 which has nothing to do with #56, it tackles a totally separate issue (yes, DNS has several issues). |
I don't have that kind of behaviour, but sometimes I have the problem that for some strange reasons HA decides to use cloudflare also for local resolution, and it obviously fails because any DNS lookup to the internet is blocked. See #55. This is what I see in the dns logs, with internet dns traffic blocked (default rule in my firewall): Would love to reproduce the same issue you have... |
|
As you can see from the reference blurb above I'm asking that his behavior and associated functional requirements be documented. As it stands now this is a hidden feature and causing some heartburn and performance issues for some of us. Let's see if we can get this documented. |
I agree that documenting it would be more appropriate, but it wouldn't solve the fundamental issue. At least we would know the reasons behind the decision to enforce a harcoded fallback without opt-out option. Will be very interesting reading it. |
|
@alexdelprete I am able to reproduce the @tescophil described. Try running |
|
@pvizeli Would a PR that changes the Cloudflare DNS servers to UDP instead of TLS be accepted? This would allow the DNS queries to be redirected to a different server while still providing failover like it does today. |
yes, all those green locks you see mean "DNSSEC validated" query. That's AdGuardHome on OPNsense, that acts as LAN DNS server, it then forwards all requests to unbound on OPNsense that is properly configured for DNSSEC, etc. I use AdGuard to filter, and unbound to do the real requests. These are the DIG tests: I don't see all those queries... |
|
@alexdelprete You won't see the queries because they use TLS. Setup packet logging in your pfSense firewall rules that block TCP port 853 and you should see all of the traffic. |
|
@alexdelprete |
|
This PR has been closed and possible/acceptable guidance has been provided. Meanwhile the discussion goes on and is no longer about the code provided in this PR but slowly becoming a more generic discussion. Please use our community forum or Discord chat for these purposes. Thanks! ../Frenck |
Frenck, what's the appropriate discord channel to discuss DNS? |
|
depends on what you'd like to discuss, probably helps to ask around on Discord for that matter, not GitHub. |
I am very much still focused on updating this PR. We are hoping to get an answer to this question: #56 (comment)
Can you please help us understand? The current failover implementation causes a lot of problems. For example, if internet connectivity is lost, HA slows to a halt on low powered devices because of the constant attempts to establish a TLS connection to Cloudflare. We are trying to understand what the goal of failover is so that this PR can be adjusted accordinly. Also, can you help with this?
|
This PR is closed. |
|
Thanks for the help :-/ |
|
@frenck We are all just trying to solve the problems we are facing. Two different PRs were implemented and both closed. When we try to discuss what PR would be accepted, we just get shot down. We appreciate the small amount of guidance received. But we were told to go do a bunch of software development if we want to solve this problem but when we try to get more guidance, we can't get any information. If you don't want to have this conversation on GitHub, then can you provide the specific location you want to have it? I understand there has been some hostility towards HA maintainers about this issue in the past. But that was coming from different people, don't punish us for their behavior. |
I'm trying to stop notifications and discussions on a closed PR; especially it's no longer discussing the actual code inside this PR. It generates tons of notifications that aren't actionable to any of us.
See: #56 (review)
See: #56 (comment)
We are not punishing, we don't consider it an issue, but are open for a change as written above. |
I am trying to discuss the code in this PR. I want to update the code so that it is accepted.
Can you add any more details to this comment? Specifically:
What issue?
What should be config flag be called? What should its type be?
Okay
Okay
What specifically should this flag do? Remove the Cloudflare DNS servers from the forward line, failover line, or both?
Does that mean Discord is the right place?
Are you saying that when Home Assistant losses internet connectivity the constant flood of TCP connection attempts slowing down HA is expected and therefore isn't an issue, is an unrelated issue, or is not something you have been able to reproduce? |
You can reopen it. If somebody decided to close it because there's no will for a serious discussion, doesn't mean it's a valid reason to close a PR and then use the closing as a reason not to discuss. Looks like a good method to you, but it's clear it's a method to avoid discussions.
That's the real issue that is neither logic nor understandable: can you, Pascal, or someone else explain why it's not an issue for you and it's an issue for many other people? We provided several arguments for which it IS an issue, but haven't read any serious factual argument from the ones who close PRs because they don't consider this an issue. You might as well close any possibility to open a PR for the DNS plugin, would avoid these discussions entirely. :)
That is not true: Pascal said that even following those guidelines, the systems that opt-out would be considered "unsupported". The reason for this? Not discussed, undisclosed. |
It's not an issue for them, full stop. They don't care if it's an issue for other users. The PR is closed, so you can't even discuss, because...it's closed, in agreement with who opened it. ;) |
This comment has been minimized.
This comment has been minimized.
|
also, stop mentioning Them, you'll get muzzled. |
|
I'm happy to discuss anything, all I asked was to stop discussing this on a closed PR. Yet, seems like that is ignored completely and additionally some snarky comments need to be made, which is really sad and doesn't motivate me at all... Locking down this PR as resolved/heated/off-topic. |
|
We make the decisions we do because with Home Assistant OS we are offering a solutions to users that works, for users that want to focus on about home automation, not system administration. We used to get a ton of issues with people incorrectly configuring their DNS and then Home Assistant stops working. All those issues disappeared when we did this. If you're interested in doing DNS stuff, probably Home Assistant OS is not for you. Consider using a Supervised or Container installation. Now Pascal did mention a way the forwarding can be resolved, but that people in this PR consider too much work nor do commenters agree with it being marked as unsupported. You can't have it all. Stray off the easy path, accept the consequences. If you want it to remain supported, it means it becomes our problem, because we are supporting it then. We are not interested in taking up any extra work that a) slows down any future improvements and b) benefits only a minority of our users. And as usual, any topic about DNS ends up with insults towards us, so it's closed. We are not interested in further discussing this topic. |






If DNS servers have been manually configured, use the manually configured DNS servers only. Otherwise if DNS servers have been supplied by DHCP, use the DNS servers supplied by DHCP only. Otherwise use Docker Embedded DNS server.
#20