-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use nine hostnames per line instead of one #49
Comments
Hi @lewisje I've thought about this. I occasionally find myself eyeballing various regions of the hosts file, for various reasons. It seems much easier to scan a single column. If we go to multiple hosts per line, I think I would keep it to 80-100-columns wide, or thereabouts, which would impose a constraint fewer than nine certainly. Know what interests me greatly? Metrics for the performance of host files as a function of orthogonal factors such as
So far I've anecdotally seen few benefits, one way or another. The hosts file lookup appears to be sufficiently high in the latency stack to maybe not fret about? Either way, I'm curious to know. |
I think I should figure out how to precisely measure this, but I know that when I run I'm thinking this suggestion is more akin to delivering a minified JS file for wide-scale Web deployment while retaining a properly spaced-out JS file for development. |
@StevenBlack wrote:
That will be extremely valuable information if anyone performs the testing. I'm amazed that detailed tests have not already been publicly documented. Cross-platform testing is essential, and will enhance the value of the data even further. |
Hey guys, I ran some short tests. First of all it's important to mention that I did NOT do any statistically evaluable stuff here. Just one try for every test case. No repetition - just a "let's see where this could possibly lead" thingy. System Client: Windows 7 Desktop Connection: wired gigabit ethernet Test Case
Remote DNS-Server is 85.214.20.141 (https://digitalcourage.de/support/zensurfreier-dns-server) Results I used a hostsfile with 355981 entries. This is 0.0.0.0 only file - no ::1 entries. S = single entry (one host per line) - size 11 MB Unblocked Sites
Blocked Sites
Note: For this case I added the ::1 entry for googleanalytics and zzzha, so the AAAA-Request doesn't get forwarded. single entry to nine entries per line conversion - Bash Script I wrote a short script, so you can try it yourself. It needs input hostsfile as the argument. It writes the file hosts_nine. #!/bin/bash
echo "127.0.0.1 localhost" > hosts_nine; cat $1 | grep "^0" | sed "s/0\.0\.0\.0//g" | tr -d "\n" | egrep -o '\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+' | sed 's/^/0\.0\.0\.0 /g' >> hosts_nine NOTE: The there will be 0-8 entries missing in the generated file. With a base file of 300000+ entries this is "okay" for testing purposes I hope. This behaviour is a result of "let's not put too much time into this and live with the bias". The Problem here is the egrep expression. If the last entries of the file are not exactly 9 lines, they will be dismissed. |
Thank you @hd074, that's vastly interesting. This seems to confirm what I've seen through informal observation: not much, if any, measurable benefit. |
Next Thing: (127.0.0.1 + ::1) vs (0.0.0.0 + ::) and Filesize Again: I did NOT do any statistically evaluable stuff here. Test Case 1: 127.0.0.1 vs 0.0.0.0
Since the last test had shown that there's no real difference between cached or uncached entries when using blocked host names I did not test this separately this time. Results L= localhost version (127.0.0.1 and ::1)
Surprise, surprise: The DNS-Request itself does not differ. That's what we expected. Test Case 2: Filesize I just compared the results from both tests (355,981 vs 712,131 entries) NOTE: What I compared here is the following:
The fact that the second file doesn't contain new "unique" entries (its just all 0.0.0.0 entries duplicated and moved to ::1) MAY have an impact on the results. The point is that I can't (and don't want to) look into dnsmasq. Nonetheless the result show the same behaviour as the time I moved from a from a pure 0.0.0.0 hostsfile with 25,000 entries to a pure 0.0.0.0 hosts file with 355,000+ entries some time ago. Results
doubled file size, but the response time is not doubled. When I moved from small a file to an approximately ten times larger file some time ago the response time increased from 0.032 to 0.050 (if I remember correctly). So the file size itself does not seem to have a very big impact on response time... if using dnsmasq. |
This is great! |
@hd074 This is _fantastic_ data you are generating. For completeness, is this 32-bit or 64-bit Win7? Is it Win7 or Win7 SP1? Also, which edition of Windows are you testing? |
@StevenBlack thank you very much. @Gitoffthelawn thanks to you, too. further relevant: |
I think that in your script, where you have |
@lewisje you're right. thank you. corrected it. |
I forgot another tiny thing: You could also match for the start of the line and for a space after |
So is there a best methodology that can be adopted based on this dataset? |
See also #47 for more related discussion. |
Relating OS X, see also the Open Radar Bug Long /etc/hosts entries lead to unbearably slow resolution rdar://24237290 and the response of an Apple engineer. |
I guess that means that nine hostnames per line is a best practice for both Windows and Mac. |
It means that a 9 hosts per line file performs better than a >9 hosts per line file (on a mac). I don't really see the advantage of the nine hosts per line method (vs single entry per line). My concerns regarding this method are the readability and the maintainability. |
The way I understood it, it's like Windows doesn't read hostnames after the ninth on a line, so the max. for that platform is nine per line, and I had remembered that OS X could read 24 per line (never tested higher) but bogged down, but I wasn't aware that 10 was the tipping point (and 9 is still within the safe zone for a Mac).
never true. With that said, it definitely is easier to maintain a list of hostnames with one per line and then output a nine-per-line version for deployment. |
|
@lewisje Maybe I got you wrong. I thought "9 entries is best practice" was referring to the whole "1 entry vs 9 entries vs X entries"-problem. In this case I did and do not agree. |
Given that the only benefit of this proposed readability decrease is filesize reduction it seems to not be worth it. Even on mobile devices this filesize change is not significant. |
So closing this now. |
Are there any settings which can be made for dnsmasq which would load the full host file into memory and thereby making everything quicker? or is that default? |
@RoelVdP dnsmasq by default is caching the hosts file(s) in the memory and it's by far the fastest dns resolver.If there are any slow downs on your end you need to look for the problem elsewhere. |
@dnmTX thanks mate. Any way to check it is effectively loaded in memory when the file is rather large? Also, any way to make any cach(ing) larger? Thank you, very appreciated. |
@RoelVdP there is not really a easy way to check this as everything cached in the memory is in some hidden files,but i can assure you that this is the case. Dnsmasq is design to work from the memory and that is why is so fast.Along with the given hosts file(s) it caches every response as well so to check how effective it is,simply do
Now,you need to clarify how you blocking those domains.There are two options,one is trough the |
@dnmTX Thank you very much for the detailed reply. Excellent idea on the nslookup. Tried that and results are about 0.5 seconds for first lookups. So, I am not using any special config in dnsmasq but rather a large |
@RoelVdP i'm really not sure what you mean by that.As long as you point dnsmasq to the file it will read it and cache it.Easiest way to check is from the system log(
Yeah,like bunch.I went briefly trough your script and you can do some improvements to kind of lower the size(entries) and make it more responsive:
You do realize what
I just looked at it and it's wrong. This list does not come with any comments or empty lines and when i tried the command it was soooo slow. So for this one(only) just use Another TIP: Some lists comes with bunch of comments on the top and that's it,the rest is only domain entries,so in this case(after confirmation aka visual inspection) use: |
@RoelVdP this will be my last post here as we really went OFF TOPIC on this one and i know...some...are not happy about it. So good luck and i hope whatever i posted above would help to make your project better. 👍 |
If there are multiple hostnames on a line, the names after the first are treated as aliases for the first, which means that it takes less time to load in the file; also this trims file size by minimizing the number of occurrences of the redirect IP address in the file.
Although even 24 hostnames per line works in Unix-like systems (although too many names per line itself has its problems), Windows ignores any hostnames on a line after the first nine, so nine per line is ideal: http://forum.hosts-file.net/viewtopic.php?p=16438&sid=3e0ec8605c66da5a6a4bdd1bb49b5fbb#p16438
The text was updated successfully, but these errors were encountered: