-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inet_db, inet_hosts: improve inet_db behaviour when .hosts file changes #2516
Conversation
Avoid re-parsing changed file over and over, also fix race condition when ETS table with IP <-> name mappings is first deleted, then re-created.
5cbb70d
to
fb39f10
Compare
Could you describe more what changes you have made and why? |
Seems its using lookup in the hot paths instead of match, which is a mandatory speedup if your querying 100k+ names per second. Uses a set instead of a bag. Also its not clearing the ets hosts file representation everytime it changes, but adding the difference. It looks like a big performance win if your doing a lot of gethostbyname. |
@vans163 exactly that, thank you. Most important fix is the smarter way to handle .hosts file timestamp changes. Existing code reads the file, removes all entries from the ETS table. Then, it adds entries from file one by one. So even if .hosts file contents is unchanged, there may be quite a long time for race conditions (especially your installation has 15,000 entries in .hosts file). Suggested code first add missing entries, then removes deleted entries. This is still not atomic (I wish there is ets:swap(Tab1, Tab2) that is atomic), but at least it does not fail name resolution while ETS table is being re-populated. |
Also, in existing implementation, if multiple processes try to resolve name concurrently, and .hosts file has changed, every process will schedule file reload. And file will be reloaded multiple times, aggravating race condition. This diff avoids multiple reloads. |
This PR certainly points out a bunch of weak spots in how However, I have read the code and if I managed to understand it right, I have some concerns...
I have pushed a branch to my GitHub account with a commit on top of this, that tries to fix the points above. It has not even been tested in our daily builds yet, but might be interesting to look at for the discussion. master...RaimoNiskanen:raimo/kernel/inet_db-hosts-file I guess what I should do would be to write test cases to expose the two first bullet points above, which would be considered bugs, and also to get regression tests on the undocumented behaviour of |
I am fairly confident that i have a solution to the clustering update problem for the /etc/hosts file, which I have pushed to my branch mentioned above. While at it I added a few micro optimizations of timestamp handling and ETS lookups too. The solution moves reading file info to the server in I will throw this into our daily tests on Monday and add the regression test i talked about. I would be delighted to hear if my branch works as well in your use case as yours, or if I have introduced new bugs... |
I just force pushed an update - a missing |
I have force pushed another update. This time the first commit is a regression test. The rest of my branch is as before. This PR branch with just the regression test commit fails the regression test:
With this PR's commit reverted it passes, and with my rewrites it passes. I have added my branch to our daily tests, we'll have a result from thoes in a few days... @max-au The question remains if my branch work in your use case? |
@max-au Although this is a performance fix - it is quite a big rewrite of (albeit often unused) core functionality, so it would be more appropriate to have on 'master'. Is that OK? |
@RaimoNiskanen Looking at your fix, it is totally superior! I agree this turned into a feature worth putting on master rather than maint. Unfortunately, I do not have a test to confirm if it works or not. However it's not a terribly difficult test case to write. |
@max-au Thank you for your feedback! R16 - it sure has been a while... Back then I guess it could have been possible used The test case sounds like fun to write! I'll see if I can figure out a not to dirty way to ensure that the hosts file is reloaded only once... Regarding this PR (there is no competing PR): If you change the target branch to 'master' (that should work since there have been no changes in this area on 'maint'), then when I merge my rewrite that is on top of this PR to 'master'; this PR should be marked as merged into 'master'... |
IIRC we tried using heap-based structures, and fprof reported a significant regression. At that time we did not realise fprof was not the right tool to measure performance. Now we have tool that suites better, but we haven't re-measured. |
Curious if its no secret what tool did you use instead? fprof does not report GC time, so in the odd case that producing tons of garbage is much faster in execution but slower when GC cleans wont be noticed by fprof AFAIK. EDIT: This made me curious and I did a small benchmark. It seems on a table lookup 'miss', lookup is 2x faster than a try catch wrapped lookup_element. Most usecases will see a miss here I think in 99.9% of cases. I did it in elixir so maybe their try catch implementation is much worse then erlangs? :inets.start
:ssl.start
#get benchwarmer dep
{:ok,{_,_,req1}} = :httpc.request 'https://raw.githubusercontent.com/mroth/benchwarmer/master/lib/benchwarmer/results.ex'
{:ok,{_,_,req2}} = :httpc.request 'https://raw.githubusercontent.com/mroth/benchwarmer/master/lib/benchwarmer.ex'
Code.eval_string(req1)
Code.eval_string(req2)
tab = :ets.new(Tab, [:public, :ordered_set])
:ets.insert(tab, {1,1})
Benchwarmer.benchmark fn->
try do :ets.lookup_element(tab, 1, 2) catch _,_ -> nil end
end
Benchwarmer.benchmark fn->
try do :ets.lookup_element(tab, 2, 2) catch _,_ -> nil end
end
Benchwarmer.benchmark fn->
case :ets.lookup(tab, 1) do
[] -> nil
[{_,a}] -> a
end
end
Benchwarmer.benchmark fn->
case :ets.lookup(tab, 2) do
[] -> nil
[{_,a}] -> a
end
end |
@vans163 for quick perf-related experiments, we use erlperf (https://github.com/max-au/erlperf). It's very simple yet powerful enough, and allows us to quickly see concurrency-related issues. For fprof we use patched that does record GC and suspend time. I cannot remember or even find the original patch (I suspect @RaimoNiskanen is the author), so I just shared what we use - max-au@17f39c8 for your convenience. |
Even if It evens out in the long run between function calls, so comparing between different Erlang functions works. |
Avoid re-parsing changed file over and over, also fix race condition
when ETS table with IP <-> name mappings is first deleted, then
re-created.