dns.lookup shares a thread pool with disk operations #2868

Closed
ntoshev opened this Issue Mar 5, 2012 · 19 comments

Projects

None yet
@ntoshev

If you generate a lot of DNS requests, they get queued and become slow. Disk I/O gets in the same queue so it is slowed down as well.

@Southern

When you say "disk I/O", are you referring to fs or the actual I/O of Node?

If you're referring to fs, the problem is that both fs and dns both use process.nextTick() callbacks. A simple solution to avoid this issue is to use child_process or cluster to create a couple of threads to handle the massive amounts of requests you require.

@thejh
@Southern

@thejh Okay. Maybe using "threads" wasn't the best word to choose. Should have known someone would be a technical brat. Regardless of whether child_process and cluster are true threads, each has their own process variable. Thus the callbacks for each process.nextTick should function independently. This eliminates the need to "revive" code that isn't broken in the first place.

This is not a Node error. It is a user error. You act as if this wouldn't happen in another coding language where a lot of requests were made back-to-back, using the same queued callback for everything.

By the way, where was your suggestion on how to fix this? If you're here to TroLLz, can I play too?

@ntoshev

I'm referring to fs. See this thread for context:

http://groups.google.com/group/nodejs/browse_thread/thread/6f564ac1dcd650bf/100bc657172b88e2

I solved my own problem using dns.resolve4 and then handing the IP to an http request, the performance difference is amazing. I also contributed a patch to the fetch module (https://github.com/andris9/fetch) doing that. I'm a little surprised no other http client library implements this. It would be the bottleneck of many network-bound projects.

@thejh
@bnoordhuis
Node.js Foundation member

To clear up any misunderstandings, what happens here is that getaddrinfo() DNS lookups and async file I/O share the same thread pool in C land. Said thread pool has limited concurrency so if DNS lookups are slow because the nameserver is down (to name something), that also slows down file ops. It's a libuv/libeio issue.

@arnaud-lb

This is a very serious issue, as queuing only 4 DNS queries will delay everything else using the thread pool.

Considering that a DNS query can take more than 20 seconds with default timeout/retry settings, if the thread pool queue is filled with DNS queries, everything can be delayed for minutes or even hours.

@jorangreef

Are there any blockers to Node using dns.resolve internally for net.connect etc? And leaving dns.lookup for those that need it?

@bnoordhuis
Node.js Foundation member

I've responded to your question on the mailing list. For posterity:

Is it really necessary that Node's net.connect etc. still use dns.lookup
internally?

Yes, for several reasons. For example, c-ares doesn't speak mDNS and
probably never will, its notion of what the upstream DNS servers are
and in what order they should be tried can be wildly at odds with the
system resolver, and so on.

tl;dr net.connect() and friends won't be switching to dns.resolve() anytime soon.

The scalability of the thread pool is a subject of ongoing research; it's possible to speed up specific workloads by a factor of 10 - but not, it seems, without regressing other workloads by a factor of 5 or more. That won't do, of course.

If you want node.js to use more threads, run export UV_THREADPOOL_SIZE=<num-threads> before starting node.

@bnoordhuis bnoordhuis closed this Oct 23, 2013
@bnoordhuis
Node.js Foundation member

Let me add that a change that makes the DNS resolver configurable in the call to net.connect() might be acceptable. (Should target master, come with docs and regression tests, etc. See CONTRIBUTING.md for details.)

The API I have in mind is something like this:

net.connect({
  host: "nodejs.org",
  port: 80,
  resolver: function(host, type, callback) {
    return dns.resolve(host, type, callback);
  }
});

It's up for debate if the 'type' argument should be called 'family' instead and be a number, like how dns.lookup() works. I'm open to well-reasoned suggestions.

@jorangreef
@tjfontaine

also for posterity there are things like http://npmjs.org/package/native-dns which implement dns in terms of node primitives such that you're not subject to the thread pool

@bminer bminer referenced this issue in EmergingTechnologyAdvisors/node-serialport Dec 24, 2013
Closed

Uncaught exceptions can cause node to hang #282

@bminer

Can this issue be re-opened? I like the API that @bnoordhuis has proposed. :)

@bnoordhuis - Is it also possible to revise dns.lookup to accept either a family or rrtype? Instead of accepting 4 or 6, it could accept any of these: 4, 6, A or AAAA. This would allow one to use dns.lookup just like dns.resolve for resolving A or AAAA records.

Thoughts?

@indutny
Node.js Foundation member

I'm ok with it, but have no time to work on it :(

@bminer

That's okay. I will try to take a shot at it over the next few days here. Just too busy at the moment.

At any rate, if the issue is re-opened, it might get a bit more attention.

Thanks!

@bminer bminer referenced this issue in visionmedia/superagent Dec 27, 2013
Closed

Add option for DNS resolving #306

@bminer

I think that there are a few issues here:

1.) The API for net.connect and for http.request should be changed to what @bnoordhuis proposed, allowing one to specify a resolver option to handle DNS resolution.
2.) dns.lookup should be revised to accept either a family or rrtype. Instead of accepting 4 or 6, it could accept any of these: 4, 6, A or AAAA. This would allow one to use dns.lookup just like dns.resolve for resolving A or AAAA records and make the API a bit more user-friendly.
3.) Most importantly: For most users who just want to use the default dns.lookup routine and utilize the operating system's resolver, DNS lookups should probably use a separate thread pool, or alternatively, they should not be allowed to fill up the entire shared thread pool. As @arnaud-lb mentioned, DNS queries can take more than a few seconds, which can significantly (and undesirably) slow other asynchronous I/O operations. This might be unacceptable in time-sensitive applications (i.e. reading from a hardware device such as a serial port).
4.) dns.resolve and friends read /etc/resolv.conf when Node starts to determine which nameservers to use. Unfortunately, C-Ares does not expose a ares_reinit() function to re-read this file if it changes. This can make using the other DNS functions a little less reliable than dns.lookup at times.

With all that being said... How should this issue be handled? Should a new issue be opened? Should this one be re-opened?

@trevnorris

Going to reopen since we seem to be cool w/ the possibility of implementation. But yeah, core doesn't have time for this now. PR's will be welcome. :)

@trevnorris trevnorris reopened this Jan 7, 2014
@bminer

@trevnorris - Thanks for re-opening! If time permits, I will work on a PR.

@chrisdickinson

Closing this in favor of the reduced issue #8475.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment