New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout #83

Open
joshribakoff opened this Issue Aug 16, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@joshribakoff
Copy link

joshribakoff commented Aug 16, 2017

cd repos/node_modules/broken-link-checker/
[josh@localhost broken-link-checker]$ grep -r 'timeout' .

No timeout option exists.

#67

The timeout is defined by the operating system and is usually 2000ms. Setting Node's http timeout to anything longer than that will not override the OS setting. Setting to anything shorter might be insufficient.

I've implemented timeouts in nodeJS just fine. If there is some kind an OS issue, you could use setTimeout() & clearTimeout() to workaround this.

What happens is BLC tries to do an HTTP request to some random IP not running a web server. The whole scraper/queue blocks until the OS times out the TCP socket like you said (which is because I enqueue only 1 page at a time, then wait until complete to enqueue the next page). Basically I'm running multiple instances of BLC on different servers & reading URLs of my own queue system, as opposed to using the internal queue which would only be local to 1 server.

In your script you can implement a user-land timeout defaulted to 20 seconds, which can be customized. Simply use setTimeout to register a callback that removes your listener & resumes the queue. If the request completes before this callback fires, use clearTimeout. Behind the scenes the OS may still be trying to open a TCP connection, but there's no reason the BLC queue has to block. This is a huge performance problem. Instead of testing 100s of URLs a second I'm only getting a throughput of about 1 URL a second after averaging in all the blocking, and I have to check 100s of thousands of URLs.

@stevenvachon

This comment has been minimized.

Copy link
Owner

stevenvachon commented Aug 16, 2017

If a request is not timing out, then this is an issue with bhttp.

@joshribakoff

This comment has been minimized.

Copy link
Author

joshribakoff commented Aug 16, 2017

It does time out. Eventually [after about 30-60 seconds]. I want to lower the timeout, maybe 5-10 seconds... but no API is provided in this library to do so.

@stevenvachon stevenvachon added the bug label Nov 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment