addFetchCondition and addDownloadCondition not working correctly anymore #360
Comments
Behavior is reproducible |
Thanks for reporting an issue, @geramy92! This looks to have been an issue in the documentation. The callbacks use the node convention of taking any potential error as the first argument, and the result as the second argument. Could you try to change your code to |
yes, with this variant it is working. |
You're absolutely right that the documentation was faulty, I've already updated it to help others from running into the same issue. We've added two new events to deal with potential errors from fetch conditions and download conditions: |
Yes, I think thats a good way to solve it. |
Cool! Let me know if you run into any other issues around this |
What happened?
I just updated to version 1.1.0 and also changed addFetchCondition and addDownloadCondition to work async (I tried also sync - same behaviour). If I use any add Download Condition the crawler stucks at download and if I use only Fetch Condition it skips linked documents.
What should have happened?
it should work like in 1.0.3
Steps to reproduce the problem
It's very helpful if you include a code sample here (including a URL to the site you tried to crawl)
I think it is the best I explain what we are doing.
We have a test html page where we have a link to an XML page. In Version 1.0.3 everything was crawled if I use simply (this code was only to check the problem normaly we have there regexp stuff)
but after adding the fetch Condition it only crawls the html page and ignores completely the link to the xml files (so also
crawler.on("fetchstart", function(queueItem) { console.log(queueIte.url); });
is not executed for the xml file anymoreIn case of using download condition I added
and after update to 1.0.3 we have that we get Fetchstart Console output for only html file but after this console output nothing will be happen anymore (so no fetch complete)
The text was updated successfully, but these errors were encountered: