addFetchCondition and addDownloadCondition not working correctly anymore #360

JessyCat92 · 2017-03-16T11:42:32Z

What happened?

I just updated to version 1.1.0 and also changed addFetchCondition and addDownloadCondition to work async (I tried also sync - same behaviour). If I use any add Download Condition the crawler stucks at download and if I use only Fetch Condition it skips linked documents.

What should have happened?

it should work like in 1.0.3

Steps to reproduce the problem

It's very helpful if you include a code sample here (including a URL to the site you tried to crawl)
I think it is the best I explain what we are doing.
We have a test html page where we have a link to an XML page. In Version 1.0.3 everything was crawled if I use simply (this code was only to check the problem normaly we have there regexp stuff)

crawler.addFetchCondition(function (queueItem, referrerQueueItem) {            
            return true;
            // in Version 1.1.0 I changed it to callback(true)
});

but after adding the fetch Condition it only crawls the html page and ignores completely the link to the xml files (so also crawler.on("fetchstart", function(queueItem) { console.log(queueIte.url); }); is not executed for the xml file anymore

In case of using download condition I added

crawler.addDownloadCondition(function (queueItem, referrerQueueItem) {            
            return true;
            // in Version 1.1.0 I changed it to callback(true)
});

and after update to 1.0.3 we have that we get Fetchstart Console output for only html file but after this console output nothing will be happen anymore (so no fetch complete)

The text was updated successfully, but these errors were encountered:

choerl · 2017-03-16T11:56:27Z

Behavior is reproducible

fredrikekelund · 2017-03-16T20:04:24Z

Thanks for reporting an issue, @geramy92! This looks to have been an issue in the documentation. The callbacks use the node convention of taking any potential error as the first argument, and the result as the second argument.

Could you try to change your code to callback(null, true) and see if it works as expected?

JessyCat92 · 2017-03-17T10:29:34Z

yes, with this variant it is working.
So documentation and examples are not correct. But I think if first parameter is error (like on normal node convention) it should be handled or return anything instead of running into such issues.

fredrikekelund · 2017-03-17T12:49:05Z

You're absolutely right that the documentation was faulty, I've already updated it to help others from running into the same issue.

We've added two new events to deal with potential errors from fetch conditions and download conditions: fetchconditionerror and downloadconditionerror. And if an error is encountered, simplecrawler will consider that the same thing as the condition having returned false (ie. it won't add the item to the queue/download the resource). Does that make sense?

JessyCat92 · 2017-03-17T13:04:01Z

Yes, I think thats a good way to solve it.

fredrikekelund · 2017-03-17T13:42:47Z

Cool! Let me know if you run into any other issues around this

fredrikekelund closed this as completed Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

addFetchCondition and addDownloadCondition not working correctly anymore #360

addFetchCondition and addDownloadCondition not working correctly anymore #360

JessyCat92 commented Mar 16, 2017 •

edited by fredrikekelund

choerl commented Mar 16, 2017

fredrikekelund commented Mar 16, 2017

JessyCat92 commented Mar 17, 2017

fredrikekelund commented Mar 17, 2017

JessyCat92 commented Mar 17, 2017

fredrikekelund commented Mar 17, 2017

addFetchCondition and addDownloadCondition not working correctly anymore #360

addFetchCondition and addDownloadCondition not working correctly anymore #360

Comments

JessyCat92 commented Mar 16, 2017 • edited by fredrikekelund

What happened?

What should have happened?

Steps to reproduce the problem

choerl commented Mar 16, 2017

fredrikekelund commented Mar 16, 2017

JessyCat92 commented Mar 17, 2017

fredrikekelund commented Mar 17, 2017

JessyCat92 commented Mar 17, 2017

fredrikekelund commented Mar 17, 2017

JessyCat92 commented Mar 16, 2017 •

edited by fredrikekelund