Release simplecrawler 0.7.0 · simplecrawler/simplecrawler

This release includes a range of fixes that were originally targeted for 1.0.0, but because of several delays, we decided on cutting an intermediary release before 1.0.0, which will break some of the existing API's in favor of more flexible solutions.

Features

Added support for srcset attributes in the default discoverRegex, as per #246
Added config option to follow initial redirects to a different domain, as per #234 (@tobli and @thejoshcrawford)
Added Accept header to default headers, as per #228
Added support for decompressing compressed response bodies. This feature is available behind the crawler.decompressResponses flag, which is now true by default, and adds an Accept-Encoding header to the default ones. Added in #261
404 and 410 HTTP statuses now emit different events: fetch404 and fetch410, as per #224 (@tmpfs)
Added config option crawler.whitelistedMimeTypes that can be used to customize the behavior of crawler.fetchWhitelistedMimeTypesBelowMaxDepth (@autarc)

Bug fixes

The port number has been removed from the Host header when crawling sites that use HTTPS, as per #241
Fixed a bug where the crawler would register multiple crawl intervals if the start method was called multiple times, as per #243

Misc

Code clean-up by using Array.prototype.some instead of Array.prototype.reduce in mutliple places (@autarc)
The README has been given some much needed love, errant examples have been fixed, an index has been added, some clean-up has been done and an FAQ section has been added. (@fredrikekelund, @cgiffard and @hbakhtiyor)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplecrawler 0.7.0

Features

Bug fixes

Misc