This repository has been archived by the owner on Mar 7, 2021. It is now read-only.
simplecrawler 0.7.0
This release includes a range of fixes that were originally targeted for 1.0.0, but because of several delays, we decided on cutting an intermediary release before 1.0.0, which will break some of the existing API's in favor of more flexible solutions.
Features
- Added support for srcset attributes in the default
discoverRegex
, as per #246 - Added config option to follow initial redirects to a different domain, as per #234 (@tobli and @thejoshcrawford)
- Added Accept header to default headers, as per #228
- Added support for decompressing compressed response bodies. This feature is available behind the
crawler.decompressResponses
flag, which is now true by default, and adds anAccept-Encoding
header to the default ones. Added in #261 - 404 and 410 HTTP statuses now emit different events:
fetch404
andfetch410
, as per #224 (@tmpfs) - Added config option
crawler.whitelistedMimeTypes
that can be used to customize the behavior ofcrawler.fetchWhitelistedMimeTypesBelowMaxDepth
(@autarc)
Bug fixes
- The port number has been removed from the
Host
header when crawling sites that use HTTPS, as per #241 - Fixed a bug where the crawler would register multiple crawl intervals if the
start
method was called multiple times, as per #243
Misc
- Code clean-up by using
Array.prototype.some
instead ofArray.prototype.reduce
in mutliple places (@autarc) - The README has been given some much needed love, errant examples have been fixed, an index has been added, some clean-up has been done and an FAQ section has been added. (@fredrikekelund, @cgiffard and @hbakhtiyor)