Skip to content
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.

simplecrawler 0.7.0

Compare
Choose a tag to compare
@fredrikekelund fredrikekelund released this 07 May 16:48
· 256 commits to master since this release

This release includes a range of fixes that were originally targeted for 1.0.0, but because of several delays, we decided on cutting an intermediary release before 1.0.0, which will break some of the existing API's in favor of more flexible solutions.

Features

  • Added support for srcset attributes in the default discoverRegex, as per #246
  • Added config option to follow initial redirects to a different domain, as per #234 (@tobli and @thejoshcrawford)
  • Added Accept header to default headers, as per #228
  • Added support for decompressing compressed response bodies. This feature is available behind the crawler.decompressResponses flag, which is now true by default, and adds an Accept-Encoding header to the default ones. Added in #261
  • 404 and 410 HTTP statuses now emit different events: fetch404 and fetch410, as per #224 (@tmpfs)
  • Added config option crawler.whitelistedMimeTypes that can be used to customize the behavior of crawler.fetchWhitelistedMimeTypesBelowMaxDepth (@autarc)

Bug fixes

  • The port number has been removed from the Host header when crawling sites that use HTTPS, as per #241
  • Fixed a bug where the crawler would register multiple crawl intervals if the start method was called multiple times, as per #243

Misc

  • Code clean-up by using Array.prototype.some instead of Array.prototype.reduce in mutliple places (@autarc)
  • The README has been given some much needed love, errant examples have been fixed, an index has been added, some clean-up has been done and an FAQ section has been added. (@fredrikekelund, @cgiffard and @hbakhtiyor)