Changes since v0.2.0


  • Added ":redirect_limit" option (default 5);
  • Ensure that hyperlinks that can't be followed are skipped. Examples:

  • Old "anemone_count", "anemone_pagedepth" cli scripts now available through the new anemone command;

  • No more global Anemone.options; parameters hash is unique per crawl. This enables running multiple crawls with different parameters;
  • Added ":traverse_up => false" option to restrict only to given paths;
  • Skip URLs ending in file extensions like ".pdf", ".jpg", etc.;
  • Added ":allowed_urls" and ":skip_urls" options for specifying URL string or regexp patterns to follow or block, respectively.


  • crawl options are not an OpenStruct anymore, but a Hash
  • added Page#fetch(url) as a shortcut for Page.fetch(url, page)
  • Page body and links are lazily parsed
  • added Page#discard_document! to delete page body
  • introduce URI::Generic#path_with_query
  • refactored Anemone::HTTP
