Cookler is a Ruby library that provides spidering and analysing features. It helps you to quickly write a program to retrieve content and statistics of any website. The API is based on Anemone (http://anemone.rubyforge.org) and use MongoDB as storage solution. Cookler has a multi-threaded and easy-to-use design.
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
README.md
cookler.gemspec

README.md

COOKLER

Cookler is a Ruby library that provides spidering and analysing features. It helps you to quickly write a program to retrieve content and statistics of any website. The API is based on Anemone (http://anemone.rubyforge.org) and use MongoDB as storage solution. Cookler has a multi-threaded and easy-to-use design.

Usage

#!/usr/bin/env ruby

require 'cookler'

targets  = {
  :target_site1 => ["http://target.domain.com/starting/path/", /\/starting\/path\/example\/uri\/regexp\/.+-[0-9]*[a-z]*$/, ".htmlelement_class_or_id_to_catch", true],
}

## Last parameter deletes links on current crawled page that doesn't match with the given Regexp.

Cookler.analyze(targets, 15)
  • The retrieved data and generated stats are stored in 'cookler-db'.

Bug tracker

Have a bug? Please create an issue here on GitHub!

https://github.com/gastounage/cookler/issues

Authors

Alexandre Mootassem