Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Anemone web-spider framework

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 lib
Octocat-spinner-32 spec
Octocat-spinner-32 CHANGELOG.md
Octocat-spinner-32 LICENSE.txt
Octocat-spinner-32 README.rdoc
Octocat-spinner-32 anemone.gemspec
README.rdoc

Anemone

Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

Features:

  • Multi-threaded design for high performance

  • Tracks 301 HTTP redirects to understand a page's aliases

  • Built-in BFS algorithm for determining page depth

  • Allows exclusion of URLs based on regular expressions

Examples

See the scripts under lib/anemone/cli directory for examples of several useful Anemone tasks.

REQUIREMENTS

  • nokogiri

Something went wrong with that request. Please try again.