Skip to content
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
aduana
doc
examples
lib
test
.gitignore
.travis.yml
LICENSE
MANIFEST.in
README.md
build_wrapper.py
requirements.txt
setup.py
tox.ini

README.md

Description Build Status

A library to guide a web crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

Warning: I only test with regularity under Linux, my development platform. From time to time I test also on OS X and Windows 8 using MinGW64.

Installation

pip install aduana

Documentation

Available at readthedocs

I have started documenting plans/ideas at the wiki.

Example

Single spider example:

cd example
pip install -r requirements.txt
scrapy crawl example

To run the distributed crawler see the docs

You can’t perform that action at this time.