Patu is a small spider
Python Shell
A small spider, useful for checking a site for 404s and 500s. Patu requires httplib2 and lxml:

pip install -U httplib2 lxml

Quick Usage

To see available options: --help

To spider an entire site using 5 workers, only showing errors: --spiders=5

To spider, stopping after the first level of links: --depth=1

To get a list of every linked page on a site: --generate > urls.txt

Instead of spidering for URLs, use a file instead and show all responses: --input=urls.txt --verbose

Format of URLs File

The output produced by --generate is formatted like so:


--input can take a file of that format, or one URL per line with no referer. --input=- reads from stdin.


Patu uses Nose for testing. To install Nose and test:

pip install -U nose
