Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
34 lines (21 sloc) 591 Bytes

Cookbook

Crawl a web page

The most simple way to use our program is with no arguments. Simply run:

python main.py -u <url>

to crawl a webpage.

Crawl a page slowly

To add a delay to your crawler, use :option:`-d`:

python main.py -d 10 -u <url>

This will wait 10 seconds between page fetches.

Crawl only your blog

You will want to use the :option:`-i` flag, which while ignore URLs matching the passed regex:

python main.py -i "^blog" -u <url>

This will only crawl pages that contain your blog URL.