Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

23 lines (18 sloc) 0.575 kB
Features:
  • HTTP, HTTPS, FTP requests
  • HTTP caching, compression, cookies
  • redirect following
  • request throttling
  • robots.txt compliance (optional)
  • robust error handling

Example:

import scrapelib
s = scrapelib.Scraper(requests_per_minute=10, allow_cookies=True,
                      follow_robots=True)

# Grab Google front page
s.urlopen('http://google.com')

# Will raise RobotExclusionError
s.urlopen('http://google.com/search')

# Will be throttled to 10 HTTP requests per minute
while True:
    s.urlopen('http://example.com')
Jump to Line
Something went wrong with that request. Please try again.