Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 62 lines (42 sloc) 2.025 kB
96bb683 @mikejs add LICENSE and update README
mikejs authored
1 =========
2 scrapelib
3 =========
4
9a60b9a @jamesturk moving it over
authored
5 .. image:: https://travis-ci.org/jamesturk/scrapelib.svg?branch=master
6 :target: https://travis-ci.org/jamesturk/scrapelib
fb3092d @jamesturk badges
authored
7
9a60b9a @jamesturk moving it over
authored
8 .. image:: https://coveralls.io/repos/jamesturk/scrapelib/badge.png?branch=master
9 :target: https://coveralls.io/r/jamesturk/scrapelib
fb3092d @jamesturk badges
authored
10
9a60b9a @jamesturk moving it over
authored
11 .. image:: https://img.shields.io/pypi/v/scrapelib.svg
fb3092d @jamesturk badges
authored
12 :target: https://pypi.python.org/pypi/scrapelib
b0d7b99 @jamesturk adjust readme
authored
13
c2455c7 @jamesturk docs badge
authored
14 .. image:: https://readthedocs.org/projects/scrapelib/badge/?version=latest
15 :target: https://readthedocs.org/projects/scrapelib/?badge=latest
16 :alt: Documentation Status
39b9967 @jamesturk tweak readme
authored
17
c435ad9 @jamesturk a few changes to README
authored
18 scrapelib is a library for making requests to less-than-reliable websites, it is implemented
19 (as of 0.7) as a wrapper around `requests <http://python-requests.org>`_.
96bb683 @mikejs add LICENSE and update README
mikejs authored
20
af1ebda @konklone Fixed link breakage in README
konklone authored
21 scrapelib originated as part of the `Open States <http://openstates.org/>`_
28dace2 @jamesturk change some text files
authored
22 project to scrape the websites of all 50 state legislatures and as a result
23 was therefore designed with features desirable when dealing with sites that
24 have intermittent errors or require rate-limiting.
96bb683 @mikejs add LICENSE and update README
mikejs authored
25
28dace2 @jamesturk change some text files
authored
26 Advantages of using scrapelib over alternatives like httplib2 simply using
27 requests as-is:
28
29 * All of the power of the suberb `requests <http://python-requests.org>`_ library.
43d6eb3 @jamesturk some clean up
authored
30 * HTTP, HTTPS, and FTP requests via an identical API
28dace2 @jamesturk change some text files
authored
31 * support for simple caching with pluggable cache backends
a26fbec @konklone tiny typo
konklone authored
32 * request throttling
28dace2 @jamesturk change some text files
authored
33 * configurable retries for non-permanent site failures
34
9a60b9a @jamesturk moving it over
authored
35 Written by James Turk <james.p.turk@gmail.com>, thanks to Michael Stephens for initial urllib2/httplib2 version
26ffdb5 @jamesturk list contributors
authored
36
51caf36 @jamesturk typo in readme
authored
37 See https://github.com/jamesturk/scrapelib/graphs/contributors for contributors.
96bb683 @mikejs add LICENSE and update README
mikejs authored
38
39 Requirements
40 ============
41
b0d7b99 @jamesturk adjust readme
authored
42 * python 2.7, 3.3, 3.4
9919959 @jamesturk bump requests version now
authored
43 * requests >= 2.0 (earlier versions may work but aren't tested)
96bb683 @mikejs add LICENSE and update README
mikejs authored
44
45
46 Example Usage
47 =============
48
9a60b9a @jamesturk moving it over
authored
49 Documentation: http://scrapelib.readthedocs.org/en/latest/
50
96bb683 @mikejs add LICENSE and update README
mikejs authored
51 ::
6b05aaa @mikejs a basic README
mikejs authored
52
53 import scrapelib
290254d @jamesturk removal of robots.txt code
authored
54 s = scrapelib.Scraper(requests_per_minute=10)
6b05aaa @mikejs a basic README
mikejs authored
55
56 # Grab Google front page
2a43816 @jamesturk remove urlopen from example, fixes #24
authored
57 s.get('http://google.com')
6b05aaa @mikejs a basic README
mikejs authored
58
59 # Will be throttled to 10 HTTP requests per minute
60 while True:
2a43816 @jamesturk remove urlopen from example, fixes #24
authored
61 s.get('http://example.com')
Something went wrong with that request. Please try again.